Isistimu yokufunda yomshini we-Diffusion ezinzile eguqulelwe ku-synthesis yomculo

Iphrojekthi ye-Riffusion ithuthukisa okuhlukile kwesistimu yokufunda yomshini we-Stable Diffusion eguqulelwe ukukhiqiza umculo esikhundleni sezithombe. Umculo ungahlanganiswa ngencazelo yombhalo ngolimi lwemvelo noma ngokusekelwe kusifanekiso esiphakanyisiwe. Izingxenye zokuhlanganiswa komculo zibhalwe nge-Python kusetshenziswa uhlaka lwe-PyTorch futhi zitholakala ngaphansi kwelayisensi ye-MIT. Ukubophezela ngesixhumi esibonakalayo kusetshenziswa ngolimi lwe-TypeScript futhi lusatshalaliswa ngaphansi kwelayisensi ye-MIT. Amamodeli aqeqeshiwe akhululwa ngaphansi kwelayisensi ye-Creative ML OpenRAIL-M evumela ukusetshenziswa kwezentengiso.

Le phrojekthi iyathakazelisa ngoba iyaqhubeka nokusebenzisa amamodeli we-"text-to-image" kanye "nesithombe-kuya-isithombe" ekwenzeni umculo, kodwa ishintsha ama-spectrogram njengezithombe. Ngamanye amazwi, i-Stable Diffusion yakudala ayiqeqeshelwanga ezithombeni nasezithombeni, kodwa ezithombeni zama-spectrogram abonisa ukuguqulwa kwemvamisa nobukhulu begagasi lomsindo ngokuhamba kwesikhathi. Ngokuvumelana nalokho, i-spectrogram nayo iyakhiwa lapho iphuma khona, bese iguqulelwa ekubeni imelela okulalelwayo.

Isistimu yokufunda yomshini we-Diffusion ezinzile eguqulelwe ku-synthesis yomculo

Indlela ingase futhi isetshenziselwe ukulungisa ukuqanjwa komsindo okukhona kanye nokuhlanganiswa komculo okuyisampula, okufana nokuguqulwa kwesithombe kokuthi I-Stable Diffusion. Isibonelo, isizukulwane singasetha amasampula e-spectrogram ngesitayela sereferensi, sihlanganise izitayela ezahlukene, senze uguquko olushelelayo ukusuka kwesinye isitayela ukuya kwesinye, noma senze izinguquko emsindweni okhona ukuze kuxazululwe izinkinga ezinjengokukhulisa ivolumu yezinsimbi ngazinye, ukushintsha isigqi kanye ukushintsha izinsimbi. Amaphethini nawo asetshenziselwa ukukhiqiza izingoma ezidlala isikhathi eside, ezakhiwe uchungechunge lwamavesi asondelene, ahluka kancane ngokuhamba kwesikhathi. Izingcezu ezikhiqizwe ngokuhlukene zihlanganiswa zibe ukusakaza okuqhubekayo ngokufaka phakathi amapharamitha angaphakathi emodeli.

Isistimu yokufunda yomshini we-Diffusion ezinzile eguqulelwe ku-synthesis yomculo

Ukwakha i-spectrogram ngomsindo, i-Fourier transform efakwe ngefasitela isetshenziswa. Lapho uphinda uphinda umsindo ovela ku-spectrogram, kunenkinga ekunqumeni isigaba (imvamisa kuphela nama-amplitude akhona ku-spectrogram), ukuze kwakhiwe kabusha lapho kusetshenziswa khona i-algorithm ye-Griffin-Lim approximation.



Source: opennet.ru

Engeza amazwana