Ikhodi yesistimu yokuqaphela inkulumo ye-Whisper kanye nokuhumusha ivuliwe

Iphrojekthi ye-OpenAI, ethuthukisa amaphrojekthi omphakathi emkhakheni wobuhlakani bokwenziwa, ishicilele intuthuko ehlobene nohlelo lokuqaphela inkulumo ye-Whisper. Kuthiwa ekukhulumeni ngesiNgisi uhlelo luhlinzeka ngamazinga okuthembeka kanye nokunemba kokuqashelwa okuzenzakalelayo eduze nokuqashelwa komuntu. Ikhodi yokusetshenziswa kwereferensi esekelwe ohlakeni lwe-PyTorch kanye nesethi yamamodeli asevele aqeqeshiwe, alungele ukusetshenziswa, avuliwe. Ikhodi ivuliwe ngaphansi kwelayisensi ye-MIT.

Ukuqeqesha imodeli, kusetshenziswe amahora ayizinkulungwane ezingama-680 edatha yenkulumo, eqoqwe emaqoqweni amaningana ahlanganisa izilimi ezahlukene nezihloko. Cishe i-1/3 yedatha yenkulumo ehilelekile ekuqeqesheni ingezinye izilimi ngaphandle kwesiNgisi. Isistimu ehlongozwayo iphatha ngendlela efanele izimo ezifana nokuphimisela okugxilile, umsindo ongemuva, kanye nokusetshenziswa kwejagoni yobuchwepheshe. Ngokungeziwe ekuguquleleni inkulumo ibe umbhalo, isistimu ingaphinda ihumushe inkulumo isuka kunoma yiluphi ulimi iye esiNgisini futhi ibone ukubukeka kwenkulumo ekusakazeni komsindo.

Amamodeli akhiwe ngezethulo ezimbili: imodeli yolimi lwesiNgisi kanye nemodeli yezilimi eziningi, ebuye isekele izilimi zesiRashiya, isi-Ukraine nesiBelarusian. Ngokulandelayo, isethulo ngasinye sihlukaniswe ngezinketho ezi-5, ezihlukile ngosayizi kanye nenani lamapharamitha ambozwe kumodeli. Ubukhulu besayizi, buba bukhulu ukunemba kanye nekhwalithi yokuqashelwa, kodwa futhi ziba phezulu nezimfuneko zosayizi wememori yevidiyo ye-GPU kanye nokusebenza okuphansi. Isibonelo, inketho encane ihlanganisa amapharamitha ayizigidi ezingu-39 futhi idinga i-1 GB yememori yevidiyo, futhi esiphezulu sihlanganisa amapharamitha ayizigidi ezingu-1550 futhi sidinga imemori yevidiyo engu-10 GB. Inketho encane ishesha izikhathi ezingama-32 kunobukhulu.

Ikhodi yesistimu yokuqaphela inkulumo ye-Whisper kanye nokuhumusha ivuliwe

Uhlelo lusebenzisa i-Transformer neural network architecture, ehlanganisa isifaki khodi nesikhiphi khodi ezisebenzisanayo. Umsindo uhlukaniswa ube yizingxenyana zamasekhondi angu-30, eziguqulwa zibe isibonisi se-log-Mel bese sithunyelwa kusishumeki. Okukhiphayo kwesishumeki kuthunyelwa kusikhi khodi, esibikezela ukumelwa kombhalo okuxutshwe namathokheni akhethekile avumela, ngemodeli eyodwa evamile, ukuxazulula izinkinga ezifana nokutholwa kolimi, ukubalwa kokulandelana kwezikhathi kokuphinyiselwa kwemishwana, ukulotshwa kwenkulumo nge- izilimi ezahlukene, nokuhunyushelwa esiNgisini.

Source: opennet.ru

Engeza amazwana