I-Mozilla yazisa injini yokuqonda intetho i-DeepSpeech 0.6

Yaziswa ukukhutshwa kwenjini yokuqonda intetho ephuhliswe yiMozilla Intetho enzulu 0.6, ephumeza ulwakhiwo lwentetho yegama elinye, icetywayo ngabaphandi abavela e-Baidu. Ukuphunyezwa kubhalwe kwiPython usebenzisa i-platform yokufunda yomatshini we-TensorFlow kunye isasazwa ngu phantsi kwelayisensi ye-MPL 2.0 yasimahla. Ixhasa umsebenzi kwiLinux, Android, macOS kunye neWindows. Ukusebenza kwanele ukusebenzisa injini kwiLePotato, iRaspberry Pi 3 kunye neRaspberry Pi 4 iibhodi.

Kwakhona kufakwe kwisethi ziyanikezelwa iimodeli eziqeqeshiweyo, imizekelo iifayile zesandi kunye nezixhobo zokuqaphela ukusuka kumgca womyalelo. Ukudibanisa umsebenzi wokuqondwa kwentetho kwiinkqubo zakho, iimodyuli ezilungele ukusetyenziswa zePython, NodeJS, C++ kunye .NET ziyanikezelwa (abaphuhlisi beqela lesithathu banemodyuli elungiselelwe ngokwahlukeneyo Ukugqithisa ΠΈ Go). Imodeli egqityiweyo ibonelelwa ngesiNgesi kuphela, kodwa ngezinye iilwimi iqhotyoshelwe imiyalelo ungaqeqesha inkqubo ngokwakho usebenzisa idatha yelizwi, eqokelelwe yiprojekthi yeLizwi eliqhelekileyo.

I-DeepSpeech ilula kakhulu kuneenkqubo zemveli kwaye kwangaxeshanye ibonelela ngokuqondwa komgangatho ophezulu phambi kwengxolo engaphandle. Idlula iimodeli ze-acoustic zemveli kunye nengqikelelo yeefowuni, endaweni yoko isebenzisa inkqubo yokufunda ephuculwe kakhulu ye-neural esekwe kumatshini esusa isidingo sokuphuhlisa amacandelo ahlukeneyo ukuze imodeli eyahlukeneyo engaqhelekanga njengengxolo, i-echo, kunye neempawu zentetho.

I-downside yale ndlela kukuba ukuze ufumane ukuqondwa komgangatho ophezulu kunye nokuqeqeshwa kwenethiwekhi ye-neural, injini ye-DeepSpeech idinga inani elikhulu leenkcukacha ezingafaniyo, ezichazwe kwiimeko zangempela ngamazwi ahlukeneyo kunye nobukho bengxolo yendalo.
Iprojekthi eyenziwe kwiMozilla iqokelela idatha enjalo. Ilizwi eliqhelekileyo, ukubonelela ngedatha eqinisekisiweyo eneeyure ezingama-780 ze Ulwimi lesingesi, ama-325 ngesiJamani, ali-173 ngesiFrentshi kunye neeyure ezingama-27 ngesiRashiya.

Injongo ephambili yeprojekthi ye-Common Voice kukuqokelela iiyure eziliwaka ezili-10 zokurekhodwa kwamagama ahlukeneyo eentetho eziqhelekileyo zentetho yomntu, eya kuvumela ukufezekisa inqanaba elamkelekileyo leempazamo ekuqapheliseni. Kwifom yayo yangoku, abathathi-nxaxheba beprojekthi sele bechaze inani leeyure ezingamawaka angama-4.3, apho i-3.5 yamawaka ihlolwe. Xa kuqeqeshwa imodeli yokugqibela yolwimi lwesiNgesi kwi-DeepSpeech, iiyure ezingama-3816 zokuthetha zisetyenzisiwe, ukongeza kwi-Common Voice covering data evela kwiiprojekthi ze-LibriSpeech, iFisher kunye ne-Switchboard, kwaye kubandakanywa malunga neeyure ze-1700 zokurekhodwa komboniso kanomathotholo.

Xa usebenzisa imodeli yolwimi lwesiNgesi esele yenziwe ukuba ikhutshelwe, izinga lempazamo lokuqaphela kwi-DeepSpeech yi-7.5% xa ivavanywa ngesethi yovavanyo. LibriSpeech. Ukuthelekisa, izinga lempazamo lokuqatshelwa komntu kuqikelelwa 5.83%.

I-DeepSpeech iqulathe ii-subsystems ezimbini - imodeli ye-acoustic kunye ne-decoder. Imodeli ye-acoustic isebenzisa iindlela ezinzulu zokufunda koomatshini ukubala ukuba nokwenzeka koonobumba abathile babekho kwisandi sogalelo. Idekhowuda isebenzisa ialgorithm yokukhangela iray ukuguqula idatha enokwenzeka yeempawu zibe ngumboniso wokubhaliweyo.

Siseko ezintsha I-DeepSpeech 0.6 (i-0.6 yesebe ayihambelani nokukhutshwa kwangaphambili kwaye ifuna ikhowudi kunye nohlaziyo lwemodeli):

  • I-decoder entsha yokusasaza indululwa ebonelela ngokuphendula okuphezulu kwaye izimeleyo kubungakanani bedatha yomsindo ecutshungulweyo. Ngenxa yoko, inguqulelo entsha ye-DeepSpeech ikwazile ukunciphisa i-latency yokuqatshelwa kwi-260 ms, eyi-73% ngokukhawuleza kunangaphambili, kwaye ivumela i-DeepSpeech ukuba isetyenziswe kwizisombululo zokuqaphela intetho kwi-fly.
  • Utshintsho lwenziwe kwi-API kwaye kwenziwe umsebenzi wokudibanisa amagama emisebenzi. Imisebenzi yongezwe ukufumana imethadatha eyongezelelweyo malunga nongqamaniso, ekuvumela ukuba ungafumani kuphela ukumelwa kokubhaliweyo njengemveliso, kodwa nokulandelela ukubophelela kwabalinganiswa kunye nezivakalisi kwindawo kwi-audio stream.
  • Inkxaso yokusebenzisa ithala leencwadi yongezwe kwikhithi yezixhobo zeemodyuli zoqeqesho CuDNN ukunyusa umsebenzi kunye neenethiwekhi ze-neural eziqhubekayo (i-RNN), ezenza ukuba kube lula ukuphumeza okubalulekileyo (malunga nokuphindwe kabini) ukunyuka komsebenzi woqeqesho lomzekelo, kodwa okufunekayo utshintsho kwikhowudi eyaphula ukuhambelana neemodeli ezilungiselelwe ngaphambili.
  • Ubuncinane beemfuno zenguqulelo yeTensorFlow zinyusiwe ukusuka ku-1.13.1 ukuya ku-1.14.0. Inkxaso eyongeziweyo ye-lightweight edition ye-TensorFlow Lite, enciphisa ubungakanani bephakheji ye-DeepSpeech ukusuka kwi-98 MB ukuya kwi-3.7 MB. Ukusetyenziswa kwizixhobo ezifakiweyo kunye neselfowuni, ubungakanani befayile epakishweyo kunye nemodeli iye yancitshiswa ukusuka kwi-188 MB ukuya kwi-47 MB ​​(indlela yokulinganisa isetyenziselwa ukucinezela emva kokuba imodeli iqeqeshiwe).
  • Imodeli yolwimi iguqulelwe kwifomathi yesakhiwo sedatha eyahlukileyo evumela ukuba iifayile zifakwe kwimemori xa zilayishiwe. Inkxaso yefomathi endala iye yanqunyanyiswa.
  • Indlela yokulayisha ifayile ngemodeli yolwimi itshintshiwe, eye yanciphisa ukusetyenziswa kwememori kunye nokunciphisa ukulibaziseka xa kusenziwa isicelo sokuqala emva kokudala imodeli. Ngexesha lokusebenza, i-DeepSpeech ngoku isebenzisa amaxesha angama-22 inkumbulo encinci kwaye iqala amaxesha angama-500 ngokukhawuleza.

    I-Mozilla yazisa injini yokuqonda intetho i-DeepSpeech 0.6

  • Amagama anqabileyo ahluzwa kwimodeli yolwimi. Inani lilonke lamagama lancitshiswa laya kwi-500 lamawaka lawona magama adumileyo afumaneka kwisicatshulwa esisetyenziselwa ukuqeqesha imodeli. Ukucoca kwenze ukuba kube lula ukunciphisa ubungakanani bemodeli yolwimi ukusuka kwi-1800MB ukuya kwi-900MB, kwaye akukho mpembelelo kwireyithi yempazamo yokuqaphela.
  • Inkxaso eyongeziweyo kwizinto ezahlukeneyo uchwephesha ukudala iinguqu ezongezelelweyo (ukwandiswa) kwedatha ye-audio esetyenziswa kuqeqesho (umzekelo, ukongeza ukuphazamiseka okanye ingxolo kwisethi yezinketho).
  • Yongezwe ithala leencwadi elinezibophelelo zokudityaniswa nezicelo ezisekelwe kwiqonga le-NET.
  • Amaxwebhu asetyenzisiwe kwaye ngoku aqokelelwa kwiwebhusayithi eyahlukileyo. deepspeech.readthedocs.io.

umthombo: opennet.ru

Yongeza izimvo