Kwakhona kufakwe kwisethi
I-DeepSpeech ilula kakhulu kuneenkqubo zemveli kwaye kwangaxeshanye ibonelela ngokuqondwa komgangatho ophezulu phambi kwengxolo engaphandle. Idlula iimodeli ze-acoustic zemveli kunye nengqikelelo yeefowuni, endaweni yoko isebenzisa inkqubo yokufunda ephuculwe kakhulu ye-neural esekwe kumatshini esusa isidingo sokuphuhlisa amacandelo ahlukeneyo ukuze imodeli eyahlukeneyo engaqhelekanga njengengxolo, i-echo, kunye neempawu zentetho.
I-downside yale ndlela kukuba ukuze ufumane ukuqondwa komgangatho ophezulu kunye nokuqeqeshwa kwenethiwekhi ye-neural, injini ye-DeepSpeech idinga inani elikhulu leenkcukacha ezingafaniyo, ezichazwe kwiimeko zangempela ngamazwi ahlukeneyo kunye nobukho bengxolo yendalo.
Iprojekthi eyenziwe kwiMozilla iqokelela idatha enjalo.
Injongo ephambili yeprojekthi ye-Common Voice kukuqokelela iiyure eziliwaka ezili-10 zokurekhodwa kwamagama ahlukeneyo eentetho eziqhelekileyo zentetho yomntu, eya kuvumela ukufezekisa inqanaba elamkelekileyo leempazamo ekuqapheliseni. Kwifom yayo yangoku, abathathi-nxaxheba beprojekthi sele bechaze inani leeyure ezingamawaka angama-4.3, apho i-3.5 yamawaka ihlolwe. Xa kuqeqeshwa imodeli yokugqibela yolwimi lwesiNgesi kwi-DeepSpeech, iiyure ezingama-3816 zokuthetha zisetyenzisiwe, ukongeza kwi-Common Voice covering data evela kwiiprojekthi ze-LibriSpeech, iFisher kunye ne-Switchboard, kwaye kubandakanywa malunga neeyure ze-1700 zokurekhodwa komboniso kanomathotholo.
Xa usebenzisa imodeli yolwimi lwesiNgesi esele yenziwe ukuba ikhutshelwe, izinga lempazamo lokuqaphela kwi-DeepSpeech yi-7.5% xa ivavanywa ngesethi yovavanyo.
I-DeepSpeech iqulathe ii-subsystems ezimbini - imodeli ye-acoustic kunye ne-decoder. Imodeli ye-acoustic isebenzisa iindlela ezinzulu zokufunda koomatshini ukubala ukuba nokwenzeka koonobumba abathile babekho kwisandi sogalelo. Idekhowuda isebenzisa ialgorithm yokukhangela iray ukuguqula idatha enokwenzeka yeempawu zibe ngumboniso wokubhaliweyo.
Siseko
- I-decoder entsha yokusasaza indululwa ebonelela ngokuphendula okuphezulu kwaye izimeleyo kubungakanani bedatha yomsindo ecutshungulweyo. Ngenxa yoko, inguqulelo entsha ye-DeepSpeech ikwazile ukunciphisa i-latency yokuqatshelwa kwi-260 ms, eyi-73% ngokukhawuleza kunangaphambili, kwaye ivumela i-DeepSpeech ukuba isetyenziswe kwizisombululo zokuqaphela intetho kwi-fly.
- Utshintsho lwenziwe kwi-API kwaye kwenziwe umsebenzi wokudibanisa amagama emisebenzi. Imisebenzi yongezwe ukufumana imethadatha eyongezelelweyo malunga nongqamaniso, ekuvumela ukuba ungafumani kuphela ukumelwa kokubhaliweyo njengemveliso, kodwa nokulandelela ukubophelela kwabalinganiswa kunye nezivakalisi kwindawo kwi-audio stream.
- Inkxaso yokusebenzisa ithala leencwadi yongezwe kwikhithi yezixhobo zeemodyuli zoqeqesho
CuDNN ukunyusa umsebenzi kunye neenethiwekhi ze-neural eziqhubekayo (i-RNN), ezenza ukuba kube lula ukuphumeza okubalulekileyo (malunga nokuphindwe kabini) ukunyuka komsebenzi woqeqesho lomzekelo, kodwa okufunekayo utshintsho kwikhowudi eyaphula ukuhambelana neemodeli ezilungiselelwe ngaphambili. - Ubuncinane beemfuno zenguqulelo yeTensorFlow zinyusiwe ukusuka ku-1.13.1 ukuya ku-1.14.0. Inkxaso eyongeziweyo ye-lightweight edition ye-TensorFlow Lite, enciphisa ubungakanani bephakheji ye-DeepSpeech ukusuka kwi-98 MB ukuya kwi-3.7 MB. Ukusetyenziswa kwizixhobo ezifakiweyo kunye neselfowuni, ubungakanani befayile epakishweyo kunye nemodeli iye yancitshiswa ukusuka kwi-188 MB ukuya kwi-47 MB ββ(indlela yokulinganisa isetyenziselwa ukucinezela emva kokuba imodeli iqeqeshiwe).
- Imodeli yolwimi iguqulelwe kwifomathi yesakhiwo sedatha eyahlukileyo evumela ukuba iifayile zifakwe kwimemori xa zilayishiwe. Inkxaso yefomathi endala iye yanqunyanyiswa.
- Indlela yokulayisha ifayile ngemodeli yolwimi itshintshiwe, eye yanciphisa ukusetyenziswa kwememori kunye nokunciphisa ukulibaziseka xa kusenziwa isicelo sokuqala emva kokudala imodeli. Ngexesha lokusebenza, i-DeepSpeech ngoku isebenzisa amaxesha angama-22 inkumbulo encinci kwaye iqala amaxesha angama-500 ngokukhawuleza.
- Amagama anqabileyo ahluzwa kwimodeli yolwimi. Inani lilonke lamagama lancitshiswa laya kwi-500 lamawaka lawona magama adumileyo afumaneka kwisicatshulwa esisetyenziselwa ukuqeqesha imodeli. Ukucoca kwenze ukuba kube lula ukunciphisa ubungakanani bemodeli yolwimi ukusuka kwi-1800MB ukuya kwi-900MB, kwaye akukho mpembelelo kwireyithi yempazamo yokuqaphela.
- Inkxaso eyongeziweyo kwizinto ezahlukeneyo
uchwephesha ukudala iinguqu ezongezelelweyo (ukwandiswa) kwedatha ye-audio esetyenziswa kuqeqesho (umzekelo, ukongeza ukuphazamiseka okanye ingxolo kwisethi yezinketho). - Yongezwe ithala leencwadi elinezibophelelo zokudityaniswa nezicelo ezisekelwe kwiqonga le-NET.
- Amaxwebhu asetyenzisiwe kwaye ngoku aqokelelwa kwiwebhusayithi eyahlukileyo.
deepspeech.readthedocs.io .
umthombo: opennet.ru