I-Mozilla yembula injini ye-DeepSpeech 0.6 yokuqaphela inkulumo

Kuthunyelwe ngu- ukukhishwa kwenjini yokuqaphela inkulumo eyakhiwe yi-Mozilla I-DeepSpeech 0.6, esebenzisa i-architecture yokuqaphela inkulumo yegama elifanayo, ehlongozwayo ngabacwaningi abavela e-Baidu. Ukuqaliswa kubhalwe nge-Python kusetshenziswa iplathifomu yokufunda yomshini we-TensorFlow futhi isatshalaliswa ngu ngaphansi kwelayisensi ye-MPL 2.0 yamahhala. Isekela umsebenzi ku-Linux, Android, macOS kanye neWindows. Ukusebenza kwanele ukusebenzisa injini kumabhodi we-LePotato, i-Raspberry Pi 3 kanye ne-Raspberry Pi 4.

Kufakwe futhi kusethi inikelwe amamodeli aqeqeshiwe, izibonelo amafayela omsindo namathuluzi okuqaphela avela kulayini womyalo. Ukuze uhlanganise umsebenzi wokuqaphela inkulumo ezinhlelweni zakho, amamojula alungele ukusetshenziswa ePython, NodeJS, C++ kanye ne-.NET anikezwa (abathuthukisi bezinkampani zangaphandle banamamojula alungiselelwe ngokwehlukana Rust ΠΈ Go). Imodeli eqediwe ihlinzekwa ngesiNgisi kuphela, kodwa ngezinye izilimi ngu okunamathiselwe imiyalelo ungaqeqesha uhlelo ngokwakho usebenzisa idatha yezwi, eqoqwe iphrojekthi ye-Common Voice.

I-DeepSpeech ilula kakhulu kunezinhlelo zendabuko futhi ngesikhathi esifanayo inikeza ukuqashelwa kwekhwalithi ephezulu phambi komsindo ongaphandle. Idlula amamodeli e-acoustic endabuko kanye nomqondo wamafonemi, esikhundleni salokho isebenzisa isistimu yokufunda yomshini esekelwe kunethiwekhi ye-neural ethuthukisiwe kakhulu eqeda isidingo sokuthuthukisa izingxenye ezihlukene ukuze imodeli ehlukehlukene efana nomsindo, i-echo, nezici zenkulumo.

Okubi ngale ndlela ukuthi ukuze kutholwe ukuqashelwa kwekhwalithi ephezulu nokuqeqeshwa kwenethiwekhi ye-neural, injini ye-DeepSpeech idinga inani elikhulu ledatha ehlukahlukene, eshiwo ezimweni zangempela ngamaphimbo ahlukene kanye nokuba khona komsindo wemvelo.
Iphrojekthi edalwe ku-Mozilla iqoqa idatha enjalo. izwi elivamile, ihlinzeka ngedathasethi eqinisekisiwe enamahora angu-780 we ulimi lwesingisi, 325 ngesiJalimane, 173 ngesiFulentshi namahora angu-27 ngesiRashiya.

Umgomo oyinhloko wephrojekthi ye-Common Voice uwukuqongelela amahora ayizinkulungwane ezingu-10 okurekhodiwe kokuphinyiselwa okuhlukahlukene kwemishwana evamile yenkulumo yomuntu, okuzovumela ukuzuza izinga elamukelekayo lamaphutha ekuqashelweni. Ngendlela yayo yamanje, abahlanganyeli bephrojekthi sebevele banqume inani lamahora ayizinkulungwane ezingu-4.3, lapho izinkulungwane ze-3.5 zihlolwe. Lapho kuqeqeshwa imodeli yokugcina yolimi lwesiNgisi ye-DeepSpeech, kwasetshenziswa amahora wokukhuluma angu-3816, ngaphezu kwedatha ye-Common Voice evela kumaphrojekthi we-LibriSpeech, Fisher kanye ne-Switchboard, futhi okuhlanganisa cishe amahora angu-1700 ohlelo lomsakazo olulotshiwe.

Uma usebenzisa imodeli yolimi lwesiNgisi eseyenziwe ngomumo ehlinzekwa ukuze ilandwe, izinga lephutha lokubonwa ku-DeepSpeech lingu-7.5% uma lihlolwa ngesethi yokuhlola. I-LibriSpeech. Ukuze uqhathanise, izinga lephutha lokubonwa komuntu kulinganiselwa ku-5.83%.

I-DeepSpeech iqukethe amasistimu angaphansi amabili - imodeli ye-acoustic kanye ne-decoder. Imodeli ye-acoustic isebenzisa izindlela zokufunda zomshini ezijulile ukubala amathuba okuba izinhlamvu ezithile zibe khona kumsindo wokufakwayo. Idekhoda isebenzisa i-algorithm yosesho lwe-ray ukuze iguqule idatha yamathuba ohlamvu ibe isethulo sombhalo.

main emisha I-DeepSpeech 0.6 (igatsha elingu-0.6 alihambisani nokukhishwa kwangaphambilini futhi lidinga ikhodi nezibuyekezo zemodeli):

  • Kuhlongozwa idekhoda entsha yokusakaza-bukhoma enikeza ukusabela okuphezulu futhi ezimele kusayizi wedatha yomsindo ecutshunguliwe. Ngenxa yalokho, inguqulo entsha ye-DeepSpeech ikwazile ukwehlisa ukubambezeleka kokuqashelwa ku-260 ms, okushesha ngo-73% kunangaphambili, futhi ivumela i-DeepSpeech ukuthi isetshenziswe ezixazululweni zokuqaphela inkulumo lapho undiza.
  • Izinguquko zenziwe ku-API futhi umsebenzi wenziwe ukuhlanganisa amagama emisebenzi. Imisebenzi yengeziwe ukuze kutholwe imethadatha eyengeziwe mayelana nokuvumelanisa, okukuvumela ukuthi ungagcini nje ngokuthola ukumelwa kombhalo njengokuphumayo, kodwa futhi ukulandelela ukubopheka kwezinhlamvu ngazinye nemisho endaweni ekusakazeni komsindo.
  • Usekelo lokusebenzisa umtapo wolwazi lwengeziwe kukhithi yamathuluzi yamamojula okuqeqesha I-CuDNN ukuze kuthuthukiswe umsebenzi ngamanethiwekhi e-neural aphindaphindiwe (RNN), okwenze kwaba nokwenzeka ukuzuza ukwanda okuphawulekayo (okuphindwe kabili) ekusebenzeni kokuqeqeshwa kwemodeli, kodwa kwadinga izinguquko kukhodi ezephula ukuhambisana namamodeli alungiselelwe ngaphambilini.
  • Izimfuneko eziyisisekelo zenguqulo ye-TensorFlow zinyusiwe zisuka ku-1.13.1 zaya ku-1.14.0. Usekelo olungeziwe lwenguqulo engasindi ye-TensorFlow Lite, enciphisa usayizi wephakheji ye-DeepSpeech ukusuka ku-98 MB ukuya ku-3.7 MB. Ukuze kusetshenziswe kumadivayisi ashumekiwe kanye namaselula, usayizi wefayela elipakishwe ngemodeli nawo uncishisiwe usuka ku-188 MB waya ku-47 MB ​​(indlela yokulinganisa isetshenziselwa ukucindezela ngemuva kokuqeqeshwa kwemodeli).
  • Imodeli yolimi ihunyushelwe kufomethi yesakhiwo sedatha ehlukile evumela amafayela ukuthi afakwe kumephu kumemori uma elayishiwe. Ukusekela ifomethi endala kunqanyuliwe.
  • Imodi yokulayisha ifayela ngemodeli yolimi ishintshiwe, eye yanciphisa ukusetshenziswa kwememori futhi yanciphisa ukubambezeleka lapho kucutshungulwa isicelo sokuqala ngemva kokudala imodeli. Ngesikhathi sokusebenza, i-DeepSpeech manje idla inkumbulo encane izikhathi ezingu-22 futhi iqala izikhathi ezingu-500 ngokushesha.

    I-Mozilla yembula injini ye-DeepSpeech 0.6 yokuqaphela inkulumo

  • Amagama ayivelakancane ahlungwa kumodeli yolimi. Isamba samagama sancishiswa saba izinkulungwane ezingama-500 zamagama aziwa kakhulu atholakala embhalweni osetshenziselwa ukuqeqesha imodeli. Ukuhlanza kwenze kwaba nokwenzeka ukunciphisa usayizi wemodeli yolimi ukusuka ku-1800MB ukuya ku-900MB, kungenamthelela kusilinganiso sephutha lokubonwa.
  • Kwengezwe ukwesekwa okuhlukahlukene uchwepheshe ukudala ukuhluka okwengeziwe (ukwandisa) kwedatha yomsindo esetshenziswa ekuqeqesheni (isibonelo, ukwengeza ukuhlanekezela noma umsindo kusethi yezinketho).
  • Kwengezwe ilabhulali enezibopho zokuhlanganiswa nezinhlelo zokusebenza ezisuselwe kuplathifomu ye-NET.
  • Amadokhumenti asetshenziwe kabusha futhi manje aqoqwe kuwebhusayithi ehlukile. i-deepspeech.readthedocs.io.

Source: opennet.ru

Engeza amazwana