Mozilla iwulula injini yozindikira mawu ya DeepSpeech 0.6

Yovomerezedwa ndi kutulutsidwa kwa injini yozindikira mawu yopangidwa ndi Mozilla DeepSpeech 0.6, yomwe imagwiritsa ntchito kamangidwe kozindikiritsa mawu a dzina lomwelo, zoperekedwa ndi ofufuza ochokera ku Baidu. Kukhazikitsa kumalembedwa mu Python pogwiritsa ntchito nsanja yophunzirira makina a TensorFlow ndi wogawidwa ndi pansi pa chilolezo chaulere cha MPL 2.0. Imathandizira ntchito pa Linux, Android, macOS ndi Windows. Kuchita kwake ndikokwanira kugwiritsa ntchito injini pa LePotato, Raspberry Pi 3 ndi Raspberry Pi 4 board.

Zophatikizidwanso mu seti zoperekedwa zitsanzo zophunzitsidwa, zitsanzo mafayilo amawu ndi zida zozindikiritsa kuchokera pamzere wolamula. Kuti muphatikize ntchito yozindikiritsa mawu m'mapulogalamu anu, ma module okonzeka kugwiritsa ntchito Python, NodeJS, C ++ ndi .NET amaperekedwa (opanga chipani chachitatu ali ndi ma module okonzekera padera. dzimbiri ΠΈ Go). Mtundu womalizidwa umaperekedwa m'Chingerezi chokha, koma m'zilankhulo zina ndi cholumikizidwa malangizo mukhoza kuphunzitsa dongosolo nokha ntchito data yamawu, yotengedwa ndi ntchito ya Common Voice.

DeepSpeech ndi yosavuta kuposa machitidwe achikhalidwe ndipo nthawi yomweyo imapereka kuzindikira kwapamwamba pamaso pa phokoso lachilendo. Imadutsa mitundu yamayimbidwe achikhalidwe komanso malingaliro amafoni, m'malo mwake imagwiritsa ntchito makina ophunzirira makina opangidwa ndi neural network omwe amachotsa kufunika kopanga zigawo zosiyana kuti ziwonetsere zosokoneza zosiyanasiyana monga phokoso, echo, ndi mawonekedwe amawu.

Choyipa cha njirayi ndikuti kuti tipeze kuzindikirika kwapamwamba komanso kuphunzitsidwa kwa neural network, injini ya DeepSpeech imafuna kuchuluka kwazinthu zosawerengeka, zomwe zimanenedwa muzochitika zenizeni ndi mawu osiyanasiyana komanso pamaso pa phokoso lachilengedwe.
Pulojekiti yopangidwa ku Mozilla imasonkhanitsa deta yotere. mawu wamba, kupereka deta yotsimikizika yokhala ndi maola 780 a chilankhulo chachingerezi, 325 m’Chijeremani, 173 m’Chifalansa ndi maola 27 m’Chirasha.

Cholinga chachikulu cha projekiti ya Common Voice ndikusonkhanitsa ma 10 maora masauzande ojambulidwa a matchulidwe osiyanasiyana a mawu wamba amunthu, zomwe zingathandize kukwaniritsa zolakwika zovomerezeka pakuzindikirika. M'mawonekedwe ake apano, omwe atenga nawo gawo pa polojekitiyi adalamula kale maola 4.3 zikwizikwi, omwe 3.5 sauzande adayesedwa. Pophunzitsa mtundu womaliza wa chilankhulo cha Chingerezi cha DeepSpeech, maola 3816 amalankhulidwe adagwiritsidwa ntchito, kuphatikiza pa Mauthenga Odziwika a Common Voice ochokera ku projekiti ya LibriSpeech, Fisher ndi Switchboard, komanso kuphatikiza pafupifupi maola 1700 a makanema ojambulidwa pawailesi.

Mukamagwiritsa ntchito chilankhulo cha Chingerezi chomwe chapangidwa kale kuti mutsitse, cholakwika chozindikirika mu DeepSpeech ndi 7.5% chikawunikiridwa ndi mayeso. LibriSpeech. Poyerekeza, kuchuluka kwa zolakwika pakuzindikirika kwa anthu kuyerekezera pa 5.83%.

DeepSpeech imakhala ndi ma subsystems awiri - acoustic model ndi decoder. Mtundu wamayimbidwe umagwiritsa ntchito njira zophunzirira zamakina zakuya kuwerengera mwayi wa zilembo zina kukhalapo pamawu olowera. Decoder imagwiritsa ntchito njira yofufuzira ya ray kuti isinthe kuchuluka kwa kuthekera kwa zilembo kukhala mawu oyimira.

waukulu zatsopano DeepSpeech 0.6 (Nthambi ya 0.6 sigwirizana ndi zomwe zatulutsidwa kale ndipo imafuna ma code ndi zosintha zachitsanzo):

  • Decoder yatsopano yosinthira ikuperekedwa yomwe imapereka kuyankha kwapamwamba komanso sikudalira kukula kwa data yomwe yasinthidwa. Zotsatira zake, mtundu watsopano wa DeepSpeech unatha kuchepetsa kuchedwa kwa kuzindikira kwa 260 ms, yomwe ndi 73% mofulumira kuposa kale, ndipo imalola DeepSpeech kuti igwiritsidwe ntchito muzoyankhulo zozindikiritsa mawu pa ntchentche.
  • Zosintha zapangidwa ku API ndipo ntchito yachitika kugwirizanitsa mayina a ntchito. Ntchito zawonjezedwa kuti mupeze metadata yowonjezereka yokhudzana ndi kulunzanitsa, kukulolani kuti musamangolandira mawu oyimira monga zotuluka, komanso kuti muzitha kuyang'anira kumangiriza kwa zilembo ndi ziganizo paudindo pamayendedwe amawu.
  • Thandizo logwiritsa ntchito laibulale yawonjezedwa ku zida zophunzitsira KuDNN kukhathamiritsa ntchito ndi ma recurrent neural network (RNN), zomwe zidapangitsa kuti zitheke kuchulukitsa (pafupifupi kuwirikiza kawiri) pamachitidwe ophunzitsira achitsanzo, koma zimafunikira kusintha kwa code yomwe imaphwanya kugwirizana ndi zitsanzo zomwe zidakonzedwa kale.
  • Zofunikira zochepa za mtundu wa TensorFlow zakwezedwa kuchokera ku 1.13.1 mpaka 1.14.0. Thandizo lowonjezera la kope lopepuka la TensorFlow Lite, lomwe limachepetsa kukula kwa phukusi la DeepSpeech kuchoka pa 98 MB mpaka 3.7 MB. Kuti mugwiritse ntchito pazida zophatikizika komanso zam'manja, kukula kwa fayilo yodzaza ndi mtunduwo kudachepetsedwanso kuchokera ku 188 MB mpaka 47 MB ​​(njira yowerengera imagwiritsidwa ntchito kupondereza mtunduwo utaphunzitsidwa).
  • Chilankhulo cha chinenero chamasuliridwa kumtundu wina wa data womwe umalola kuti mafayilo alowe m'makumbukidwe akasungidwa. Thandizo la mtundu wakale wasiya.
  • Njira yotsitsa fayilo yokhala ndi chilankhulo cha chilankhulo chasinthidwa, chomwe chachepetsa kukumbukira kukumbukira ndikuchepetsa kuchedwa pokonza pempho loyamba mutatha kupanga chitsanzo. Panthawi yogwira ntchito, DeepSpeech tsopano imadya kukumbukira nthawi 22 ndipo imayamba nthawi 500 mofulumira.

    Mozilla iwulula injini yozindikira mawu ya DeepSpeech 0.6

  • Mawu osowa anasefedwa mu chitsanzo chinenero. Chiwerengero chonse cha mawu chinachepetsedwa kukhala 500 zikwi za mawu otchuka omwe amapezeka m'mawu omwe amagwiritsidwa ntchito pophunzitsa chitsanzo. Kuyeretsaku kunapangitsa kuti zitheke kuchepetsa kukula kwa chilankhulo kuchokera ku 1800MB kupita ku 900MB, popanda vuto lililonse pakuzindikira zolakwika.
  • Anawonjezera thandizo zosiyanasiyana waluso kupanga kusinthika kowonjezera (kuwonjezera) kwamawu omvera omwe amagwiritsidwa ntchito pophunzitsa (mwachitsanzo, kuwonjezera kusokoneza kapena phokoso pazosankha).
  • Anawonjezera laibulale yokhala ndi zomangira zophatikiza ndi mapulogalamu otengera nsanja ya .NET.
  • Zolembazo zakonzedwanso ndipo tsopano zasonkhanitsidwa patsamba lapadera. deepspeech.readthedocs.io.

Source: opennet.ru

Kuwonjezera ndemanga