Zvakare zvinosanganisirwa museti
DeepSpeech iri nyore kwazvo kupfuura masisitimu echinyakare uye panguva imwechete inopa yemhando yepamusoro kuzivikanwa pamberi pekunze kweruzha. Iyo inodarika echinyakare acoustic modhi uye pfungwa yemafoni, pachinzvimbo ichishandisa yakanyanya optimized neural network-yakavakirwa muchina yekudzidza sisitimu inobvisa kudiwa kwekugadzira zvikamu zvakaparadzana kutevedzera akasiyana anomalies seruzha, echo, uye matauriro ekutaura.
Iyo yakaderera yeiyi nzira ndeyokuti kuti uwane kucherechedzwa kwepamusoro uye kudzidziswa kweneural network, iyo DeepSpeech injini inoda huwandu hukuru hwe data rehterogeneous, inotemerwa mumamiriro ezvinhu chaiwo nemanzwi akasiyana uye pamberi pehuzha.
Chirongwa chakagadzirwa muMozilla chinounganidza data rakadaro.
Chinangwa chekupedzisira cheCommon Voice project ndeyekuunganidza zviuru gumi zvemaawa ezvakarekodhwa zvemataurirwo akasiyana-siyana ezvirevo zvekutaura kwevanhu, izvo zvinobvumira kuwana nhanho inogamuchirika yekukanganisa mukuzivikanwa. Mune chimiro chayo chemazuva ano, vatori vechikamu chepurojekiti vakatotaurira maawa 10 zviuru, izvo 4.3 zviuru zvakaedzwa. Pakudzidzisa chimiro chekupedzisira chechirungu cheDeepSpeech, maawa 3.5 ekutaura akashandiswa, kuwedzera kune Common Voice inovhara data kubva kumapurojekiti eLibriSpeech, Fisher uye Switchboard, uye zvakare kusanganisira maawa angangoita 3816 erekodhi redhiyo akarekodhwa.
Paunenge uchishandisa iyo yakagadzirira-yakagadzirwa mutauro wechirungu modhi inopihwa kudhawunirodha, mwero wekukanganisa wekuzivikanwa muDeepSpeech i7.5% kana ukaongororwa netest set.
DeepSpeech ine ma subsystems maviri - acoustic modhi uye decoder. Iyo acoustic modhi inoshandisa yakadzika muchina nzira dzekudzidza kuverenga mukana wemamwe mavara aripo mune yekupinza ruzha. Iyo decoder inoshandisa ray yekutsvagisa algorithm kushandura data inogona kuitika kuva mavara anomiririra.
chikuru
- Iyo nyowani yekutepfenyura decoder inokurudzirwa inopa mhinduro yepamusoro uye yakazvimirira pahukuru hweiyo yakagadziridzwa odhiyo data. Nekuda kweizvozvo, iyo nyowani vhezheni yeDeepSpeech yakakwanisa kudzikisa latency yekuzivikanwa kune 260 ms, iyo iri 73% nekukurumidza kupfuura kare, uye inobvumira DeepSpeech kuti ishandiswe mukuzivikanwa kwekutaura mhinduro panhunzi.
- Shanduko dzakaitwa kuAPI uye basa rakaitwa kubatanidza mazita emabasa. Mabasa akawedzerwa kuti uwane mamwe metadata nezve kuwiriranisa, zvichikutendera iwe kuti usangogashira chinomiririra chinyorwa sechibuda, asi zvakare kuteedzera kusungwa kwemavara ega uye mitsara kune chinzvimbo murukova rweodhiyo.
- Rutsigiro rwekushandisa raibhurari rwakawedzerwa kune Toolkit yemamodule ekudzidzisa
CuDNN kukwidziridza basa nerecurrent neural network (RNN), izvo zvakaita kuti zvikwanise kuwana zvakakosha (zvinenge zvakapetwa kaviri) kuwedzera kwekuita kwemuenzaniso wekudzidzira, asi zvinoda shanduko kune kodhi iyo yakatyora kuenderana nemhando dzakagadzirirwa kare. - Izvo zvishoma zveTensorFlow vhezheni zvinodiwa zvakasimudzwa kubva 1.13.1 kusvika 1.14.0. Yakawedzera tsigiro yeiyo yakareruka edition yeTensorFlow Lite, iyo inoderedza saizi yeDeepSpeech package kubva pa98 MB kusvika 3.7 MB. Kuti ishandiswe pane zvakamisikidzwa uye nharembozha, saizi yefaira yakarongedzwa ine modhi zvakare yakaderedzwa kubva 188 MB kusvika 47 MB ββ(iyo quantization nzira inoshandiswa kumanikidza mushure mekunge modhi yadzidziswa).
- Mutauro wemutauro wakashandurirwa kune akasiyana data chimiro fomati inobvumira mafaera kuti apihwe mepu mundangariro kana akaremerwa. Tsigiro yefomati yekare yakamiswa.
- Nzira yekutakura faira nemutauro wemutauro yakashandurwa, iyo yakaderedza kushandiswa kwekuyeuka uye kuderedza kunonoka pakugadzirisa chikumbiro chekutanga mushure mekugadzira muenzaniso. Panguva yekushanda, DeepSpeech ikozvino inoshandisa 22 times less memory uye inotanga 500 times nekukurumidza.
- Mazwi asingawanzo wanzosefa mumutauro wemutauro. Nhamba yose yemashoko yakaderedzwa kusvika ku500 zviuru zvemashoko anozivikanwa zvikuru anowanikwa mumashoko anoshandiswa kudzidzisa muenzaniso. Kucheneswa kwacho kwakaita kuti zvikwanisike kudzikisa saizi yemhando yemutauro kubva pa1800MB kuenda pa900MB, pasina chaizokanganisa pachiyero chekuzivikanwa.
- Yakawedzerwa rutsigiro kune dzakasiyana siyana
technician kugadzira misiyano yekuwedzera (kuwedzera) kweiyoodhiyo data inoshandiswa mukudzidziswa (semuenzaniso, kuwedzera kukanganisa kana ruzha kune seti yesarudzo). - Yakawedzera raibhurari ine zvinosungirwa kuti zvibatanidzwe nemaapplication anobva pa.NET platform.
- Zvinyorwa zvakagadziridzwa uye zvino zvaunganidzwa pane imwe webhusaiti.
deepspeech.readthedocs.io .
Source: opennet.ru