Mozilla inoburitsa DeepSpeech 0.6 yekuziva matauriro injini

Introduced kuburitswa kweinjini yekuziva kutaura kwakagadzirwa neMozilla DeepSpeech 0.6, iyo inoshandisa chiziviso chekutaura chezita rimwe chete, proposed nevatsvakurudzi vanobva kuBaidu. Kuitwa kwacho kwakanyorwa muPython uchishandisa TensorFlow muchina kudzidza chikuva uye inoparadzirwa ne pasi peyemahara MPL 2.0 rezinesi. Inotsigira basa paLinux, Android, macOS uye Windows. Kuita kwakakwana kushandisa injini paLePotato, Raspberry Pi 3 uye Raspberry Pi 4 mabhodhi.

Zvakare zvinosanganisirwa museti akapa mhando dzakadzidziswa, mienzaniso mafaira anonzwika uye maturusi ekuzivikanwa kubva pamutsetse wekuraira. Kuti ubatanidze basa rekucherechedzwa kwekutaura muzvirongwa zvako, kugadzirira-kushandisa-modules yePython, NodeJS, C++ uye .NET inopihwa (vagadziri vebato rechitatu vakagadzirira zvakasiyana mamodules e. ngura ΠΈ Go). Iyo yakapedzwa modhi inopihwa kuChirungu chete, asi kune mimwe mitauro ne attached mirayiridzo unogona kudzidzisa sisitimu iwe uchishandisa izwi data, yakaunganidzwa neCommon Voice project.

DeepSpeech iri nyore kwazvo kupfuura masisitimu echinyakare uye panguva imwechete inopa yemhando yepamusoro kuzivikanwa pamberi pekunze kweruzha. Iyo inodarika echinyakare acoustic modhi uye pfungwa yemafoni, pachinzvimbo ichishandisa yakanyanya optimized neural network-yakavakirwa muchina yekudzidza sisitimu inobvisa kudiwa kwekugadzira zvikamu zvakaparadzana kutevedzera akasiyana anomalies seruzha, echo, uye matauriro ekutaura.

Iyo yakaderera yeiyi nzira ndeyokuti kuti uwane kucherechedzwa kwepamusoro uye kudzidziswa kweneural network, iyo DeepSpeech injini inoda huwandu hukuru hwe data rehterogeneous, inotemerwa mumamiriro ezvinhu chaiwo nemanzwi akasiyana uye pamberi pehuzha.
Chirongwa chakagadzirwa muMozilla chinounganidza data rakadaro. common voice, ichipa dataset yakasimbiswa ine 780 maawa e Mutauro weChirungu, 325 muchiGerman, 173 muchiFrench uye maawa 27 muchiRussia.

Chinangwa chekupedzisira cheCommon Voice project ndeyekuunganidza zviuru gumi zvemaawa ezvakarekodhwa zvemataurirwo akasiyana-siyana ezvirevo zvekutaura kwevanhu, izvo zvinobvumira kuwana nhanho inogamuchirika yekukanganisa mukuzivikanwa. Mune chimiro chayo chemazuva ano, vatori vechikamu chepurojekiti vakatotaurira maawa 10 zviuru, izvo 4.3 zviuru zvakaedzwa. Pakudzidzisa chimiro chekupedzisira chechirungu cheDeepSpeech, maawa 3.5 ekutaura akashandiswa, kuwedzera kune Common Voice inovhara data kubva kumapurojekiti eLibriSpeech, Fisher uye Switchboard, uye zvakare kusanganisira maawa angangoita 3816 erekodhi redhiyo akarekodhwa.

Paunenge uchishandisa iyo yakagadzirira-yakagadzirwa mutauro wechirungu modhi inopihwa kudhawunirodha, mwero wekukanganisa wekuzivikanwa muDeepSpeech i7.5% kana ukaongororwa netest set. LibriSpeech. Kuenzanisa, mwero wekukanganisa wekuzivikanwa kwevanhu inofungidzirwa pa5.83%.

DeepSpeech ine ma subsystems maviri - acoustic modhi uye decoder. Iyo acoustic modhi inoshandisa yakadzika muchina nzira dzekudzidza kuverenga mukana wemamwe mavara aripo mune yekupinza ruzha. Iyo decoder inoshandisa ray yekutsvagisa algorithm kushandura data inogona kuitika kuva mavara anomiririra.

chikuru zvitsva DeepSpeech 0.6 (0.6 bazi harienderane nezvakaburitswa kare uye inoda kodhi uye modhi inogadziridza):

  • Iyo nyowani yekutepfenyura decoder inokurudzirwa inopa mhinduro yepamusoro uye yakazvimirira pahukuru hweiyo yakagadziridzwa odhiyo data. Nekuda kweizvozvo, iyo nyowani vhezheni yeDeepSpeech yakakwanisa kudzikisa latency yekuzivikanwa kune 260 ms, iyo iri 73% nekukurumidza kupfuura kare, uye inobvumira DeepSpeech kuti ishandiswe mukuzivikanwa kwekutaura mhinduro panhunzi.
  • Shanduko dzakaitwa kuAPI uye basa rakaitwa kubatanidza mazita emabasa. Mabasa akawedzerwa kuti uwane mamwe metadata nezve kuwiriranisa, zvichikutendera iwe kuti usangogashira chinomiririra chinyorwa sechibuda, asi zvakare kuteedzera kusungwa kwemavara ega uye mitsara kune chinzvimbo murukova rweodhiyo.
  • Rutsigiro rwekushandisa raibhurari rwakawedzerwa kune Toolkit yemamodule ekudzidzisa CuDNN kukwidziridza basa nerecurrent neural network (RNN), izvo zvakaita kuti zvikwanise kuwana zvakakosha (zvinenge zvakapetwa kaviri) kuwedzera kwekuita kwemuenzaniso wekudzidzira, asi zvinoda shanduko kune kodhi iyo yakatyora kuenderana nemhando dzakagadzirirwa kare.
  • Izvo zvishoma zveTensorFlow vhezheni zvinodiwa zvakasimudzwa kubva 1.13.1 kusvika 1.14.0. Yakawedzera tsigiro yeiyo yakareruka edition yeTensorFlow Lite, iyo inoderedza saizi yeDeepSpeech package kubva pa98 MB kusvika 3.7 MB. Kuti ishandiswe pane zvakamisikidzwa uye nharembozha, saizi yefaira yakarongedzwa ine modhi zvakare yakaderedzwa kubva 188 MB kusvika 47 MB ​​(iyo quantization nzira inoshandiswa kumanikidza mushure mekunge modhi yadzidziswa).
  • Mutauro wemutauro wakashandurirwa kune akasiyana data chimiro fomati inobvumira mafaera kuti apihwe mepu mundangariro kana akaremerwa. Tsigiro yefomati yekare yakamiswa.
  • Nzira yekutakura faira nemutauro wemutauro yakashandurwa, iyo yakaderedza kushandiswa kwekuyeuka uye kuderedza kunonoka pakugadzirisa chikumbiro chekutanga mushure mekugadzira muenzaniso. Panguva yekushanda, DeepSpeech ikozvino inoshandisa 22 times less memory uye inotanga 500 times nekukurumidza.

    Mozilla inoburitsa DeepSpeech 0.6 yekuziva matauriro injini

  • Mazwi asingawanzo wanzosefa mumutauro wemutauro. Nhamba yose yemashoko yakaderedzwa kusvika ku500 zviuru zvemashoko anozivikanwa zvikuru anowanikwa mumashoko anoshandiswa kudzidzisa muenzaniso. Kucheneswa kwacho kwakaita kuti zvikwanisike kudzikisa saizi yemhando yemutauro kubva pa1800MB kuenda pa900MB, pasina chaizokanganisa pachiyero chekuzivikanwa.
  • Yakawedzerwa rutsigiro kune dzakasiyana siyana technician kugadzira misiyano yekuwedzera (kuwedzera) kweiyoodhiyo data inoshandiswa mukudzidziswa (semuenzaniso, kuwedzera kukanganisa kana ruzha kune seti yesarudzo).
  • Yakawedzera raibhurari ine zvinosungirwa kuti zvibatanidzwe nemaapplication anobva pa.NET platform.
  • Zvinyorwa zvakagadziridzwa uye zvino zvaunganidzwa pane imwe webhusaiti. deepspeech.readthedocs.io.

Source: opennet.ru

Voeg