Mozilla ta buɗe injin tantance magana ta DeepSpeech 0.6

Ƙaddamar da saki injin gane magana wanda Mozilla ya kirkira DeepSpeech 0.6, wanda ke aiwatar da gine-ginen gane magana mai suna iri ɗaya, shawara daga masu bincike daga Baidu. An rubuta aiwatarwa a cikin Python ta amfani da dandamali na koyon injin TensorFlow da rarraba ta ƙarƙashin lasisin MPL 2.0 kyauta. Yana goyan bayan aiki akan Linux, Android, macOS da Windows. Ayyukan ya wadatar don amfani da injin akan allunan LePotato, Rasberi Pi 3 da Raspberry Pi 4.

Hakanan an haɗa a cikin saitin miƙa model masu horarwa, misalai fayilolin sauti da kayan aikin tantancewa daga layin umarni. Don haɗa aikin tantance magana cikin shirye-shiryenku, ana ba da samfuran shirye-shiryen amfani don Python, NodeJS, C++ da NET (masu haɓakawa na ɓangare na uku sun shirya na'urori daban-daban don Rust и Go). The ƙãre model aka kawota kawai ga Turanci, amma ga sauran harsuna ta haɗe umarnin zaka iya horar da tsarin da kanka ta amfani da bayanan murya, wanda aikin na gama gari ya tattara.

DeepSpeech ya fi sauƙi fiye da tsarin gargajiya kuma a lokaci guda yana ba da ƙimar inganci mafi girma a gaban amo mai ban mamaki. Yana ƙetare nau'ikan sauti na al'ada da tunanin sautin wayoyi, maimakon yin amfani da ingantaccen tsarin koyon injin na tushen hanyar sadarwa wanda ke kawar da buƙatar ƙirƙirar sassa daban-daban don yin ƙima iri-iri kamar su amo, echo, da fasalin magana.

Ƙarƙashin wannan hanyar ita ce don samun ƙwarewar inganci da horar da cibiyar sadarwa na jijiyoyi, injin DeepSpeech yana buƙatar babban adadin bayanai daban-daban, wanda aka tsara a cikin yanayi na ainihi ta hanyar muryoyi daban-daban kuma a gaban sautin yanayi.
Aikin da aka ƙirƙira a Mozilla yana tattara irin waɗannan bayanai. Muryar gama gari, samar da ingantaccen saitin bayanai tare da sa'o'i 780 na Harshen Turanci, 325 a cikin Jamusanci, 173 cikin Faransanci da sa'o'i 27 cikin Rashanci.

Maƙasudin aikin gama gari shine tara sa'o'i dubu 10 na rikodi na lafuzza daban-daban na jimlolin maganganun ɗan adam, waɗanda za su ba da damar cimma matakin karɓuwa na kurakurai. A halin da ake ciki yanzu, mahalarta aikin sun riga sun tsara jimillar sa'o'i 4.3, wanda aka gwada dubu 3.5. Lokacin horar da samfurin Ingilishi na ƙarshe don DeepSpeech, an yi amfani da sa'o'i 3816 na magana, ban da Muryar Jama'a da ke rufe bayanai daga ayyukan LibriSpeech, Fisher da Switchboard, gami da kusan sa'o'i 1700 na rikodin nunin rediyo da aka rubuta.

Lokacin amfani da samfurin Ingilishi da aka shirya don saukewa, ƙimar kuskuren ganewa a cikin DeepSpeech shine 7.5% lokacin da aka tantance tare da saitin gwaji. Maganar Librin. Don kwatanta, ƙimar kuskure don gane ɗan adam kiyasta a 5.83%.

DeepSpeech ya ƙunshi tsarin ƙasa guda biyu - ƙirar sauti da mai ƙididdigewa. Samfurin acoustic yana amfani da hanyoyin koyon injin zurfi don ƙididdige yuwuwar wasu haruffa su kasance a cikin sautin shigarwar. Mai ƙididdigewa yana amfani da algorithm search ray don musanya bayanan yuwuwar haruffa zuwa wakilcin rubutu.

Main sababbin abubuwa DeepSpeech 0.6 (reshen 0.6 bai dace da abubuwan da suka gabata ba kuma yana buƙatar sabuntawa da sabuntawa):

  • An gabatar da sabon na'urar dikodi mai yawo wanda ke ba da amsa mafi girma kuma ya kasance mai zaman kansa daga girman bayanan mai jiwuwa da aka sarrafa. A sakamakon haka, sabon sigar DeepSpeech ya sami nasarar rage jinkirin fitarwa zuwa 260 ms, wanda shine 73% sauri fiye da da, kuma yana ba da damar yin amfani da DeepSpeech a cikin hanyoyin tantance magana akan tashi.
  • An yi canje-canje ga API kuma an yi aiki don haɗa sunayen ayyuka. An ƙara ayyuka don samun ƙarin metadata game da aiki tare, yana ba ku damar karɓar wakilcin rubutu azaman fitarwa kawai, amma kuma don bin diddigin ɗaurin haruffa ɗaya da jimloli zuwa matsayi a cikin rafi mai jiwuwa.
  • An ƙara tallafi don amfani da ɗakin karatu a cikin kayan aikin kayan aiki don ƙirar horo KuDNN don haɓaka aiki tare da cibiyoyin sadarwa na yau da kullun (RNN), wanda ya ba da damar samun haɓaka mai mahimmanci (kimanin ninki biyu) a cikin aikin horar da ƙirar, amma yana buƙatar canje-canje ga lambar da ta keta daidaituwa tare da samfuran da aka shirya a baya.
  • An ɗaga mafi ƙarancin buƙatun sigar TensorFlow daga 1.13.1 zuwa 1.14.0. Ƙara goyon baya don bugu mai sauƙi na TensorFlow Lite, wanda ke rage girman kunshin DeepSpeech daga 98 MB zuwa 3.7 MB. Don amfani akan na'urorin da aka haɗa da na hannu, girman fayil ɗin da aka haɗa tare da ƙirar kuma an rage shi daga 188 MB zuwa 47 MB ​​(ana amfani da hanyar ƙididdigewa don matsawa bayan an horar da ƙirar).
  • An fassara samfurin harshe zuwa tsarin tsarin bayanai na daban wanda ke ba da damar yin taswirar fayiloli zuwa ƙwaƙwalwar ajiya lokacin da aka loda. An daina goyan bayan tsohon tsari.
  • An canza yanayin ɗora fayil ɗin tare da ƙirar harshe, wanda ya rage yawan ƙwaƙwalwar ajiya da rage jinkiri lokacin sarrafa buƙatun farko bayan ƙirƙirar samfurin. Yayin aiki, DeepSpeech yanzu yana cinye ƙananan ƙwaƙwalwar ajiya sau 22 kuma yana farawa sau 500 cikin sauri.

    Mozilla ta buɗe injin tantance magana ta DeepSpeech 0.6

  • An tace kalmomin da ba safai ba a cikin ƙirar harshe. An rage jimlar adadin kalmomi zuwa 500 dubu daga cikin shahararrun kalmomin da aka samu a cikin rubutun da aka yi amfani da su don horar da samfurin. Tsaftacewa ya ba da damar rage girman ƙirar harshe daga 1800MB zuwa 900MB, ba tare da kusan wani tasiri akan ƙimar kuskuren ganewa ba.
  • Ƙara tallafi don daban-daban technic ƙirƙirar ƙarin bambance-bambance (ƙara) na bayanan sauti da aka yi amfani da su a cikin horo (misali, ƙara murdiya ko hayaniya zuwa saitin zaɓuɓɓuka).
  • An ƙara ɗakin karatu tare da ɗaure don haɗawa tare da aikace-aikace bisa tushen dandalin NET.
  • An sake yin aikin takaddun kuma yanzu ana tattara su akan wani gidan yanar gizon daban. zurfafa magana. karantathedocs.io.

source: budenet.ru

Add a comment