Kutulutsidwa kwatsopano kwa Silero speech synthesis system

Kutulutsidwa kwatsopano kwapoyera kwa Silero Text-to-Speech neural network synthesis synthesis system ilipo. Pulojekitiyi makamaka ikufuna kupanga njira yamakono yopangira mawu apamwamba kwambiri omwe sali otsika kuposa njira zamalonda kuchokera kumakampani ndipo amapezeka kwa aliyense popanda kugwiritsa ntchito zipangizo zamtengo wapatali za seva.

Zitsanzo zimagawidwa pansi pa layisensi ya GNU AGPL, koma kampani yomwe ikupanga pulojekitiyi siulula njira zophunzitsira. Kuti mutsegule, mutha kugwiritsa ntchito PyTorch ndi mafelemu omwe amathandizira mawonekedwe a ONNX. Kuphatikizika kwamawu ku Silero kumatengera kugwiritsa ntchito ma algorithms amakono osinthika a neural network ndi njira zosinthira ma digito.

Zikudziwika kuti vuto lalikulu lamakono amakono a neural network zothetsera kaphatikizidwe ka mawu ndikuti nthawi zambiri amapezeka ngati gawo la mayankho olipidwa amtambo, ndipo zinthu zapagulu zimakhala ndi zofunikira za hardware, zotsika kapena zosamalizidwa komanso zokonzeka kugwiritsa ntchito. . Mwachitsanzo, kuti muthamangitse mosasunthika imodzi mwazomangamanga zatsopano zoyambira kumapeto mpaka kumapeto, VITS, mumayendedwe ophatikizika (ndiko kuti, osati maphunziro achitsanzo), makadi apakanema okhala ndi ma gigabytes opitilira 16 a VRAM amafunikira.

Mosiyana ndi zomwe zikuchitika pano, mayankho a Silero amayenda bwino ngakhale pa ulusi wa 1 x86 wa purosesa wa Intel wokhala ndi malangizo a AVX2. Pa ulusi 4 wa purosesa, kaphatikizidwe kumakupatsani mwayi wopanga masekondi 30 mpaka 60 pamphindikati mu 8 kHz kaphatikizidwe mumachitidwe 24 kHz - 15-20 masekondi, ndi 48 kHz - pafupifupi masekondi 10.

Zofunikira za mtundu watsopano wa Silero:

  • Kukula kwachitsanzo kumachepetsedwa ndi 2 nthawi mpaka 50 megabytes;
  • Ojambula amadziwa kuyimitsa;
  • Mawu 4 apamwamba kwambiri mu Chirasha akupezeka (ndipo osawerengeka osawerengeka). Zitsanzo zamatchulidwe;
  • Zitsanzo zakhala nthawi 10 mofulumira ndipo, mwachitsanzo, mu 24 kHz mode, zimatha kupanga masekondi 20 a audio pa sekondi imodzi pa ulusi wa purosesa wa 4;
  • Zosankha zonse zamawu za chilankhulo chimodzi zimadzaza ndi mtundu umodzi;
  • Mitundu imatha kuvomereza ndime zonse zamawu ngati zolowetsa, ma tag a SSML amathandizidwa;
  • Kaphatikizidwe amagwira ntchito nthawi yomweyo mumiyezo itatu yosankha kuchokera - 8, 24 ndi 48 kilohertz;
  • Kuthetsa "mavuto a ana": kusakhazikika ndi kulephera kwa mawu;
  • Onjezani mbendera kuti aziwongolera kuyika kwa mawu ndi kuyika kwa chilembo "Ρ‘".

Tsopano kwa mtundu watsopano wa kaphatikizidwe, mawu 4 mu Chirasha akupezeka pagulu, koma mtundu wotsatira udzasindikizidwa posachedwa ndi zosintha zotsatirazi:

  • Mlingo wa kaphatikizidwe udzawonjezeka ndi zina 2-4;
  • Mitundu ya kaphatikizidwe ya zilankhulo za CIS idzasinthidwa: Kalmyk, Tatar, Uzbek ndi Chiyukireniya;
  • Zitsanzo za zilankhulo za ku Ulaya zidzawonjezedwa;
  • Zitsanzo za zilankhulo zaku India zidzawonjezedwa;
  • Zitsanzo za Chingerezi zidzawonjezedwa.

Zina mwazowonongeka zamakina zomwe zimachokera ku Silero synthesis ndi:

  • Mosiyana ndi njira zopangira zachikhalidwe monga RHVoice, Silero synthesis ilibe kuphatikiza kwa SAPI, makasitomala osavuta kukhazikitsa, ndi kuphatikiza kwa Windows ndi Android;
  • Kuthamanga, ngakhale kuti sikunakhaleko kwapamwamba kwambiri kwa yankho lotere, sikungakhale kokwanira pa-fly synthesis pa mapurosesa ofooka apamwamba;
  • Njira yothetsera vuto lachidziwitso silimagwiritsira ntchito ma homographs (mawu ngati Castle ndi Castle) ndipo imapangabe zolakwika, koma cholakwika ichi chidzakonzedwa m'mabuku amtsogolo;
  • Mawonekedwe amakono a kaphatikizidwe sagwira ntchito pa mapurosesa opanda malangizo a AVX2 (kapena muyenera kusintha mwachindunji makonzedwe a PyTorch), chifukwa imodzi mwa ma modules mkati mwa chitsanzo ndi quant;
  • Mtundu waposachedwa wa kaphatikizidwe kake kamakhala ndi kudalira kokha kwa PyTorch, zoyika zonse ndi "zolimba" mkati mwachitsanzo ndi phukusi la JIT. Zolemba zachitsanzo sizimasindikizidwa, komanso kachidindo koyendetsa zitsanzo kuchokera pansi pa makasitomala a PyTorch a zilankhulo zina;
  • Libtorch yomwe ilipo pamapulatifomu am'manja ndizovuta kwambiri kuposa nthawi yothamanga ya ONNX, koma mtundu wa ONNX wamtunduwu sunaperekedwebe.

Source: opennet.ru

Kuwonjezera ndemanga