Ukukhishwa okusha kwesistimu yokuhlanganisa inkulumo ye-Silero

Ukukhishwa okusha okusesidlangalaleni kwesistimu ye-Silero Text-to-Speech neural synthesis yenkulumo kuyatholakala. Le phrojekthi ihloselwe ngokuyinhloko ukudala uhlelo lwesimanje, olusezingeni eliphezulu lokuhlanganisa inkulumo engekho ngaphansi kwezixazululo zezentengiselwano ezivela ezinkampanini futhi lufinyeleleka kuwo wonke umuntu ngaphandle kokusebenzisa imishini yeseva ebizayo.

Amamodeli asatshalaliswa ngaphansi kwelayisensi ye-GNU AGPL, kodwa inkampani ethuthukisa iphrojekthi ayilidaluli indlela yokuqeqesha amamodeli. Ukuze uqalise, ungasebenzisa i-PyTorch nezinhlaka ezisekela ifomethi ye-ONNX. Ukuhlanganiswa kwenkulumo ku-Silero kusekelwe ekusetshenzisweni kwe-algorithms yenethiwekhi ye-neural elungiswe ngokujulile nezindlela zokucubungula isignali yedijithali.

Kuyaphawulwa ukuthi inkinga eyinhloko yezixazululo zesimanje ze-neural network zokuhlanganiswa kwenkulumo ukuthi zivame ukutholakala kuphela ngaphakathi kwezixazululo zamafu ezikhokhelwayo, futhi imikhiqizo yomphakathi inezidingo eziphezulu ze-hardware, ikhwalithi ephansi, noma ayiphelele futhi ayilungele ukusetshenziswa. imikhiqizo. Isibonelo, ukusebenzisa enye yezakhiwo ezintsha ezidumile zokuphela-kuya-ekupheleni, i-VITS, ngokushelela kumodi yokuhlanganisa (okungukuthi, hhayi ukuqeqeshwa okuyimodeli), amakhadi wevidiyo anamagigabhayithi e-VRAM angaphezu kuka-16 ayadingeka.

Ngokuphambene nendlela yamanje, izixazululo ze-Silero zisebenza ngempumelelo ngisho nakuchungechunge olu-1 x86 lwe-Intel processor enemiyalelo ye-AVX2. Emicu yeprosesa engu-4, ukuhlanganisa kukuvumela ukuthi uhlanganise imizuzwana engama-30 kuye kwengama-60 ngomzuzwana kumodi yokuhlanganisa engu-8 kHz, kumodi engu-24 kHz - imizuzwana engu-15-20, futhi kumodi engu-48 kHz - cishe imizuzwana eyi-10.

Izici ezibalulekile zokukhishwa okusha kwe-Silero:

  • Usayizi wemodeli uncishiswe izikhathi ezi-2 kuya ku-50 megabytes;
  • Amamodeli ayakwazi ukuma kancane;
  • 4 amazwi ekhwalithi ephezulu ngesiRashiya ayatholakala (kanye nenani elingenamkhawulo lalawo angahleliwe). Izibonelo zokuphimisa;
  • Amamodeli aseshesha izikhathi ezingu-10 futhi, isibonelo, kumodi engu-24 kHz akuvumela ukuthi uhlanganise imizuzwana engu-20 yomsindo ngomzuzwana emicu ye-processor engu-4;
  • Zonke izinketho zezwi zolimi olulodwa zihlanganiswa zibe yimodeli eyodwa;
  • Amamodeli angamukela zonke izigaba zombhalo njengokufakwayo, amathegi e-SSML asekelwa;
  • I-synthesis isebenza ngesikhathi esisodwa kumafrikhwensi amasampula ongakhetha kuwo - 8, 24 kanye no-48 kilohertz;
  • "Izinkinga zezingane" sezixazululiwe: ukungazinzi namagama angekho;
  • Kwengezwe amafulegi ukuze kulawulwe ukubekwa okuzenzakalelayo kwamaphikseli kanye nokubekwa kohlamvu “е”.

Njengamanje, enguqulweni entsha ye-synthesis, amazwi angu-4 ngesiRashiya atholakala esidlangalaleni, kodwa maduzane inguqulo elandelayo izoshicilelwa nezinguquko ezilandelayo:

  • Izinga lokuhlanganisa lizokwandisa izikhathi ezingu-2-4;
  • Amamodeli we-synthesis wezilimi ze-CIS azobuyekezwa: i-Kalmyk, isiTatar, isi-Uzbek nesi-Ukraine;
  • Amamodeli wezilimi zaseYurophu azongezwa;
  • Amamodeli wezilimi zaseNdiya azokwengezwa;
  • Amamodeli esiNgisi azokwengezwa.

Okunye ukwehlukana kwesistimu okutholakala ku-Silero synthesis:

  • Ngokungafani nezixazululo ezijwayelekile zokuhlanganiswa ezifana ne-RHVoice, i-Silero synthesis ayinakho ukuhlanganiswa kwe-SAPI, amaklayenti afakeka kalula, noma ukuhlanganiswa kwe-Windows ne-Android;
  • Isivinini, nakuba siphezulu ngendlela engakaze ibonwe ngaphambili kusixazululo esinjalo, singase singanele ukuhlanganiswa kwe-on-the-fly kumaphrosesa abuthakathaka ngekhwalithi ephezulu;
  • Isixazululo se-auto-accent asiwabambi ama-homographs (amagama afana nenqaba nenqaba) futhi senza amaphutha, kodwa lokhu kuzolungiswa ekukhishweni okuzayo;
  • Inguqulo yamanje ye-synthesis ayisebenzi kumaphrosesa ngaphandle kwemiyalelo ye-AVX2 (noma udinga ukushintsha ngokuqondile izilungiselelo ze-PyTorch) ngoba enye yamamojula ngaphakathi kwemodeli ilinganiselwe;
  • Inguqulo yamanje ye-synthesis empeleni inokuncika kwe-PyTorch eyodwa; konke ukugxusha "ku-hardwired" ngaphakathi kwemodeli kanye namaphakheji e-JIT. Amakhodi omthombo wamamodeli awashicilelwa, kanye nekhodi yokusebenzisa amamodeli avela kumakhasimende e-PyTorch kwezinye izilimi;
  • I-Libtorch, etholakala kumapulatifomu eselula, inkulu kakhulu kunesikhathi sokusebenza se-ONNX, kodwa inguqulo ye-ONNX yemodeli ayikatholakali.

Source: opennet.ru

Engeza amazwana