Ukukhutshwa okutsha kwenkqubo yentetho yeSilero

Ukukhutshwa koluntu olutsha lweSilero Text-to-Speech neural network synthesis intetho yenkqubo iyafumaneka. Le projekthi ijolise ngokukodwa ekudaleni inkqubo yentetho yanamhlanje, ephezulu, engekho ngaphantsi kwezisombululo zorhwebo ezivela kumaqumrhu kwaye ifikeleleke kuwo wonke umntu ngaphandle kokusetyenziswa kwezixhobo zeseva ezibizayo.

Iimodeli zisasazwa phantsi kwelayisensi ye-GNU AGPL, kodwa inkampani ephuhlisa iprojekthi ayivezi indlela yokuqeqesha iimodeli. Ukusebenzisa, ungasebenzisa iPyTorch kunye nezakhelo ezixhasa ifomathi ye-ONNX. Ukudityaniswa kwentetho kwiSilero kusekwe kusetyenziso lwe-algorithms yenethiwekhi ye-neural eguqulwe ngokunzulu kunye neendlela zokusetyenzwa komqondiso wedijithali.

Kuqatshelwe ukuba eyona ngxaki iphambili yezisombululo zenethiwekhi yanamhlanje ye-neural synthesis yentetho kukuba zihlala zifumaneka kuphela kwizisombululo zamafu ezihlawulweyo, kwaye iimveliso zoluntu zineemfuno eziphezulu zehardware, zikumgangatho ophantsi, okanye aziphelelanga kwaye azilungele ukusetyenziswa. iimveliso. Ngokomzekelo, ukuqhuba enye yezakhiwo ezintsha ezidumileyo zokuphela-to-ekupheleni kwe-synthesis, i-VITS, ngokutyibilikayo kwimodi yokudibanisa (oko kukuthi, kungekhona ukuqeqeshwa kwemodeli), amakhadi evidiyo ane-gigabytes engaphezu kwe-16 ye-VRAM iyafuneka.

Ngokuchasene nendlela yangoku, izisombululo zeSilero zisebenza ngempumelelo nakwi-1 x86 intambo ye-Intel processor enemiyalelo ye-AVX2. Kwimisonto emi-4 yeprosesa, i-synthesis ikuvumela ukuba udibanise ukusuka kwi-30 ukuya kwimizuzwana engama-60 ngesekhondi kwimowudi ye-8 kHz ye-synthesis, kwimodi ye-24 kHz - imizuzwana eyi-15-20, kwaye kwimo ye-48 kHz - malunga nemizuzwana eyi-10.

Iimpawu eziphambili zokukhutshwa kweSilero:

  • Ubungakanani bemodeli buncitshiswe ngamaxesha angama-2 ukuya kuma-megabytes angama-50;
  • Iimodeli ziyakwazi ukunqumama;
  • Amazwi ama-4 akumgangatho ophezulu ngesiRashiya ayafumaneka (kunye nenani elingenasiphelo lamagama angenasiphelo). Imizekelo yokubiza amagama;
  • Iimodeli ziye zaba ngamaxesha angama-10 ngokukhawuleza kwaye, umzekelo, kwimodi ye-24 kHz zikuvumela ukuba udibanise ukuya kwimizuzwana engama-20 yomsindo ngesibini kwimicu yeprosesa ye-4;
  • Zonke iinketho zelizwi zolwimi olunye zipakishwe kwimodeli enye;
  • Iimodeli zinokuvuma yonke imihlathi yombhalo njengegalelo, iithegi ze-SSML ziyaxhaswa;
  • I-synthesis isebenza kanye kwii-frequencies ezintathu zesampulu ukukhetha kuzo - 8, 24 kunye ne-48 kilohertz;
  • "Iingxaki zabantwana" ziye zasonjululwa: ukungazinzi kunye namagama alahlekileyo;
  • Iiflegi ezongeziweyo ukulawula ukubekwa okuzenzekelayo kwee-accents kunye nokubekwa kweleta "е".

Okwangoku, kwinguqulelo entsha ye-synthesis, amazwi ama-4 ngesiRashiya ayafumaneka esidlangalaleni, kodwa kungekudala inguqulelo elandelayo iya kupapashwa ngolu tshintsho lulandelayo:

  • Izinga lokudibanisa liya kwandisa amaxesha angama-2-4;
  • Iimodeli ze-synthesis zeelwimi zeCIS ziya kuhlaziywa: iKalmyk, isiTatar, isiUzbek kunye nesiUkraine;
  • Iimodeli zeelwimi zaseYurophu ziya kongezwa;
  • Iimodeli zeelwimi zaseIndiya ziya kongezwa;
  • Iimodeli zesiNgesi ziya kongezwa.

Olunye lwenkqubo eyonakalisayo kwi-Silero synthesis:

  • Ngokungafaniyo nezinye izisombululo zemveli ze-synthesis ezifana ne-RHVoice, i-Silero synthesis ayinayo indibaniselwano ye-SAPI, abaxhasi ekulula ukuyifaka, okanye ukudityaniswa kwe-Windows kunye ne-Android;
  • Isantya, nangona siphezulu ngendlela engazange ibonwe ngaphambili kwisisombululo esinjalo, sisenokunganeli kwi-on-the-fly synthesis kwiiprosesa ezibuthathaka kumgangatho ophezulu;
  • Isisombululo se-auto-accent asiyiphathi i-homographs (amagama afana nenqaba kunye nenqaba) kwaye isenza iimpazamo, kodwa oku kuya kulungiswa ekukhutshweni kwexesha elizayo;
  • Uguqulelo lwangoku lwe-synthesis alusebenzi kwiiprosesa ngaphandle kwemiyalelo ye-AVX2 (okanye kufuneka utshintshe ngokuthe ngqo izicwangciso zePyTorch) kuba enye yeemodyuli ngaphakathi kwimodeli ibalwa;
  • Uguqulelo lwangoku lwe-synthesis lunokuxhomekeka kwi-PyTorch enye; konke ukuxutywa "ku-hardwired" ngaphakathi kwemodeli kunye neepakethe ze-JIT. Iikhowudi zemvelaphi yeemodeli azipapashwa, kunye nekhowudi yokuqhuba imifuziselo evela kubaxhasi bePyTorch kwezinye iilwimi;
  • I-Libtorch, ekhoyo kwiiplatifti zeselula, ininzi kakhulu kune-ONNX yexesha lokusebenza, kodwa inguqulo ye-ONNX yemodeli ayikabikho.

umthombo: opennet.ru

Yongeza izimvo