I-NVIDIA itshala izigidi ezingu-$1.5 kuphrojekthi ye-Mozilla Common Voice. Intshisekelo ezinhlelweni zokuqaphela inkulumo isukela ekubikezelweni ukuthi phakathi neminyaka eyishumi ezayo, ubuchwepheshe bezwi buzoba enye yezindlela eziyinhloko abantu abasebenzisa ngazo amadivaysi ahlukahlukene, kusukela kumakhompyutha namafoni kuya kubasizi bedijithali nemishini yokuthengisa.
Ukusebenza kwamasistimu ezwi kuncike kakhulu kuvolumu nokuhlukahluka kwedatha yezwi etholakalayo kumamodeli okufunda omshini wokuqeqesha. Ubuchwepheshe bezwi bamanje bugxile kakhulu ekuqashelweni kolimi lwesiNgisi futhi abubandakanyi izilimi eziningi, amaphimbo, namaphethini okukhuluma. Ukutshalwa kwezimali kuzosiza ukusheshisa ukukhula kwedatha yezwi etholakala esidlangalaleni, kubandakanye imiphakathi eminingi namavolontiya, futhi kwandise inani lezisebenzi zesikhathi esigcwele zephrojekthi.
Ake sikukhumbuze ukuthi iphrojekthi Yezwi Elivamile ihloselwe ukuhlela umsebenzi ohlanganyelwe ukuze kuqoqwe isizindalwazi samaphethini ezwi acabangela ukuhlukahluka kwamazwi nezitayela zokukhuluma. Abasebenzisi bayamenywa ukuthi bakhulume imishwana yezwi eboniswe esikrinini noma bahlole ikhwalithi yedatha engezwe abanye abasebenzisi. Isizindalwazi esiqoqiwe esinamarekhodi okuphimisela okuhlukahlukene kwemishwana evamile yenkulumo yomuntu ingasetshenziswa ngaphandle kwemikhawulo ezinhlelweni zokufunda zomshini kanye namaphrojekthi ocwaningo.
I-Common Voice dataset okwamanje ihlanganisa amasampula okuphimisa avela kubantu abangaphezu kuka-164, ahlanganisa cishe amahora angu-9 wedatha yezwi ngezilimi ezingu-60 ezahlukene. Idathasethi yesiRashiya ihlanganisa ababambiqhaza abangu-1412 namahora angu-111 wento yokukhuluma, kuyilapho isethi yedatha yase-Ukraine ihlanganisa ababambiqhaza abangu-459 namahora angu-30. Uma kuqhathaniswa, abantu abangaphezu kuka-66 banikele ngamahora angu-1686 enkulumo eqinisekisiwe kudathasethi yesiNgisi. Lawa madathasethi angasetshenziswa kumasistimu okufunda omshini ukuze akhe ukunakwa kwenkulumo namamodeli okuhlanganiswa. Idatha ishicilelwa esizindeni somphakathi (CC0).
Ngokusho kombhali womtapo wolwazi weVosk oqhubekayo wokuqashelwa kwenkulumo, ububi besethi ye-Common Voice wuhlangothi olulodwa lwezwi lezwi (ubukhulu babantu besilisa abaneminyaka engama-20-30 ubudala, kanye nokuntuleka kwezinto ezinamazwi abesifazane. , izingane kanye nabantu abadala), ukuntuleka kokuhlukahluka kusichazamazwi (ukuphindaphindwa kwemisho efanayo) kanye nokusatshalaliswa kokurekhodiwe ngefomethi ye-MP3 ehlanekezelwe.
Source: opennet.ru
