Kusintha kwa Mozilla Common Voice 8.0

Mozilla yatulutsa zosintha zamaseti ake a Common Voice, omwe akuphatikiza zitsanzo zamatchulidwe kuchokera kwa anthu pafupifupi 200. Zambiri zimasindikizidwa ngati gulu la anthu (CC0). Ma seti omwe akufuna angagwiritsidwe ntchito pamakina ophunzirira kuti apange kuzindikira kwamawu ndi mitundu yophatikizika. Poyerekeza ndi zosintha zam'mbuyomu, kuchuluka kwa zolankhula zomwe zidasonkhanitsidwa zidakwera ndi 30% - kuchokera pa 13.9 mpaka 18.2 maola masauzande olankhula. Chiwerengero cha zilankhulo zothandizidwa chawonjezeka kuchoka pa 67 mpaka 87.

Kukonzekera kwa chinenero cha Chirasha kumakhudza ophunzira 2452 ndi maola 193 a zolankhula (anali 2136 ndi maola 173), kwa chinenero cha Chibelarusi - 6160 otenga nawo mbali ndi maola 987 (anali 3831 ndi maola 356), chifukwa cha chinenero cha Chiyukireniya - Otenga nawo mbali 684 ndi maola 76 (analipo 615 ndi maola 66). Anthu opitilira 79 adatenga nawo gawo pokonzekera zida mu Chingerezi, kulamula maola 2886 amalankhulidwe otsimikizika (anali nawo 75 zikwi ndi maola 2637).

Tikukumbutseni kuti pulojekiti ya Common Voice ndi cholinga chokonzekera ntchito yolumikizana kuti ipeze mndandanda wamitundu yamawu yomwe imaganizira za kusiyanasiyana kwa mawu ndi masitaelo akulankhula. Ogwiritsa ntchito amapemphedwa kuti azilankhula mawu omwe akuwonetsedwa pazenera kapena kuwunika kuchuluka kwa data yomwe yawonjezeredwa ndi ogwiritsa ntchito ena. Dongosolo lankhokwe losanjidwa lomwe lili ndi katchulidwe kosiyanasiyana ka mawu amunthu atha kugwiritsidwa ntchito popanda zoletsa pamakina ophunzirira makina ndi ntchito zofufuza. Malinga ndi mlembi wa laibulale yozindikiritsa mawu yopitilira ya Vosk, zovuta za Common Voice set ndi mbali imodzi ya mawu (kuchuluka kwa amuna azaka za 20-30, komanso kusowa kwa zinthu ndi mawu a azimayi. .

Kuphatikiza apo, titha kuzindikira kutulutsidwa kwa zida za NVIDIA NeMo 1.6, zomwe zimapereka njira zophunzirira zamakina zopangira makina ozindikira mawu, kaphatikizidwe ka mawu ndikusintha zilankhulo zachilengedwe. NeMo imaphatikizapo zitsanzo zophunzitsidwa zokonzekera kugwiritsa ntchito makina ophunzirira makina motengera PyTorch framework, yokonzedwa ndi NVIDIA pogwiritsa ntchito data ya Common Voice komanso zinenero zosiyanasiyana, kamvekedwe ka mawu ndi kalankhulidwe kosiyanasiyana. Zitsanzozi zitha kukhala zothandiza kwa ofufuza omwe akupanga njira zoyankhulirana zozikidwa pamawu, nsanja zolembera, ndi malo oimbira mafoni. Mwachitsanzo, NVIDIA NeMo imagwiritsidwa ntchito pamawu omvera a MTS ndi Sberbank. Khodi ya NeMo imalembedwa mu Python pogwiritsa ntchito PyTorch ndikugawidwa pansi pa chilolezo cha Apache 2.0.

Source: opennet.ru

Kuwonjezera ndemanga