Kusintha kwa Mozilla Common Voice 9.0

Mozilla yatulutsa zosintha zamaseti ake a Common Voice, omwe akuphatikiza zitsanzo zamatchulidwe kuchokera kwa anthu pafupifupi 200. Zambiri zimasindikizidwa ngati gulu la anthu (CC0). Ma seti omwe akufuna angagwiritsidwe ntchito pamakina ophunzirira kuti apange kuzindikira kwamawu ndi mitundu yophatikizika.

Poyerekeza ndi zosintha zam'mbuyomu, kuchuluka kwa zolankhula zomwe zidasonkhanitsidwa zidakwera ndi 10% - kuchokera pa 18.2 mpaka 20.2 maola masauzande olankhula. Chiwerengero cha zilankhulo zothandizidwa chawonjezeka kuchokera ku 87 kufika ku 93. Kwa zilankhulo za 27, maola oposa 100 a mauthenga amawu asonkhanitsidwa, ndipo kwa 9 - maola oposa 500 a deta yolankhula. Kwa zilankhulo 9 zinali zothekanso kukwaniritsa gawo la mawu achikazi osachepera 45%.

Anthu opitilira 81 adatenga nawo gawo pokonzekera zida mu Chingerezi, kulamula maola 2953 akulankhula (anali nawo 79 ndi maola 2886). Kukonzekera kwa chinenero cha Chibelarusi kumakhudza anthu 6326 ndi maola 1054 a zolankhula (anali 6160 ndi maola 987), Russian - 2585 otenga nawo mbali ndi maola 201 (anali 2452 ndi maola 193), Uzbek - maola 1503 ndi maola 231. panali 1355 otenga nawo mbali 227 maola), Chiyukireniya chinenero - 696 ophunzira ndi 79 maola (anali 684 ophunzira ndi 76 maola).

Pulojekiti ya Common Voice ikufuna kulinganiza ntchito yolumikizana kuti ipeze nkhokwe ya mawu omwe amaganizira za kusiyanasiyana kwa mawu ndi masitayilo olankhulira. Ogwiritsa ntchito amapemphedwa kuti azilankhula mawu omwe akuwonetsedwa pazenera kapena kuwunika kuchuluka kwa data yomwe yawonjezeredwa ndi ogwiritsa ntchito ena. Dongosolo lankhokwe losanjidwa lomwe lili ndi katchulidwe kosiyanasiyana ka mawu amunthu atha kugwiritsidwa ntchito popanda zoletsa pamakina ophunzirira makina ndi ntchito zofufuza.

Source: opennet.ru

Kuwonjezera ndemanga