I-Facebook ishicilela imodeli yokuhumusha ngomshini esekela izilimi ezingu-200

I-Facebook (evinjelwe e-Russian Federation) ishicilele ukuthuthukiswa kwephrojekthi ye-NLLB (Alukho Ulimi Olusele Ngemuva), okuhloswe ngayo ukudala imodeli yokufunda yomshini yendawo yonke yokuhumusha ngokuqondile umbhalo usuka kolunye ulimi uye kolunye, udlule ukuhumusha okumaphakathi kuya esiNgisini. Imodeli ehlongozwayo ihlanganisa izilimi ezingaphezu kuka-200, kuhlanganise nezilimi ezingavamile zabantu base-Afrika nase-Australia. Umgomo omkhulu wephrojekthi ukuhlinzeka ngezindlela zokuxhumana kunoma yibaphi abantu, kungakhathaliseki ukuthi bakhuluma luphi ulimi.

Imodeli inikezwe ilayisense ngaphansi kwelayisensi ye-Creative Commons BY-NC 4.0, evumela ukukopishwa, ukusatshalaliswa kabusha, ukwenza ngendlela oyifisayo, nemisebenzi ephuma kokunye, inqobo nje uma unikeza incazelo, ugcina ilayisense, futhi uyisebenzisela izinjongo okungezona ezokuthengisa kuphela. Amathuluzi okusebenza namamodeli anikezwe ngaphansi kwelayisensi ye-MIT. Ukuze kugqugquzelwe intuthuko kusetshenziswa imodeli ye-NLLB, kunqunywe ukuthi kwabiwe izinkulungwane ezingama-$200 ukuze kuhlinzekwe izibonelelo kubacwaningi.

Ukwenza lula ukudalwa kwamaphrojekthi kusetshenziswa imodeli ehlongozwayo, ikhodi yezinhlelo zokusebenza ezisetshenziselwa ukuhlola nokuhlola ikhwalithi yamamodeli (FLORES-200, NLLB-MD, Toxicity-200), ikhodi yamamodeli okuqeqesha kanye nezifaki khodi ezisekelwe kulabhulali ye-LASER3 ( Umusho Wolimi-Okungaziwa) nawo awumthombo ovulekile. Ukumelwa). Imodeli yokugcina inikezwa ngezinguqulo ezimbili - igcwele futhi imfushane. Inguqulo efushanisiwe idinga izinsiza ezimbalwa futhi ifanele ukuhlolwa nokusetshenziswa kumaphrojekthi ocwaningo.

Ngokungafani nezinye izinhlelo zokuhumusha ezisekelwe ezinhlelweni zokufunda zomshini, isixazululo se-Facebook siyaphawuleka ngoba sinikeza imodeli eyodwa evamile yazo zonke izilimi ezingama-200, ehlanganisa zonke izilimi futhi engadingi ukusetshenziswa kwamamodeli ahlukene olimi ngalunye. Ukuhumusha kwenziwa ngokuqondile kusukela olimini oluwumthombo kuya olimini okuqondiswe kulo, ngaphandle kokuhunyushelwa kwesiNgisi okumaphakathi. Ukuze kwakhiwe izinhlelo zokuhumusha zomhlaba wonke, kuhlongozwa imodeli ye-LID (Ukuhlonza Ulimi), okwenza kube nokwenzeka ukunquma ulimi olusetshenziswayo. Labo. isistimu ingabona ngokuzenzakalelayo ukuthi ulwazi lunikezwa ngaluphi ulimi futhi luhumushele olimini lomsebenzisi.

Ukuhumusha kusekelwa kunoma iyiphi indlela, phakathi kwanoma yiziphi izilimi ezingu-200 ezisekelwe. Ukuze kuqinisekiswe ikhwalithi yokuhumusha phakathi kwanoma yiziphi izilimi, kwalungiswa isethi yokuhlola yereferensi ye-FLORES-200, ebonise ukuthi imodeli ye-NLLB-200 ngokwekhwalithi yokuhumusha ngokwesilinganiso iphakeme ngo-44% kunezinhlelo zocwaningo ezisekelwe emshinini ezazihlongozwa ngaphambilini lapho kusetshenziswa. Amamethrikhi e-BLEU aqhathanisa ukuhumusha komshini nokuhumusha komuntu okujwayelekile. Ngezilimi zase-Afrika ezingavamile kanye nezilimi zesigodi zaseNdiya, ukuphakama kwekhwalithi kufinyelela ku-70%. Kungenzeka ukuhlola ngeso lengqondo ikhwalithi yokuhumusha endaweni yedemo elungiselelwe ngokukhethekile.

Source: opennet.ru

Engeza amazwana