Facebook imasindikiza makina omasulira omwe amathandizira zilankhulo 200

Facebook (yoletsedwa ku Chitaganya cha Russia) yafalitsa zomwe zachitika mu pulojekiti ya NLLB (No Language Left Behind), yomwe cholinga chake ndi kupanga makina ophunzirira makina onse omasulira mwachindunji mawu kuchokera ku chinenero china kupita ku china, ndikudutsa kumasulira kwapakati mu Chingerezi. Chitsanzochi chili ndi zilankhulo zopitilira 200, kuphatikiza zilankhulo zosowa za anthu aku Africa ndi Australia. Cholinga chachikulu cha polojekitiyi ndikupereka njira yolankhulirana ndi anthu onse, mosasamala kanthu za chinenero chomwe amalankhula.

Mtunduwu uli ndi chilolezo pansi pa laisensi ya Creative Commons BY-NC 4.0, yomwe imalola kukopera, kugawanso, kusintha makonda, ndi ntchito zochokera kumayiko ena, malinga ngati mutapereka chilolezo, kusunga laisensiyo, ndikuchigwiritsa ntchito pazinthu zopanda malonda zokha. Zida zogwirira ntchito ndi zitsanzo zimaperekedwa pansi pa layisensi ya MIT. Pofuna kulimbikitsa chitukuko pogwiritsa ntchito chitsanzo cha NLLB, adaganiza kuti apereke $ 200 zikwi kuti apereke thandizo kwa ofufuza.

Kuti muchepetse kupanga mapulojekiti pogwiritsa ntchito mtundu womwe waperekedwa, malamulo amagwiritsidwe ntchito poyesa ndikuwunika mtundu wamitundu (FLORES-200, NLLB-MD, Toxicity-200), ma code amitundu yophunzitsira ndi ma encoder kutengera laibulale ya LASER3 ( Chiganizo cha Language-Agnostic) ndi gwero lotseguka. Chitsanzo chomaliza chimaperekedwa m'matembenuzidwe awiri - odzaza ndi ofupikitsidwa. Mtundu wofupikitsidwa umafuna zinthu zochepa ndipo ndi woyenera kuyesa ndi kugwiritsidwa ntchito muzofufuza.

Mosiyana ndi machitidwe ena omasulira kutengera makina ophunzirira makina, yankho la Facebook ndi lodziwika chifukwa limapereka mtundu umodzi wamitundu yonse ya zilankhulo 200, zofotokoza zilankhulo zonse osafunikira kugwiritsa ntchito mitundu yosiyanasiyana ya chilankhulo chilichonse. Kumasuliraku kumachitika mwachindunji kuchokera ku chilankhulo kupita kuchilankhulo chomwe akumasulira, popanda kumasulira apakatikati kupita ku Chingerezi. Kupanga machitidwe omasulira padziko lonse lapansi, mtundu wa LID (Chidziwitso cha Chilankhulo) umaperekedwanso, womwe umalola munthu kudziwa chilankhulo chomwe agwiritsidwa ntchito. Iwo. makinawo amatha kuzindikira chilankhulo chomwe chidziwitsocho chimaperekedwa ndikuchimasulira m'chinenero cha wogwiritsa ntchito.

Kumasulira kumathandizidwa mbali iliyonse, pakati pa zilankhulo 200 zothandizidwa. Kuti atsimikizire mtundu wa kumasulira pakati pa zilankhulo zilizonse, mayeso a FLORES-200 adakonzedwa, omwe adawonetsa kuti mtundu wa NLLB-200 malinga ndi mtundu womasulira uli pafupifupi 44% kuposa makina ophunzirira makina ophunzirira omwe adapangidwa kale akamagwiritsa ntchito. Ma metric a BLEU kufanizitsa zomasulira zamakina ndi zomasulira zokhazikika zamunthu. Kwa zilankhulo zosowa za ku Africa ndi zilankhulo zaku India, kupambana kwabwino kumafika 70%. Ndizotheka kuwunika mowoneka bwino za kumasulira patsamba lokonzekera mwapadera.

Source: opennet.ru

Kuwonjezera ndemanga