I-Facebook ipapasha imodeli yokuguqulela koomatshini exhasa iilwimi ezingama-200

I-Facebook (ivaliwe kwi-Russian Federation) ipapashe uphuhliso lweprojekthi ye-NLLB (Akukho Ulwimi oluKhohlo ngasemva), ejolise ekudaleni imodeli yokufunda yomatshini jikelele yokuguqulela ngokuthe ngqo umbhalo ukusuka kolunye ulwimi ukuya kolunye, ukudlula ukuguqulela okuphakathi kwisiNgesi. Imodeli ecetywayo igubungela ngaphezu kweelwimi ezingama-200, kubandakanya iilwimi ezinqabileyo zabantu baseAfrika naseOstreliya. Eyona njongo iphambili yeprojekthi kukubonelela ngeendlela zonxibelelwano kubo nabaphi na abantu, kungakhathaliseki ukuba bathetha luphi na ulwimi.

Imodeli inikwe ilayisenisi phantsi kwelayisensi yeCreative Commons BY-NC 4.0, evumela ukukopishwa, ukusasazwa kwakhona, ukwenziwa ngokwezifiso, kunye nemisebenzi ephuma kuyo, ngaphandle kokuba unikezela, ulondoloze ilayisenisi, kwaye uyisebenzisele iinjongo ezingezizo ezentengiso kuphela. Izixhobo zokusebenza kunye neemodeli zinikezelwa phantsi kwelayisenisi ye-MIT. Ukuvuselela uphuhliso usebenzisa imodeli ye-NLLB, kwagqitywa ukuba kwabiwe i-$ 200 yamawaka ukubonelela ngezibonelelo kubaphandi.

Ukwenza lula ukudalwa kweeprojekthi usebenzisa imodeli ecetywayo, ikhowudi yezicelo ezisetyenziselwa ukuvavanya nokuvavanya umgangatho weemodeli (FLORES-200, NLLB-MD, Toxicity-200), ikhowudi yoqeqesho lweemodeli kunye neekhowudi ezisekelwe kwilayibrari ye-LASER3 ( Language-Agnostic Isivakalisi) zikwangumthombo ovulelekileyo.Umelo). Imodeli yokugqibela inikezelwa kwiinguqulelo ezimbini - zigcwele kwaye zifutshane. Uguqulelo olufutshane lufuna izixhobo ezimbalwa kwaye lufanelekile ukuvavanywa kunye nokusetyenziswa kwiiprojekthi zophando.

Ngokungafaniyo nezinye iinkqubo zokuguqulela ezisekwe kwiinkqubo zokufunda ngoomatshini, isisombululo sikaFacebook siphawuleka kwinto yokuba kuzo zonke iilwimi ezingama-200, kucetywa imodeli enye ngokubanzi egubungela zonke iilwimi kwaye ayifuni ukusetyenziswa kweemodeli ezahlukeneyo zolwimi ngalunye. Uguqulelo lwenziwa ngokuthe ngqo ukusuka kulwimi lomthombo ukuya kulwimi ekujoliswe kulo, ngaphandle koguqulelo oluphakathi kwisiNgesi. Ukudala iinkqubo zokuguqulela jikelele, imodeli ye-LID (Isazisi soLwimi) iyacetywa ukongezwa, nto leyo eyenza kube lula ukumisela ulwimi olusetyenziswayo. Ezo. inkqubo inokuqonda ngokuzenzekelayo ukuba loluphi ulwimi ulwazi lunikezelwa kwaye luguqulele kulwimi lomsebenzisi.

Uguqulo luxhaswa nakweliphi na icala, phakathi kwazo naziphi na iilwimi ezingama-200 ezixhaswayo. Ukuqinisekisa umgangatho woguqulo phakathi kwazo naziphi na iilwimi, kuye kwalungiselelwa iseti yereferensi ye-FLORES-200, ebonisa ukuba imodeli ye-NLLB-200 ngokomgangatho wokuguqulela ikwi-avareji ye-44% ngaphezu kweenkqubo zophando ezisekelwe kumatshini ezazicetywa ngaphambili xa usebenzisa. I-BLEU metrics ethelekisa uguqulelo lomatshini kunye nenguqulelo yomntu eqhelekileyo. Kwiilwimi ezinqabileyo zaseAfrika kunye neelwimi zaseIndiya, umgangatho wokuphakama ufikelela kuma-70%. Kuyenzeka ukuvavanya ngokubonakalayo umgangatho woguqulelo kwindawo yedemo elungiselelwe ngokukodwa.

umthombo: opennet.ru

Yongeza izimvo