Facebook yana buga samfurin fassarar inji wanda ke tallafawa harsuna 200

Facebook (an dakatar da shi a cikin Tarayyar Rasha) ya buga ci gaban aikin NLLB (Babu Harshe Hagu) wanda ke da nufin ƙirƙirar samfurin koyon injin na duniya don fassara rubutu kai tsaye daga wannan harshe zuwa wani, ta tsallake fassarar tsaka-tsaki zuwa Turanci. Samfurin da aka tsara ya ƙunshi fiye da harsuna 200, gami da ƙananan harsunan mutanen Afirka da Ostiraliya. Babban makasudin aikin shine samar da hanyar sadarwa ga kowane mutane, ba tare da la’akari da yaren da suke magana ba.

Samfurin yana da lasisi a ƙarƙashin lasisin Creative Commons BY-NC 4.0, wanda ke ba da izinin yin kwafi, sake rarrabawa, keɓancewa, da ayyukan ƙirƙira, in dai kun ba da sifa, kula da lasisi, da amfani da shi don dalilai na kasuwanci kawai. Ana ba da kayan aikin aiki tare da ƙira ƙarƙashin lasisin MIT. Don haɓaka haɓaka ta amfani da ƙirar NLLB, an yanke shawarar ware dala dubu 200 don ba da tallafi ga masu bincike.

Don sauƙaƙe ƙirƙirar ayyukan ta amfani da ƙirar da aka tsara, lambar aikace-aikacen da aka yi amfani da su don gwaji da kimanta ingancin samfuran (FLORES-200, NLLB-MD, toxicity-200), lambar don ƙirar horarwa da masu ɓoyewa dangane da ɗakin karatu na LASER3 ( Harshe-Agnostic Sentence) kuma buɗaɗɗen tushe ne. Wakili). Ana ba da samfurin ƙarshe a cikin nau'i biyu - cikakke da taqaitaccen. Gajartawar sigar tana buƙatar ƙarancin albarkatu kuma ya dace da gwaji da amfani da ayyukan bincike.

Ba kamar sauran tsarin fassarar da ke kan tsarin koyon injin ba, mafita ta Facebook sanannen abu ne ta yadda yana ba da samfuri na gaba ɗaya ga duk harsuna 200, wanda ya ƙunshi duk harsuna kuma baya buƙatar amfani da ƙira daban-daban ga kowane harshe. Ana aiwatar da fassarar kai tsaye daga harshen tushen zuwa harshen da ake nufi, ba tare da fassarar tsaka-tsaki zuwa Turanci ba. Don ƙirƙirar tsarin fassarar duniya, ana kuma samar da samfurin LID (Language IDentification), wanda ke ba da damar tantance harshen da ake amfani da shi. Wadancan. tsarin zai iya gane ta atomatik a cikin wane harshe aka ba da bayanin kuma ya fassara shi zuwa harshen mai amfani.

Ana tallafawa fassarar ta kowace hanya, tsakanin kowane harshe na 200 da aka goyan baya. Don tabbatar da ingancin fassarar tsakanin kowane harshe, an shirya saitin gwajin tunani na FLORES-200, wanda ya nuna cewa samfurin NLLB-200 dangane da ingancin fassarar yana kan matsakaicin 44% sama da tsarin bincike na tushen koyo na inji a baya lokacin amfani. Ma'aunin BLEU yana kwatanta fassarar inji tare da daidaitaccen fassarar ɗan adam. Don ƙananan harsunan Afirka da yarukan Indiya, ingancin inganci ya kai 70%. Yana yiwuwa a iya kimanta ingancin fassarar a gani a kan wani wurin da aka shirya na musamman.

source: budenet.ru

Add a comment