Bude lambar tushe don Jina Embedding, samfuri don wakilcin vector na ma'anar rubutu

Jina ta buɗe samfurin koyon injin don wakilcin rubutun vector, jina-embedddings-v2.0, ƙarƙashin lasisin Apache 2. Samfurin yana ba ku damar sauya rubutu na sabani, gami da har zuwa haruffa 8192, zuwa ƙaramin jeri na ainihin lambobi waɗanda ke samar da vector wanda aka kwatanta da rubutun tushen kuma ya sake sake fasalin ta (ma'ana). Jina Embedding ita ce farkon buɗaɗɗen ƙirar injuna don yin aiki iri ɗaya da ƙirar ƙirar rubutu ta mallaka daga aikin OpenAI (rubutun saka-ada-002), kuma yana iya sarrafa rubutu tare da alamun har zuwa 8192.

Za a iya amfani da tazarar da ke tsakanin ɓangarori biyu da aka ƙera don tantance alaƙar ma'anar rubutun tushe. A aikace, ana iya amfani da vectors da aka samar don nazarin kamancen rubutun, tsara bincike don abubuwan da suka danganci batun (sakamakon matsayi ta kusancin ma'anar), rubutun rukuni ta ma'ana, samar da shawarwari (ba da jerin irin wannan kirtani na rubutu), gano abubuwan da ba su da kyau, gano saɓo da rarraba gwaje-gwaje. Misalan wuraren da ake amfani da su sun haɗa da yin amfani da samfurin don nazarin takaddun doka, don nazarin harkokin kasuwanci, a cikin binciken likitanci don sarrafa labaran kimiyya, a cikin sukar wallafe-wallafe, don nazarin rahotannin kuɗi da kuma inganta ingancin sarrafa chatbot na batutuwa masu rikitarwa.

Akwai nau'ikan nau'ikan nau'ikan nau'ikan nau'ikan jina-jina don saukewa (na asali - 0.27 GB kuma an rage - 0.07 GB), an horar da su akan nau'ikan nau'ikan rubutu miliyan 400 a cikin Ingilishi, wanda ya ƙunshi fannonin ilimi daban-daban. A lokacin horo, an yi amfani da jeri tare da girman alamun 512, waɗanda aka fitar da su zuwa girman 8192 ta amfani da hanyar ALiBi (Attention with Linear Biases).

Samfurin asali ya haɗa da sigogi miliyan 137 kuma an tsara shi don amfani akan tsarin tsaye tare da GPU. Samfurin da aka rage ya ƙunshi sigogi miliyan 33, yana ba da ƙarancin daidaito kuma ana nufin amfani da na'urorin hannu da tsarin tare da ƙaramin adadin ƙwaƙwalwar ajiya. Nan gaba kadan kuma suna shirin buga wani babban tsari wanda zai kunshi sigogi miliyan 435. Har ila yau, nau'in samfurin na harsuna da yawa yana ci gaba, a halin yanzu yana mai da hankali kan tallafi ga Jamusanci da Mutanen Espanya. An shirya plugin ɗin daban don amfani da samfurin-sa-jina ta kayan aikin LLM.

source: budenet.ru

Add a comment