Bula khoutu ea mohloli bakeng sa Jina Embedding, mohlala bakeng sa tlhahiso ea vector ea moelelo oa mongolo

Jina le na le mokhoa o bulehileng oa ho ithuta oa mochini bakeng sa tlhahiso ea mongolo oa vector, jina-embeddings-v2.0, tlasa laesense ea Apache 2. Moetso ona o o lumella ho fetolela mongolo o sa lumellaneng, ho kenyelletsa litlhaku tse fihlang ho 8192, hore e be tatellano e nyane ea linomoro tsa 'nete tse etsang vector e bapisoang le mongolo oa mohloli le ho hlahisa semantics (moelelo). Jina Embedding e bile mohlala oa pele o bulehileng oa ho ithuta ka mochini ho ba le ts'ebetso e ts'oanang le ea mofuta oa mong'a mongolo oa vectorization ho tsoa morerong oa OpenAI (text-embedding-ada-002), hape o khona ho sebetsana le mongolo ka li-tokens tse fihlang ho 8192.

Sebaka se pakeng tsa li-vector tse peli tse hlahisoang se ka sebelisoa ho fumana kamano ea semantic ea litemana tsa mohloli. Ha e le hantle, li-vectors tse hlahisitsoeng li ka sebelisoa ho hlahloba ho tšoana ha litemana, ho hlophisa lipatlisiso tsa lisebelisoa tse amanang le sehlooho (liphetho tsa boemo ka ho atamela ha semantic), litemana tsa sehlopha ka moelelo, ho hlahisa likhothaletso (ho fana ka lethathamo la likhoele tse tšoanang tsa mongolo), tseba lintho tse sa hlakang, ho lemoha bosholu le ho arola liteko. Mehlala ea libaka tsa ts'ebeliso e kenyelletsa ts'ebeliso ea mohlala bakeng sa tlhahlobo ea litokomane tsa molao, bakeng sa tlhahlobo ea khoebo, lipatlisisong tsa bongaka bakeng sa ho sebetsana le lingoliloeng tsa mahlale, ho nyatsuoa ha lingoliloeng, ho fana ka litlaleho tsa lichelete le ho ntlafatsa boleng ba ts'ebetso ea chatbot ea litaba tse rarahaneng.

Liphetolelo tse peli tsa mofuta oa lebitso-embeddings li fumaneha bakeng sa ho khoasolla (ea motheo - 0.27 GB le ho fokotsoa - 0.07 GB), e koetlisitsoeng ka lipara tse limilione tse 400 tsa tatellano ea mongolo ka Senyesemane, e akaretsang likarolo tse fapaneng tsa tsebo. Nakong ea koetliso, ho ile ha sebelisoa tatellano e nang le boholo ba li-tokens tsa 512, tse ileng tsa fetisetsoa ho boholo ba 8192 ho sebelisa mokhoa oa ALiBi (Attention with Linear Biases).

Moetso oa mantlha o kenyelletsa liparamente tse limilione tse 137 mme o etselitsoe ho sebelisoa lits'ebetsong tse emeng ka GPU. Mohlala o fokolitsoeng o kenyelletsa li-parameter tse limilione tse 33, o fana ka ho nepahala ho fokolang 'me o reretsoe ho sebelisoa lisebelisoa tsa mehala le litsamaiso tse nang le mohopolo o monyenyane. Haufinyane ba boetse ba rera ho hatisa mohlala o moholo o tla koahela li-parameter tse limilione tse 435. Mofuta ona oa lipuo tse ngata o ntse o tsoela pele, hona joale o shebane le tšehetso ea Sejeremane le Sepanishe. Ho lokiselitsoe plugin ka thoko bakeng sa ho sebelisa mofuta oa lebitso-embeddings ka LLM toolkit.

Source: opennet.ru

Eketsa ka tlhaloso