Vhura sosi kodhi yeJina Embedding, modhi yevector inomiririra yezvinoreva mavara

Jina rakavhura-sourced muchina wekudzidza modhi yevector mavara anomiririra, jina-embeddings-v2.0, pasi peApache 2 rezinesi. Iyo modhi inobvumidza iwe kushandura zvinyorwa zvisingaite, kusanganisira anosvika 8192 mavara, kuita diki diki nhamba dzenhamba chaidzo dzinogadzira vector inofananidzwa nekwakabva zvinyorwa uye inoburitsa semantics yayo (zvinoreva). Jina Embedding yaive yekutanga yakavhurika muchina yekudzidza modhi kuve nekuita kwakafanana neiyo proprietary text vectorization modhi kubva kuOpenAI purojekiti (mavara-embedding-ada-002), zvakare inokwanisa kugadzirisa zvinyorwa zvinosvika 8192 tokens.

Chinhambwe chiri pakati pemavekita maviri akagadzirwa anogona kushandiswa kuona hukama hwesemantic hwezvinyorwa zvakabva. Mukuita, mavheji anogadzirwa anogona kushandiswa kuongorora kufanana kwezvinyorwa, kuronga kutsvaga kwezvinhu zvine chekuita nemusoro wenyaya (zviyero zvemhedzisiro nesemantic kuswedera), zvinyorwa zveboka nezvazvinoreva, kugadzira kurudziro (kupa runyoro rwezvinyorwa zvakafanana tambo), tsvaga anomalies, tarisa kunyengedza uye rongedza bvunzo. Mienzaniso yenzvimbo dzekushandiswa inosanganisira kushandiswa kweiyo modhi yekuongorora zvinyorwa zvepamutemo, yebhizinesi analytics, mukutsvagisa kwekurapa kwekugadzirisa zvinyorwa zvesainzi, mukutsoropodza kwemabhuku, kuburitsa mishumo yemari uye nekuvandudza kunaka kwechatbot kugadzirisa nyaya dzakaoma.

Mavhezheni maviri ejina-embeddings modhi anowanikwa kudhawunirodha (chaiyo - 0.27 GB uye yakaderedzwa - 0.07 GB), akadzidziswa pamazana mazana mana emamiriyoni maviri ezvinyorwa zvakateerana muChirungu, zvichifukidza akasiyana siyana eruzivo. Munguva yekudzidziswa, kutevedzana nehukuru hwe400 tokens yakashandiswa, iyo yakawedzera kusvika kuhukuru hwe512 uchishandisa nzira yeALiBi (Attention with Linear Biases).

Iyo yekutanga modhi inosanganisira 137 miriyoni paramita uye yakagadzirirwa kushandiswa pane yakamira masisitimu ane GPU. Iyo yakaderedzwa modhi inosanganisira 33 miriyoni paramita, inopa kushoma kurongeka uye inotarisirwa kushandiswa panharembozha uye masisitimu ane diki ndangariro. Munguva pfupi iri kutevera vanorongawo kubudisa modhi huru iyo ichafukidza 435 miriyoni paramita. Iyo vhezheni yemitauro yakawanda yemuenzaniso zvakare iri mukuvandudzwa, parizvino yakatarisana nerutsigiro rweGerman neSpanish. Plugin yakagadziridzwa yakaparadzana yekushandisa iyo jina-embeddings modhi kuburikidza neLLM toolkit.

Source: opennet.ru

Voeg