O le RedPajama poloketi e atiaʻe se faʻamaumauga tatala mo faiga faʻapitoa

Faʻailoa RedPajama, o se galuega faʻatasi e faʻatatau i le fatuina o faʻataʻitaʻiga o le aʻoaʻoina o masini ma faʻatasi ma mea faʻaoga e mafai ona faʻaogaina e fausia ai fesoasoani atamamai e tauva ma oloa faʻapisinisi e pei o ChatGPT. O le maua o fa'amatalaga matala ma fa'ata'ita'iga gagana tetele e fa'amoemoe e fa'asa'oloto ai 'au su'esu'e a'oa'oga masini tuto'atasi ma fa'afaigofie ai ona fausia faiga fa'aaganu'u talanoaga. Fa'alapotopotoga ma nu'u e pei o Together, Ontocord.ai, ETH DS3Lab, Stanford CRFM, Hazy Research ma MILA Québec AI Institute na auai i le poloketi.

O le laasaga muamua o le lolomiina o le RedPajama-Data-1T dataset mo le aʻoaʻoina o faʻataʻitaʻiga faʻataʻitaʻiga, e aofia ai 1.2 trillion faʻailoga. O le RedPajama suite e toe faʻaleleia faʻamatalaga avanoa lautele na faʻaaogaina e Facebook e fatuina ai lana ata LLaMA (e tusa ma le 1.25 trillion faʻailoga), ae o loʻo tuʻuina atu i lalo o se laisene tatala, e le faʻatapulaaina (LLaMA faʻamaumauga ma faʻataʻitaʻiga na na o avanoa mo tagata suʻesuʻe i luga ole talosaga faʻapitoa mo le leai. -fa'aoga fa'apisinisi). O le RedPajama-Data-1T seti e mafai ona sii mai e 2.67 TB le tele ma e aofia ai faʻamatalaga mai le Common Crawl-indexed web page, Wikipedia archives, source code from GitHub, public domain books from the Gutenberg library, science articles from the ArXiv archive, ma talanoaga mai Stack Overflow ma isi nofoaga Stack Exchange.

O faʻataʻitaʻiga ua saunia, aʻoaʻoina i luga o le faʻavae o se seti faʻamaumauga saunia ma faʻamalosia e faʻaaoga ai faʻataʻitaʻiga ua saunia o talanoaga i le tulaga o faʻatonuga-faʻatinoina mai galuega a Alpaca ma OpenChatKit, ua fuafua e fausia i nai vaiaso o lumanaʻi. O fa'ata'ita'iga fa'ata'ita'iga gagana fa'atusa e aofia ai galuega fa'atino fa'apitoa e tatala ai le LLaMA, Alpaca, Vicuna, ma Koala, fa'apea fo'i ma polokalame fa'amatala atoatoa Pythia, OpenChatKit, Open Assistant, ma Dolly.

E le gata i lea, o nisi o poloketi fou e fesoʻotaʻi ma aʻoaʻoga masini e mafai ona maitauina:

  • MiniGPT-4 - faʻalauteleina talatalanoaga fefaʻasoaaʻi masani ma agavaʻa e amanaʻia ai faʻamatalaga vaʻaia, lea e mafai ai ona e suʻesuʻeina ata ma mafaufau i tusitusiga tusilima pe a fegalegaleai ma le faiga (mo se faʻataʻitaʻiga, e mafai ona e fesili pe o le a le ituaiga mea o loʻo faʻaalia i le ata. , fai atu i le bot e tusi se tala e faʻavae i luga o le ata o loʻo faʻaalia i le ata, pe faʻavae i luga o se ata faʻataʻitaʻiga, fai atu e fatu se upega tafaʻilagi). O le MiniGPT-4 faʻatinoga o loʻo tusia i le Python ma tufatufaina i lalo ole laisene BSD.
  • Ua lomia faasalalau e Facebook meafaigaluega ma se aʻoaʻoga a le tagata lava ia (SSL, Self-Supervised Learning, e le faʻaaogaina faʻailoga saunia e le tagata ma faʻamatalaga i le taimi o aʻoaʻoga) faʻataʻitaʻiga vaʻaia komepiuta DINOv2, talafeagai mo le foia o faʻafitauli o le faʻavasegaina o faʻamatalaga vaʻaia lautele (faʻavasegaina o ata, faʻamatalaga faʻamatalaga e uiga i mea faitino i ata, malamalama i mea o loʻo tupu i luga o le vitio) ma faʻataʻitaʻiga i le pixel level (valoaga loloto, vaeluaga). O le faʻataʻitaʻiga na aʻoaʻoina i luga o le aoina o 142 miliona ata. O le faʻatinoga o loʻo tusia i le Python ma tufatufaina i lalo o le Creative Commons Attribution-NonCommercial 4.0 laisene, faʻatagaina le faʻaaogaina o pisinisi.
  • GPT4All o se meafaigaluega mo le vave faʻalauiloaina o talatalanoaga tutoʻatasi i luga o au lava meafaigaluega (latou te le maua auaunaga i fafo ma faʻaoga se PPU ma le AVX2 lagolago mo le faʻatinoina). Lagolagoina le feso'ota'iga o fa'ata'ita'iga gagana tetele fa'avae ile GPT-J ma le LLaMa. O le code e tusia i le Python ma tufatufaina i lalo ole laisene MIT.

puna: opennet.ru

Faaopoopo i ai se faamatalaga