Ukuqinisa ukufunda noma amasu okuziphendukela kwemvelo? - Kokubili

Sawubona Habr!

Asivamisile ukunquma ukuthumela lapha ukuhumushwa kwemibhalo ebineminyaka emibili ubudala, ngaphandle kwekhodi futhi okusobala ukuthi ingokwemfundo - kodwa namuhla sizokwenza okuhlukile. Sithemba ukuthi inkinga evezwe esihlokweni se-athikili ikhathaza abafundi bethu abaningi, futhi usuvele uwufundile umsebenzi oyisisekelo wamasu okuziphendukela kwemvelo lokhu okuthunyelwe okuphikisana nakho ekuqaleni noma ozokufunda manje. Siyakwamukela ekatini!

Ukuqinisa ukufunda noma amasu okuziphendukela kwemvelo? - Kokubili

NgoMashi 2017, i-OpenAI yenza amagagasi emphakathini ofunda ngokujulile ngephepha elithi “Amasu Okuziphendukela Kwemvelo Njengenye Indlela Eyenyukayo Yokuqinisa Ukufunda" Lo msebenzi uchaze imiphumela emangazayo evuna iqiniso lokuthi ukufunda kokuqinisa (RL) akubanga yincithakalo, futhi lapho uqeqesha amanethiwekhi ayinkimbinkimbi e-neural, kuyatuseka ukuzama ezinye izindlela. Kwabe sekuqubuka inkulumo-mpikiswano mayelana nokubaluleka kokufunda okuqinisiwe kanye nendlela okufaneleka ngayo isimo sakho njengobuchwepheshe “okufanele ube nakho” ekufundiseni ukuxazulula izinkinga. Lapha ngifuna ukusho ukuthi lobu buchwepheshe bubili akumele buthathwe njengokuncintisana, obunye bungcono ngokusobala kunobunye; kunalokho, ekugcineni bayaphelelisana. Ngempela, uma ucabanga kancane ngalokho okudingekayo ukuze udale jikelele AI kanye nezinhlelo ezinjalo, okuthi kukho konke ukuba khona kwazo zikwazi ukufunda, ukwahlulela nokuhlela, khona-ke cishe sizofika esiphethweni sokuthi lokhu noma leso sixazululo esihlangene sizodingeka. Ngendlela, kwakuyilesi sixazululo esihlangene okwafika kuso imvelo, eyanikeza izilwane ezincelisayo nezinye izilwane eziphakeme ngobuhlakani obuyinkimbinkimbi phakathi nenkathi yokuziphendukela kwemvelo.

Amasu okuziphendukela kwemvelo

I-thesis eyinhloko yephepha le-OpenAI yayiwukuthi, esikhundleni sokusebenzisa ukufunda okuqiniswayo kuhlanganiswe ne-backpropagation yendabuko, baqeqeshe ngempumelelo inethiwekhi ye-neural ukuxazulula izinkinga eziyinkimbinkimbi besebenzisa lokho abakubiza ngokuthi "isu lokuziphendukela kwemvelo" (ES). Le ndlela ye-ES ihlanganisa ukugcina ukusabalalisa kwezisindo ezibanzi zenethiwekhi, okubandakanya ama-ejenti amaningi asebenza ngokufana nokusebenzisa amapharamitha akhethwe kulokhu kusatshalaliswa. Umenzeli ngamunye usebenza endaweni yakhe, futhi lapho kuqedwa inombolo ethile yeziqephu noma izigaba zesiqephu, i-algorithm ibuyisela umklomelo oqoqiwe, ovezwa njengomphumela wokufaneleka. Ngokucabangela leli nani, ukusatshalaliswa kwamapharamitha kungashintshelwa kuma-ejenti aphumelele kakhulu, kuncishwe abaphumelele kancane. Ngokuphinda umsebenzi onjalo izikhathi eziyizigidi ngokubamba iqhaza kwamakhulu ama-ejenti, kungenzeka ukuhambisa ukusatshalaliswa kwezisindo endaweni ezovumela ama-ejenti ukuba enze inqubomgomo yekhwalithi ephezulu yokuxazulula umsebenzi abawabelwe. Ngempela, imiphumela evezwe esihlokweni iyamangalisa: kuboniswa ukuthi uma usebenzisa ama-agent ayinkulungwane ngokuhambisana, khona-ke i-anthropomorphic locomotion emilenzeni emibili ingafundwa ngaphansi kwesigamu sehora (kuyilapho ngisho nezindlela ezithuthuke kakhulu ze-RL zidinga ukuchitha imali eyengeziwe. ngaphezu kwehora elilodwa kulokhu). Ukuze uthole ukwaziswa okwengeziwe, ngincoma ukufunda okuhle kakhulu okusheshayo kusukela kubabhali bokuhlolwa, kanjalo isihloko sesayensi.

Ukuqinisa ukufunda noma amasu okuziphendukela kwemvelo? - Kokubili

Amasu ahlukene okufundisa ukuhamba okuqondile kwe-anthropomorphic, afundwe kusetshenziswa indlela ye-ES evela ku-OpenAI.

Ibhokisi elimnyama

Inzuzo enkulu yale ndlela ukuthi ingahambisana kalula. Nakuba izindlela ze-RL, njenge-A3C, zidinga ukuthi ulwazi lushintshwe phakathi kwezintambo zesisebenzi kanye neseva yepharamitha, i-ES idinga kuphela izilinganiso zokufaneleka kanye nolwazi oluvamile lokusabalalisa ipharamitha. Kungenxa yalokhu kuba lula ukuthi le ndlela idlula kude izindlela zesimanje ze-RL ngokwamandla okukala. Kodwa-ke, konke lokhu akuzili ize: kufanele wandise inethiwekhi ngokulandela isimiso sebhokisi elimnyama. Kulokhu, "ibhokisi elimnyama" lisho ukuthi ngesikhathi sokuqeqesha isakhiwo sangaphakathi senethiwekhi singanakwa ngokuphelele, futhi kuphela umphumela ophelele (umvuzo wesiqephu) osetshenzisiwe, futhi kuncike kuwo ukuthi izisindo zenethiwekhi ethile zizoba yini. ifa yizizukulwane ezilandelayo. Ezimweni lapho singatholi khona impendulo eningi evela endaweni ezungezile-futhi ezinkingeni eziningi ze-RL zendabuko ukugeleza kwemiklomelo kuncane kakhulu-inkinga isuka ekubeni "ibhokisi elimnyama ngokwengxenye" ​​liye "kwibhokisi elimnyama ngokuphelele." Kulokhu, ungakwazi ukwandisa kakhulu umkhiqizo, ngakho-ke, ukuyekethisa okunjalo kuyafaneleka. "Ubani odinga ama-gradients uma enomsindo ongenathemba noma kunjalo?" - lona umbono jikelele.

Nokho, ezimeni lapho impendulo isebenza kakhulu, izinto ziqala ukungahambi kahle ku-ES. Ithimba le-OpenAI lichaza ukuthi inethiwekhi elula yokuhlukanisa i-MNIST yaqeqeshwa kanjani kusetshenziswa i-ES, futhi kulokhu ukuqeqeshwa bekuhamba kancane izikhathi ezingu-1000. Iqiniso liwukuthi isignali yegradient ekuhlukaniseni izithombe ifundisa kakhulu mayelana nendlela yokufundisa inethiwekhi ukuhlukaniswa okungcono. Ngakho-ke, inkinga incane ngesu le-RL futhi ngaphezulu ngemivuzo embalwa ezindaweni ezikhiqiza ama-gradients anomsindo.

Isixazululo semvelo

Uma sizama ukufunda esibonelweni semvelo, sicabanga ngezindlela zokuthuthukisa i-AI, khona-ke kwezinye izimo i-AI ingacatshangwa njenge indlela egxile ezinkingeni. Phela, imvelo isebenza ngaphansi kwezingqinamba ososayensi bamakhompyutha abangenazo. Kunombono wokuthi indlela yetiyori kuphela yokuxazulula inkinga ethile inganikeza izixazululo ezisebenza kahle kunezinye izindlela ezisetshenziswayo. Kodwa-ke, ngisacabanga ukuthi kungakufanelekela ukuhlola ukuthi uhlelo oluguquguqukayo olusebenza ngaphansi kwezingqinamba ezithile (uMhlaba) lukhiqize kanjani ama-agent (izilwane, ikakhulukazi izilwane ezincelisayo) ezikwazi ukuziphatha okuguquguqukayo nokuyinkimbinkimbi. Nakuba ezinye zalezi zingqinamba zingasebenzi emhlabeni wesayensi yedatha, ezinye zilungile.

Ngemva kokuhlola ukuziphatha kobuhlakani kwezilwane ezincelisayo, siyabona ukuthi kwakhiwe ngenxa yethonya eliyinkimbinkimbi elihlangene lezinqubo ezimbili ezihlobene eduze: ukufunda kokuhlangenwe nakho kwabanye и ukufunda ngokwenza. Okwakuqala kuvame ukulinganisa nokuziphendukela kwemvelo okuqhutshwa ukuzikhethela kwemvelo, kodwa lapha ngisebenzisa igama elibanzi ukuze ngicabangele i-epigenetics, ama-microbiomes, nezinye izindlela ezivumela ukwabelana kokuhlangenwe nakho phakathi kwezinto eziphilayo ezingahlobene nezakhi zofuzo. Inqubo yesibili, ukufunda kokuhlangenwe nakho, yilo lonke ulwazi isilwane esikwazi ukulufunda kukho konke ukuphila kwaso, futhi lolu lwazi lunqunywa ngokuqondile ukuxhumana kwalesi silwane nezwe langaphandle. Lesi sigaba sihlanganisa yonke into kusukela ekufundeni ukuqaphela izinto kuya ekubambeni kahle ukuxhumana okukhona enqubweni yokufunda.

Uma sikhuluma nje, lezi zinqubo ezimbili ezenzeka emvelweni zingaqhathaniswa nezinketho ezimbili zokuthuthukisa amanethiwekhi e-neural. Amasu okuziphendukela kwemvelo, lapho ulwazi olumayelana nama-gradient lusetshenziswa ukuze kuthuthukiswe ulwazi olumayelana nezinto eziphilayo, sondela eduze nokufunda kokuhlangenwe nakho kwabanye. Ngokufanayo, izindlela ze-gradient, lapho ukuthola isipiliyoni esisodwa noma esinye kuholela ekushintsheni okukodwa noma kolunye ekuziphatheni kwe-ejenti, ziqhathaniswa nokufunda kokuhlangenwe nakho komuntu siqu. Uma sicabanga ngezinhlobo zokuziphatha okukhaliphile noma amakhono ngayinye yalezi zindlela ezimbili ezithuthukiswayo ezilwaneni, ukuqhathanisa kuba sobala kakhulu. Kuzo zombili izimo, "izindlela zokuziphendukela kwemvelo" zikhuthaza ukutadisha ukuziphatha okusebenzayo okuvumela umuntu ukuba athuthukise ukuqina okuthile (okwanele ukuhlala ephila). Ukufunda ukuhamba noma ukuphunyuka ekuthunjweni ezimweni eziningi kulingana nokuziphatha "okungokwemvelo" "okunezintambo eziqinile" ezilwaneni eziningi ezingeni lofuzo. Ngaphezu kwalokho, lesi sibonelo siqinisekisa ukuthi izindlela zokuziphendukela kwemvelo ziyasebenza ezimeni lapho isignali yomvuzo iyivelakancane kakhulu (isibonelo, iqiniso lokukhulisa ingane ngempumelelo). Esimeni esinjalo, akunakwenzeka ukuhlobanisa umvuzo nanoma iyiphi isethi ethile yezenzo okungenzeka ukuthi zenziwe eminyakeni eminingi ngaphambi kokuba leli qiniso libe khona. Ngakolunye uhlangothi, uma sicabangela icala lapho i-ES ihluleka khona, okungukuthi ukuhlukaniswa kwezithombe, imiphumela iqhathaniswa ngokuphawulekayo nemiphumela yokufunda ngezilwane efinyelelwe ekuhlolweni kwengqondo yokuziphatha okungenakubalwa okwenziwe ngaphezu kweminyaka eyi-100-plus.

Ukufunda Ezilwaneni

Izindlela ezisetshenziswa ekuqiniseni ukufunda ezimweni eziningi zithathwe ngokuqondile ezincwadini zengqondo isimo esisebenzayo, futhi isimo sokusebenza safundwa kusetshenziswa isayensi yezilwane. Ngendlela, uRichard Sutton, omunye wabasunguli ababili bokuqinisa ukufunda, uneziqu ze-bachelor ku-psychology. Kusimo sokusebenza, izilwane zifunda ukuhlobanisa umvuzo noma isijeziso namaphethini athile okuziphatha. Abaqeqeshi nabacwaningi bangasebenzisa le nhlangano yomvuzo ngendlela eyodwa noma enye, bechukuluza izilwane ukuze zibonise ubuhlakani noma ukuziphatha okuthile. Kodwa-ke, ukulungiswa okusebenzayo, njengoba kusetshenziswe ocwaningweni lwezilwane, akuyona into engaphezu kwendlela ezwakalayo yesimo esifanayo ngesisekelo izilwane ezifunda ngazo kukho konke ukuphila kwazo. Sihlala sithola izimpawu zokuqiniswa okuvela endaweni ezungezile futhi silungise ukuziphatha kwethu ngendlela efanele. Eqinisweni, ososayensi abaningi bezinzwa nososayensi bengqondo bakholelwa ukuthi abantu nezinye izilwane empeleni basebenza ezingeni eliphakeme kakhulu futhi baqhubeka nokufunda ukubikezela umphumela wokuziphatha kwabo ezimeni ezizayo ngokusekelwe emiklomelweni engaba khona.

Indima eyinhloko yokubikezela ekufundeni kokuhlangenwe nakho ishintsha amandla achazwe ngenhla ngezindlela ezibalulekile. Isignali ebikade ibhekwa njengencane kakhulu (umvuzo we-episodic) ivele iminyene kakhulu. Ngokwethiyori, isimo sinjena: nganoma isiphi isikhathi, ubuchopho besilwane esincelisayo bubala imiphumela ngokusekelwe kuchungechunge oluyinkimbinkimbi lwezinzwa nezenzo, kuyilapho isilwane sicwiliswa kulo mfudlana. Kulesi simo, ukuziphatha kokugcina kwesilwane kunikeza isignali eqinile okufanele isetshenziselwe ukuqondisa ukulungiswa kwezibikezelo nokuthuthukiswa kokuziphatha. Ubuchopho busebenzisa zonke lezi zimpawu ukuze kuthuthukiswe izibikezelo (futhi, ngokufanele, ikhwalithi yezenzo ezithathiwe) esikhathini esizayo. Uhlolojikelele lwale ndlela kunikezwa encwadini enhle kakhulu "Ukungaqiniseki Ukusefa” usosayensi nesazi sefilosofi u-Andy Clark. Uma sidlulisela ukucabanga okunjalo ekuqeqeshweni kwama-ejenti okwenziwa, khona-ke iphutha eliyisisekelo ekufundeni okuqinisayo liyembulwa: isignali esetshenziswe kule paradigm ibuthakathaka ngokungenakuqhathaniswa uma iqhathaniswa nalokho okungenzeka (noma okufanele ibe yikho). Ezimweni lapho kungenakwenzeka ukukhulisa ukugcwala kwesignali (mhlawumbe ngenxa yokuthi ibuthakathaka ngokwemvelo noma ihlotshaniswa nokusebenza kabusha kwezinga eliphansi), cishe kungcono ukukhetha indlela yokuqeqesha ehambisana kahle, isibonelo, i-ES.

Ukuqeqeshwa okucebile kwamanethiwekhi e-neural

Ukwakhela phezu kwezimiso zomsebenzi ophezulu wezinzwa ezitholakala ebuchosheni bezilwane ezincelisayo, obuhlala bumatasa benza izibikezelo, intuthuko yakamuva yenziwe ekuqiniseni ukufunda, manje okucabangela ukubaluleka kwezibikezelo ezinjalo. Ngingancoma ngokushesha imisebenzi emibili efanayo kuwe:

Kuwo womabili la maphepha, ababhali bagcwalisa inqubomgomo yokuzenzakalelayo ejwayelekile yamanethiwekhi abo e-neural ngemiphumela yokuqagela mayelana nesimo sendawo esikhathini esizayo. Esihlokweni sokuqala, ukubikezela kusetshenziswa ezinhlobonhlobo zokulinganisa, kanti okwesibili, ukubikezela kusetshenziswa ekushintsheni kwemvelo kanye nokuziphatha kwe-ejenti kanjalo. Kuzo zombili izimo, isignali eyingcosana ehambisana nokuqiniswa okuhle iba inothe kakhulu futhi ifundisa kakhulu, okuvumela kokubili ukufunda ngokushesha kanye nokutholwa kokuziphatha okuyinkimbinkimbi. Ukuthuthukiswa okunjalo kutholakala kuphela ngezindlela ezisebenzisa isignali yegradient, futhi hhayi ngezindlela ezisebenza ngomgomo "webhokisi elimnyama", njenge-ES.

Ngaphezu kwalokho, ukufunda kokuhlangenwe nakho nezindlela ze-gradient kuphumelela kakhulu. Ngisho nasezimeni lapho kwakungenzeka khona ukutadisha inkinga ethile usebenzisa indlela ye-ES ngokushesha kunokusebenzisa ukufunda okuqiniswayo, inzuzo yafinyelelwa ngenxa yokuthi isu le-ES lalihilela izikhathi eziningi idatha engaphezu kwe-RL. Uma sicabangela kulesi simo ngezimiso zokufunda ezilwaneni, siphawula ukuthi umphumela wokufunda esibonelweni somunye umuntu uzibonakalisa ngemva kwezizukulwane eziningi, kuyilapho ngezinye izikhathi isenzakalo esisodwa esitholwa ngokwaso sanele ukuba isilwane sifunde isifundo kuze kube phakade. Ngenkathi uthanda ukuqeqeshwa ngaphandle kwezibonelo Nakuba ingangeni ezindleleni zendabuko ze-gradient, iqondakala kakhulu kune-ES. Kukhona, isibonelo, izindlela ezifana ukulawulwa kwe-neural episodic, lapho ama-Q-values ​​egcinwa khona ngesikhathi sokuqeqeshwa, ngemva kwalokho uhlelo luwahlola ngaphambi kokuthatha izinyathelo. Umphumela uba indlela ye-gradient ekuvumela ukuthi ufunde ukuxazulula izinkinga ngokushesha kakhulu kunangaphambili. Ku-athikili ekhuluma nge-neural episodic control, ababhali bakhuluma nge-hippocampus yomuntu, ekwazi ukugcina ulwazi mayelana nomcimbi ngisho nangemva kwesipiliyoni esisodwa, ngakho-ke, idlala. indima ebalulekile emgudwini wokukhumbula. Izindlela ezinjalo zidinga ukufinyelela enhlanganweni yangaphakathi ye-ejenti, okuyinto futhi, ngencazelo, engenakwenzeka ku-ES paradigm.

Ngakho, kungani ungazihlanganisi?

Kungenzeka ukuthi ingxenye enkulu yalesi sihloko ingase ishiye umbono wokuthi ngikhuthaza izindlela ze-RL. Kodwa-ke, empeleni ngicabanga ukuthi ngokuhamba kwesikhathi isisombululo esihle kakhulu ukuhlanganisa zombili izindlela, ukuze ngayinye isetshenziswe ezimweni lapho ifaneleka khona kakhulu. Ngokusobala, esimweni sezinqubomgomo eziningi ezisebenzayo noma ezimeni ezinezimpawu ezimbalwa zokuqinisa okuhle, i-ES iyawina, ikakhulukazi uma unamandla okwenza ikhompuyutha onawo ongasebenzisa kuwo ukuqeqeshwa okuhambisanayo okukhulu. Ngakolunye uhlangothi, izindlela ze-gradient zisebenzisa ukufunda okuqiniswayo noma ukufunda okugadiwe zizoba usizo lapho sifinyelela impendulo ebanzi futhi sidinga ukufunda indlela yokuxazulula inkinga ngokushesha nangedatha encane.

Uma siphendukela emvelweni, sithola ukuthi indlela yokuqala, empeleni, ibeka isisekelo sesibili. Kungakho, phakathi nesikhathi sokuziphendukela kwemvelo, izilwane ezincelisayo ziye zakha ubuchopho obuzivumela ukuba zifunde ngokuphumelelayo kakhulu kumasignali ayinkimbinkimbi avela endaweni ezungezile. Ngakho-ke, umbuzo uhlala uvulekile. Mhlawumbe amasu okuziphendukela kwemvelo azosisiza ukuthi sisungule izakhiwo zokufunda ezisebenzayo ezizoba wusizo nasezindleleni zokufunda eziphansi. Phela, ikhambi elitholakala ngokwemvelo liphumelela kakhulu.

Source: www.habr.com

Engeza amazwana