Ukomeleza ukufunda okanye izicwangciso zokuzivelela? - Zombini

Hayi Habr!

Asisoloko sithatha isigqibo sokuthumela apha iinguqulelo zemibhalo ebineminyaka emibini ubudala, ngaphandle kwekhowudi kwaye ngokucacileyo yeyobume bemfundo - kodwa namhlanje siza kwenza okukhethekile. Siyathemba ukuba ingxaki evezwe kwisihloko senqaku ixhalabisa abafundi bethu abaninzi, kwaye sele ufunde umsebenzi osisiseko kwizicwangciso zokuziphendukela kwemvelo apho esi sithuba siphikisana khona kwi-original okanye siya kuyifunda ngoku. Wamkelekile kwikati!

Ukomeleza ukufunda okanye izicwangciso zokuzivelela? - Zombini

NgoMatshi ka-2017, i-OpenAI yenza amaza kuluntu lokufunda olunzulu ngephepha “IziCwangciso ze-Evolution njengeNdlela eScalable yokuNxibelela ekuFundeni" Lo msebenzi uchaze iziphumo ezikhangayo ezixhasa into yokuba ukufunda okomeleza (RL) akubanga ngumqobo, kwaye xa uqeqesha uthungelwano lwe-neural olunzima, kuyacetyiswa ukuba uzame ezinye iindlela. Kwabakho ingxoxo-mpikiswano malunga nokubaluleka kokufunda okomeleza kunye nendlela efaneleke ngayo imeko yayo njengeteknoloji “ekufuneka ube nayo” ekufundiseni ukusombulula iingxaki. Apha ndifuna ukuthetha ukuba ezi teknoloji zimbini akufanele zithathwe njengezikhuphisanayo, enye yazo ngokucacileyo ingcono kunomnye; ngokuchaseneyo, ekugqibeleni ziyaphelelisana. Ewe, ukuba ucinga kancinci malunga nokuba yintoni efunekayo ukuyila jikelele AI kunye neenkqubo ezinjalo, eziya kuthi kubo bonke ubukho bazo zikwazi ukufunda, ukugweba kunye nokucwangcisa, ngoko siya kuthi phantse ngokuqinisekileyo sifikelele kwisigqibo sokuba oku okanye isisombululo esidibeneyo siya kufuneka. Ngendlela, yayiyisisombululo esidibeneyo esathi indalo yafika kuyo, eyanika izilwanyana ezincelisayo kunye nezinye izilwanyana eziphakamileyo ezinobukrelekrele obunzima ngexesha lokuziphendukela kwemvelo.

Izicwangciso zokuziphendukela kwemvelo

Ithisisi ephambili yephepha le-OpenAI yayikukuba, endaweni yokusebenzisa ukufunda okomeleza kudityaniswe ne-backpropagation yendabuko, baqeqeshe ngempumelelo inethiwekhi ye-neural ukusombulula iingxaki ezintsonkothileyo besebenzisa oko bakubiza ngokuba “sisicwangciso sokuziphendukela kwemvelo” (ES). Le ndlela ye-ES ibandakanya ukugcina ukuhanjiswa kwenethiwekhi ngokubanzi kobunzima, okubandakanya ii-arhente ezininzi ezisebenza ngokufanayo kunye nokusebenzisa iiparitha ezikhethiweyo kolu lwabiwo. I-arhente nganye isebenza kwindawo yayo, kwaye ekugqityweni kwenani elithile leepisodes okanye izigaba zesiqendu, i-algorithm ibuyisela umvuzo odibeneyo, ochazwe njengamanqaku okufaneleka. Ukuthatha eli xabiso kwi-akhawunti, ukuhanjiswa kweeparameters kunokutshintshelwa kwii-agent eziphumeleleyo, ukunqanda abaphumelele kakhulu. Ngokuphinda umsebenzi onjalo izigidi zamaxesha ngokuthatha inxaxheba kwamakhulu eejenti, kunokwenzeka ukuhambisa ukuhanjiswa kobunzima kwindawo eya kuvumela ii-arhente ukuba zenze umgaqo-nkqubo ophezulu wokusombulula umsebenzi owabelwe wona. Enyanisweni, iziphumo ezichazwe kwinqaku ziyamangalisa: kubonisiwe ukuba uqhuba iwaka leejenti ngokuhambelanayo, ngoko i-anthropomorphic locomotion kwimilenze emibini inokufundwa ngaphantsi kwesiqingatha seyure (ngelixa iindlela eziphambili zeRL zifuna ukuchitha ngaphezulu. ngaphezu kweyure enye kule). Ngolwazi oluthe kratya, ndincoma ukufunda okugqwesileyo iposti ukusuka kubabhali bovavanyo, ngokunjalo inqaku lenzululwazi.

Ukomeleza ukufunda okanye izicwangciso zokuzivelela? - Zombini

Izicwangciso ezahlukeneyo zokufundisa ukuhamba okuthe tye kwe-anthropomorphic, ezifundwe kusetyenziswa indlela ye-ES evela kwi-OpenAI.

Ibhokisi emnyama

Inzuzo enkulu yale ndlela kukuba iyakwazi ukulinganisa lula. Ngelixa iindlela ze-RL, ezifana ne-A3C, zifuna ukuba ulwazi lutshintshwe phakathi kweentambo zabasebenzi kunye neseva yepharamitha, i-ES ifuna kuphela uqikelelo lokufaneleka kunye nolwazi oluqhelekileyo lokusabalalisa ipharamitha. Kungenxa yolu lula ukuba le ndlela iphambili kakhulu kwiindlela zangoku ze-RL ngokwesakhono sokukala. Nangona kunjalo, konke oku akuzililize: kuya kufuneka ukhulise inethiwekhi ngokomgaqo webhokisi emnyama. Kule meko, "ibhokisi elimnyama" lithetha ukuba ngexesha loqeqesho isakhiwo sangaphakathi sothungelwano asihoywa ngokupheleleyo, kwaye kuphela umphumo opheleleyo (umvuzo wesiganeko) usetyenzisiweyo, kwaye kuxhomekeke kuyo ukuba ubunzima bothungelwano oluthile luya kunceda. malizuzwe zizizukulwana ezilandelayo. Kwiimeko apho singafumani ngxelo eninzi evela kwindalo-kwaye kwiingxaki ezininzi ze-RL zendabuko ukuhamba kwembuyekezo kunqabile kakhulu-ingxaki isuka ekubeni "inxalenye yebhokisi emnyama" ukuya "kwibhokisi elimnyama ngokupheleleyo." Kule meko, unokwandisa kakhulu imveliso, ngoko ke, ngokuqinisekileyo, ukulungelelaniswa okunjalo kuyafaneleka. "Ngubani ofuna i-gradients ukuba akukho ngxolo engapheliyo?" - olu luluvo ngokubanzi.

Nangona kunjalo, kwiimeko apho impendulo isebenza ngakumbi, izinto ziqala ukungahambi kakuhle kwi-ES. Iqela le-OpenAI lichaza indlela inethiwekhi yokuhlelwa kwe-MNIST elula yaqeqeshwa ngayo ngokusebenzisa i-ES, kwaye ngeli xesha uqeqesho lwaluhamba ngokukhawuleza ngamaxesha angama-1000. Inyani yeyokuba isiginali yegradient kuhlelo lwemifanekiso inolwazi kakhulu malunga nendlela yokufundisa inethiwekhi yokuhlelwa ngcono. Ke, ingxaki incinci ngobuchwephesha be-RL kwaye ngaphezulu ngemivuzo enqabileyo kwindawo evelisa i-gradients enengxolo.

Isisombululo sendalo

Ukuba sizama ukufunda kumzekelo wendalo, sicinga ngeendlela zokuphuhlisa i-AI, ngoko kwezinye iimeko i-AI ingacingelwa njenge. indlela ejolise kwingxaki. Emva kwayo yonke loo nto, indalo isebenza ngaphakathi kwemiqobo izazinzulu zekhompyuter ezingenazo. Kukho uluvo lokuba indlela yethiyori esulungekileyo yokusombulula ingxaki ethile inokubonelela ngezisombululo ezisebenzayo kunezinye iindlela ezinobungqina. Nangona kunjalo, ndisacinga ukuba kuya kuba luncedo ukuvavanya indlela inkqubo eguqukayo esebenza phantsi kwemiqobo ethile (uMhlaba) ivelise iiarhente (izilwanyana, ngakumbi izilwanyana ezanyisayo) ezikwaziyo ukuba bhetyebhetye kunye nokuziphatha okuntsokothileyo. Ngelixa ezinye zezi zithintelo zingasebenzi kwihlabathi lenzululwazi yedatha, ezinye zilungile.

Emva kokuphonononga indlela yokuziphatha kwengqondo yezilwanyana ezanyisayo, siyabona ukuba yenziwa ngenxa yempembelelo edibeneyo yeenkqubo ezimbini ezisondeleleneyo: ukufunda kumava abanye и ukufunda ngokwenza. Eyangaphambili isoloko ilingana nendaleko eqhutywa kukhetho lwendalo, kodwa apha ndisebenzisa igama elibanzi ukuthathela ingqalelo i-epigenetics, i-microbiomes, kunye nezinye iindlela ezenza kube lula ukwabelana ngamava phakathi kwezinto eziphilayo ezinganxulumananga nemfuzo. Inkqubo yesibini, ukufunda kumava, yonke ingcaciso isilwanyana esilawula ukufunda kuyo yonke impilo yaso, kwaye olu lwazi lunqunywe ngokuthe ngqo ngokusebenzisana kwesi silwanyana kunye nehlabathi langaphandle. Olu didi lubandakanya yonke into ukusuka ekufundeni ukuya ekuqondeni izinto ukuya ekulawuleni unxibelelwano olukhoyo kwinkqubo yokufunda.

Xa sithetha nje, ezi nkqubo zimbini zenzeka kwindalo zinokuthelekiswa neenketho ezimbini zokuphucula uthungelwano lwe-neural. Izicwangciso zokuziphendukela kwemvelo, apho ulwazi malunga ne-gradients lusetyenziselwa ukuhlaziya ulwazi malunga ne-organism, sondela ekufundeni kumava abanye. Ngokufanayo, iindlela ze-gradient, apho ukufumana enye okanye enye amava kukhokelela kwinguqu enye okanye enye kwindlela yokuziphatha ye-arhente, ifaniswa nokufunda kumava akhe. Ukuba sicinga ngeentlobo zokuziphatha okukrelekrele okanye izakhono ezithi nganye kwezi ndlela zimbini ziphuhliswe kwizilwanyana, uthelekiso luba lukhulu ngakumbi. Kuzo zombini ezi meko, "iindlela zendaleko" zikhuthaza ukufundwa kokuziphatha okusebenzayo okuvumela umntu ukuba aphuhlise ukuqina okuthile (okwaneleyo ukuhlala ephila). Ukufunda ukuhamba okanye ukuphunyuka ekuthinjweni kwiimeko ezininzi zilingana nokuziphatha "okwemvelo" okungaphezulu "ku-hard-wired" kwizilwanyana ezininzi kwinqanaba lezofuzo. Ukongeza, lo mzekelo uqinisekisa ukuba iindlela zokuzivelela ziyasebenza kwiimeko apho umqondiso womvuzo unqabile kakhulu (umzekelo, inyani yokukhulisa umntwana ngempumelelo). Kwimeko enjalo, akunakwenzeka ukulungelelanisa umvuzo kunye nayo nayiphi na isethi yezenzo ezithile ezinokuthi zenziwe iminyaka emininzi ngaphambi kokuba kwenzeke le nyaniso. Ngakolunye uhlangothi, ukuba siqwalasela imeko apho i-ES ingaphumeleli, oko kukuthi ukuhlelwa kwemifanekiso, iziphumo zifaniswa ngokuphawulekayo neziphumo zokufunda kwezilwanyana eziphunyezwe kwiimvavanyo ezingenakubalwa zengqondo zokuziphatha eziqhutywe ngaphezu kwe-100-plus iminyaka.

Ukufunda kwiZilwanyana

Iindlela ezisetyenziswayo ekuqiniseni ukufunda kwiimeko ezininzi zithathwa ngokuthe ngqo kuncwadi lwezengqondo ukulungiswa okusebenzayo, kunye ne-operating conditioning yafundwa kusetyenziswa i-psychology yezilwanyana. Hi ndlela leyi, uRichard Sutton, omnye wabaseki babini bokomelezwa kwemfundo, une-bachelor degree in psychology. Kumxholo wemeko yokusebenza, izilwanyana zifunda ukudibanisa umvuzo okanye isohlwayo kunye neendlela ezithile zokuziphatha. Abaqeqeshi kunye nabaphandi banokusebenzisa lo mbutho womvuzo ngendlela enye okanye enye, bexhokonxa izilwanyana ukuba zibonise ubukrelekrele okanye iindlela zokuziphatha ezithile. Nangona kunjalo, i-operating conditioning, njengoko isetyenziswe kuphando lwezilwanyana, ayikho enye into ngaphandle kohlobo olucacileyo lwemeko efanayo ngokusekelwe apho izilwanyana zifunda ubomi bazo bonke. Sihlala sifumana imiqondiso yokomelezwa kokusingqongileyo kwaye sihlengahlengise indlela esiziphatha ngayo ngokufanelekileyo. Ngapha koko, izazinzulu ezininzi ze-neuroscientist kunye nezazinzulu ngengqondo zikholelwa ukuba abantu kunye nezinye izilwanyana ngokwenene zisebenza kwinqanaba eliphezulu kwaye ngokuqhubekayo zifunda ukuqikelela isiphumo sokuziphatha kwabo kwiimeko ezizayo ngokusekwe kwimivuzo enokubakho.

Indima ephambili yoqikelelo ekufundeni kumava itshintsha i-dynamics echazwe ngasentla ngeendlela ezibalulekileyo. Isiginali ebikade ithathwa njengencinci kakhulu (i-episodic umvuzo) ijika ibe xinene kakhulu. Ngokwethiyori, imeko ifana nale: nangaliphi na ixesha, ingqondo yesilwanyana esanyisayo ibala iziphumo ezisekelwe kumlambo ontsonkothileyo wentshukumo yeemvakalelo kunye nezenzo, ngelixa isilwanyana sintywiliselwa nje kulo mjelo. Kule meko, ukuziphatha kokugqibela kwesilwanyana kunika umqondiso oqinileyo omele usetyenziswe ukukhokela ukulungiswa kwezibikezelo kunye nokuphuhliswa kokuziphatha. Ingqondo isebenzisa yonke le miqondiso ukuze kuphuculwe uqikelelo (kwaye, ngokufanelekileyo, umgangatho wamanyathelo athathiweyo) kwixesha elizayo. Isishwankathelo sale ndlela sinikwe kwincwadi ebalaseleyo "UkuSurfing Ukungaqiniseki” Isazinzulu kunye nesithandi sobulumko uAndy Clark. Ukuba sigqithisa ukuqiqa okunjalo kuqeqesho lwee-arhente ezenziweyo, ngoko ke isiphene esisisiseko ekufundiseni ukomeleza siyatyhilwa: umqondiso osetyenziswe kule paradigm ubuthathaka ngokungenathemba xa uthelekiswa nokuba yintoni enokuba yiyo (okanye imele ukuba). Kwiimeko apho kungenakwenzeka ukunyusa i-signal saturation (mhlawumbi ngenxa yokuba ibuthathaka ngokwendalo okanye inxulumene ne-reactivity yezinga eliphantsi), mhlawumbi kungcono ukukhetha indlela yoqeqesho ehambelana kakuhle, umzekelo, i-ES.

Uqeqesho olutyebileyo lothungelwano lwe-neural

Ukwakha phezu kwemigaqo yomsebenzi ophezulu we-neural okhoyo kwingqondo ye-mammalian, ehlala ixakeke ngokwenza uqikelelo, inkqubela phambili yakutshanje yenziwe ekufundiseni ukomeleza, ngoku ithathela ingqalelo ukubaluleka koqikelelo olunjalo. Ndingacebisa ngokukhawuleza imisebenzi emibini efanayo kuwe:

Kuwo omabini la maphepha, ababhali bongeza umgaqo-nkqubo ongagqibekanga oqhelekileyo wothungelwano lwabo lwe-neural kunye neziphumo zokuqikelela malunga nemeko yokusingqongileyo kwixesha elizayo. Kwinqaku lokuqala, ukubikezelwa kusetyenziswa kwiindidi ezahlukeneyo zokulinganisa, kwaye okwesibini, ukubikezela kusetyenziswa utshintsho kwimo engqongileyo kunye nokuziphatha kwe-arhente. Kuzo zombini iimeko, umqondiso we-sarse ohambelana nokuqiniswa okulungileyo uba nobutyebi kunye nolwazi oluninzi, okuvumela ukufunda ngokukhawuleza kunye nokufumana iindlela zokuziphatha ezinzima. Uphuculo olunjalo lufumaneka kuphela ngeendlela ezisebenzisa isignali ye-gradient, kwaye kungekhona ngeendlela ezisebenza kumgaqo "webhokisi elimnyama", njenge-ES.

Ukongeza, ukufunda kumava kunye neendlela zegradient zisebenza ngakumbi. Kwanakwiimeko apho kwakunokwenzeka ukufundisisa ingxaki ethile usebenzisa indlela ye-ES ngokukhawuleza kunokusebenzisa ukufunda okomeleza, inzuzo yafunyanwa ngenxa yokuba isicwangciso se-ES sibandakanya amaxesha amaninzi idatha ngaphezu kwe-RL. Ukucinga kule meko ngemigaqo yokufunda kwizilwanyana, siphawula ukuba umphumo wokufunda kumzekelo womnye umntu uzibonakalisa emva kwezizukulwana ezininzi, ngelixa ngamanye amaxesha isiganeko esinye esinamava ngokwaso sanele ukuba isilwanyana sifunde isifundo ngonaphakade. Ngexesha uthanda uqeqesho ngaphandle kwemizekelo Ngelixa ingangeni ncam kwiindlela zemveli zokuthambeka, iqondakala ngakumbi kune-ES. Kukho, umzekelo, iindlela ezifana ulawulo lwe-neural episodic, apho ii-Q-values ​​zigcinwa ngexesha loqeqesho, emva koko inkqubo ihlola ngaphambi kokuthatha amanyathelo. Isiphumo yindlela yegradient ekuvumela ukuba ufunde indlela yokusombulula iingxaki ngokukhawuleza kunangaphambili. Kwinqaku malunga nolawulo lwe-neural episodic, ababhali bakhankanya ihippocampus yomntu, ekwaziyo ukugcina ulwazi malunga nesiganeko nangemva kwamava omnye kwaye, ngenxa yoko, idlala. indima ebalulekileyo kwinkqubo yokukhumbula. Iindlela ezinjalo zifuna ukufikelela kwintlangano yangaphakathi ye-arhente, nayo, ngenkcazo, ayinakwenzeka kwi-ES paradigm.

Ngoko, kutheni ungazidibanisi?

Kusenokwenzeka ukuba uninzi lweli nqaku lunokushiya uluvo lokuba ndithethelela iindlela ze-RL. Nangona kunjalo, ngokwenene ndicinga ukuba ekuhambeni kwexesha isisombululo esona sihle kukudibanisa zombini iindlela, ukwenzela ukuba nganye isetyenziswe kwiimeko ezifanelekileyo. Ngokucacileyo, kwimeko yemigaqo-nkqubo emininzi esebenzayo okanye kwiimeko ezineempawu ezinqabileyo zokuqiniswa okuqinisekileyo, i-ES iyaphumelela, ngakumbi ukuba unamandla ekhompyuter onawo apho unokuqhuba khona uqeqesho olunxuseneyo. Kwelinye icala, iindlela zokuhla ezisebenzisa ukuqinisa ukufunda okanye ukufunda okugadwayo ziya kuba luncedo xa sinokufikelela kwingxelo ebanzi kwaye kufuneka sifunde indlela yokusombulula ingxaki ngokukhawuleza nangedatha encinci.

Ukuguqukela kwindalo, sifumanisa ukuba indlela yokuqala, ngokwenene, ibeka isiseko sesibini. Yiyo loo nto, ekuhambeni kwexesha lokuzivelela kwezinto, izilwanyana ezanyisayo ziye zavelisa ubuchopho obuzivumela ukuba zifunde ngokugqibeleleyo kwimiqondiso entsonkothileyo evela kwindalo esingqongileyo. Ngoko, umbuzo uhlala uvulekile. Mhlawumbi amacebo endaleko aya kusinceda ukuba siyile iindlela zokufunda ezisebenzayo neziya kuba luncedo kwiindlela zokufunda ezithambekileyo. Ngapha koko, isisombululo esifunyanwa yindalo ngokwenene siphumelele kakhulu.

umthombo: www.habr.com

Yongeza izimvo