Isifundo seMagic Ensemble

Hayi Habr! Simema iiNjineli zeDatha kunye neengcali zokuFunda koMatshini kwisifundo seDemo sasimahla "Imveliso yeemodeli zeML kwindawo yoshishino isebenzisa umzekelo weengcebiso ze-intanethi". Siphinde sishicilele inqaku uLuca Monno-iNtloko ye-Financial Analytics kwi-CDP SpA.

Enye yezona ndlela ziluncedo nezilula zokufunda koomatshini kukufunda ngokudibeneyo. I-Ensemble Learning yindlela esemva kwe-XGBoost, i-Bagging, iHlathi eliHlangeneyo kunye nezinye iindlela ezininzi.

Kukho amanqaku amaninzi amnandi kwiNzululwazi yeDatha, kodwa ndakhetha amabali amabini (kuqala ΠΈ okwesibini) eyona nto ndiyithanda kakhulu. Ngoko kutheni ubhala elinye inqaku malunga ne-EL? Kuba ndifuna ukukubonisa isebenza njani ngomzekelo olula, lonto yandenza ndaqonda ukuba akukho mlingo apha.

Xa ndiqala ukubona u-EL esebenza (esebenza kunye neemodeli ezilula kakhulu zokunciphisa) Andizange ndikholelwe amehlo am, kwaye ndisakhumbula unjingalwazi owandifundisa le ndlela.

Ndineemodeli ezimbini ezahlukeneyo (ii-algorithms zoqeqesho ezibuthathaka) ezineemethrikhi ngaphandle kwesampulu I-RΒ² ilingana no-0,90 kunye no-0,93, ngokulandelelanayo. Phambi kokujonga isiphumo, bendicinga ukuba ndingafumana i-RΒ² kwindawo ethile phakathi kwamaxabiso amabini oqobo. Ngamanye amazwi, bendikholelwa ukuba i-EL ingasetyenziselwa ukwenza imodeli ingasebenzi kakuhle njengeyona modeli imbi, kodwa hayi kanye njengeyona modeli ibalaseleyo inokusebenza.

Okothusayo kum, ukuba nje i-avareji yoqikelelo luvelise i-RΒ² ye-0,95. 

Ekuqaleni ndaqala ukukhangela impazamo, kodwa emva koko ndacinga ukuba kukho umlingo ozifihlayo apha!

Yintoni i-Ensemble Learning

Nge-EL, unokudibanisa ukuqikelelwa kweemodeli ezimbini okanye ngaphezulu ukuvelisa imodeli eyomeleleyo kunye nesebenzayo. Zininzi iindlela zokusebenza zokusebenza ngeemodeli zeensembles. Apha ndiza kuchukumisa kwezi zimbini ziluncedo ukunika isishwankathelo.

Ngo kunceda ukuhlehla kuyenzeka ukulinganisa ukusebenza kweemodeli ezikhoyo.

Ngo kunceda ukuhlelwa Unokunika iimodeli ithuba lokukhetha iilebula. Ileyibhile ekhethwe rhoqo yileyo iya kukhethwa yimodeli entsha.

Kutheni i-EL isebenza ngcono

Esona sizathu siphambili sokuba kutheni i-EL iqhube ngcono kukuba zonke iingqikelelo zinempazamo (siyazi oku kwithiyori enokwenzeka), ukudibanisa iingqikelelo ezimbini kunokunceda ukunciphisa impazamo, kwaye ke ngoko kuphuculwe iimethrikhi zokusebenza (RMSE, RΒ², njl.). d.).

Lo mzobo ulandelayo ubonisa indlela ii-algorithms ezimbini ezibuthathaka ezisebenza ngayo kwiseti yedatha. I-algorithm yokuqala inethambeka elikhulu kunelo lifunekayo, ngelixa elesibini liphantse libe ngu-zero (mhlawumbi ngenxa yohlengahlengiso olugqithisileyo). Kodwa zomculo ibonisa iziphumo ezingcono kakhulu. 

Ukuba ujonga isalathisi se-RΒ², ngoko i-algorithm yoqeqesho lokuqala neyesibini iyakulingana no -0.01ΒΉ, 0.22, ngokulandelelanayo, ngelixa i-ensemble iya kulingana no-0.73.

Isifundo seMagic Ensemble

Kukho izizathu ezininzi zokuba kutheni i-algorithm inokuba ngumzekelo ombi nakumzekelo osisiseko onje: mhlawumbi uthathe isigqibo sokusebenzisa uhlengahlengiso ukunqanda ukugqwesa, okanye uthathe isigqibo sokungakhupheli ngaphandle ezinye izinto ezingaqhelekanga, okanye mhlawumbi usebenzise uhlengahlengiso lwepolynomial kwaye wafumana impazamo. idigri (umzekelo, sisebenzise i-polynomial yesidanga sesibini, kwaye idatha yovavanyo ibonisa i-asymmetry ecacileyo apho i-degree yesithathu iya kufaneleka kangcono).

Xa i-EL isebenza ngcono

Makhe sijonge kwiialgorithms ezimbini zokufunda ezisebenza ngedatha efanayo.

Isifundo seMagic Ensemble

Apha unokubona ukuba ukudibanisa iimodeli ezimbini akuzange kuphucule ukusebenza kakhulu. Ekuqaleni, kwii-algorithms zoqeqesho ezimbini, izikhombisi ze-RΒ² zazilingana ne--0,37 kunye ne-0,22, ngokulandelanayo, kwaye kwi-ensemble yaba yi-0,04. Oko kukuthi, imodeli ye-EL ifumene ixabiso eliphakathi kwezikhombisi.

Nangona kunjalo, kukho umahluko omkhulu phakathi kwale mizekelo mibini: kumzekelo wokuqala, iimposiso zemodeli zazinxibelelene kakubi, kwaye kowesibini, zazinxibelelene kakuhle (i-coefficients yeemodeli ezintathu azizange ziqikelelwe, kodwa zikhethwe ngokulula umbhali njengomzekelo.)

Ke ngoko, i-Ensemble Learning ingasetyenziselwa ukuphucula ibhalansi yecala / umahluko kuyo nayiphi na imeko, kodwa nini Iimpazamo zemodeli azihambelani kakuhle, ukusebenzisa i-EL kunokukhokelela ekusebenzeni okuphuculweyo.

Iimodeli ezihambelanayo kunye nezingafaniyo

Amaxesha amaninzi i-EL isetyenziswa kwiimodeli ezilinganayo (njengakulo mzekelo okanye ihlathi elingaqhelekanga), kodwa eneneni unokudibanisa iimodeli ezahlukeneyo (i-linear regression + neural network + XGBoost) kunye neeseti ezahlukeneyo zeenguqu ezichazayo. Oku kunokukhokelela kwiimpazamo ezingadityaniswanga kunye nokusebenza okuphuculweyo.

Ukuthelekiswa nokwahluka kweepotfoliyo

I-EL isebenza ngokufanayo kulwahlulo kwithiyori yepotfoliyo, kodwa kungcono kakhulu kuthi. 

Xa udibanisa, uzama ukunciphisa ukuhluka komsebenzi wakho ngokutyala imali kwizitokhwe ezingahambelaniyo. Iphothifoliyo eyohlukeneyo yesitokhwe iya kwenza ngcono kuneyona mpahla imbi kakhulu, kodwa ayinakuze ibengcono kuneyona nto ibhetele.

Ukucaphula uWarren Buffett: 

"Ukwahlukahlukana kukukhusela ekungazini; kumntu ongaziyo into ayenzayo, [ukwahlukana] kwenza ingqiqo encinci."

Ekufundeni koomatshini, i-EL inceda ukunciphisa ukungafani kwemodeli yakho, kodwa inokubangela imodeli enomsebenzi ongcono kuneyona modeli ibalaseleyo.

Masibhale iziphumo

Ukudibanisa iimodeli ezininzi zibe yinto enye yindlela elula kakhulu enokukhokelela ekusombululeni ingxaki yokwahluka kunye nokuphucula ukusebenza.

Ukuba uneemodeli ezimbini okanye ngaphezulu ezisebenza kakuhle, musa ukukhetha phakathi kwazo: zisebenzise zonke (kodwa ngononophelo)!

Ngaba unomdla wokuphuhlisa kweli cala? Bhalisela idemo isifundo simahla "Imveliso yeemodeli zeML kwindawo yoshishino isebenzisa umzekelo weengcebiso ze-intanethi" kwaye uthathe inxaxheba kwi intlanganiso ye-intanethi kunye no-Andrey Kuznetsov -Injineli yokuFunda ngoomatshini kwi-Mail.ru Group.

umthombo: www.habr.com

Yongeza izimvo