Kusarudzwa kwechimiro mukudzidza kwemichina

Hei Habr!

Isu paReksoft takashandurira nyaya yacho muchiRussian Feature Sarudzo muMuchina Kudzidza. Tinovimba ichabatsira kune wese anofarira musoro wenyaya.

Munyika chaiyo, data haisi nguva dzose yakachena sekufunga kwevatengi vebhizinesi dzimwe nguva. Ichi ndicho chikonzero kuchera data uye kukakavara kwedata kuri kudiwa. Inobatsira kuona hunhu husipo uye mapatani mubvunzo-yakarongeka data isingagoni kuzivikanwa nevanhu. Kuti uwane uye ushandise mapatani aya kufanotaura mhedzisiro uchishandisa hukama hwakawanikwa mune data, kudzidza muchina kunouya kunobatsira.

Kuti unzwisise chero algorithm, iwe unofanirwa kutarisa kune ese akasiyana mune data uye kuona izvo izvo zvakasiyana zvinomiririra. Izvi zvakakosha nekuti zvikonzero zviri kumashure kwemhedzisiro zvinobva pakunzwisisa data. Kana iyo data iine 5 kana kunyange makumi mashanu akasiyana, unogona kuongorora ese. Ko kana varipo mazana maviri? Ipapo hapazongove nenguva yakakwana yekudzidza yega yega yega. Uyezve, mamwe maalgorithms haashande kune dhata yemhando, uye ipapo iwe uchafanirwa kushandura ese categorical columns kuti ive yakawanda inosiyana (inogona kutaridzika kuwanda, asi metrics icharatidza kuti ndeye categorical) kuti uwedzere kune iyo modhi. Nokudaro, nhamba yezvinyorwa zvinowedzera, uye kune vanenge 50. Chii chaunofanira kuita iye zvino? Mumwe angafunga kuti mhinduro yaizova kuderedza dimensionality. Dimensionality reduction algorithms inoderedza huwandu hwema parameter asi ine yakaipa kukanganisa kududzira. Ko kana paine mamwe maitiro anobvisa maficha uku achiita kuti mamwe asara ave nyore kunzwisisa nekududzira?

Zvichienderana nekuti ongororo yacho yakavakirwa pakudzoreredza kana kupatsanurwa, iyo yemhando sarudzo algorithms inogona kusiyana, asi pfungwa huru yekuitwa kwavo inoramba yakafanana.

Zvikuru Zvinoenderana Variables

Misiyano yakabatana zvakanyanya kune imwe neimwe inopa ruzivo rwakafanana kumuenzaniso, saka hapana chikonzero chekuashandisa ese pakuongorora. Semuyenzaniso, kana dhatabheti rine zvimiro zve "Nguva Yepamhepo" uye "Traffic Yakashandiswa", tinogona kufunga kuti ichave yakabatana, uye isu tichaona kuwirirana kwakasimba kunyangwe tikasarudza isina kurerekera data data. Muchiitiko ichi, imwe chete yemhando idzi inodiwa mumuenzaniso. Kana iwe ukashandisa ese ari maviri, modhi ichave yakawandisa uye yakarerekera kune chimwe chinhu.

P-tsika

Mune algorithms senge linear regression, yekutanga manhamba modhi inogara iri zano rakanaka. Inobatsira kuratidza kukosha kwezvimiro kuburikidza ne-p-values ​​yavo yakawanikwa nemuenzaniso uyu. Kana taseta nhanho yekukosha, tinotarisa mhedzisiro yep-values, uye kana chero kukosha kuri pazasi peyakatarwa nhanho, ipapo chimiro ichi chinoziviswa chakakosha, ndiko kuti, shanduko mukukosha kwayo ingangotungamira mukushandurwa kwehukoshi. chinangwa.

Kusarudzwa kwakananga

Sarudzo yemberi inyanzvi inosanganisira kushandisa kudzoreredza nhanho. Chivakwa chemuenzaniso chinotanga ne zero yakakwana, kureva kuti, muenzaniso usina chinhu, uyezve imwe neimwe iteration inowedzera shanduko inoita kuvandudzwa kwemuenzaniso uri kuvakwa. Ndeipi shanduko inowedzerwa kumuenzaniso inotsanangurwa nekukosha kwayo. Izvi zvinogona kuverengerwa uchishandisa akasiyana metrics. Nzira yakajairika ndeye kushandisa p-values ​​inowanwa mune yekutanga manhamba modhi uchishandisa ese akasiyana. Dzimwe nguva kumberi kusarudzwa kunogona kutungamira mukuwedzeredza modhi nekuti panogona kunge paine zvakanyanyo wiriraniswa zvakasiyana mumuenzaniso, kunyangwe kana vachipa ruzivo rwakafanana kumuenzaniso (asi modhi ichiri kuratidza kuvandudzwa).

Reverse selection

Reverse kusarudzwa kunosanganisirawo nhanho-ne-nhanho kubviswa kwehunhu, asi mune yakapesana inofananidzwa nekumberi kusarudzwa. Muchiitiko ichi, muenzaniso wekutanga unosanganisira zvose zvakasununguka zvakasiyana. Misiyano inobva yabviswa (imwe pakudzokororwa) kana ikasapira kukosha kumuenzaniso mutsva wekudzoreredza mune imwe neimwe iteration. Kusarudzika kwechimiro kunobva pane p-values ​​yeiyo yekutanga modhi. Iyi nzira zvakare ine kusavimbika kana uchibvisa zvakanyanya kuenderana akasiyana.

Recursive Feature Kubvisa

RFE inzira inoshandiswa zvakanyanya / algorithm yekusarudza iyo chaiyo nhamba yeakakosha maficha. Dzimwe nguva nzira inoshandiswa kutsanangura huwandu hwe "zvakakosha" zvinhu zvinopesvedzera mhedzisiro; uye dzimwe nguva kuderedza nhamba huru kwazvo yezvinosiyana (anenge 200-400), uye izvo chete izvo zvinopa zvishoma mupiro kune muenzaniso zvinochengetwa, uye vamwe vose vanobviswa. RFE inoshandisa hurongwa hwekugadzirisa. Zvimiro zviri mu data seti zvakapihwa masanji. Aya masanji anozoshandiswa kudzokorodza kubvisa zvimiro zvinoenderana nekubatana kuri pakati pavo uye kukosha kweaya maficha mumuenzaniso. Pamusoro pezvimiro zvezvimiro, RFE inogona kuratidza kuti aya maficha akakosha here kana kuti kwete kune yakapihwa nhamba yezvimiro (nekuti zvinogoneka kuti iyo yakasarudzwa nhamba yezvimiro inogona kunge isiri iyo yakakwana, uye iyo yakakwana nhamba yezvimiro inogona kunge yakawanda. kana kuti shoma pane yakasarudzwa nhamba).

Feature Kukosha Dhiyagiramu

Kana tichitaura pamusoro pekududzirwa kwemaalgorithms emuchina wekudzidza, isu tinowanzo kurukura mutsara kudzokorora (izvo zvinokutendera kuti uongorore kukosha kwezvinhu uchishandisa p-values) nemiti yesarudzo (inonyatso kuratidza kukosha kwezvinhu muchimiro chemuti, uye pa panguva imwe chete hierarchy yavo). Kune rumwe rutivi, maalgorithms akadai saRandom Forest, LightGBM uye XG Boost anowanzo shandisa dhiyabhorosi yakakosha dhizaini, ndiko kuti, dhayagiramu yezvakasiyana uye "nhamba dzinokosha" dzakarongwa. Izvi zvinonyanya kubatsira kana iwe uchida kupa yakarongeka zvikonzero zvekukosha kwehunhu maererano nekukanganisa kwavo mubhizinesi.

Regularization

Regularization inoitwa kudzora chiyero pakati pekurerekera uye musiyano. Bias inoratidza kuti yakawanda sei modhi yakawandisa pane yekudzidziswa data set. Iko kutsauka kunoratidza kuti fungidziro dzaive dzakasiyana sei pakati pekudzidziswa uye bvunzo dataset. Nenzira yakanaka, zvose zvakarerekera uye kusiyana kunofanira kunge kuri kudiki. Apa ndipo panouya kugara nguva dzose kununura! Pane nzira mbiri huru:

L1 Regularization - Lasso: Lasso inoranga maremu emodhi kuti achinje kukosha kwawo kune modhi uye anogona kutomboabvisa (kureva kubvisa izvo zvinosiyana kubva kune yekupedzisira modhi). Kazhinji, Lasso inoshandiswa apo dhatabheti rine nhamba huru yezvinyorwa uye iwe unoda kusabvisa zvimwe zvacho kuti unzwisise zviri nani kuti zvakakosha zvinokanganisa sei muenzaniso (kureva, izvo zvakasarudzwa neLasso uye zvakagoverwa kukosha).

L2 Regularization - Nzira yeRidge: Basa raRidge ndere kuchengeta zvese zvinosiyana uye panguva imwe chete kugovera kukosha kwavari zvichienderana nekubatsira kwavo mukuita kwemuenzaniso. Ridge ichava sarudzo yakanaka kana dhatabheti rine nhamba diki yezvinosiyana uye zvese zvakakosha kududzira zvakawanikwa uye mhedzisiro yakawanikwa.

Sezvo Ridge ichichengeta zvese zvinosiyana uye Lasso ichiita basa riri nani rekumisikidza kukosha kwazvo, algorithm yakagadziridzwa iyo inosanganisa akanakisa maficha eese ari maviri enguva dzose, anozivikanwa seElastic-Net.

Kune dzimwe nzira dzakawanda dzekusarudza maficha ekudzidza muchina, asi pfungwa huru inogara yakafanana: ratidza kukosha kwezvinosiyana wobva wabvisa zvimwe zvacho zvichibva pakukosha kwazvinoita. Kukosha ishoko rinozvimiririra, sezvo risiri rimwechete, asi seti yese yemetrics uye machati anogona kushandiswa kutsvaga akakosha hunhu.

Ndinokutendai nekuverenga! Kufara kudzidza!

Source: www.habr.com

Voeg