ʻO ke aʻo ʻana o ka hui kilokilo

E Habr! Ke kono nei mākou i nā ʻenehana ʻikepili a me nā loea aʻo mīkini i kahi haʻawina Demo manuahi "Hoʻokomo i nā hiʻohiʻona ML i ka ʻenehana ʻenehana me ka hoʻohana ʻana i ka laʻana o nā ʻōlelo aʻoaʻo pūnaewele". Hoʻopuka pū mākou i kahi ʻatikala na Luca Monno - Ke poʻo o ka ʻikepili kālā ma CDP SpA.

ʻO kekahi o nā ʻano hana aʻo mīkini maʻalahi a maʻalahi ʻo Ensemble Learning. ʻO ka Ensemble Learning ke kumu kumu no XGBoost, Bagging, Random Forest, a me nā mea algorithm ʻē aʻe.

Nui nā ʻatikala nui ma Towards Data Science, akā ua koho au i ʻelua moʻolelo (ʻo ka mua и kekona) aʻu i makemake nui ai. No laila no ke aha e kākau ai i ʻatikala ʻē aʻe e pili ana iā EL? No ka mea makemake wau e hōʻike iā ʻoe pehea ia e hana ai ma kahi laʻana maʻalahi, ʻo ia ka mea i hoʻomaopopo iaʻu ʻaʻohe kupua ma ʻaneʻi.

I koʻu ʻike mua ʻana iā EL i ka hana (e hana ana me kekahi mau hiʻohiʻona regression maʻalahi loa) ʻaʻole hiki iaʻu ke manaʻoʻiʻo i koʻu mau maka, a ke hoʻomanaʻo nei au i ke kaukaʻi nāna i aʻo mai iaʻu i kēia ʻano.

Ua loaʻa iaʻu ʻelua mau hiʻohiʻona like ʻole (ʻelua mau algorithm aʻo nāwaliwali) me nā exponents ma waho o ka laʻana R² like me 0,90 a me 0,93 pakahi. Ma mua o ka nānā ʻana i ka hopena, manaʻo wau e loaʻa iaʻu ka R² ma waena o nā kumu waiwai mua ʻelua. I nā huaʻōlelo ʻē aʻe, manaʻo wau e hiki ke hoʻohana ʻia ʻo EL e hana ʻole i ke kumu hoʻohālike e like me ke kumu hoʻohālike maikaʻi loa, akā ʻaʻole e like me ke kumu hoʻohālike maikaʻi loa.

I koʻu kahaha nui, ua hāʻawi ka hopena o ka awelika maʻalahi o nā wānana i kahi R² o 0,95. 

I ka hoʻomaka ʻana ua hoʻomaka wau e ʻimi i kahi hewa, akā ua manaʻo wau aia paha kekahi mea kilokilo i hūnā ʻia ma ʻaneʻi!

He aha ka Ensemble Learning

Me EL, hiki iā ʻoe ke hoʻohui i nā wānana o ʻelua a ʻoi aku paha nā hiʻohiʻona e loaʻa i kahi kumu hoʻohālike hilinaʻi a hoʻokō. Nui nā ʻano hana no ka hana ʻana me nā ensembles o nā hiʻohiʻona. Eia wau e hoʻopā aku i ʻelua mau mea pono loa e hāʻawi iā ʻoe i manaʻo.

Me ke kōkuaʻana o hoʻihoʻi hou hiki iā ʻoe ke awelika i ka hana o nā hiʻohiʻona i loaʻa.

Me ke kōkuaʻana o hoʻokaʻawale hiki iā ʻoe ke ʻae i nā kumu hoʻohālike e koho i nā lepili. ʻO ka lepili i koho pinepine ʻia ʻo ia ka mea e koho ʻia e ke kumu hoʻohālike hou.

No ke aha e hana maikaʻi ai ʻo EL

ʻO ke kumu nui o ka hana maikaʻi ʻana o EL no ka mea he hewa kēlā me kēia wānana (ʻike mākou i kēia mai ke kumumanaʻo probability), hiki i ka hui ʻana i ʻelua wānana ke kōkua i ka hōʻemi ʻana i ka hewa, a pēlā e hoʻomaikaʻi ai i nā hōʻailona hana (RMSE, R², etc.). d.).

Hōʻike ke kiʻikuhi aʻe i ka hana ʻana o nā algorithm nāwaliwali ʻelua ma kahi waihona. ʻO ka algorithm mua he slope ʻoi aku ka nui ma mua o ka pono, aʻo ka lua he aneane ʻole (ma muli paha o ka hoʻonohonoho ʻana i ka nui). Akā Kanakaʻole hōʻike i nā hopena maikaʻi aʻe. 

Inā ʻoe e nānā i ka R², a laila e like ka algorithm aʻo mua a me ka lua me -0.01¹, 0.22, i kēlā me kēia, ʻoiai no ka hui e like ia me 0.73.

ʻO ke aʻo ʻana o ka hui kilokilo

Nui nā kumu e hiki ai i kahi algorithm ke lilo i kumu hoʻohālike maikaʻi ʻole no kahi kumu kumu e like me kēia: malia paha ua hoʻoholo ʻoe e hoʻohana i ka regularization e pale aku ai i ka overfitting, a i ʻole ʻoe i hoʻoholo ʻaʻole e hoʻopau i kekahi mau anomalies, a i ʻole ua hoʻohana ʻoe i ka regression polynomial a koho i ka degere hewa. (no ka laʻana, hoʻohana i ka polynomial o ke degere ʻelua, a hōʻike ka ʻikepili hoʻāʻo i kahi asymmetry akaka, kahi e kūpono ai ke degere ʻekolu).

I ka hana o EL

E nānā kākou i ʻelua mau algorithm aʻo e hana ana ma ka ʻikepili like.

ʻO ke aʻo ʻana o ka hui kilokilo

Maanei hiki iā ʻoe ke ʻike ʻaʻole i hoʻomaikaʻi nui ka hoʻohui ʻana i nā hiʻohiʻona ʻelua. I ka hoʻomaka ʻana, no nā algorithm hoʻomaʻamaʻa ʻelua, ʻo nā waiwai R² ʻo -0,37 a me 0,22, kēlā me kēia, a no ka hui ʻana ua lilo ia i -0,04. ʻO ia hoʻi, ua loaʻa i ke kumu hoʻohālike EL ka waiwai awelika o nā hōʻailona.

Eia nō naʻe, aia ka ʻokoʻa nui ma waena o kēia mau hiʻohiʻona ʻelua: ma ka laʻana mua, ua hoʻopili maikaʻi ʻia nā hewa o nā hiʻohiʻona, a ma ka lua - maikaʻi (ʻaʻole i manaʻo ʻia nā coefficients o nā hiʻohiʻona ʻekolu, akā ua koho wale ʻia e ka mea kākau. he laʻana.)

No laila, hiki ke hoʻohana ʻia ka Ensemble Learning no ka hoʻomaikaʻi ʻana i ke kaulike / dispersion i nā hihia āpau, akā i ka wā ʻAʻole i hoʻopili maikaʻi ʻia nā hewa kumu hoʻohālike, me ka hoʻohana ʻana iā EL hiki ke alakaʻi i ka hana ʻoi aku ka maikaʻi.

Nā hiʻohiʻona kūlike a me ka heterogeneous

Hoʻohana pinepine ʻia ʻo EL ma nā hiʻohiʻona homogeneous (e like me kēia hiʻohiʻona a i ʻole nahele ulu lāʻau), akā ʻoiaʻiʻo hiki iā ʻoe ke hoʻohui i nā hiʻohiʻona like ʻole (linear regression + neural network + XGBoost) me nā pūʻulu like ʻole o nā ʻano wehewehe wehewehe. E alakaʻi paha kēia i nā kuhi hewa ʻole a hoʻomaikaʻi i ka hana.

Hoʻohālikelike me ka hoʻokaʻawale portfolio

Hana ʻo EL ma ke ʻano like me ka diversification i ka manaʻo portfolio, akā ʻoi aku ka maikaʻi no mākou. 

Ke hoʻololi ʻoe, e hoʻāʻo ʻoe e hōʻemi i ka ʻokoʻa o kāu hana ma o ka hoʻokomo ʻana i nā waihona i hoʻopili ʻole ʻia. ʻOi aku ka maikaʻi o ka hoʻokō ʻana o kahi kōpili waiwai ma mua o ka waiwai hoʻokahi, akā ʻaʻole i ʻoi aku ka maikaʻi ma mua o ka maikaʻi.

Wahi a Warren Buffett: 

"ʻO ka ʻokoʻa ka pale ʻana i ka naʻaupō, no ka mea ʻike ʻole i kāna mea e hana nei, he mea liʻiliʻi loa ia [diversification]."

Ma ke aʻo ʻana i nā mīkini, kōkua ʻo EL e hōʻemi i ke ʻano like ʻole o kāu kumu hoʻohālike, akā hiki ke hopena i kahi hoʻohālike me ka hana holoʻokoʻa ʻoi aku ka maikaʻi ma mua o ke kumu hoʻohālike maikaʻi loa.

E hōʻuluʻulu i nā hualoaʻa

ʻO ka hoʻohui ʻana i nā hiʻohiʻona he nui i hoʻokahi ʻano hana maʻalahi e hiki ke alakaʻi i kahi hoʻonā i ka pilikia bias varice a me ka hoʻomaikaʻi ʻana i ka hana.

Inā loaʻa iā ʻoe ʻelua a ʻoi aku paha nā hiʻohiʻona e hana maikaʻi ana, mai koho ma waena o lākou: e hoʻohana iā lākou āpau (akā me ka akahele)!

Makemake ʻoe e hoʻomohala ma kēia ʻaoʻao? E kākau inoa no kahi haʻawina demo manuahi "Hoʻokomo i nā hiʻohiʻona ML i ka ʻenehana ʻenehana me ka hoʻohana ʻana i ka laʻana o nā ʻōlelo aʻoaʻo pūnaewele" a komo i loko hui pūnaewele me Andrey Kuznetsov — ʻEnekinia Aʻo Mīkini ma Mail.ru Group.

Source: www.habr.com

Pākuʻi i ka manaʻo hoʻopuka