ʻO kāu hana mua i ka ʻIkepili ʻIkepili. Titanic

He ʻōlelo hoʻolauna pōkole

Ke manaʻoʻiʻo nei au hiki iā mākou ke hana i nā mea hou aku inā hāʻawi ʻia iā mākou nā ʻōlelo aʻoaʻo i kēlā me kēia ʻanuʻu e haʻi iā mākou i ka mea e hana ai a pehea e hana ai. Hoʻomanaʻo wau iho i nā manawa o koʻu ola ʻaʻole hiki iaʻu ke hoʻomaka i kekahi mea no ka mea paʻakikī ke hoʻomaopopo i kahi e hoʻomaka ai. Malia paha, i kekahi manawa ma ka Pūnaewele ua ʻike ʻoe i nā huaʻōlelo "Data Science" a ua hoʻoholo ʻoe ua mamao loa ʻoe mai kēia, a ʻo ka poʻe e hana nei i kekahi wahi ma waho, ma kahi honua ʻē aʻe. ʻAʻole, aia lākou ma ʻaneʻi. A, mahalo paha i ka poʻe mai kēia kahua, ua ʻike ʻia kahi ʻatikala ma kāu hānai. Nui nā papa e kōkua iā ʻoe e maʻa i kēia hana, akā eia wau e kōkua iā ʻoe e hana i ka hana mua.

ʻAe, ua mākaukau ʻoe? E haʻi koke wau e pono ʻoe e ʻike iā Python 3, ʻoiai ʻo ia kaʻu e hoʻohana ai ma aneʻi. Aʻo wau iā ʻoe e hoʻokomo iā ia ma ka Jupyter Notebook ma mua a i ʻole e ʻike pehea e hoʻohana ai i ka google colab.

Ka Papa Hana

ʻO kāu hana mua i ka ʻIkepili ʻIkepili. Titanic

ʻO Kaggle kou kōkua koʻikoʻi ma kēia mea. Ma ke kumu, hiki iā ʻoe ke hana me ka ʻole, akā e kamaʻilio wau e pili ana i kēia ma kekahi ʻatikala. He kahua kēia e mālama ai i nā hoʻokūkū ʻIke ʻIkepili. I kēlā me kēia hoʻokūkū, i ka wā mua e loaʻa iā ʻoe ka nui o ka ʻike maoli ʻole i ka hoʻoponopono ʻana i nā pilikia o nā ʻano like ʻole, ka ʻike hoʻomohala a me ka ʻike hana i kahi hui, he mea nui i ko mākou manawa.

E lawe mākou i kā mākou hana mai laila. Ua kapa ʻia ʻo "Titanic". ʻO kēia ke kūlana: wānana inā e ola kēlā me kēia kanaka. ʻO ka mea maʻamau, ʻo ka hana a kahi kanaka i komo i ka DS ʻo ka hōʻiliʻili ʻana i ka ʻikepili, ka hoʻoponopono ʻana iā ia, ke aʻo ʻana i kahi kumu hoʻohālike, ka wānana, a pēlā aku. Ma kaggle, ʻae ʻia mākou e hoʻokuʻu i ka pae hōʻiliʻili ʻikepili - hōʻike ʻia lākou ma ka paepae. Pono mākou e hoʻoiho iā lākou a hiki iā mākou ke hoʻomaka!

Hiki iā ʻoe ke hana penei:

Aia ka waihona ʻikepili i nā faila i loaʻa ka ʻikepili

ʻO kāu hana mua i ka ʻIkepili ʻIkepili. Titanic

ʻO kāu hana mua i ka ʻIkepili ʻIkepili. Titanic

Hoʻoiho mākou i ka ʻikepili, hoʻomākaukau i kā mākou puke puke Jupyter a ...

Papaʻelua

Pehea mākou e hoʻouka ai i kēia ʻikepili?

ʻO ka mua, e hoʻokomo i nā hale waihona puke e pono ai:

import pandas as pd
import numpy as np

Na Pandas e ʻae iā mākou e hoʻoiho i nā faila .csv no ka hana hou ʻana.

Pono ʻo Numpy e hōʻike i kā mākou papa ʻikepili ma ke ʻano he matrix me nā helu.
Hele i mua. E lawe kāua i ka faila train.csv a hoʻouka iā mākou:

dataset = pd.read_csv('train.csv')

E kuhikuhi mākou i kā mākou koho ʻikepili train.csv ma ​​o ka hoʻololi ʻikepili. E ʻike kākou i ka mea i laila:

dataset.head()

ʻO kāu hana mua i ka ʻIkepili ʻIkepili. Titanic

ʻO ka hana poʻo () hiki iā mākou ke nānā i nā lālani mua o kahi dataframe.

ʻO nā kolamu Survived ʻo ia kā mākou hopena, i ʻike ʻia ma kēia ʻikepili. No ka nīnau hana, pono mākou e wānana i ke kolamu Survived no ka ʻikepili test.csv. Mālama kēia ʻikepili i ka ʻike e pili ana i nā kaʻa ʻē aʻe o ka Titanic, kahi mākou e hoʻoponopono ai i ka pilikia, ʻaʻole ʻike i ka hopena.

No laila, e hoʻokaʻawale i kā mākou papaʻaina i ʻikepili hilinaʻi a kūʻokoʻa. He mea maʻalahi nā mea a pau maʻaneʻi. ʻO ka ʻikepili hilinaʻi kēlā mau ʻikepili e hilinaʻi ana i ka ʻikepili kūʻokoʻa i loko o nā hopena. ʻO ka ʻikepili kūʻokoʻa nā ʻikepili e pili ana i ka hopena.

No ka laʻana, loaʻa iā mākou kēia pūʻulu ʻikepili:

“Ua aʻo ʻo Vova i ka ʻepekema kamepiula - ʻaʻole.
Ua loaʻa iā Vova he 2 ma ka ʻepekema kamepiula.

Aia ka papa ma ka ʻepekema kamepiula i ka pane i ka nīnau: ua aʻo ʻo Vova i ka ʻepekema kamepiula? Ua maopopo? E neʻe kākou, ua kokoke loa kākou i ka pahuhopu!

ʻO ka loli kuʻuna no ka ʻikepili kūʻokoʻa ʻo X. No ka ʻikepili hilinaʻi, y.

Hana mākou i kēia:

X = dataset.iloc[ : , 2 : ]
y = dataset.iloc[ : , 1 : 2 ]

He aha ia? Me ka hana iloc [:, 2: ] haʻi mākou iā Python: makemake wau e ʻike i ka variable X i ka ʻikepili e hoʻomaka ana mai ke kolamu ʻelua (e komo a hāʻawi ʻia ka helu ʻana mai ka zero). Ma ka laina ʻelua e ʻōlelo mākou makemake mākou e ʻike i ka ʻikepili ma ke kolamu mua.

[ a:b, c:d ] ʻo ia ke kūkulu ʻana i nā mea a mākou e hoʻohana ai i nā pale. Inā ʻaʻole ʻoe e kuhikuhi i nā ʻano hoʻololi, e mālama ʻia lākou ma ke ʻano he paʻamau. ʻO ia hoʻi, hiki iā mākou ke kuhikuhi i [:,: d] a laila e loaʻa iā mākou nā kolamu āpau i ka ʻikepili, koe wale nā ​​mea e hele ana mai ka helu d ma luna. ʻO nā mea hoʻololi a me b e wehewehe i nā kaula, akā pono mākou iā lākou a pau, no laila waiho mākou i kēia ma ke ʻano he paʻamau.

E ʻike kākou i ka mea i loaʻa iā mākou:

X.head()

ʻO kāu hana mua i ka ʻIkepili ʻIkepili. Titanic

y.head()

ʻO kāu hana mua i ka ʻIkepili ʻIkepili. Titanic

I mea e maʻalahi ai kēia haʻawina liʻiliʻi, e hoʻoneʻe mākou i nā kolamu e pono ai ka mālama kūikawā a ʻaʻole pili i ke ola ʻana. Loaʻa iā lākou ka ʻikepili o ke ʻano str.

count = ['Name', 'Ticket', 'Cabin', 'Embarked']
X.drop(count, inplace=True, axis=1)

Nui loa! E neʻe kākou i ka pae aʻe.

Ka Papa Hanaʻekolu

Maanei pono mākou e hoʻopili i kā mākou ʻikepili i maopopo ai ka mīkini i ka hopena o kēia ʻikepili i ka hopena. Akā ʻaʻole mākou e hoʻopili i nā mea āpau, akā ʻo ka ʻikepili str wale nō a mākou i waiho ai. Kolu "Sex". Pehea mākou e makemake ai e code? E hōʻike i ka ʻikepili e pili ana i ke kāne o ke kanaka ma ke ʻano he vector: 10 - kāne, 01 - wahine.

ʻO ka mua, e hoʻololi i kā mākou mau papa i kahi matrix NumPy:

X = np.array(X)
y = np.array(y)

A i kēia manawa e nānā kākou:

from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder

ct = ColumnTransformer(transformers=[('encoder', OneHotEncoder(), [1])],
                       remainder='passthrough')
X = np.array(ct.fit_transform(X))

ʻO ka hale waihona puke ʻo sklearn kahi waihona ʻoluʻolu e hiki ai iā mākou ke hana i ka hana piha i ka ʻIke ʻIke. Loaʻa iā ia kahi helu nui o nā hiʻohiʻona aʻo mīkini hoihoi a hiki iā mākou ke hana i ka hoʻomākaukau ʻikepili.

E ʻae ʻo OneHotEncoder iā mākou e hoʻopaʻa inoa i ke kāne o ke kanaka ma ia hōʻike, e like me kā mākou i wehewehe ai. E hana ʻia nā papa 2: kāne, wahine. Inā he kāne ke kanaka, e kākau ʻia ka 1 ma ke kolamu "kāne", a me ka 0 ma ke kolamu "female".

Ma hope o OneHotEncoder() aia [1] - ʻo ia hoʻi, makemake mākou e hoʻopaʻa i ka helu kolamu 1 (helu mai ka ʻole).

Super. E neʻe hou aku kākou!

E like me ke kānāwai, hiki i kēia ke waiho ʻia kekahi ʻikepili (ʻo ia hoʻi, NaN - ʻaʻole helu). No ka laʻana, aia ka ʻike e pili ana i ke kanaka: kona inoa, ke kāne. Akā ʻaʻohe ʻike e pili ana i kona mau makahiki. I kēia hihia, e hoʻohana mākou i kēia ʻano hana: e ʻike mākou i ka helu helu ma luna o nā kolamu a pau, a inā e nalowale kekahi mau ʻikepili i ke kolamu, a laila e hoʻopiha mākou i ka ʻole me ka helu helu.

from sklearn.impute import SimpleImputer
imputer = SimpleImputer(missing_values=np.nan, strategy='mean')
imputer.fit(X)
X = imputer.transform(X)

I kēia manawa, e noʻonoʻo kākou i nā hanana i ka wā nui loa ka ʻikepili. Aia kekahi ʻikepili i ka wā [0:1], ʻoiai ʻoi aku kekahi ma mua o nā haneli a me nā tausani. No ka hoʻopau ʻana i ka hoʻopuehu ʻana a i ʻoi aku ka pololei o ke kamepiula i kāna helu ʻana, e nānā mākou i ka ʻikepili a hoʻonui iā ia. ʻAʻole ʻoi aku nā helu a pau ma mua o ʻekolu. No ka hana ʻana i kēia, e hoʻohana mākou i ka hana StandardScaler.

from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X[:, 2:] = sc.fit_transform(X[:, 2:])

I kēia manawa ua like kā mākou ʻikepili:

ʻO kāu hana mua i ka ʻIkepili ʻIkepili. Titanic

Papa. Ua kokoke mākou i kā mākou pahuhopu!

ʻEhā

E hoʻomaʻamaʻa kāua i kā mākou kumu hoʻohālike mua! Mai ka hale waihona puke ʻo sklearn hiki iā mākou ke ʻike i kahi helu nui o nā mea hoihoi. Ua hoʻohana au i ke kumu hoʻohālike Gradient Boosting Classifier i kēia pilikia. Hoʻohana mākou i A classifier no ka mea ʻo kā mākou hana he hana hoʻohālikelike. Pono e hāʻawi ʻia ka wānana i ka 1 (ola) a i ʻole 0 (ʻaʻole i ola).

from sklearn.ensemble import GradientBoostingClassifier
gbc = GradientBoostingClassifier(learning_rate=0.5, max_depth=5, n_estimators=150)
gbc.fit(X, y)

Hōʻike ka hana kūpono iā Python: E nānā ke kumu hoʻohālike i nā hilinaʻi ma waena o X a me y.

Ma lalo o hoʻokahi kekona a ua mākaukau ke kumu hoʻohālike.

ʻO kāu hana mua i ka ʻIkepili ʻIkepili. Titanic

Pehea e pili ai? E ʻike kākou i kēia manawa!

ʻElima ʻanuʻu. Ka hopena

I kēia manawa pono mākou e hoʻouka i kahi papaʻaina me kā mākou ʻikepili hōʻike e pono ai mākou e hana i kahi wānana. Me kēia papa e hana mākou i nā hana like a mākou i hana ai no X.

X_test = pd.read_csv('test.csv', index_col=0)

count = ['Name', 'Ticket', 'Cabin', 'Embarked']
X_test.drop(count, inplace=True, axis=1)

X_test = np.array(X_test)

from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
ct = ColumnTransformer(transformers=[('encoder', OneHotEncoder(), [1])],
                       remainder='passthrough')
X_test = np.array(ct.fit_transform(X_test))

from sklearn.impute import SimpleImputer
imputer = SimpleImputer(missing_values=np.nan, strategy='mean')
imputer.fit(X_test)
X_test = imputer.transform(X_test)

from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_test[:, 2:] = sc.fit_transform(X_test[:, 2:])

E hoʻohana i kā mākou kumu hoʻohālike i kēia manawa!

gbc_predict = gbc.predict(X_test)

ʻO nā mea a pau. Hana mākou i kahi wānana. I kēia manawa pono e hoʻopaʻa ʻia ma csv a hoʻouna ʻia i ka pūnaewele.

np.savetxt('my_gbc_predict.csv', gbc_predict, delimiter=",", header = 'Survived')

Mākaukau. Ua loaʻa iā mākou kahi faila i loaʻa nā wānana no kēlā me kēia kaʻa. ʻO nā mea a pau i koe, ʻo ka hoʻouka ʻana i kēia mau hopena i ka pūnaewele a loaʻa kahi loiloi o ka wānana. ʻAʻole hāʻawi ʻia kēlā ʻano hopena primitive ʻaʻole wale 74% o nā pane pololei i ka lehulehu, akā kekahi impetus i ka Data Science. Hiki i ka poʻe hoihoi ke kākau mai iaʻu i nā leka pilikino i kēlā me kēia manawa a nīnau i kahi nīnau. Mahalo i nā mea a pau!

Source: www.habr.com

Pākuʻi i ka manaʻo hoʻopuka