ื“ื™ื™ืŸ ืขืจืฉื˜ืขืจ ืฉืจื™ื˜ ืื™ืŸ ื“ืึทื˜ืึท ื•ื•ื™ืกื ืฉืึทืคึฟื˜. ื˜ื™ื˜ืึทื ื™ืง

ื ืงื•ืจืฅ ื”ืงื“ืžื” ื•ื•ืืจื˜

ืื™ืš ื’ืœื•ื™ื‘ืŸ ืึทื– ืžื™ืจ ืงืขืŸ ื˜ืึธืŸ ืžืขืจ ื˜ื™ื ื’ื– ืื•ื™ื‘ ืžื™ืจ ื–ืขื ืขืŸ ืฆื•ื’ืขืฉื˜ืขืœื˜ ืžื™ื˜ ืฉืจื™ื˜-ื“ื•ืจืš-ืฉืจื™ื˜ ื™ื ืกื˜ืจืึทืงืฉืึทื ื– ื•ื•ืึธืก ื•ื•ืึธืœื˜ ื–ืึธื’ืŸ ืื•ื ื“ื– ื•ื•ืึธืก ืฆื• ื˜ืึธืŸ ืื•ืŸ ื•ื•ื™ ืฆื• ื˜ืึธืŸ ื“ืึธืก. ืื™ืš ื–ื™ืš ื’ืขื“ืขื ืงืขืŸ ืžืึธื•ืžืึทื ืฅ ืื™ืŸ ืžื™ื™ืŸ ืœืขื‘ืŸ ื•ื•ืขืŸ ืื™ืš ืงืขืŸ ื ื™ืฉื˜ ืึธื ื”ื™ื™ื‘ืŸ ืขืคึผืขืก ื•ื•ื™ื™ึทืœ ืขืก ืื™ื– ืคืฉื•ื˜ ืฉื•ื•ืขืจ ืฆื• ืคึฟืึทืจืฉื˜ื™ื™ืŸ ื•ื•ื• ืฆื• ืึธื ื”ื™ื™ื‘ืŸ. ื˜ืึธืžืขืจ, ืึทืžืึธืœ ืื•ื™ืฃ ื“ืขืจ ืื™ื ื˜ืขืจื ืขืฅ, ืื™ืจ ื”ืึธื˜ ื’ืขื–ืขืŸ ื“ื™ ื•ื•ืขืจื˜ืขืจ "ื“ืึทื˜ืึท ื•ื•ื™ืกื ืฉืึทืคึฟื˜" ืื•ืŸ ื‘ืึทืฉืœืึธืกืŸ ืึทื– ืื™ืจ ื–ืขื ื˜ ื•ื•ื™ื™ึทื˜ ืคื•ืŸ ื“ืขื, ืื•ืŸ ื“ื™ ืžืขื ื˜ืฉืŸ ื•ื•ืึธืก ื˜ืึธืŸ ื“ืึธืก ื–ืขื ืขืŸ ืขืจื’ืขืฅ ื“ืึธืจื˜, ืื™ืŸ ืืŸ ืื ื“ืขืจ ื•ื•ืขืœื˜. ื ื™ื™ืŸ, ื–ื™ื™ ื–ืขื ืขืŸ ืจืขื›ื˜ ื“ืึธ. ืื•ืŸ, ื˜ืึธืžืขืจ, ื“ืึทื ืง ืฆื• ืžืขื ื˜ืฉืŸ ืคื•ืŸ ื“ืขื ืคืขืœื“, ืึทืŸ ืึทืจื˜ื™ืงืœ ืื™ื– ืืจื•ื™ืก ืื•ื™ืฃ ื“ื™ื™ืŸ ืคื™ื˜ืขืจ. ืขืก ื–ืขื ืขืŸ ืคื™ืœืข ืงืึธืจืกืึทื– ื•ื•ืึธืก ื•ื•ืขื˜ ื”ืขืœืคึฟืŸ ืื™ืจ ื‘ืึทืงื•ืžืขืŸ ื’ืขื•ื•ื™ื™ื ื˜ ืฆื• ื“ืขื ืžืขืœืึธื›ืข, ืึธื‘ืขืจ ื“ืึธ ืื™ืš ื•ื•ืขืœ ื”ืขืœืคึฟืŸ ืื™ืจ ื ืขืžืขืŸ ื“ื™ ืขืจืฉื˜ืขืจ ืฉืจื™ื˜.

ื ื•, ื–ืขื ื˜ ืื™ืจ ื’ืจื™ื™ื˜? ืœืึธื–ืŸ ืžื™ืจ ื–ืึธื’ืŸ ืื™ืจ ื’ืœื™ื™ืš ืึทื– ืื™ืจ ื•ื•ืขื˜ ื“ืึทืจืคึฟืŸ ืฆื• ื•ื•ื™ืกืŸ Python 3, ื•ื•ื™ื™ึทืœ ื“ืึธืก ืื™ื– ื•ื•ืึธืก ืื™ืš ื•ื•ืขืœ ื ื•ืฆืŸ ื“ืึธ. ืื™ืš ืื•ื™ืš ืจืขืงืึธืžืขื ื“ื™ืจืŸ ืื™ืจ ืฆื• ื™ื ืกื˜ืึทืœื™ืจืŸ ืขืก ืื•ื™ืฃ Jupyter Notebook ืื™ืŸ ืฉื˜ื™ื™ึทื’ืŸ ืึธื“ืขืจ ื–ืขืŸ ื•ื•ื™ ืฆื• ื ื•ืฆืŸ Google ืงืึธืœืึทื‘.

ืขืจืฉื˜ืขืจ ืฉืจื™ื˜

ื“ื™ื™ืŸ ืขืจืฉื˜ืขืจ ืฉืจื™ื˜ ืื™ืŸ ื“ืึทื˜ืึท ื•ื•ื™ืกื ืฉืึทืคึฟื˜. ื˜ื™ื˜ืึทื ื™ืง

Kaggle ืื™ื– ื“ื™ื™ืŸ ื•ื•ื™ื›ื˜ื™ืง ืึทืกื™ืกื˜ืึทื ื˜ ืื™ืŸ ื“ืขื ืขื ื™ืŸ. ืื™ืŸ ืคึผืจื™ื ืฆื™ืคึผ, ืื™ืจ ืงืขื ืขืŸ ื˜ืึธืŸ ืึธืŸ ืขืก, ืึธื‘ืขืจ ืื™ืš ื•ื•ืขืœ ืจืขื“ืŸ ื•ื•ืขื’ืŸ ื“ืขื ืื™ืŸ ืืŸ ืื ื“ืขืจ ืึทืจื˜ื™ืงืœ. ื“ืึธืก ืื™ื– ืึท ืคึผืœืึทื˜ืคืึธืจืžืข ื•ื•ืึธืก ื”ืึธืกืฅ ื“ืึทื˜ืึท ื•ื•ื™ืกื ืฉืึทืคึฟื˜ ืงืึทืžืคึผืึทื˜ื™ืฉืึทื ื–. ืื™ืŸ ื™ืขื“ืขืจ ืึทื–ืึท ืคืึทืจืžืขืกื˜, ืื™ืŸ ื“ื™ ืคืจื™ ืกื˜ืึทื’ืขืก ืื™ืจ ื•ื•ืขื˜ ื‘ืึทืงื•ืžืขืŸ ืึท ืึทื ืจื™ืœื™ืกื˜ื™ืง ื“ืขืจืคืึทืจื•ื ื’ ืื™ืŸ ืกืึทืœื•ื•ื™ื ื’ ืคึผืจืึธื‘ืœืขืžืก ืคื•ืŸ ืคืึทืจืฉื™ื“ืŸ ืžื™ื ื™ื, ืึทื ื˜ื•ื•ื™ืงืœื•ื ื’ ื“ืขืจืคืึทืจื•ื ื’ ืื•ืŸ ื“ืขืจืคืึทืจื•ื ื’ ืื™ืŸ ืึท ืงืึธืœืขืงื˜ื™ื•ื•, ื•ื•ืึธืก ืื™ื– ื•ื•ื™ื›ื˜ื™ืง ืื™ืŸ ืื•ื ื“ื–ืขืจ ืฆื™ื™ื˜.

ืžื™ืจ ื•ื•ืขืœืŸ ื ืขืžืขืŸ ืื•ื ื“ื–ืขืจ ืึทืจื‘ืขื˜ ืคื•ืŸ ื“ืึธืจื˜. ืขืก ืื™ื– ื’ืขืจื•ืคืŸ "ื˜ื™ื˜ืึทื ื™ืง". ื“ืขืจ ืฆื•ืฉื˜ืึทื ื“ ืื™ื– ื“ืึธืก: ืคืึธืจื•ื™ืกื–ืึธื’ืŸ ืฆื™ ื™ืขื“ืขืจ ื™ื—ื™ื“ ืžืขื ื˜ืฉ ื•ื•ืขื˜ ื‘ืœื™ื™ึทื‘ื  ืœืขื‘ืŸ. ืื™ืŸ ืึทืœื’ืขืžื™ื™ืŸ, ื“ื™ ืึทืจื‘ืขื˜ ืคื•ืŸ ืึท ืžืขื ื˜ืฉ ื™ื ื•ื•ืึทืœื•ื•ื“ ืื™ืŸ DS ืื™ื– ืงืึทืœืขืงื˜ื™ื ื’ ื“ืึทื˜ืŸ, ืคึผืจืึทืกืขืกื™ื ื’ ืขืก, ื˜ืจื™ื™ื ื™ื ื’ ืึท ืžืึธื“ืขืœ, ืคืึธืจืงืึทืกื˜ื™ื ื’, ืื•ืŸ ืึทื–ื•ื™ ืื•ื™ืฃ. ืื™ืŸ ืงืึทื’ื’ืœืข, ืžื™ืจ ื–ืขื ืขืŸ ืขืจืœื•ื™ื‘ื˜ ืฆื• ื”ืึธืคึผืงืขืŸ ื“ื™ ื“ืึทื˜ืŸ ื–ืึทืžืœื•ื ื’ ื‘ื™ื ืข - ื–ื™ื™ ื–ืขื ืขืŸ ื“ืขืจืœืื ื’ื˜ ืื•ื™ืฃ ื“ืขืจ ืคึผืœืึทื˜ืคืึธืจืžืข. ืžื™ืจ ื“ืึทืจืคึฟืŸ ืฆื• ืึธืคึผืœืึธื“ื™ืจืŸ ื–ื™ื™ ืื•ืŸ ืžื™ืจ ืงืขื ืขืŸ ืึธื ื”ื™ื™ื‘ืŸ!

ืื™ืจ ืงืขื ืขืŸ ื˜ืึธืŸ ื“ืึธืก ื•ื•ื™ ื’ื™ื™ื˜:

ื“ื™ ื“ืึทื˜ืึท ืงื•ื•ื™ื˜ืœ ื›ึผื•ืœืœ ื˜ืขืงืขืก ื•ื•ืึธืก ืึทื ื˜ื”ืึทืœื˜ืŸ ื“ืึทื˜ืŸ

ื“ื™ื™ืŸ ืขืจืฉื˜ืขืจ ืฉืจื™ื˜ ืื™ืŸ ื“ืึทื˜ืึท ื•ื•ื™ืกื ืฉืึทืคึฟื˜. ื˜ื™ื˜ืึทื ื™ืง

ื“ื™ื™ืŸ ืขืจืฉื˜ืขืจ ืฉืจื™ื˜ ืื™ืŸ ื“ืึทื˜ืึท ื•ื•ื™ืกื ืฉืึทืคึฟื˜. ื˜ื™ื˜ืึทื ื™ืง

ืžื™ืจ ื“ืึทื•ื ืœืึธื•ื“ื™ื“ ื“ื™ ื“ืึทื˜ืŸ, ืฆื•ื’ืขื’ืจื™ื™ื˜ ืื•ื ื“ื–ืขืจ ื“ื–ืฉื•ืคึผื™ื˜ืขืจ ื ืึธื•ื˜ื‘ื•ืงืก ืื•ืŸ ...

ืจื’ืข ืฉืจื™ื˜

ื•ื•ื™ ื˜ืึธืŸ ืžื™ืจ ืื™ืฆื˜ ืœืึธื“ืŸ ื“ื™ ื“ืึทื˜ืŸ?

ืขืจืฉื˜ืขืจ, ืœืึธืžื™ืจ ืึทืจื™ื™ึทื ืคื™ืจ ื“ื™ ื ื™ื™ื˜ื™ืง ืœื™ื™ื‘ืจืขืจื™ื–:

import pandas as pd
import numpy as np

ืคึผืึทื ื“ืึทืก ื•ื•ืขื˜ ืœืึธื–ืŸ ืื•ื ื“ื– ืฆื• ืึธืคึผืœืึธื“ื™ืจืŸ .ืงืกื•ื• ื˜ืขืงืขืก ืคึฟืึทืจ ื•ื•ื™ื™ึทื˜ืขืจ ืคึผืจืึทืกืขืกื™ื ื’.

ื ืึทืžืคึผื™ ืื™ื– ื“ืืจืฃ ืฆื• ืคืึธืจืฉื˜ืขืœืŸ ืื•ื ื“ื–ืขืจ ื“ืึทื˜ืŸ ื˜ื™ืฉ ื•ื•ื™ ืึท ืžืึทื˜ืจื™ืฅ ืžื™ื˜ ื ื•ืžืขืจืŸ.
ื’ื™ื™ ื•ื•ื™ื™ื˜ืขืจ. ืœืึธืžื™ืจ ื ืขืžืขืŸ ื“ื™ ื˜ืขืงืข train.csv ืื•ืŸ ืฆื•ืคึฟืขืœื™ืงืขืจ ืขืก ืฆื• ืื•ื ื“ื–:

dataset = pd.read_csv('train.csv')

ืžื™ืจ ื•ื•ืขืœืŸ ืึธืคึผืฉื™ืงืŸ ืฆื• ืื•ื ื“ื–ืขืจ train.csv ื“ืึทื˜ืŸ ืกืขืœืขืงืฆื™ืข ื“ื•ืจืš ื“ื™ ื“ืึทื˜ืึทืกืขื˜ ื‘ื™ื™ึทื˜ืขื•ื•ื“ื™ืง. ื–ืืœ ืก ื–ืขืŸ ื•ื•ืึธืก ืื™ื– ื“ืึธืจื˜:

dataset.head()

ื“ื™ื™ืŸ ืขืจืฉื˜ืขืจ ืฉืจื™ื˜ ืื™ืŸ ื“ืึทื˜ืึท ื•ื•ื™ืกื ืฉืึทืคึฟื˜. ื˜ื™ื˜ืึทื ื™ืง

ื“ื™ ืงืึธืคึผ () ืคื•ื ืงืฆื™ืข ืึทืœืึทื•ื– ืื•ื ื“ื– ืฆื• ืงื•ืงืŸ ืื™ืŸ ื“ื™ ืขืจืฉื˜ืขืจ ื‘ื™ืกืœ ืจืึธื•ื– ืคื•ืŸ ืึท ื“ืึทื˜ืึทืคืจืึทืžืข.

ื“ื™ ืกืขืจื•ื•ื™ื™ื•ื•ื“ ืฉืคืืœื˜ืŸ ื–ืขื ืขืŸ ืคึผื•ื ืงื˜ ืื•ื ื“ื–ืขืจ ืจืขื–ื•ืœื˜ืึทื˜ืŸ, ื•ื•ืึธืก ื–ืขื ืขืŸ ื‘ืืงืื ื˜ ืื™ืŸ ื“ืขื ื“ืึทื˜ืึทืคืจืึทืžืข. ืคึฟืึทืจ ื“ื™ ืึทืจื‘ืขื˜ ืงืฉื™ื, ืžื™ืจ ื“ืึทืจืคึฟืŸ ืฆื• ืคืึธืจื•ื™ืกื–ืึธื’ืŸ ื“ื™ ืกืขืจื•ื•ื™ื™ื•ื•ื“ ื–ื™ื™ึทืœ ืคึฟืึทืจ test.csv ื“ืึทื˜ืŸ. ื“ื™ ื“ืึทื˜ืŸ ืกื˜ืึธืจื– ืื™ื ืคึฟืึธืจืžืึทืฆื™ืข ื•ื•ืขื’ืŸ ืื ื“ืขืจืข ืคึผืึทืกืึทื ื“ื–ืฉืขืจื– ืคื•ืŸ ื“ื™ ื˜ื™ื˜ืึทื ื™ืง, ืคึฟืึทืจ ื•ื•ืึธืก ืžื™ืจ, ืกืึทืœื•ื•ื™ื ื’ ื“ื™ ืคึผืจืึธื‘ืœืขื, ื˜ืึธืŸ ื ื™ื˜ ื•ื•ื™ืกืŸ ื“ื™ ืจืขื–ื•ืœื˜ืึทื˜.

ืึทื–ื•ื™, ืœืึธื–ืŸ ืื•ื ื“ื– ื˜ื™ื™ืœืŸ ืื•ื ื“ื–ืขืจ ื˜ื™ืฉ ืื™ืŸ ืึธืคืขื ื’ื™ืง ืื•ืŸ ืคืจื™ื™ึท ื“ืึทื˜ืŸ. ืึทืœืฅ ืื™ื– ืคึผืฉื•ื˜ ื“ืึธ. ืึธืคืขื ื’ื™ืง ื“ืึทื˜ืŸ ื–ืขื ืขืŸ ื“ื™ ื“ืึทื˜ืŸ ื•ื•ืึธืก ืึธืคืขื ื’ืขืŸ ืื•ื™ืฃ ื“ื™ ืคืจื™ื™ึท ื“ืึทื˜ืŸ ื•ื•ืึธืก ื–ืขื ืขืŸ ืื™ืŸ ื“ื™ ืจืขื–ื•ืœื˜ืึทื˜ืŸ. ืื•ืžืึธืคึผื”ืขื ื’ื™ืง ื“ืึทื˜ืŸ ื–ืขื ืขืŸ ื“ื™ ื“ืึทื˜ืŸ ื•ื•ืึธืก ื•ื•ื™ืจืงืŸ ื“ื™ ืจืขื–ื•ืœื˜ืึทื˜.

ืคึฟืึทืจ ื‘ื™ื™ึทืฉืคึผื™ืœ, ืžื™ืจ ื”ืึธื‘ืŸ ื“ื™ ืคืืœื’ืขื ื“ืข ื“ืึทื˜ืŸ ืฉื˜ืขืœืŸ:

"ื•ื•ืึธื•ื•ืึท ื’ืขืœืขืจื ื˜ ืงืึธืžืคึผื™ื•ื˜ืขืจ ื•ื•ื™ืกื ืฉืึทืคึฟื˜ - ื ื™ื™ืŸ.
ื•ื•ืึธื•ื•ืึท ื‘ืืงื•ืžืขืŸ ืึท 2 ืื™ืŸ ืงืึธืžืคึผื™ื•ื˜ืขืจ ื•ื•ื™ืกื ืฉืึทืคึฟื˜.

ื“ืขืจ ื’ืจืึทื“ ืื™ืŸ ืงืึธืžืคึผื™ื•ื˜ืขืจ ื•ื•ื™ืกื ืฉืึทืคึฟื˜ ื“ืขืคึผืขื ื“ืก ืื•ื™ืฃ ื“ื™ ืขื ื˜ืคืขืจ ืฆื• ื“ื™ ืงืฉื™ื: ื”ืื˜ ื•ื•ืึธื•ื•ืึท ื’ืขืœืขืจื ื˜ ืงืึธืžืคึผื™ื•ื˜ืขืจ ื•ื•ื™ืกื ืฉืึทืคึฟื˜? ืื™ื– ืขืก ืงืœืึธืจ? ืœืืžื™ืจ ื•ื•ื™ื™ื˜ืขืจ ื’ื™ื™ืŸ, ืžื™ืจ ื–ืขื ืขืŸ ืฉื•ื™ืŸ ื ืขื ื˜ืขืจ ืฆื•ื ืฆื™ืœ!

ื“ื™ ื‘ืขืงืึทื‘ืึธืœืขื“ื™ืง ื‘ื™ื™ึทื˜ืขื•ื•ื“ื™ืง ืคึฟืึทืจ ืคืจื™ื™ึท ื“ืึทื˜ืŸ ืื™ื– X. ืคึฟืึทืจ ืึธืคืขื ื’ื™ืง ื“ืึทื˜ืŸ, ื™.

ืžื™ืจ ื˜ืึธืŸ ื“ื™ ืคืืœื’ืขื ื“ืข:

X = dataset.iloc[ : , 2 : ]
y = dataset.iloc[ : , 1 : 2 ]

ื•ื•ืืก ืื™ื– ื“ืืก? ืžื™ื˜ ื“ื™ ืคึฟื•ื ืงืฆื™ืข iloc [:, 2: ] ืžื™ืจ ื–ืึธื’ืŸ Python: ืื™ืš ื•ื•ื™ืœืŸ ืฆื• ื–ืขืŸ ืื™ืŸ ื“ื™ ื‘ื™ื™ึทื˜ืขื•ื•ื“ื™ืง X ื“ื™ ื“ืึทื˜ืŸ ืกื˜ืึทืจื˜ื™ื ื’ ืคื•ืŸ ื“ื™ ืจื’ืข ื–ื™ื™ึทืœ (ื™ื ืงืœื•ืกื™ื•ื• ืื•ืŸ ืฆื•ื’ืขืฉื˜ืขืœื˜ ืึทื– ืงืึทื•ื ื˜ื™ื ื’ ืกื˜ืึทืจืฅ ืคื•ืŸ ื ื•ืœ). ืื™ืŸ ื“ื™ ืจื’ืข ืฉื•ืจื” ืžื™ืจ ื–ืึธื’ืŸ ืึทื– ืžื™ืจ ื•ื•ื™ืœืŸ ืฆื• ื–ืขืŸ ื“ื™ ื“ืึทื˜ืŸ ืื™ืŸ ื“ืขืจ ืขืจืฉื˜ืขืจ ื–ื™ื™ึทืœ.

[ ืึท: ื‘, ื’: ื“] ืื™ื– ื“ื™ ืงืึทื ืกื˜ืจืึทืงืฉืึทืŸ ืคื•ืŸ ื•ื•ืึธืก ืžื™ืจ ื ื•ืฆืŸ ืื™ืŸ ืงืœืึทืžืขืจืŸ. ืื•ื™ื‘ ืื™ืจ ื˜ืึธืŸ ื ื™ื˜ ืกืคึผืขืฆื™ืคื™ืฆื™ืจืŸ ืงื™ื™ืŸ ื•ื•ืขืจื™ืึทื‘ืึทืœื–, ื–ื™ื™ ื•ื•ืขืœืŸ ื–ื™ื™ืŸ ื’ืขืจืื˜ืขื•ื•ืขื˜ ื•ื•ื™ ืคืขืœื™ืงื™ื™ึทื˜. ื“ืึธืก ืื™ื–, ืžื™ืจ ืงืขื ืขืŸ ืกืคึผืขืฆื™ืคื™ืฆื™ืจืŸ [:,: ื“] ืื•ืŸ ื“ืขืžืึธืœื˜ ืžื™ืจ ื•ื•ืขืœืŸ ื‘ืึทืงื•ืžืขืŸ ืึทืœืข ื“ื™ ืฉืคืืœื˜ืŸ ืื™ืŸ ื“ื™ ื“ืึทื˜ืึทืคืจืึทืžืข, ืึทื—ื•ืฅ ื“ื™ ื•ื•ืึธืก ื’ื™ื™ืŸ ืคึฟื•ืŸ ื ื•ืžืขืจ ื“ ืื•ืŸ ื•ื•ื™ื™ื˜ืขืจ. ื“ื™ ื•ื•ืขืจื™ืึทื‘ืึทืœื– ืึท ืื•ืŸ ื‘ ื“ืขืคื™ื ื™ืจืŸ ืกื˜ืจื™ื ื’ืก, ืึธื‘ืขืจ ืžื™ืจ ื“ืึทืจืคึฟืŸ ื–ื™ื™ ืึทืœืข, ืึทื–ื•ื™ ืžื™ืจ ืœืึธื–ืŸ ื“ืขื ื•ื•ื™ ืคืขืœื™ืงื™ื™ึทื˜.

ืœืึธืžื™ืจ ื–ืขืŸ ื•ื•ืึธืก ืžื™ืจ ื”ืึธื‘ืŸ:

X.head()

ื“ื™ื™ืŸ ืขืจืฉื˜ืขืจ ืฉืจื™ื˜ ืื™ืŸ ื“ืึทื˜ืึท ื•ื•ื™ืกื ืฉืึทืคึฟื˜. ื˜ื™ื˜ืึทื ื™ืง

y.head()

ื“ื™ื™ืŸ ืขืจืฉื˜ืขืจ ืฉืจื™ื˜ ืื™ืŸ ื“ืึทื˜ืึท ื•ื•ื™ืกื ืฉืึทืคึฟื˜. ื˜ื™ื˜ืึทื ื™ืง

ืื™ืŸ ืกื“ืจ ืฆื• ืคืึทืจืคึผืึธืฉืขื˜ืขืจืŸ ื“ืขื ื‘ื™ืกืœ ืœืขืงืฆื™ืข, ืžื™ืจ ื•ื•ืขืœืŸ ื‘ืึทื–ื™ื™ึทื˜ื™ืงืŸ ืฉืคืืœื˜ืŸ ื•ื•ืึธืก ื“ืึทืจืคืŸ ืกืคึผืขืฆื™ืขืœ ื–ืึธืจื’ ืึธื“ืขืจ ื˜ืึธืŸ ื ื™ื˜ ื•ื•ื™ืจืงืŸ ื“ื™ ื ื™ืฆืœ. ื–ื™ื™ ืึทื ื˜ื”ืึทืœื˜ืŸ ื“ืึทื˜ืŸ ืคื•ืŸ ื˜ื™ืคึผ str.

count = ['Name', 'Ticket', 'Cabin', 'Embarked']
X.drop(count, inplace=True, axis=1)

ืกื•ืคึผืขืจ! ื–ืืœ ืก ืžืึทืš ืื•ื™ืฃ ืฆื• ื“ืขืจ ื•ื•ื™ื™ึทื˜ืขืจ ืฉืจื™ื˜.

ื“ืจื™ื˜ ืฉืจื™ื˜

ื“ืึธ ืžื™ืจ ื“ืึทืจืคึฟืŸ ืฆื• ืขื ืงืึธื•ื“ ืื•ื ื“ื–ืขืจ ื“ืึทื˜ืŸ ืึทื–ื•ื™ ืึทื– ื“ื™ ืžืึทืฉื™ืŸ ื‘ืขืกืขืจ ืคึฟืึทืจืฉื˜ื™ื™ืŸ ื•ื•ื™ ื“ื™ ื“ืึทื˜ืŸ ืึทืคืขืงืฅ ื“ื™ ืจืขื–ื•ืœื˜ืึทื˜. ืึธื‘ืขืจ ืžื™ืจ ื•ื•ืขืœืŸ ื ื™ืฉื˜ ืขื ืงืึธื•ื“ ืึทืœืฅ, ืึธื‘ืขืจ ื‘ืœื•ื™ื– ื“ื™ ืกื˜ืจ ื“ืึทื˜ืŸ ื•ื•ืึธืก ืžื™ืจ ืœื™ื ืงืก. ื–ื™ื™ึทืœ "ืกืขืงืก". ื•ื•ื™ ื˜ืึธืŸ ืžื™ืจ ื•ื•ื™ืœืŸ ืฆื• ืงืึธื“? ืœืึธืžื™ืจ ืคืึธืจืฉื˜ืขืœืŸ ื“ืึทื˜ืŸ ื•ื•ืขื’ืŸ ืึท ืžืขื ื˜ืฉ 'ืก ื“ื–ืฉืขื ื“ืขืจ ื•ื•ื™ ืึท ื•ื•ืขืงื˜ืึธืจ: 10 - ื–ื›ืจ, 01 - ื•ื•ื™ื™ึทื‘ืœืขืš.

ืขืจืฉื˜ืขืจ, ืœืึธืžื™ืจ ื’ืขืจ ืื•ื ื“ื–ืขืจ ื˜ื™ืฉืŸ ืื™ืŸ ืึท NumPy ืžืึทื˜ืจื™ืฅ:

X = np.array(X)
y = np.array(y)

ืื•ืŸ ืื™ืฆื˜ ืœืึธืžื™ืจ ืงื•ืงืŸ:

from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder

ct = ColumnTransformer(transformers=[('encoder', OneHotEncoder(), [1])],
                       remainder='passthrough')
X = np.array(ct.fit_transform(X))

ื“ื™ ืกืงืœืขืึทืจืŸ ื‘ื™ื‘ืœื™ืึธื˜ืขืง ืื™ื– ืึทื–ืึท ืึท ืงื™ืœ ื‘ื™ื‘ืœื™ืึธื˜ืขืง ื•ื•ืึธืก ืึทืœืึทื•ื– ืื•ื ื“ื– ืฆื• ื˜ืึธืŸ ื’ืึทื ืฅ ืึทืจื‘ืขื˜ ืื™ืŸ ื“ืึทื˜ืึท ื•ื•ื™ืกื ืฉืึทืคึฟื˜. ืขืก ื›ึผื•ืœืœ ืึท ื’ืจื•ื™ืก ื ื•ืžืขืจ ืคื•ืŸ ื˜ืฉื™ืงืึทื•ื•ืข ืžืึทืฉื™ืŸ ืœืขืจื ืขืŸ ืžืึธื“ืขืœืก ืื•ืŸ ืื•ื™ืš ืึทืœืึทื•ื– ืื•ื ื“ื– ืฆื• ืฆื•ื’ืจื™ื™ื˜ืŸ ื“ืึทื˜ืŸ.

OneHotEncoder ื•ื•ืขื˜ ืœืึธื–ืŸ ืื•ื ื“ื– ืฆื• ืขื ืงืึธื•ื“ ื“ื™ ื“ื–ืฉืขื ื“ืขืจ ืคื•ืŸ ืึท ืžืขื ื˜ืฉ ืื™ืŸ ื“ืขืจ ืคืึทืจื˜ืจืขื˜ื•ื ื’, ื•ื•ื™ ืžื™ืจ ื“ื™ืกืงืจื™ื™ื‘ื“. 2 ืงืœืืกืŸ ื•ื•ืขื˜ ื–ื™ื™ืŸ ื‘ืืฉืืคืŸ: ื–ื›ืจ, ื•ื•ื™ื™ึทื‘ืœืขืš. ืื•ื™ื‘ ื“ืขืจ ืžืขื ื˜ืฉ ืื™ื– ืึท ืžืขื ื˜ืฉ, 1 ื•ื•ืขื˜ ื–ื™ื™ืŸ ื’ืขืฉืจื™ื‘ืŸ ืื™ืŸ ื“ื™ "ื–ื›ืจ" ื–ื™ื™ึทืœ, ืื•ืŸ 0 ืื™ืŸ ื“ื™ "ื•ื•ื™ื™ึทื‘ืœืขืš" ื–ื™ื™ึทืœ, ืจื™ืกืคึผืขืงื˜ื™ื•ื•ืœื™.

ื ืึธืš OneHotEncoder () ืขืก ืื™ื– [1] - ื“ืึธืก ืžื™ื˜ืœ ืึทื– ืžื™ืจ ื•ื•ื™ืœืŸ ืฆื• ืขื ืงืึธื•ื“ ื–ื™ื™ึทืœ ื ื•ืžืขืจ 1 (ืงืึทื•ื ื˜ื™ื ื’ ืคึฟื•ืŸ ื ื•ืœ).

ื™ื‘ืขืจ. ื–ืืœ ืก ืžืึทืš ืึทืคึฟื™ืœื• ื•ื•ื™ื™ึทื˜ืขืจ!

ื•ื•ื™ ืึท ื”ืขืจืฉืŸ, ื“ืึธืก ื›ืึทืคึผืึทื ื– ืึทื– ืขื˜ืœืขื›ืข ื“ืึทื˜ืŸ ื–ืขื ืขืŸ ืœื™ื™ื“ื™ืง (ื“ืึธืก ืื™ื–, NaN - ื ื™ื˜ ืึท ื ื•ืžืขืจ). ืคึฟืึทืจ ื‘ื™ื™ึทืฉืคึผื™ืœ, ืขืก ืื™ื– ืื™ื ืคึฟืึธืจืžืึทืฆื™ืข ื•ื•ืขื’ืŸ ืึท ืžืขื ื˜ืฉ: ื–ื™ื™ืŸ ื ืึธืžืขืŸ, ื“ื–ืฉืขื ื“ืขืจ. ืื‘ืขืจ ืขืก ืื™ื– ืงื™ื™ืŸ ืื™ื ืคึฟืึธืจืžืึทืฆื™ืข ื•ื•ืขื’ืŸ ื–ื™ื™ืŸ ืขืœื˜ืขืจ. ืื™ืŸ ื“ืขื ืคืึทืœ, ืžื™ืจ ื•ื•ืขืœืŸ ืฆื•ืœื™ื™ื’ืŸ ื“ื™ ืคืืœื’ืขื ื“ืข ืื•ืคึฟืŸ: ืžื™ืจ ื•ื•ืขืœืŸ ื’ืขืคึฟื™ื ืขืŸ ื“ื™ ืึทืจื™ื˜ืžืขื˜ื™ืง ืžื™ื˜ืœ ืื™ื‘ืขืจ ืึทืœืข ืฉืคืืœื˜ืŸ ืื•ืŸ ืื•ื™ื‘ ืขื˜ืœืขื›ืข ื“ืึทื˜ืŸ ืคืขืœื ื“ื™ืง ืื™ืŸ ื“ื™ ื–ื™ื™ึทืœ, ืžื™ืจ ื•ื•ืขืœืŸ ืคึผืœืึธืžื‘ื™ืจืŸ ื“ื™ ืคึผืึธืกืœ ืžื™ื˜ ื“ื™ ืึทืจื™ื˜ืžืขื˜ื™ืง ืžื™ื˜ืœ.

from sklearn.impute import SimpleImputer
imputer = SimpleImputer(missing_values=np.nan, strategy='mean')
imputer.fit(X)
X = imputer.transform(X)

ืื™ืฆื˜ ืœืึธื–ืŸ ืื•ื ื“ื– ื ืขืžืขืŸ ืื™ืŸ ื—ืฉื‘ื•ืŸ ืึทื– ืกื™ื˜ื•ืึทื˜ื™ืึธื ืก ืคึผืึทืกื™ืจืŸ ื•ื•ืขืŸ ื“ื™ ื“ืึทื˜ืŸ ื–ืขื ืขืŸ ื–ื™ื™ืขืจ ื’ืจื•ื™ืก. ืขื˜ืœืขื›ืข ื“ืึทื˜ืŸ ื–ืขื ืขืŸ ืื™ืŸ ื“ื™ ืžืขื”ืึทืœืขืš [0:1], ื‘ืฉืขืช ืขื˜ืœืขื›ืข ืงืขืŸ ื’ื™ื™ืŸ ื•ื•ื™ื™ึทื˜ืขืจ ืคื•ืŸ ื”ื•ื ื“ืขืจื˜ืขืจ ืื•ืŸ ื˜ื•ื™ื–ื ื˜ืขืจ. ืฆื• ืขืœื™ืžื™ื ื™ืจืŸ ืึทื–ืึท ืฆืขื•ื•ืึธืจืคืŸ ืื•ืŸ ืฆื• ืžืึทื›ืŸ ื“ื™ ืงืึธืžืคึผื™ื•ื˜ืขืจ ืžืขืจ ืคึผื™ื ื˜ืœืขืš ืื™ืŸ ื–ื™ื™ึทืŸ ื—ืฉื‘ื•ื ื•ืช, ืžื™ืจ ื™ื‘ืขืจืงื•ืงืŸ ื“ื™ ื“ืึทื˜ืŸ ืื•ืŸ ื•ื•ืึธื’ ืขืก. ื–ืืœ ืึทืœืข ื ื•ืžืขืจืŸ ื ื™ื˜ ื™ืงืกื™ื“ ื“ืจื™ื™ึท. ืฆื• ื˜ืึธืŸ ื“ืึธืก, ืžื™ืจ ื•ื•ืขืœืŸ ื ื•ืฆืŸ ื“ื™ StandardScaler ืคื•ื ืงืฆื™ืข.

from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X[:, 2:] = sc.fit_transform(X[:, 2:])

ืื™ืฆื˜ ืื•ื ื“ื–ืขืจ ื“ืึทื˜ืŸ ืงื•ืงื˜ ื•ื•ื™ ื“ืึธืก:

ื“ื™ื™ืŸ ืขืจืฉื˜ืขืจ ืฉืจื™ื˜ ืื™ืŸ ื“ืึทื˜ืึท ื•ื•ื™ืกื ืฉืึทืคึฟื˜. ื˜ื™ื˜ืึทื ื™ืง

ืงืœืึทืก. ืžื™ืจ ื–ืขื ืขืŸ ืฉื•ื™ืŸ ื ืึธืขื ื˜ ืฆื• ืื•ื ื“ื–ืขืจ ืฆื™ืœ!

ืฉืจื™ื˜ ืคื™ืจ

ื–ืืœ ืก ื‘ืึทืŸ ืื•ื ื“ื–ืขืจ ืขืจืฉื˜ืขืจ ืžืึธื“ืขืœ! ืคึฟื•ืŸ ื“ืขืจ ืกืงืœืขืึทืจืŸ ื‘ื™ื‘ืœื™ืึธื˜ืขืง ืงืขื ืขืŸ ืžื™ืจ ื’ืขืคึฟื™ื ืขืŸ ืึท ืจื™ื–ื™ืง ื ื•ืžืขืจ ืคื•ืŸ ื˜ืฉื™ืงืึทื•ื•ืข ื–ืื›ืŸ. ืื™ืš ื’ืขื•ื•ืขื ื“ื˜ ื“ื™ ื’ืจืึทื“ื™ืขื ื˜ ื‘ืึธืึธืกื˜ื™ื ื’ ืงืœืึทืกืกื™ืคื™ืขืจ ืžืึธื“ืขืœ ืฆื• ื“ืขื ืคึผืจืึธื‘ืœืขื. ืžื™ืจ ื ื•ืฆืŸ ื ืงืœืึทืกืึทืคื™ื™ืขืจ ื•ื•ื™ื™ึทืœ ืื•ื ื“ื–ืขืจ ืึทืจื‘ืขื˜ ืื™ื– ืึท ืงืœืึทืกืึทืคืึทืงื™ื™ืฉืึทืŸ ืึทืจื‘ืขื˜. ื“ื™ ืคึผืจืึธื’ื ืึธืกื™ืก ื–ืึธืœ ื–ื™ื™ืŸ ืึทืกื™ื™ื ื“ ืฆื• 1 (ืกืขืจื•ื•ื™ื™ื•ื•ื“) ืึธื“ืขืจ 0 (ื ื™ืฉื˜ ื‘ืœื™ื™ึทื‘ื  ืœืขื‘ืŸ).

from sklearn.ensemble import GradientBoostingClassifier
gbc = GradientBoostingClassifier(learning_rate=0.5, max_depth=5, n_estimators=150)
gbc.fit(X, y)

ื“ื™ ืคึผืึทืกื™ืง ืคื•ื ืงืฆื™ืข ื“ืขืจืฆื™ื™ืœื˜ Python: ืœืึธื–ืŸ ื“ื™ ืžืึธื“ืขืœ ืงื•ืงืŸ ืคึฟืึทืจ ื“ื™ืคึผืขื ื“ืึทื ืกื™ื– ืฆื•ื•ื™ืฉืŸ X ืื•ืŸ y.

ื•ื•ื™ื™ื ื™ืงืขืจ ื•ื•ื™ ืึท ืจื’ืข ืื•ืŸ ื“ื™ ืžืึธื“ืขืœ ืื™ื– ื’ืจื™ื™ื˜.

ื“ื™ื™ืŸ ืขืจืฉื˜ืขืจ ืฉืจื™ื˜ ืื™ืŸ ื“ืึทื˜ืึท ื•ื•ื™ืกื ืฉืึทืคึฟื˜. ื˜ื™ื˜ืึทื ื™ืง

ื•ื•ื™ ืฆื• ืฆื•ืœื™ื™ื’ืŸ ืขืก? ืžื™ืจ ื•ื•ืขืœืŸ ืื™ืฆื˜ ื–ืขืŸ!

ืฉืจื™ื˜ ืคื™ื ืฃ. ืžืกืงื ื

ืื™ืฆื˜ ืžื™ืจ ื“ืึทืจืคึฟืŸ ืฆื• ืžืึทืกืข ืึท ื˜ื™ืฉ ืžื™ื˜ ืื•ื ื“ื–ืขืจ ืคึผืจืึธื‘ืข ื“ืึทื˜ืŸ ืคึฟืึทืจ ื•ื•ืึธืก ืžื™ืจ ื“ืึทืจืคึฟืŸ ืฆื• ืžืึทื›ืŸ ืึท ืคืึธืจื•ื™ืกื–ืึธื’ืŸ. ืžื™ื˜ ื“ืขื ื˜ื™ืฉ ืžื™ืจ ื•ื•ืขืœืŸ ื˜ืึธืŸ ืึทืœืข ื“ื™ ื–ืขืœื‘ืข ืึทืงืฉืึทื ื– ื•ื•ืึธืก ืžื™ืจ ื”ืึธื‘ืŸ ื’ืขื˜ืืŸ ืคึฟืึทืจ X.

X_test = pd.read_csv('test.csv', index_col=0)

count = ['Name', 'Ticket', 'Cabin', 'Embarked']
X_test.drop(count, inplace=True, axis=1)

X_test = np.array(X_test)

from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
ct = ColumnTransformer(transformers=[('encoder', OneHotEncoder(), [1])],
                       remainder='passthrough')
X_test = np.array(ct.fit_transform(X_test))

from sklearn.impute import SimpleImputer
imputer = SimpleImputer(missing_values=np.nan, strategy='mean')
imputer.fit(X_test)
X_test = imputer.transform(X_test)

from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_test[:, 2:] = sc.fit_transform(X_test[:, 2:])

ืœืึธืžื™ืจ ืฆื•ืœื™ื™ื’ืŸ ืื•ื ื“ื–ืขืจ ืžืึธื“ืขืœ ืื™ืฆื˜!

gbc_predict = gbc.predict(X_test)

ืึทืœืข. ืžื™ืจ ื’ืขืžืื›ื˜ ืึท ืคืึธืจื•ื™ืกื–ืึธื’ืŸ. ืื™ืฆื˜ ืขืก ื“ืึทืจืฃ ื–ื™ื™ืŸ ืจืขืงืึธืจื“ืขื“ ืื™ืŸ ืงืกื•ื• ืื•ืŸ ื’ืขืฉื™ืงื˜ ืฆื• ื“ื™ ื•ื•ืขื‘ื–ื™ื™ื˜ืœ.

np.savetxt('my_gbc_predict.csv', gbc_predict, delimiter=",", header = 'Survived')

ื’ืจื™ื™ื˜. ืžื™ืจ ื”ืึธื‘ืŸ ื‘ืึทืงื•ืžืขืŸ ืึท ื˜ืขืงืข ืžื™ื˜ ืคึฟืึธืจื•ื™ืกื–ืึธื’ืŸ ืคึฟืึทืจ ื™ืขื“ืขืจ ืคึผืึทืกืึทื–ืฉื™ืจ. ืึทืœืข ื•ื•ืึธืก ื‘ืœื™ื™ื‘ื˜ ืื™ื– ืฆื• ืฆื•ืคึฟืขืœื™ืงืขืจ ื“ื™ ืกืึทืœื•ืฉืึทื ื– ืฆื• ื“ืขื ื•ื•ืขื‘ื–ื™ื™ื˜ืœ ืื•ืŸ ื‘ืึทืงื•ืžืขืŸ ืึทืŸ ืึทืกืขืกืžืึทื ื˜ ืคื•ืŸ ื“ื™ ืคืึธืจื•ื™ืกื–ืึธื’ืŸ. ืึทื–ืึท ืึท ืคึผืจื™ืžื™ื˜ื™ื•ื• ืœื™ื™ื–ื•ื ื’ ื’ื™ื˜ ื ื™ื˜ ื‘ืœื•ื™ื– 74% ืคื•ืŸ ืจื™ื›ื˜ื™ืง ืขื ื˜ืคึฟืขืจืก ืื•ื™ืฃ ื“ืขื ืฆื™ื‘ื•ืจ, ืึธื‘ืขืจ ืื•ื™ืš ืขื˜ืœืขื›ืข ื™ืžืคึผืึทื˜ืึทืก ืื™ืŸ ื“ืึทื˜ืึท ื•ื•ื™ืกื ืฉืึทืคึฟื˜. ื“ื™ ืžืขืจืกื˜ ื˜ืฉื™ืงืึทื•ื•ืข ืงืขื ืขืŸ ืฉืจื™ื™ึทื‘ืŸ ืฆื• ืžื™ืจ ืื™ืŸ ืคึผืจื™ื•ื•ืึทื˜ ืึทืจื˜ื™ืงืœืขืŸ ืื™ืŸ ืงื™ื™ืŸ ืฆื™ื™ื˜ ืื•ืŸ ืคืจืขื’ืŸ ืึท ืงืฉื™ื. ื“ืึทื ืง ืฆื• ืึทืœืข!

ืžืงื•ืจ: www.habr.com

ืœื™ื™ื’ืŸ ืึท ื‘ืึทืžืขืจืงื•ื ื’