ื”ืืฅ ื ื™ืชื•ื— ื ืชื•ื ื™ื ื—ืงืจื ื™ ื‘ืืžืฆืขื•ืช ืกืคืจื™ื™ืช ืคืจื•ืคื™ืœื™ ืคื ื“ื•ืช

ื”ืฆืขื“ ื”ืจืืฉื•ืŸ ื›ืืฉืจ ืžืชื—ื™ืœื™ื ืœืขื‘ื•ื“ ืขื ืžืขืจืš ื ืชื•ื ื™ื ื—ื“ืฉ ื”ื•ื ืœื”ื‘ื™ืŸ ืื•ืชื•. ื›ื“ื™ ืœืขืฉื•ืช ื–ืืช, ืืชื” ืฆืจื™ืš, ืœืžืฉืœ, ืœื‘ืจืจ ืืช ื˜ื•ื•ื—ื™ ื”ืขืจื›ื™ื ื”ืžืงื•ื‘ืœื™ื ืขืœ ื”ืžืฉืชื ื™ื, ืกื•ื’ื™ื”ื, ื•ื›ืŸ ืœื‘ืจืจ ืขืœ ืžืกืคืจ ื”ืขืจื›ื™ื ื”ื—ืกืจื™ื.

ืกืคืจื™ื™ืช ื”ืคื ื“ื•ืช ืžืกืคืงืช ืœื ื• ื›ืœื™ื ืฉื™ืžื•ืฉื™ื™ื ืจื‘ื™ื ืœื‘ื™ืฆื•ืข ื ื™ืชื•ื— ื ืชื•ื ื™ื ื—ืงืจื ื™ (EDA). ืื‘ืœ ืœืคื ื™ ืฉืืชื” ืžืฉืชืžืฉ ื‘ื”ื, ืืชื” ื‘ื“ืจืš ื›ืœืœ ืฆืจื™ืš ืœื”ืชื—ื™ืœ ืขื ืคื•ื ืงืฆื™ื•ืช ื›ืœืœื™ื•ืช ื™ื•ืชืจ ื›ื’ื•ืŸ df.describe(). ืขื ื–ืืช, ื™ืฉ ืœืฆื™ื™ืŸ ื›ื™ ื”ื™ื›ื•ืœื•ืช ืฉืžืกืคืงื•ืช ืคื•ื ืงืฆื™ื•ืช ื›ืืœื” ืžื•ื’ื‘ืœื•ืช, ื•ื”ืฉืœื‘ื™ื ื”ืจืืฉื•ื ื™ื™ื ืฉืœ ื”ืขื‘ื•ื“ื” ืขื ืžืขืจื›ื™ ื ืชื•ื ื™ื ื›ืœืฉื”ื ื‘ืขืช ื‘ื™ืฆื•ืข EDA ื“ื•ืžื™ื ืžืื•ื“ ื–ื” ืœื–ื”.

ื”ืืฅ ื ื™ืชื•ื— ื ืชื•ื ื™ื ื—ืงืจื ื™ ื‘ืืžืฆืขื•ืช ืกืคืจื™ื™ืช ืคืจื•ืคื™ืœื™ ืคื ื“ื•ืช

ืžื—ื‘ืจ ื”ื—ื•ืžืจ ืฉืื ื• ืžืคืจืกืžื™ื ื”ื™ื•ื ืื•ืžืจ ืฉื”ื•ื ืœื ื—ื•ื‘ื‘ ื‘ื™ืฆื•ืข ืคืขื•ืœื•ืช ืฉื—ื•ื–ืจื•ืช ืขืœ ืขืฆืžืŸ. ื›ืชื•ืฆืื” ืžื›ืš, ื‘ื—ื™ืคื•ืฉ ืื—ืจ ื›ืœื™ื ืœื‘ื™ืฆื•ืข ืžื”ื™ืจ ื•ื™ืขื™ืœ ืฉืœ ื ื™ืชื•ื— ื ืชื•ื ื™ื ื—ืงืจื ื™ื™ื, ื”ื•ื ืžืฆื ืืช ื”ืกืคืจื™ื™ื” ืคืจื•ืคื™ืœื™ ืคื ื“ื•ืช. ืชื•ืฆืื•ืช ืขื‘ื•ื“ืชื” ืื™ื ืŸ ืžืชื‘ื˜ืื•ืช ื‘ืฆื•ืจื” ืฉืœ ืื™ื ื“ื™ืงื˜ื•ืจื™ื ื‘ื•ื“ื“ื™ื ืžืกื•ื™ืžื™ื, ืืœื ื‘ืฆื•ืจื” ืฉืœ ื“ื•ื— HTML ืžืคื•ืจื˜ ืœืžื“ื™ ื”ืžื›ื™ืœ ืืช ืจื•ื‘ ื”ืžื™ื“ืข ืขืœ ื”ื ืชื•ื ื™ื ื”ืžื ื•ืชื—ื™ื ืฉืื•ืœื™ ืชืฆื˜ืจืš ืœื“ืขืช ืœืคื ื™ ืฉืชืชื—ื™ืœ ืœืขื‘ื•ื“ ืื™ืชื• ื™ื•ืชืจ ืžืงืจื•ื‘.

ื›ืืŸ ื ืกืชื›ืœ ืขืœ ื”ืชื›ื•ื ื•ืช ืฉืœ ืฉื™ืžื•ืฉ ื‘ืกืคืจื™ื™ืช ืคืจื•ืคื™ืœื™ ื”ืคื ื“ื•ืช ื‘ืืžืฆืขื•ืช ืžืขืจืš ื”ื ืชื•ื ื™ื ืฉืœ Titanic ื›ื“ื•ื’ืžื”.

ื ื™ืชื•ื— ื ืชื•ื ื™ื ื—ืงืจื ื™ ื‘ืืžืฆืขื•ืช ืคื ื“ื•ืช

ื”ื—ืœื˜ืชื™ ืœื”ืชื ืกื•ืช ื‘-Pandas-profiling ืขืœ ืžืขืจืš ื”ื ืชื•ื ื™ื ืฉืœ Titanic ื‘ืฉืœ ืกื•ื’ื™ ื”ื ืชื•ื ื™ื ื”ืฉื•ื ื™ื ืฉื”ื•ื ืžื›ื™ืœ ื•ื ื•ื›ื—ื•ืชื ืฉืœ ืขืจื›ื™ื ื—ืกืจื™ื ื‘ื•. ืื ื™ ืžืืžื™ืŸ ืฉืกืคืจื™ื™ืช ืคืจื•ืคื™ืœื™ ื”ืคื ื“ื•ืช ืžืขื ื™ื™ื ืช ื‘ืžื™ื•ื—ื“ ื‘ืžืงืจื™ื ื‘ื”ื ื”ื ืชื•ื ื™ื ื˜ืจื ื ื•ืงื• ื•ื“ื•ืจืฉืช ืขื™ื‘ื•ื“ ื ื•ืกืฃ ื‘ื”ืชืื ืœืžืืคื™ื™ื ื™ื. ืขืœ ืžื ืช ืœื‘ืฆืข ื‘ื”ืฆืœื—ื” ืขื™ื‘ื•ื“ ื›ื–ื”, ืืชื” ืฆืจื™ืš ืœื“ืขืช ืžืื™ืคื” ืœื”ืชื—ื™ืœ ื•ืœืžื” ืœืฉื™ื ืœื‘. ื–ื” ื”ืžืงื•ื ืฉื‘ื• ื™ื›ื•ืœื•ืช ื™ืฆื™ืจืช ืคืจื•ืคื™ืœื™ ืคื ื“ื•ืช ืžื•ืขื™ืœื•ืช.

ืจืืฉื™ืช, ืื ื• ืžื™ื™ื‘ืื™ื ืืช ื”ื ืชื•ื ื™ื ื•ืžืฉืชืžืฉื™ื ื‘ืคื ื“ื•ืช ื›ื“ื™ ืœืงื‘ืœ ื ืชื•ื ื™ื ืกื˜ื˜ื™ืกื˜ื™ื™ื ืชื™ืื•ืจื™ื™ื:

# ะธะผะฟะพั€ั‚ ะฝะตะพะฑั…ะพะดะธะผั‹ั… ะฟะฐะบะตั‚ะพะฒ
import pandas as pd
import pandas_profiling
import numpy as np

# ะธะผะฟะพั€ั‚ ะดะฐะฝะฝั‹ั…
df = pd.read_csv('/Users/lukas/Downloads/titanic/train.csv')

# ะฒั‹ั‡ะธัะปะตะฝะธะต ะฟะพะบะฐะทะฐั‚ะตะปะตะน ะพะฟะธัะฐั‚ะตะปัŒะฝะพะน ัั‚ะฐั‚ะธัั‚ะธะบะธ
df.describe()

ืœืื—ืจ ื‘ื™ืฆื•ืข ืงื˜ืข ืงื•ื“ ื–ื”, ืชืงื‘ืœ ืืช ืžื” ืฉืžื•ืฆื’ ื‘ืื™ื•ืจ ื”ื‘ื.

ื”ืืฅ ื ื™ืชื•ื— ื ืชื•ื ื™ื ื—ืงืจื ื™ ื‘ืืžืฆืขื•ืช ืกืคืจื™ื™ืช ืคืจื•ืคื™ืœื™ ืคื ื“ื•ืช
ืกื˜ื˜ื™ืกื˜ื™ืงื” ืชื™ืื•ืจื™ืช ื”ืžืชืงื‘ืœืช ื‘ืืžืฆืขื•ืช ื›ืœื™ ืคื ื“ื” ืกื˜ื ื“ืจื˜ื™ื™ื

ืœืžืจื•ืช ืฉื™ืฉ ื›ืืŸ ื”ืจื‘ื” ืžื™ื“ืข ืฉื™ืžื•ืฉื™, ื”ื•ื ืœื ืžื›ื™ืœ ืืช ื›ืœ ืžื” ืฉื™ื”ื™ื” ืžืขื ื™ื™ืŸ ืœื“ืขืช ืขืœ ื”ื ืชื•ื ื™ื ื”ื ื‘ื“ืงื™ื. ืœื“ื•ื’ืžื”, ืืคืฉืจ ืœื”ื ื™ื— ืฉื‘ืžืกื’ืจืช ื ืชื•ื ื™ื, ื‘ืžื‘ื ื” DataFrame, ื™ืฉ 891 ืงื•ื•ื™ื. ืื ื™ืฉ ืฆื•ืจืš ืœื‘ื“ื•ืง ื–ืืช, ื ื“ืจืฉืช ืฉื•ืจืช ืงื•ื“ ื ื•ืกืคืช ื›ื“ื™ ืœืงื‘ื•ืข ืืช ื’ื•ื“ืœ ื”ืžืกื’ืจืช. ืœืžืจื•ืช ืฉื—ื™ืฉื•ื‘ื™ื ืืœื” ืื™ื ื ืขืชื™ืจื™ ืžืฉืื‘ื™ื ื‘ืžื™ื•ื—ื“, ื—ื–ืจื” ืขืœื™ื”ื ื›ืœ ื”ื–ืžืŸ ื—ื™ื™ื‘ืช ืœื‘ื–ื‘ื– ื–ืžืŸ ืฉื›ื ืจืื” ื ื™ืชืŸ ื”ื™ื” ืœื‘ื–ื‘ื– ื˜ื•ื‘ ื™ื•ืชืจ ื‘ื ื™ืงื•ื™ ื”ื ืชื•ื ื™ื.

ื ื™ืชื•ื— ื ืชื•ื ื™ื ื—ืงืจื ื™ ื‘ืืžืฆืขื•ืช ืคืจื•ืคื™ืœื™ ืคื ื“ื•ืช

ืขื›ืฉื™ื• ื‘ื•ืื• ื ืขืฉื” ืืช ืื•ืชื• ื”ื“ื‘ืจ ื‘ืืžืฆืขื•ืช ืคืจื•ืคื™ืœื™ ืคื ื“ื•ืช:

pandas_profiling.ProfileReport(df)

ื‘ื™ืฆื•ืข ืฉื•ืจืช ื”ืงื•ื“ ืฉืœืžืขืœื” ื™ืคื™ืง ื“ื•ื— ืขื ืื™ื ื“ื™ืงื˜ื•ืจื™ื ืœื ื™ืชื•ื— ื ืชื•ื ื™ื ื—ืงืจื ื™ื™ื. ื”ืงื•ื“ ื”ืžื•ืฆื’ ืœืขื™ืœ ื™ื•ืฆื™ื ืืช ื”ื ืชื•ื ื™ื ืฉื ืžืฆืื•, ืื‘ืœ ืืชื” ื™ื›ื•ืœ ืœื’ืจื•ื ืœื• ืœืคืœื˜ ืงื•ื‘ืฅ HTML ืฉืืชื” ื™ื›ื•ืœ ืœื”ืฆื™ื’ ืœืžื™ืฉื”ื•, ืœืžืฉืœ.

ื”ื—ืœืง ื”ืจืืฉื•ืŸ ืฉืœ ื”ื“ื•ื— ื™ื›ื™ืœ ืงื˜ืข ืกืงื™ืจื”, ื”ืžืขื ื™ืง ืžื™ื“ืข ื‘ืกื™ืกื™ ืขืœ ื”ื ืชื•ื ื™ื (ืžืกืคืจ ืชืฆืคื™ื•ืช, ืžืกืคืจ ืžืฉืชื ื™ื ื•ื›ื•'). ื”ื•ื ื™ื›ื™ืœ ื’ื ืจืฉื™ืžื” ืฉืœ ื”ืชืจืื•ืช, ืฉื™ื•ื“ื™ืขื• ืœื ืชื— ืขืœ ื“ื‘ืจื™ื ืฉื™ืฉ ืœืฉื™ื ืœื‘ ืืœื™ื”ื ื‘ืžื™ื•ื—ื“. ื”ืชืจืื•ืช ืืœื• ื™ื›ื•ืœื•ืช ืœืกืคืง ืจืžื–ื™ื ื”ื™ื›ืŸ ืชื•ื›ืœ ืœืžืงื“ ืืช ืžืืžืฆื™ ื ื™ืงื•ื™ ื”ื ืชื•ื ื™ื ืฉืœืš.

ื”ืืฅ ื ื™ืชื•ื— ื ืชื•ื ื™ื ื—ืงืจื ื™ ื‘ืืžืฆืขื•ืช ืกืคืจื™ื™ืช ืคืจื•ืคื™ืœื™ ืคื ื“ื•ืช
ืกืขื™ืฃ ื“ื•ื— ืกืงื™ืจื”

ื ื™ืชื•ื— ืžืฉืชื ื™ื ื—ืงืจื ื™ื™ื

ืžืชื—ืช ืœืงื˜ืข ืกืงื™ืจื” ื›ืœืœื™ืช ืฉืœ ื”ื“ื•ื— ืชื•ื›ืœ ืœืžืฆื•ื ืžื™ื“ืข ืฉื™ืžื•ืฉื™ ืขืœ ื›ืœ ืžืฉืชื ื”. ื”ื ื›ื•ืœืœื™ื, ื‘ื™ืŸ ื”ื™ืชืจ, ืชืจืฉื™ืžื™ื ืงื˜ื ื™ื ื”ืžืชืืจื™ื ืืช ื”ื”ืชืคืœื’ื•ืช ืฉืœ ื›ืœ ืžืฉืชื ื”.

ื”ืืฅ ื ื™ืชื•ื— ื ืชื•ื ื™ื ื—ืงืจื ื™ ื‘ืืžืฆืขื•ืช ืกืคืจื™ื™ืช ืคืจื•ืคื™ืœื™ ืคื ื“ื•ืช
ืขืœ ื”ืžืฉืชื ื” ื”ื ื•ืžืจื™ ืฉืœ ื’ื™ืœ

ื›ืคื™ ืฉื ื™ืชืŸ ืœืจืื•ืช ืžื”ื“ื•ื’ืžื” ื”ืงื•ื“ืžืช, ืคืจื•ืคื™ืœื™ ืคื ื“ื” ื ื•ืชืŸ ืœื ื• ืžืกืคืจ ืื™ื ื“ื™ืงื˜ื•ืจื™ื ืฉื™ืžื•ืฉื™ื™ื, ื›ืžื• ืื—ื•ื– ื•ืžืกืคืจ ื”ืขืจื›ื™ื ื”ื—ืกืจื™ื, ื›ืžื• ื’ื ืžื“ื“ื™ ืกื˜ื˜ื™ืกื˜ื™ืงื” ืชื™ืื•ืจื™ื™ื ืฉื›ื‘ืจ ืจืื™ื ื•. ื›ื™ Age ื”ื•ื ืžืฉืชื ื” ืžืกืคืจื™, ื”ื“ืžื™ื” ืฉืœ ื”ืชืคืœื’ื•ืชื• ื‘ืฆื•ืจื” ืฉืœ ื”ื™ืกื˜ื•ื’ืจืžื” ืžืืคืฉืจืช ืœื ื• ืœื”ืกื™ืง ืฉื™ืฉ ืœื ื• ื”ืชืคืœื’ื•ืช ืžื•ื˜ื” ื™ืžื™ื ื”.

ื›ืืฉืจ ื‘ื•ื—ื ื™ื ืžืฉืชื ื” ืงื˜ื’ื•ืจื™, ืชื•ืฆืื•ืช ื”ืคืœื˜ ืฉื•ื ื•ืช ื‘ืžืงืฆืช ืžืืœื• ืฉื ืžืฆืื• ืขื‘ื•ืจ ืžืฉืชื ื” ืžืกืคืจื™.

ื”ืืฅ ื ื™ืชื•ื— ื ืชื•ื ื™ื ื—ืงืจื ื™ ื‘ืืžืฆืขื•ืช ืกืคืจื™ื™ืช ืคืจื•ืคื™ืœื™ ืคื ื“ื•ืช
ืœื’ื‘ื™ ื”ืžืฉืชื ื” ื”ืงื˜ื’ื•ืจื™ ืžื™ืŸ

ื›ืœื•ืžืจ, ื‘ืžืงื•ื ืœืžืฆื•ื ืืช ื”ืžืžื•ืฆืข, ื”ืžื™ื ื™ืžื•ื ื•ื”ืžืงืกื™ืžื•ื, ืกืคืจื™ื™ืช ืคืจื•ืคื™ืœื™ ื”ืคื ื“ื•ืช ืžืฆืื” ืืช ืžืกืคืจ ื”ื›ื™ืชื•ืช. ื›ื™ Sex - ืžืฉืชื ื” ื‘ื™ื ืืจื™, ืขืจื›ื™ื• ืžื™ื•ืฆื’ื™ื ืขืœ ื™ื“ื™ ืฉืชื™ ืžื—ืœืงื•ืช.

ืื ืืชื” ืื•ื”ื‘ ืœื‘ื—ื•ืŸ ืงื•ื“ ื›ืžื•ื ื™, ืื•ืœื™ ื™ืขื ื™ื™ืŸ ืื•ืชืš ืื™ืš ื‘ื“ื™ื•ืง ืกืคืจื™ื™ืช ืคืจื•ืคื™ืœื™ ื”ืคื ื“ื•ืช ืžื—ืฉื‘ืช ืืช ื”ืžื“ื“ื™ื ื”ืืœื”. ืœื‘ืจืจ ืขืœ ื›ืš, ื‘ื”ืชื—ืฉื‘ ื‘ืขื•ื‘ื“ื” ืฉืงื•ื“ ื”ืกืคืจื™ื™ื” ืคืชื•ื— ื•ื–ืžื™ืŸ ื‘-GitHub, ืœื ื›ืœ ื›ืš ืงืฉื”. ืžื›ื™ื•ื•ืŸ ืฉืื ื™ ืœื ืžืขืจื™ืฅ ื’ื“ื•ืœ ืฉืœ ืฉื™ืžื•ืฉ ื‘ืงื•ืคืกืื•ืช ืฉื—ื•ืจื•ืช ื‘ืคืจื•ื™ืงื˜ื™ื ืฉืœื™, ื”ืกืชื›ืœืชื™ ืขืœ ืงื•ื“ ื”ืžืงื•ืจ ืฉืœ ื”ืกืคืจื™ื™ื”. ืœื“ื•ื’ืžื”, ื›ืš ื ืจืื” ื”ืžื ื’ื ื•ืŸ ืœืขื™ื‘ื•ื“ ืžืฉืชื ื™ื ืžืกืคืจื™ื™ื, ื”ืžื™ื•ืฆื’ ืขืœ ื™ื“ื™ ื”ืคื•ื ืงืฆื™ื” describe_numeric_1d:

def describe_numeric_1d(series, **kwargs):
    """Compute summary statistics of a numerical (`TYPE_NUM`) variable (a Series).
    Also create histograms (mini an full) of its distribution.
    Parameters
    ----------
    series : Series
        The variable to describe.
    Returns
    -------
    Series
        The description of the variable as a Series with index being stats keys.
    """
    # Format a number as a percentage. For example 0.25 will be turned to 25%.
    _percentile_format = "{:.0%}"
    stats = dict()
    stats['type'] = base.TYPE_NUM
    stats['mean'] = series.mean()
    stats['std'] = series.std()
    stats['variance'] = series.var()
    stats['min'] = series.min()
    stats['max'] = series.max()
    stats['range'] = stats['max'] - stats['min']
    # To avoid to compute it several times
    _series_no_na = series.dropna()
    for percentile in np.array([0.05, 0.25, 0.5, 0.75, 0.95]):
        # The dropna() is a workaround for https://github.com/pydata/pandas/issues/13098
        stats[_percentile_format.format(percentile)] = _series_no_na.quantile(percentile)
    stats['iqr'] = stats['75%'] - stats['25%']
    stats['kurtosis'] = series.kurt()
    stats['skewness'] = series.skew()
    stats['sum'] = series.sum()
    stats['mad'] = series.mad()
    stats['cv'] = stats['std'] / stats['mean'] if stats['mean'] else np.NaN
    stats['n_zeros'] = (len(series) - np.count_nonzero(series))
    stats['p_zeros'] = stats['n_zeros'] * 1.0 / len(series)
    # Histograms
    stats['histogram'] = histogram(series, **kwargs)
    stats['mini_histogram'] = mini_histogram(series, **kwargs)
    return pd.Series(stats, name=series.name)

ืœืžืจื•ืช ืฉืงื˜ืข ืงื•ื“ ื–ื” ื ืจืื” ื“ื™ ื’ื“ื•ืœ ื•ืžื•ืจื›ื‘, ื”ื•ื ืœืžืขืฉื” ืคืฉื•ื˜ ืžืื•ื“ ืœื”ื‘ื ื”. ื”ืขื ื™ื™ืŸ ื”ื•ื ืฉื‘ืงื•ื“ ื”ืžืงื•ืจ ืฉืœ ื”ืกืคืจื™ื™ื” ื™ืฉ ืคื•ื ืงืฆื™ื” ืฉืงื•ื‘ืขืช ืืช ืกื•ื’ื™ ื”ืžืฉืชื ื™ื. ืื ื™ืชื‘ืจืจ ืฉื”ืกืคืจื™ื™ื” ื ืชืงืœื” ื‘ืžืฉืชื ื” ืžืกืคืจื™, ื”ืคื•ื ืงืฆื™ื” ืœืขื™ืœ ืชืžืฆื ืืช ื”ืžื“ื“ื™ื ืฉื‘ื“ืงื ื•. ืคื•ื ืงืฆื™ื” ื–ื• ืžืฉืชืžืฉืช ื‘ืคืขื•ืœื•ืช ืคื ื“ื” ืกื˜ื ื“ืจื˜ื™ื•ืช ืœืขื‘ื•ื“ื” ืขื ืื•ื‘ื™ื™ืงื˜ื™ื ืžืกื•ื’ Series, ื›ืžื• series.mean(). ืชื•ืฆืื•ืช ื”ื—ื™ืฉื•ื‘ ืžืื•ื—ืกื ื•ืช ื‘ืžื™ืœื•ืŸ stats. ื”ื™ืกื˜ื•ื’ืจืžื•ืช ื ื•ืฆืจื•ืช ื‘ืืžืฆืขื•ืช ื’ืจืกื” ืžื•ืชืืžืช ืฉืœ ื”ืคื•ื ืงืฆื™ื” matplotlib.pyplot.hist. ื”ื”ืชืืžื” ื ื•ืขื“ื” ืœื”ื‘ื˜ื™ื— ืฉื”ืคื•ื ืงืฆื™ื” ื™ื›ื•ืœื” ืœืขื‘ื•ื“ ืขื ืกื•ื’ื™ื ืฉื•ื ื™ื ืฉืœ ืžืขืจื›ื™ ื ืชื•ื ื™ื.

ืžื“ื“ื™ ืžืชืื ื•ื ืชื•ื ื™ ืžื“ื’ื ื ื—ืงืจื•

ืœืื—ืจ ืชื•ืฆืื•ืช ื”ื ื™ืชื•ื— ืฉืœ ื”ืžืฉืชื ื™ื, ืคืจื•ืคื™ืœ ืคื ื“ื•ืช, ื‘ืงื˜ืข ืžืชืืžื™ื, ื™ืฆื™ื’ ืืช ืžื˜ืจื™ืฆื•ืช ื”ืžืชืื ืฉืœ ืคื™ืจืกื•ืŸ ื•ืกืคื™ืจืžืŸ.

ื”ืืฅ ื ื™ืชื•ื— ื ืชื•ื ื™ื ื—ืงืจื ื™ ื‘ืืžืฆืขื•ืช ืกืคืจื™ื™ืช ืคืจื•ืคื™ืœื™ ืคื ื“ื•ืช
ืžื˜ืจื™ืฆืช ืžืชืื ืคื™ืจืกื•ืŸ

ื‘ืžื™ื“ืช ื”ืฆื•ืจืš, ืชื•ื›ืœ, ื‘ืฉื•ืจืช ื”ืงื•ื“ ืฉืžืคืขื™ืœื” ืืช ื”ืคืงืช ื”ื“ื•ื—, ืœื”ื’ื“ื™ืจ ืืช ื”ืื™ื ื“ื™ืงื˜ื•ืจื™ื ืฉืœ ืขืจื›ื™ ื”ืกืฃ ื”ืžืฉืžืฉื™ื ื‘ืขืช ื—ื™ืฉื•ื‘ ื”ืžืชืื. ืขืœ ื™ื“ื™ ื›ืš, ืืชื” ื™ื›ื•ืœ ืœืฆื™ื™ืŸ ืื™ื–ื” ื—ื•ื–ืง ืžืชืื ื ื—ืฉื‘ ื—ืฉื•ื‘ ืœื ื™ืชื•ื— ืฉืœืš.

ืœื‘ืกื•ืฃ, ื“ื•ื— ื”ืคืจื•ืคื™ืœ ืฉืœ ื”ืคื ื“ื•ืช, ื‘ืกืขื™ืฃ ืœื“ื•ื’ืžื”, ืžืฆื™ื’, ื›ื“ื•ื’ืžื”, ื ืชื•ืŸ ืฉื ืœืงื— ืžืชื—ื™ืœืช ืžืขืจืš ื”ื ืชื•ื ื™ื. ื’ื™ืฉื” ื–ื• ืขืœื•ืœื” ืœื”ื•ื‘ื™ืœ ืœื”ืคืชืขื•ืช ืœื ื ืขื™ืžื•ืช, ืฉื›ืŸ ื”ืชืฆืคื™ื•ืช ื”ืจืืฉื•ื ื•ืช ืขืฉื•ื™ื•ืช ืœื™ื™ืฆื’ ืžื“ื’ื ืฉืื™ื ื• ืžืฉืงืฃ ืืช ื”ืžืืคื™ื™ื ื™ื ืฉืœ ืžืขืจืš ื”ื ืชื•ื ื™ื ื›ื•ืœื•.

ื”ืืฅ ื ื™ืชื•ื— ื ืชื•ื ื™ื ื—ืงืจื ื™ ื‘ืืžืฆืขื•ืช ืกืคืจื™ื™ืช ืคืจื•ืคื™ืœื™ ืคื ื“ื•ืช
ื—ืœืง ื”ืžื›ื™ืœ ื ืชื•ื ื™ื ืœื“ื•ื’ืžื” ื”ื ื‘ื“ืงื™ื

ื›ืชื•ืฆืื” ืžื›ืš, ืื ื™ ืœื ืžืžืœื™ืฅ ืœืฉื™ื ืœื‘ ืœืกืขื™ืฃ ื”ืื—ืจื•ืŸ ื”ื–ื”. ื‘ืžืงื•ื ื–ืืช, ืขื“ื™ืฃ ืœื”ืฉืชืžืฉ ื‘ืคืงื•ื“ื” df.sample(5), ืฉื™ื‘ื—ืจ ื‘ืืงืจืื™ 5 ืชืฆืคื™ื•ืช ืžืžืขืจืš ื”ื ืชื•ื ื™ื.

ืชื•ืฆืื•ืช ืฉืœ

ืœืกื™ื›ื•ื, ืกืคืจื™ื™ืช ื”ืคืจื•ืคื™ืœื™ื ืฉืœ ื”ืคื ื“ื•ืช ืžืขื ื™ืงื” ืœืื ืœื™ืกื˜ ื›ืžื” ื™ื›ื•ืœื•ืช ืฉื™ืžื•ืฉื™ื•ืช ืฉื™ื”ื™ื• ืฉื™ืžื•ืฉื™ื•ืช ื‘ืžืงืจื™ื ืฉื‘ื”ื ืืชื” ืฆืจื™ืš ืœืงื‘ืœ ื‘ืžื”ื™ืจื•ืช ืžื•ืฉื’ ื’ืก ืขืœ ื”ื ืชื•ื ื™ื ืื• ืœื”ืขื‘ื™ืจ ื“ื•ื— ื ื™ืชื•ื— ืžื•ื“ื™ืขื™ื ื™ ืœืžื™ืฉื”ื•. ื‘ืžืงื‘ื™ืœ, ืขื‘ื•ื“ื” ืืžื™ืชื™ืช ืขื ื ืชื•ื ื™ื, ืชื•ืš ื”ืชื—ืฉื‘ื•ืช ื‘ืชื›ื•ื ื•ืชื™ื•, ืžืชื‘ืฆืขืช, ื›ืžื• ืœืœื ืฉื™ืžื•ืฉ ื‘ืคืจื•ืคื™ืœื™ ืคื ื“ื•ืช, ื‘ืื•ืคืŸ ื™ื“ื ื™.

ืื ืืชื” ืจื•ืฆื” ืœื”ืกืชื›ืœ ืื™ืš ื ืจืื” ื›ืœ ื ื™ืชื•ื— ื ืชื•ื ื™ ืžื•ื“ื™ืขื™ืŸ ื‘ืžื—ื‘ืจืช Jupyter ืื—ืช, ืชืกืชื›ืœ ืขืœ ื–ื” ื”ืคืจื•ื™ืงื˜ ืฉืœื™ ื ื•ืฆืจ ื‘ืืžืฆืขื•ืช nbviewer. ื•ื‘ืชื•ืš ื–ื” ืืชื” ื™ื›ื•ืœ ืœืžืฆื•ื ืืช ื”ืงื•ื“ ื”ืžืชืื™ื ื‘ืžืื’ืจื™ GitHub.

ืงื•ืจืื™ื ื™ืงืจื™ื! ื”ื™ื›ืŸ ืžืชื—ื™ืœื™ื ืœื ืชื— ืžืขืจื›ื™ ื ืชื•ื ื™ื ื—ื“ืฉื™ื?

ื”ืืฅ ื ื™ืชื•ื— ื ืชื•ื ื™ื ื—ืงืจื ื™ ื‘ืืžืฆืขื•ืช ืกืคืจื™ื™ืช ืคืจื•ืคื™ืœื™ ืคื ื“ื•ืช

ืžืงื•ืจ: www.habr.com

ื”ื•ืกืคืช ืชื’ื•ื‘ื”