E hoʻolalelale i ka ʻikepili ʻikepili me ka hoʻohana ʻana i ka waihona pandas-profiling

ʻO ka hana mua i ka wā e hoʻomaka ai e hana me kahi hoʻonohonoho ʻikepili hou e hoʻomaopopo iā ia. No ka hana ʻana i kēia, pono ʻoe, no ka laʻana, e ʻike i nā pae o nā waiwai i ʻae ʻia e nā mea hoʻololi, kā lākou ʻano, a ʻike pū i ka helu o nā waiwai i nalowale.

Hāʻawi ka waihona pandas iā mākou i nā mea pono he nui no ka hoʻokō ʻana i ka ʻikepili ʻikepili ʻimi (EDA). Akā ma mua o ka hoʻohana ʻana iā lākou, pono ʻoe e hoʻomaka me nā hana maʻamau e like me df.describe(). Eia naʻe, pono e hoʻomaopopoʻia he palena nā mana i hāʻawiʻia e ia mau hana, aʻo nā hana mua o ka hanaʻana me nā pūʻuluʻikepili i ka wā e hoʻokō ai i ka EDA e like loa me kekahi.

E hoʻolalelale i ka ʻikepili ʻikepili me ka hoʻohana ʻana i ka waihona pandas-profiling

ʻO ka mea kākau o ka mea a mākou e paʻi nei i kēia lā, ʻōlelo ʻo ia ʻaʻole ia he mea makemake i ka hana hou ʻana. ʻO ka hopena, i ka ʻimi ʻana i nā mea hana e hana wikiwiki a maikaʻi hoʻi i ka nānā ʻana i ka ʻikepili exploratory, ua loaʻa iā ia ka waihona panda-profiling. ʻAʻole i hōʻike ʻia nā hopena o kāna hana ma ke ʻano o kekahi mau hōʻailona hoʻokahi, akā ma ke ʻano o kahi hōʻike HTML kikoʻī i loaʻa ka hapa nui o ka ʻike e pili ana i ka ʻikepili i kālai ʻia e pono ai ʻoe e ʻike ma mua o ka hoʻomaka ʻana e hana kokoke me ia.

Maanei e nānā mākou i nā hiʻohiʻona o ka hoʻohana ʻana i ka waihona pandas-profiling me ka hoʻohana ʻana i ka dataset Titanic ma ke ʻano he laʻana.

ʻIke ʻikepili ʻimi me ka hoʻohana ʻana i nā pandas

Ua hoʻoholo wau e hoʻokolohua me nā pandas-profiling ma ka ʻikepili Titanic ma muli o nā ʻano ʻikepili like ʻole i loaʻa a me ka loaʻa ʻana o nā waiwai i nalowale. Ke manaʻoʻiʻo nei au he mea hoihoi loa ka waihona pandas-profiling i nā hihia kahi i hoʻomaʻemaʻe ʻole ʻia ai ka ʻikepili a koi aku i ka hana hou ʻana ma muli o kāna mau hiʻohiʻona. I mea e hoʻokō pono ai i ia kaʻina hana, pono ʻoe e ʻike i kahi e hoʻomaka ai a me ka mea e hoʻolohe ai. ʻO kēia kahi e hiki mai ai nā mana panda-profiling.

ʻO ka mea mua, lawe mākou i ka ʻikepili a hoʻohana i nā pandas e kiʻi i nā ʻikepili wehewehe:

# импорт необходимых пакетов
import pandas as pd
import pandas_profiling
import numpy as np

# импорт данных
df = pd.read_csv('/Users/lukas/Downloads/titanic/train.csv')

# вычисление показателей описательной статистики
df.describe()

Ma hope o ka hoʻokō ʻana i kēia ʻāpana code, e loaʻa iā ʻoe ka mea i hōʻike ʻia ma kēia kiʻi.

E hoʻolalelale i ka ʻikepili ʻikepili me ka hoʻohana ʻana i ka waihona pandas-profiling
Loaʻa nā ʻikepili wehewehe me ka hoʻohana ʻana i nā mea hana pandas maʻamau

ʻOiai he nui nā ʻike pono ma ʻaneʻi, ʻaʻole i loko o nā mea a pau e hoihoi i ka ʻike e pili ana i ka ʻikepili e aʻo ʻia. No ka laʻana, manaʻo paha kekahi i loko o kahi kiʻi ʻikepili, i kahi hoʻolālā DataFrame, he 891 laina. Inā pono e nānā ʻia kēia, a laila koi ʻia kahi laina code hou e hoʻoholo ai i ka nui o ke kiʻi. ʻOiai ʻaʻole koʻikoʻi kēia mau helu ʻana, ʻoi aku ka maikaʻi o ka hoʻomaʻemaʻe ʻana i ka ʻikepili.

Ka ʻimi ʻikepili ʻikepili me ka hoʻohana ʻana i ka panda-profiling

E hana like kāua me ka hoʻohana ʻana i ka pandas-profiling:

pandas_profiling.ProfileReport(df)

ʻO ka hoʻokō ʻana i ka laina ma luna o ke code e hoʻopuka i kahi hōʻike me nā hōʻailona hōʻike ʻikepili ʻimi. ʻO ke code i hōʻike ʻia ma luna nei e hoʻopuka i ka ʻikepili i loaʻa, akā hiki iā ʻoe ke hoʻopuka i kahi faila HTML hiki iā ʻoe ke hōʻike i kekahi, no ka laʻana.

Aia ka ʻāpana mua o ka hōʻike i kahi ʻāpana Overview, e hāʻawi ana i ka ʻike kumu e pili ana i ka ʻikepili (helu o ka nānā ʻana, ka helu o nā mea hoʻololi, etc.). Loaʻa iā ia kahi papa inoa o nā mākaʻikaʻi, e hōʻike ana i ka mea loiloi i nā mea e nānā pono ai. Hiki i kēia mau makaʻala ke hāʻawi i nā hōʻailona e pili ana i kahi e hiki ai iā ʻoe ke kālele i kāu mau hana hoʻomaʻemaʻe ʻikepili.

E hoʻolalelale i ka ʻikepili ʻikepili me ka hoʻohana ʻana i ka waihona pandas-profiling
Māhele hōʻike manaʻo nui

Ka Imi Imi

Ma lalo o ka ʻāpana Overview o ka hōʻike hiki iā ʻoe ke ʻike i ka ʻike pono e pili ana i kēlā me kēia ʻano. Hoʻokomo pū lākou, ma waena o nā mea ʻē aʻe, nā palapala liʻiliʻi e wehewehe ana i ka māhele ʻana o kēlā me kēia ʻano.

E hoʻolalelale i ka ʻikepili ʻikepili me ka hoʻohana ʻana i ka waihona pandas-profiling
E pili ana i ka helu makahiki

E like me kāu e ʻike ai mai ka laʻana mua, hāʻawi ka pandas-profiling iā mākou i kekahi mau hōʻailona pono, e like me ka pākēneka a me ka helu o nā waiwai i nalowale, a me nā ana helu wehewehe a mākou i ʻike mua ai. No ka mea Age he helu helu, ʻike ʻia kona puʻunaue ʻana ma ke ʻano o ka histogram e hiki ai iā mākou ke hoʻoholo i kā mākou mahele ʻana i ka ʻākau.

I ka noʻonoʻo ʻana i kahi hoʻololi categorical, ʻokoʻa iki nā hopena i loaʻa mai nā mea i loaʻa no kahi loli helu.

E hoʻolalelale i ka ʻikepili ʻikepili me ka hoʻohana ʻana i ka waihona pandas-profiling
E pili ana i ka Sex categorical variable

ʻO ia, ma kahi o ka loaʻa ʻana o ka awelika, ka liʻiliʻi a me ka nui, ua loaʻa i ka waihona pandas-profiling ka helu o nā papa. No ka mea Sex - kahi hoʻololi binary, hōʻike ʻia kona mau waiwai e nā papa ʻelua.

Inā makemake ʻoe e nānā i nā code e like me aʻu, makemake paha ʻoe i ke ʻano o ka helu ʻana o ka waihona pandas-profiling i kēia mau metric. ʻO ka ʻike e pili ana i kēia, hāʻawi ʻia ua wehe ʻia ka code waihona a loaʻa iā GitHub, ʻaʻole paʻakikī loa. No ka mea ʻaʻole wau makemake nui i ka hoʻohana ʻana i nā pahu ʻeleʻele i kaʻu mau papahana, ua nānā au i ka code kumu o ka waihona. No ka laʻana, ʻo ia ke ʻano o ke ʻano o ka hana ʻana i nā ʻano helu helu, i hōʻike ʻia e ka hana wehewehe_numeric_1d:

def describe_numeric_1d(series, **kwargs):
    """Compute summary statistics of a numerical (`TYPE_NUM`) variable (a Series).
    Also create histograms (mini an full) of its distribution.
    Parameters
    ----------
    series : Series
        The variable to describe.
    Returns
    -------
    Series
        The description of the variable as a Series with index being stats keys.
    """
    # Format a number as a percentage. For example 0.25 will be turned to 25%.
    _percentile_format = "{:.0%}"
    stats = dict()
    stats['type'] = base.TYPE_NUM
    stats['mean'] = series.mean()
    stats['std'] = series.std()
    stats['variance'] = series.var()
    stats['min'] = series.min()
    stats['max'] = series.max()
    stats['range'] = stats['max'] - stats['min']
    # To avoid to compute it several times
    _series_no_na = series.dropna()
    for percentile in np.array([0.05, 0.25, 0.5, 0.75, 0.95]):
        # The dropna() is a workaround for https://github.com/pydata/pandas/issues/13098
        stats[_percentile_format.format(percentile)] = _series_no_na.quantile(percentile)
    stats['iqr'] = stats['75%'] - stats['25%']
    stats['kurtosis'] = series.kurt()
    stats['skewness'] = series.skew()
    stats['sum'] = series.sum()
    stats['mad'] = series.mad()
    stats['cv'] = stats['std'] / stats['mean'] if stats['mean'] else np.NaN
    stats['n_zeros'] = (len(series) - np.count_nonzero(series))
    stats['p_zeros'] = stats['n_zeros'] * 1.0 / len(series)
    # Histograms
    stats['histogram'] = histogram(series, **kwargs)
    stats['mini_histogram'] = mini_histogram(series, **kwargs)
    return pd.Series(stats, name=series.name)

ʻOiai he mea nui a paʻakikī paha kēia ʻāpana code, maʻalahi loa ia e hoʻomaopopo. ʻO ka manaʻo, aia i loko o ka code source o ka waihona kahi hana e hoʻoholo ai i nā ʻano o nā mea hoʻololi. Inā ʻike ʻia ua loaʻa ka waihona i kahi ʻano helu helu, e ʻike ka hana ma luna nei i nā metric a mākou e nānā nei. Hoʻohana kēia hana i nā hana pandas maʻamau no ka hana ʻana me nā mea o ke ʻano Series, like series.mean(). Mālama ʻia nā hopena helu i loko o ka puke wehewehe stats. Hana ʻia nā histograms me ka hoʻohana ʻana i kahi mana kūpono o ka hana matplotlib.pyplot.hist. Hoʻopili ʻia ka hoʻololi ʻana i ka hōʻoia ʻana e hiki ke hana i ka hana me nā ʻano pūʻulu ʻikepili like ʻole.

Ua aʻo ʻia nā hōʻailona hoʻoponopono a me ka ʻikepili laʻana

Ma hope o nā hopena o ka nānā ʻana i nā mea hoʻololi, pandas-profiling, ma ka ʻāpana Correlations, e hōʻike i nā matrices correlation Pearson a me Spearman.

E hoʻolalelale i ka ʻikepili ʻikepili me ka hoʻohana ʻana i ka waihona pandas-profiling
Pearson correlation matrix

Inā pono, hiki iā ʻoe, ma ka laina o ke code e hoʻāla ai i ka hanauna o ka hōʻike, e hoʻonohonoho i nā hōʻailona o nā koina paepae i hoʻohana ʻia i ka helu ʻana i ka correlation. Ma ka hana ʻana i kēia, hiki iā ʻoe ke kuhikuhi i ka ikaika o ka correlation i manaʻo ʻia he mea nui no kāu loiloi.

ʻO ka hope, hōʻike ka hōʻike panda-profiling, ma ka ʻāpana Sample, ma ke ʻano he laʻana, kahi ʻāpana ʻikepili i lawe ʻia mai ka hoʻomaka ʻana o ka hoʻonohonoho ʻikepili. Hiki i kēia ala ke alakaʻi i nā pīhoihoi maikaʻi ʻole, no ka mea, ʻo nā ʻike mua loa e hōʻike ana i kahi laʻana i hōʻike ʻole i nā ʻano o ka pūʻulu ʻikepili holoʻokoʻa.

E hoʻolalelale i ka ʻikepili ʻikepili me ka hoʻohana ʻana i ka waihona pandas-profiling
ʻO ka ʻāpana i loaʻa nā ʻikepili laʻana e aʻo ʻia

ʻO ka hopena, ʻaʻole wau manaʻo e hoʻolohe i kēia ʻāpana hope. Akā, ʻoi aku ka maikaʻi o ka hoʻohana ʻana i ke kauoha df.sample(5), ka mea e koho maalea i 5 ike mai ka hoonohonoho ikepili.

Nā hopena

I ka hōʻuluʻulu ʻana, hāʻawi ka waihona pandas-profiling i ka mea loiloi i kekahi mau mea pono e hiki mai ana i nā hihia kahi e pono ai ʻoe e kiʻi koke i kahi manaʻo koʻikoʻi o ka ʻikepili a i ʻole e hāʻawi i kahi hōʻike loiloi naʻauao i kekahi. I ka manawa like, hana ʻia ka hana maoli me ka ʻikepili, e noʻonoʻo ana i kāna mau hiʻohiʻona, me ka ʻole o ka hoʻohana ʻana i ka pandas-profiling, me ka lima.

Inā makemake ʻoe e nānā i ke ʻano o ka nānā ʻana i ka ʻikepili naʻauao āpau i hoʻokahi puke Jupyter, e nānā kēia hana ʻia kaʻu papahana me ka hoʻohana ʻana i ka nbviewer. A i loko kēia Hiki iā ʻoe ke ʻike i ke code pili i nā waihona waihona GitHub.

E nā mea heluhelu aloha! Ma hea ʻoe e hoʻomaka ai e kālailai i nā pūʻulu ʻikepili hou?

E hoʻolalelale i ka ʻikepili ʻikepili me ka hoʻohana ʻana i ka waihona pandas-profiling

Source: www.habr.com

Pākuʻi i ka manaʻo hoʻopuka