Nthawi zambiri anthu omwe amalowa m'gawo la Sayansi ya Data amakhala ndi zoyembekeza zochepa kuposa zomwe zikuyembekezera. Anthu ambiri amaganiza kuti tsopano alemba ma neural network abwino, kupanga wothandizira mawu kuchokera ku Iron Man, kapena kumenya aliyense m'misika yazachuma.
Koma ntchito Deta Asayansi amayendetsedwa ndi data, ndipo chimodzi mwazinthu zofunika kwambiri komanso zowononga nthawi ndikukonza deta musanayidyetse mu neural network kapena kuisanthula mwanjira inayake.
M'nkhaniyi, gulu lathu likufotokozera momwe mungagwiritsire ntchito deta mofulumira komanso mosavuta ndi ndondomeko ndi ndondomeko. Tinayesa kupanga code kukhala yosinthika ndipo ingagwiritsidwe ntchito pama dataset osiyanasiyana.
Akatswiri ambiri sangapeze chilichonse chodabwitsa m'nkhaniyi, koma oyamba kumene adzatha kuphunzira zatsopano, ndipo aliyense amene wakhala akulakalaka kupanga kope lapadera kuti azitha kukonza deta mofulumira komanso mokhazikika akhoza kukopera kachidindo ndikuzipanga okha, kapena
Tinalandira deta. Zotani kenako?
Choncho, muyezo: tiyenera kumvetsa zimene tikuchita, chithunzi chonse. Kuti tichite izi, timagwiritsa ntchito ma pandas kutanthauzira mitundu yosiyanasiyana ya data.
import pandas as pd #ΠΈΠΌΠΏΠΎΡΡΠΈΡΡΠ΅ΠΌ pandas
import numpy as np #ΠΈΠΌΠΏΠΎΡΡΠΈΡΡΠ΅ΠΌ numpy
df = pd.read_csv("AB_NYC_2019.csv") #ΡΠΈΡΠ°Π΅ΠΌ Π΄Π°ΡΠ°ΡΠ΅Ρ ΠΈ Π·Π°ΠΏΠΈΡΡΠ²Π°Π΅ΠΌ Π² ΠΏΠ΅ΡΠ΅ΠΌΠ΅Π½Π½ΡΡ df
df.head(3) #ΡΠΌΠΎΡΡΠΈΠΌ Π½Π° ΠΏΠ΅ΡΠ²ΡΠ΅ 3 ΡΡΡΠΎΡΠΊΠΈ, ΡΡΠΎΠ±Ρ ΠΏΠΎΠ½ΡΡΡ, ΠΊΠ°ΠΊ Π²ΡΠ³Π»ΡΠ΄ΡΡ Π·Π½Π°ΡΠ΅Π½ΠΈΡ
df.info() #ΠΠ΅ΠΌΠΎΠ½ΡΡΡΠΈΡΡΠ΅ΠΌ ΠΈΠ½ΡΠΎΡΠΌΠ°ΡΠΈΡ ΠΎ ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°Ρ
Tiyeni tiwone misinkhu:
- Kodi mizere pagawo lililonse ikugwirizana ndi mizere yonse?
- Kodi tanthauzo la zomwe zili mugawo lililonse ndi chiyani?
- Ndi gawo liti lomwe tikufuna kuloza kuti tilosere za izo?
Mayankho a mafunsowa adzakuthandizani kusanthula deta yanu ndikujambula dongosolo lazotsatira zanu.
Komanso, kuti tiwone mozama pamakhalidwe omwe ali mugawo lililonse, titha kugwiritsa ntchito ntchito ya pandas explain(). Komabe, kuipa kwa ntchitoyi ndikuti sikumapereka chidziwitso chokhudza mizati yokhala ndi zingwe. Tithana nawo pambuyo pake.
df.describe()
Mawonedwe amatsenga
Tiyeni tiwone komwe tilibe ma values ββkonse:
import seaborn as sns
sns.heatmap(df.isnull(),yticklabels=False,cbar=False,cmap='viridis')
Uku kunali kuyang'ana kwakufupi kuchokera pamwamba, tsopano tipita kuzinthu zosangalatsa
Tiyeni tiyese kupeza ndipo, ngati n'kotheka, chotsani mizati yomwe ili ndi mtengo umodzi wokha m'mizere yonse (sizidzakhudza zotsatira mwanjira iliyonse):
df = df[[c for c
in list(df)
if len(df[c].unique()) > 1]] #ΠΠ΅ΡΠ΅Π·Π°ΠΏΠΈΡΡΠ²Π°Π΅ΠΌ Π΄Π°ΡΠ°ΡΠ΅Ρ, ΠΎΡΡΠ°Π²Π»ΡΡ ΡΠΎΠ»ΡΠΊΠΎ ΡΠ΅ ΠΊΠΎΠ»ΠΎΠ½ΠΊΠΈ, Π² ΠΊΠΎΡΠΎΡΡΡ
Π±ΠΎΠ»ΡΡΠ΅ ΠΎΠ΄Π½ΠΎΠ³ΠΎ ΡΠ½ΠΈΠΊΠ°Π»ΡΠ½ΠΎΠ³ΠΎ Π·Π½Π°ΡΠ΅Π½ΠΈΡ
Tsopano tikudziteteza komanso kupambana kwa projekiti yathu ku mizere yobwereza (mizere yomwe ili ndi chidziwitso chofanana ndi mizere yomwe ilipo):
df.drop_duplicates(inplace=True) #ΠΠ΅Π»Π°Π΅ΠΌ ΡΡΠΎ, Π΅ΡΠ»ΠΈ ΡΡΠΈΡΠ°Π΅ΠΌ Π½ΡΠΆΠ½ΡΠΌ.
#Π Π½Π΅ΠΊΠΎΡΠΎΡΡΡ
ΠΏΡΠΎΠ΅ΠΊΡΠ°Ρ
ΡΠ΄Π°Π»ΡΡΡ ΡΠ°ΠΊΠΈΠ΅ Π΄Π°Π½Π½ΡΠ΅ Ρ ΡΠ°ΠΌΠΎΠ³ΠΎ Π½Π°ΡΠ°Π»Π° Π½Π΅ ΡΡΠΎΠΈΡ.
Timagawa magawo awiri: imodzi yokhala ndi mikhalidwe yabwino, ina ndi kuchuluka kwake.
Apa tifunika kumveketsa pang'ono: ngati mizere yomwe ili ndi deta yomwe ikusowa mu deta yabwino komanso yochuluka sikugwirizana kwambiri, ndiye kuti tifunika kusankha zomwe timaperekera nsembe - mizere yonse yomwe ili ndi deta yomwe ikusowa, gawo limodzi lokha, kapena zigawo zina. Ngati mizereyo ilumikizidwa, ndiye kuti tili ndi ufulu wonse wogawa magawo awiri. Kupanda kutero, choyamba muyenera kuthana ndi mizere yomwe simalumikizana ndi zomwe zikusowa mumkhalidwe komanso kuchuluka, kenako ndikugawanitsa magawo awiri.
df_numerical = df.select_dtypes(include = [np.number])
df_categorical = df.select_dtypes(exclude = [np.number])
Timachita izi kuti zikhale zosavuta kwa ife kukonza mitundu iwiri yosiyana ya deta - pambuyo pake tidzamvetsetsa momwe izi zimakhalira zosavuta pamoyo wathu.
Timagwira ntchito ndi kuchuluka kwa data
Chinthu choyamba chimene tiyenera kuchita ndi kudziwa ngati pali "zazonda mizati" mu kuchuluka kwa deta. Timatcha magawowa chifukwa amadziwonetsa ngati kuchuluka kwa data, koma amakhala ngati deta yolondola.
Kodi timawatanthauzira bwanji? Zachidziwikire, zonse zimatengera mtundu wa zomwe mukusanthula, koma nthawi zambiri mizati yotere imatha kukhala ndi chidziwitso chaching'ono (m'dera la 3-10).
print(df_numerical.nunique())
Tikazindikira mizati ya akazitape, tidzawasuntha kuchoka pazambiri kupita ku data yabwino:
spy_columns = df_numerical[['ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°1', 'ΠΊΠΎΠ»ΠΎΠΊΠ°2', 'ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°3']]#Π²ΡΠ΄Π΅Π»ΡΠ΅ΠΌ ΠΊΠΎΠ»ΠΎΠ½ΠΊΠΈ-ΡΠΏΠΈΠΎΠ½Ρ ΠΈ Π·Π°ΠΏΠΈΡΡΠ²Π°Π΅ΠΌ Π² ΠΎΡΠ΄Π΅Π»ΡΠ½ΡΡ dataframe
df_numerical.drop(labels=['ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°1', 'ΠΊΠΎΠ»ΠΎΠΊΠ°2', 'ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°3'], axis=1, inplace = True)#Π²ΡΡΠ΅Π·Π°Π΅ΠΌ ΡΡΠΈ ΠΊΠΎΠ»ΠΎΠ½ΠΊΠΈ ΠΈΠ· ΠΊΠΎΠ»ΠΈΡΠ΅ΡΡΠ²Π΅Π½Π½ΡΡ
Π΄Π°Π½Π½ΡΡ
df_categorical.insert(1, 'ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°1', spy_columns['ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°1']) #Π΄ΠΎΠ±Π°Π²Π»ΡΠ΅ΠΌ ΠΏΠ΅ΡΠ²ΡΡ ΠΊΠΎΠ»ΠΎΠ½ΠΊΡ-ΡΠΏΠΈΠΎΠ½ Π² ΠΊΠ°ΡΠ΅ΡΡΠ²Π΅Π½Π½ΡΠ΅ Π΄Π°Π½Π½ΡΠ΅
df_categorical.insert(1, 'ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°2', spy_columns['ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°2']) #Π΄ΠΎΠ±Π°Π²Π»ΡΠ΅ΠΌ Π²ΡΠΎΡΡΡ ΠΊΠΎΠ»ΠΎΠ½ΠΊΡ-ΡΠΏΠΈΠΎΠ½ Π² ΠΊΠ°ΡΠ΅ΡΡΠ²Π΅Π½Π½ΡΠ΅ Π΄Π°Π½Π½ΡΠ΅
df_categorical.insert(1, 'ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°3', spy_columns['ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°3']) #Π΄ΠΎΠ±Π°Π²Π»ΡΠ΅ΠΌ ΡΡΠ΅ΡΡΡ ΠΊΠΎΠ»ΠΎΠ½ΠΊΡ-ΡΠΏΠΈΠΎΠ½ Π² ΠΊΠ°ΡΠ΅ΡΡΠ²Π΅Π½Π½ΡΠ΅ Π΄Π°Π½Π½ΡΠ΅
Potsirizira pake, talekanitsa deta yochuluka kuchokera ku deta yodalirika ndipo tsopano tikhoza kugwira nawo ntchito moyenera. Chinthu choyamba ndikumvetsetsa komwe tili ndi zinthu zopanda kanthu (NaN, ndipo nthawi zina 0 idzavomerezedwa ngati zopanda pake).
for i in df_numerical.columns:
print(i, df[i][df[i]==0].count())
Pakadali pano, ndikofunikira kumvetsetsa kuti ndi ziti ziti zomwe zikuwonetsa zomwe zikusowa: kodi izi ndichifukwa cha momwe deta idasonkhanitsira? Kapena zingakhale zogwirizana ndi ma data? Mafunso awa ayenera kuyankhidwa pazochitika ndizochitika.
Chifukwa chake, ngati tilingalirabe kuti mwina tikusowa deta pomwe pali ziro, tiyenera kusintha ziro ndi NaN kuti zikhale zosavuta kugwira ntchito ndi data yotayikayi pambuyo pake:
df_numerical[["ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ° 1", "ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ° 2"]] = df_numerical[["ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ° 1", "ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ° 2"]].replace(0, nan)
Tsopano tiyeni tiwone komwe tikusowa deta:
sns.heatmap(df_numerical.isnull(),yticklabels=False,cbar=False,cmap='viridis') # ΠΠΎΠΆΠ½ΠΎ ΡΠ°ΠΊΠΆΠ΅ Π²ΠΎΡΠΏΠΎΠ»ΡΠ·ΠΎΠ²Π°ΡΡΡΡ df_numerical.info()
Apa zikhalidwe zomwe zili mkati mwamizati zomwe zikusowa ziyenera kulembedwa zachikasu. Ndipo tsopano zosangalatsa zimayamba - momwe mungachitire ndi izi? Kodi ndichotse mizere yokhala ndi zikhalidwe izi kapena mizati? Kapena lembani zinthu zopanda pake izi ndi zina?
Nachi chithunzi chomwe chingakuthandizeni kusankha zomwe zingachitike ndi zinthu zopanda pake:
0. Chotsani mizati yosafunikira
df_numerical.drop(labels=["ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°1","ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°2"], axis=1, inplace=True)
1. Kodi kuchuluka kwazinthu zopanda kanthu patsambali ndi zazikulu kuposa 50%?
print(df_numerical.isnull().sum() / df_numerical.shape[0] * 100)
df_numerical.drop(labels=["ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°1","ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°2"], axis=1, inplace=True)#Π£Π΄Π°Π»ΡΠ΅ΠΌ, Π΅ΡΠ»ΠΈ ΠΊΠ°ΠΊΠ°Ρ-ΡΠΎ ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ° ΠΈΠΌΠ΅Π΅Ρ Π±ΠΎΠ»ΡΡΠ΅ 50 ΠΏΡΡΡΡΡ
Π·Π½Π°ΡΠ΅Π½ΠΈΠΉ
2. Chotsani mizere yokhala ndi mfundo zopanda pake
df_numerical.dropna(inplace=True)#Π£Π΄Π°Π»ΡΠ΅ΠΌ ΡΡΡΠΎΡΠΊΠΈ Ρ ΠΏΡΡΡΡΠΌΠΈ Π·Π½Π°ΡΠ΅Π½ΠΈΡΠΌΠΈ, Π΅ΡΠ»ΠΈ ΠΏΠΎΡΠΎΠΌ ΠΎΡΡΠ°Π½Π΅ΡΡΡ Π΄ΠΎΡΡΠ°ΡΠΎΡΠ½ΠΎ Π΄Π°Π½Π½ΡΡ
Π΄Π»Ρ ΠΎΠ±ΡΡΠ΅Π½ΠΈΡ
3.1. Kuyika mtengo wachisawawa
import random #ΠΈΠΌΠΏΠΎΡΡΠΈΡΡΠ΅ΠΌ random
df_numerical["ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°"].fillna(lambda x: random.choice(df[df[column] != np.nan]["ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°"]), inplace=True) #Π²ΡΡΠ°Π²Π»ΡΠ΅ΠΌ ΡΠ°Π½Π΄ΠΎΠΌΠ½ΡΠ΅ Π·Π½Π°ΡΠ΅Π½ΠΈΡ Π² ΠΏΡΡΡΡΠ΅ ΠΊΠ»Π΅ΡΠΊΠΈ ΡΠ°Π±Π»ΠΈΡΡ
3.2. Kuyika mtengo wokhazikika
from sklearn.impute import SimpleImputer #ΠΈΠΌΠΏΠΎΡΡΠΈΡΡΠ΅ΠΌ SimpleImputer, ΠΊΠΎΡΠΎΡΡΠΉ ΠΏΠΎΠΌΠΎΠΆΠ΅Ρ Π²ΡΡΠ°Π²ΠΈΡΡ Π·Π½Π°ΡΠ΅Π½ΠΈΡ
imputer = SimpleImputer(strategy='constant', fill_value="<ΠΠ°ΡΠ΅ Π·Π½Π°ΡΠ΅Π½ΠΈΠ΅ Π·Π΄Π΅ΡΡ>") #Π²ΡΡΠ°Π²Π»ΡΠ΅ΠΌ ΠΎΠΏΡΠ΅Π΄Π΅Π»Π΅Π½Π½ΠΎΠ΅ Π·Π½Π°ΡΠ΅Π½ΠΈΠ΅ Ρ ΠΏΠΎΠΌΠΎΡΡΡ SimpleImputer
df_numerical[["Π½ΠΎΠ²Π°Ρ_ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°1",'Π½ΠΎΠ²Π°Ρ_ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°2','Π½ΠΎΠ²Π°Ρ_ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°3']] = imputer.fit_transform(df_numerical[['ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°1', 'ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°2', 'ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°3']]) #ΠΡΠΈΠΌΠ΅Π½ΡΠ΅ΠΌ ΡΡΠΎ Π΄Π»Ρ Π½Π°ΡΠ΅ΠΉ ΡΠ°Π±Π»ΠΈΡΡ
df_numerical.drop(labels = ["ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°1","ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°2","ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°3"], axis = 1, inplace = True) #Π£Π±ΠΈΡΠ°Π΅ΠΌ ΠΊΠΎΠ»ΠΎΠ½ΠΊΠΈ ΡΠΎ ΡΡΠ°ΡΡΠΌΠΈ Π·Π½Π°ΡΠ΅Π½ΠΈΡΠΌΠΈ
3.3. Ikani mtengo wapakati kapena wochuluka kwambiri
from sklearn.impute import SimpleImputer #ΠΈΠΌΠΏΠΎΡΡΠΈΡΡΠ΅ΠΌ SimpleImputer, ΠΊΠΎΡΠΎΡΡΠΉ ΠΏΠΎΠΌΠΎΠΆΠ΅Ρ Π²ΡΡΠ°Π²ΠΈΡΡ Π·Π½Π°ΡΠ΅Π½ΠΈΡ
imputer = SimpleImputer(strategy='mean', missing_values = np.nan) #Π²ΠΌΠ΅ΡΡΠΎ mean ΠΌΠΎΠΆΠ½ΠΎ ΡΠ°ΠΊΠΆΠ΅ ΠΈΡΠΏΠΎΠ»ΡΠ·ΠΎΠ²Π°ΡΡ most_frequent
df_numerical[["Π½ΠΎΠ²Π°Ρ_ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°1",'Π½ΠΎΠ²Π°Ρ_ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°2','Π½ΠΎΠ²Π°Ρ_ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°3']] = imputer.fit_transform(df_numerical[['ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°1', 'ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°2', 'ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°3']]) #ΠΡΠΈΠΌΠ΅Π½ΡΠ΅ΠΌ ΡΡΠΎ Π΄Π»Ρ Π½Π°ΡΠ΅ΠΉ ΡΠ°Π±Π»ΠΈΡΡ
df_numerical.drop(labels = ["ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°1","ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°2","ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°3"], axis = 1, inplace = True) #Π£Π±ΠΈΡΠ°Π΅ΠΌ ΠΊΠΎΠ»ΠΎΠ½ΠΊΠΈ ΡΠΎ ΡΡΠ°ΡΡΠΌΠΈ Π·Π½Π°ΡΠ΅Π½ΠΈΡΠΌΠΈ
3.4. Ikani mtengo wowerengedwa ndi chitsanzo china
Nthawi zina ma values ββamatha kuwerengedwa pogwiritsa ntchito zitsanzo za regression laibulale ya sklearn kapena malaibulale ena ofanana. Gulu lathu lipereka nkhani ina ya momwe izi zingachitikire posachedwa.
Kotero, pakali pano, nkhani ya kuchuluka kwa deta idzasokonezedwa, chifukwa pali zina zambiri za momwe mungapangire bwino kukonzekera deta ndikukonzekera ntchito zosiyanasiyana, ndi zinthu zofunika za deta yochuluka zakhala zikuganiziridwa m'nkhaniyi, ndipo ino ndi nthawi yobwerera ku qualitative data yomwe tidalekanitsa masitepe angapo kuchokera ku kuchuluka. Mutha kusintha kope ili momwe mukufunira, ndikulisintha kuti lizigwira ntchito zosiyanasiyana, kuti kukonzanso kwa data kumapita mwachangu kwambiri!
Deta yolondola
Kwenikweni, pazambiri zofananira, njira ya One-hot-encoding imagwiritsidwa ntchito kuti ipangike kuchokera pa chingwe (kapena chinthu) kupita ku nambala. Tisanapitirire pamfundoyi, tiyeni tigwiritse ntchito chithunzichi ndi kachidindo pamwambapa kuthana ndi zinthu zopanda pake.
df_categorical.nunique()
sns.heatmap(df_categorical.isnull(),yticklabels=False,cbar=False,cmap='viridis')
0. Chotsani mizati yosafunikira
df_categorical.drop(labels=["ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°1","ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°2"], axis=1, inplace=True)
1. Kodi kuchuluka kwazinthu zopanda kanthu patsambali ndi zazikulu kuposa 50%?
print(df_categorical.isnull().sum() / df_numerical.shape[0] * 100)
df_categorical.drop(labels=["ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°1","ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°2"], axis=1, inplace=True) #Π£Π΄Π°Π»ΡΠ΅ΠΌ, Π΅ΡΠ»ΠΈ ΠΊΠ°ΠΊΠ°Ρ-ΡΠΎ ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°
#ΠΈΠΌΠ΅Π΅Ρ Π±ΠΎΠ»ΡΡΠ΅ 50% ΠΏΡΡΡΡΡ
Π·Π½Π°ΡΠ΅Π½ΠΈΠΉ
2. Chotsani mizere yokhala ndi mfundo zopanda pake
df_categorical.dropna(inplace=True)#Π£Π΄Π°Π»ΡΠ΅ΠΌ ΡΡΡΠΎΡΠΊΠΈ Ρ ΠΏΡΡΡΡΠΌΠΈ Π·Π½Π°ΡΠ΅Π½ΠΈΡΠΌΠΈ,
#Π΅ΡΠ»ΠΈ ΠΏΠΎΡΠΎΠΌ ΠΎΡΡΠ°Π½Π΅ΡΡΡ Π΄ΠΎΡΡΠ°ΡΠΎΡΠ½ΠΎ Π΄Π°Π½Π½ΡΡ
Π΄Π»Ρ ΠΎΠ±ΡΡΠ΅Π½ΠΈΡ
3.1. Kuyika mtengo wachisawawa
import random
df_categorical["ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°"].fillna(lambda x: random.choice(df[df[column] != np.nan]["ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°"]), inplace=True)
3.2. Kuyika mtengo wokhazikika
from sklearn.impute import SimpleImputer
imputer = SimpleImputer(strategy='constant', fill_value="<ΠΠ°ΡΠ΅ Π·Π½Π°ΡΠ΅Π½ΠΈΠ΅ Π·Π΄Π΅ΡΡ>")
df_categorical[["Π½ΠΎΠ²Π°Ρ_ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°1",'Π½ΠΎΠ²Π°Ρ_ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°2','Π½ΠΎΠ²Π°Ρ_ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°3']] = imputer.fit_transform(df_categorical[['ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°1', 'ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°2', 'ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°3']])
df_categorical.drop(labels = ["ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°1","ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°2","ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°3"], axis = 1, inplace = True)
Kotero, ife potsiriza tiri ndi chogwirira pa nulls mu qualitative deta. Tsopano ndi nthawi yoti mupange-encoding imodzi pamikhalidwe yomwe ili munkhokwe yanu. Njirayi imagwiritsidwa ntchito nthawi zambiri kuwonetsetsa kuti ma aligorivimu anu amatha kuphunzira kuchokera kuzinthu zapamwamba kwambiri.
def encode_and_bind(original_dataframe, feature_to_encode):
dummies = pd.get_dummies(original_dataframe[[feature_to_encode]])
res = pd.concat([original_dataframe, dummies], axis=1)
res = res.drop([feature_to_encode], axis=1)
return(res)
features_to_encode = ["ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°1","ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°2","ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°3"]
for feature in features_to_encode:
df_categorical = encode_and_bind(df_categorical, feature))
Chifukwa chake, tamaliza kukonza zidziwitso zosiyana zamtundu ndi kuchuluka kwake - nthawi yophatikizanso
new_df = pd.concat([df_numerical,df_categorical], axis=1)
Titaphatikiza ma dataset pamodzi kukhala amodzi, titha kugwiritsa ntchito kusintha kwa data pogwiritsa ntchito MinMaxScaler kuchokera ku library ya sklearn. Izi zipangitsa kuti mfundo zathu zikhale pakati pa 0 ndi 1, zomwe zingathandize pophunzitsa chitsanzo mtsogolomu.
from sklearn.preprocessing import MinMaxScaler
min_max_scaler = MinMaxScaler()
new_df = min_max_scaler.fit_transform(new_df)
Deta iyi tsopano ndiyokonzeka kuchita chilichonse - ma neural network, ma aligorivimu wamba a ML, ndi zina zambiri!
M'nkhaniyi, sitinaganizirepo kugwira ntchito ndi deta yotsatizana ndi nthawi, chifukwa deta yotereyi muyenera kugwiritsa ntchito njira zosiyana siyana, kutengera ntchito yanu. M'tsogolomu, gulu lathu lidzapereka nkhani ina pamutuwu, ndipo tikuyembekeza kuti idzabweretsa zina zosangalatsa, zatsopano komanso zothandiza m'moyo wanu, monga izi.
Source: www.habr.com