Iphepha lokukopela leNotepad lokucutshungulwa kwangaphambili kweDatha

Rhoqo abantu abangena kwicandelo leNzululwazi yeDatha bangaphantsi kokulindela okusengqiqweni koko kubalindileyo. Abantu abaninzi bacinga ukuba ngoku baya kubhala iinethiwekhi ezipholileyo ze-neural, benze umncedisi welizwi ovela kwi-Iron Man, okanye babethe wonke umntu kwiimarike zemali.
Kodwa umsebenzi Iinkcukacha Isazinzulu siqhutywa yidatha, kwaye enye yezona zinto zibaluleke kakhulu kwaye zichitha ixesha kukucubungula idatha ngaphambi kokuyondla kwinethiwekhi ye-neural okanye ukuyihlalutya ngendlela ethile.

Kule nqaku, iqela lethu liza kuchaza indlela ongayenza ngayo idatha ngokukhawuleza kwaye kulula kunye nemiyalelo yesinyathelo ngesinyathelo kunye nekhowudi. Sizame ukwenza ikhowudi ibe bhetyebhetye kwaye ingasetyenziselwa iiseti zedatha ezahlukeneyo.

Iingcali ezininzi zinokungafumani nto ingaqhelekanga kweli nqaku, kodwa abaqalayo baya kukwazi ukufunda into entsha, kwaye nabani na okudala ephupha ukwenza incwadana eyahlukileyo yokucocwa kwedatha ekhawulezileyo necwangcisiweyo unokukhuphela ikhowudi kwaye ayifomethe ngokwabo, okanye khuphela incwadana egqityiweyo evela kwiGithub.

Sifumene iseti yedatha. Kufuneka wenze ntoni ngokulandelayo?

Ngoko ke, umgangatho: kufuneka siqonde into esijongene nayo, umfanekiso opheleleyo. Ukwenza oku, sisebenzisa i-pandas ukuchaza ngokulula iindidi ezahlukeneyo zedatha.

import pandas as pd #ΠΈΠΌΠΏΠΎΡ€Ρ‚ΠΈΡ€ΡƒΠ΅ΠΌ pandas
import numpy as np  #ΠΈΠΌΠΏΠΎΡ€Ρ‚ΠΈΡ€ΡƒΠ΅ΠΌ numpy
df = pd.read_csv("AB_NYC_2019.csv") #Ρ‡ΠΈΡ‚Π°Π΅ΠΌ датасСт ΠΈ записываСм Π² ΠΏΠ΅Ρ€Π΅ΠΌΠ΅Π½Π½ΡƒΡŽ df

df.head(3) #смотрим Π½Π° ΠΏΠ΅Ρ€Π²Ρ‹Π΅ 3 строчки, Ρ‡Ρ‚ΠΎΠ±Ρ‹ ΠΏΠΎΠ½ΡΡ‚ΡŒ, ΠΊΠ°ΠΊ выглядят значСния

Iphepha lokukopela leNotepad lokucutshungulwa kwangaphambili kweDatha

df.info() #ДСмонстрируСм ΠΈΠ½Ρ„ΠΎΡ€ΠΌΠ°Ρ†ΠΈΡŽ ΠΎ ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°Ρ…

Iphepha lokukopela leNotepad lokucutshungulwa kwangaphambili kweDatha

Makhe sijonge amaxabiso ekholamu:

  1. Ngaba inani lemigca kwikholamu nganye liyahambelana nenani elipheleleyo lemigca?
  2. Yintoni undoqo wedatha kwikholamu nganye?
  3. Yeyiphi ikholamu esifuna ukuyijolisa ukuze senze iingqikelelo zayo?

Iimpendulo zale mibuzo ziya kukuvumela ukuba uhlalutye iseti yedatha kwaye uzobe isicwangciso sezenzo zakho ezilandelayo.

Kwakhona, ukujonga nzulu kumaxabiso kumhlathi ngamnye, sinokusebenzisa i-pandas explain() umsebenzi. Nangona kunjalo, ukungalunganga kwalo msebenzi kukuba awuboneleli ngolwazi malunga neekholamu ezinamaxabiso omtya. Siza kujongana nabo kamva.

df.describe()

Iphepha lokukopela leNotepad lokucutshungulwa kwangaphambili kweDatha

Umbono womlingo

Makhe sijonge apho singenaxabiso kwaphela:

import seaborn as sns
sns.heatmap(df.isnull(),yticklabels=False,cbar=False,cmap='viridis')

Iphepha lokukopela leNotepad lokucutshungulwa kwangaphambili kweDatha

Oku bekujongeka okufutshane ukusuka phezulu, ngoku siza kuqhubela phambili kwizinto ezinomdla ngakumbi

Masizame ukufumana kwaye, ukuba kunokwenzeka, sisuse iikholamu ezinexabiso elinye kuphela kuyo yonke imigca (aziyi kusichaphazela isiphumo nangayiphi na indlela):

df = df[[c for c
        in list(df)
        if len(df[c].unique()) > 1]] #ΠŸΠ΅Ρ€Π΅Π·Π°ΠΏΠΈΡΡ‹Π²Π°Π΅ΠΌ датасСт, оставляя Ρ‚ΠΎΠ»ΡŒΠΊΠΎ Ρ‚Π΅ ΠΊΠΎΠ»ΠΎΠ½ΠΊΠΈ, Π² ΠΊΠΎΡ‚ΠΎΡ€Ρ‹Ρ… большС ΠΎΠ΄Π½ΠΎΠ³ΠΎ ΡƒΠ½ΠΈΠΊΠ°Π»ΡŒΠ½ΠΎΠ³ΠΎ значСния

Ngoku siyazikhusela kunye nempumelelo yeprojekthi yethu kwimigca ephindwe kabini (imigca equlethe ulwazi olufanayo ngokulandelelana njengenye yemigca ekhoyo):

df.drop_duplicates(inplace=True) #Π”Π΅Π»Π°Π΅ΠΌ это, Ссли считаСм Π½ΡƒΠΆΠ½Ρ‹ΠΌ.
                                 #Π’ Π½Π΅ΠΊΠΎΡ‚ΠΎΡ€Ρ‹Ρ… ΠΏΡ€ΠΎΠ΅ΠΊΡ‚Π°Ρ… ΡƒΠ΄Π°Π»ΡΡ‚ΡŒ Ρ‚Π°ΠΊΠΈΠ΅ Π΄Π°Π½Π½Ρ‹Π΅ с самого Π½Π°Ρ‡Π°Π»Π° Π½Π΅ стоит.

Sahlula-hlula isethi yedatha ibe zimbini: enye inamaxabiso asemgangathweni, kwaye enye ngezobungakanani

Apha kufuneka senze ingcaciso encinci: ukuba imigca eneenkcukacha ezilahlekileyo kwidatha esemgangathweni kunye nenani azihambelani kakhulu, ngoko kuya kufuneka senze isigqibo malunga nento esiyibingelelayo - yonke imigca eneenkcukacha ezilahlekileyo, inxalenye yazo kuphela, okanye iikholamu ezithile. Ukuba imigca inxibelelene, ngoko sinelungelo lonke lokwahlula isethi yedatha ibe zimbini. Kungenjalo, kuya kufuneka uqale ujongane nemigca engahambelaniyo nedatha elahlekileyo ngokomgangatho kunye nobungakanani, kwaye emva koko wahlulahlule kabini.

df_numerical = df.select_dtypes(include = [np.number])
df_categorical = df.select_dtypes(exclude = [np.number])

Senza oku ukwenza kube lula ngathi ukucubungula ezi ntlobo zibini zedatha - kamva siya kuqonda ukuba kulula kangakanani oku ukwenza ubomi bethu.

Sisebenza ngedatha yobuninzi

Into yokuqala ekufuneka siyenzile kukuqinisekisa ukuba kukho "iikholamu zokuhlola" kwidatha yobungakanani. Sibiza ezi kholamu kuba zizibonakalisa njengedatha yobuninzi, kodwa zisebenza njengedatha esemgangathweni.

Sinokubazi njani? Ngokuqinisekileyo, konke kuxhomekeke kubume bedatha oyihlalutyayo, kodwa ngokubanzi iikholamu ezinjalo zinokuba nedatha ekhethekileyo (kwindawo ye-3-10 yamaxabiso ahlukeneyo).

print(df_numerical.nunique())

Sakuba sichonge iikholamu zentlola, siya kuzisusa ukusuka kwidatha yobungakanani ukuya kwidatha esemgangathweni:

spy_columns = df_numerical[['ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°1', 'ΠΊΠΎΠ»ΠΎΠΊΠ°2', 'ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°3']]#выдСляСм ΠΊΠΎΠ»ΠΎΠ½ΠΊΠΈ-ΡˆΠΏΠΈΠΎΠ½Ρ‹ ΠΈ записываСм Π² ΠΎΡ‚Π΄Π΅Π»ΡŒΠ½ΡƒΡŽ dataframe
df_numerical.drop(labels=['ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°1', 'ΠΊΠΎΠ»ΠΎΠΊΠ°2', 'ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°3'], axis=1, inplace = True)#Π²Ρ‹Ρ€Π΅Π·Π°Π΅ΠΌ эти ΠΊΠΎΠ»ΠΎΠ½ΠΊΠΈ ΠΈΠ· количСствСнных Π΄Π°Π½Π½Ρ‹Ρ…
df_categorical.insert(1, 'ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°1', spy_columns['ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°1']) #добавляСм ΠΏΠ΅Ρ€Π²ΡƒΡŽ ΠΊΠΎΠ»ΠΎΠ½ΠΊΡƒ-шпион Π² качСствСнныС Π΄Π°Π½Π½Ρ‹Π΅
df_categorical.insert(1, 'ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°2', spy_columns['ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°2']) #добавляСм Π²Ρ‚ΠΎΡ€ΡƒΡŽ ΠΊΠΎΠ»ΠΎΠ½ΠΊΡƒ-шпион Π² качСствСнныС Π΄Π°Π½Π½Ρ‹Π΅
df_categorical.insert(1, 'ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°3', spy_columns['ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°3']) #добавляСм Ρ‚Ρ€Π΅Ρ‚ΡŒΡŽ ΠΊΠΎΠ»ΠΎΠ½ΠΊΡƒ-шпион Π² качСствСнныС Π΄Π°Π½Π½Ρ‹Π΅

Ekugqibeleni, siye sahlula ngokupheleleyo idatha yobungakanani ukusuka kwidatha esemgangathweni kwaye ngoku sinokusebenza nayo ngokufanelekileyo. Into yokuqala kukuqonda apho sinexabiso elingenanto (NaN, kwaye kwezinye iimeko u-0 uya kwamkelwa njengamaxabiso angenanto).

for i in df_numerical.columns:
    print(i, df[i][df[i]==0].count())

Ngeli nqanaba, kubalulekile ukuqonda ukuba zeziphi iikholomu zero zingabonisa amaxabiso alahlekileyo: oku kungenxa yendlela idatha eqokelelwe ngayo? Okanye ngaba inokunxulumana namaxabiso edatha? Le mibuzo mayiphendulwe ngokwemeko nganye.

Ke, ukuba sisathatha isigqibo sokuba silahlekile idatha apho kukho ooziro, kufuneka sibuyisele ooziro ngeNaN ukuze kube lula ukusebenza ngale datha ilahlekileyo kamva:

df_numerical[["ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ° 1", "ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ° 2"]] = df_numerical[["ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ° 1", "ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ° 2"]].replace(0, nan)

Ngoku makhe sibone apho siphosa khona idatha:

sns.heatmap(df_numerical.isnull(),yticklabels=False,cbar=False,cmap='viridis') # МоТно Ρ‚Π°ΠΊΠΆΠ΅ Π²ΠΎΡΠΏΠΎΠ»ΡŒΠ·ΠΎΠ²Π°Ρ‚ΡŒΡΡ df_numerical.info()

Iphepha lokukopela leNotepad lokucutshungulwa kwangaphambili kweDatha

Apha loo maxabiso angaphakathi kwiikholamu ezingekhoyo kufuneka iphawulwe ngomthubi. Kwaye ngoku ulonwabo luqala - indlela yokujongana nale milinganiselo? Ngaba kufuneka ndiyicime imiqolo enala maxabiso okanye iikholamu? Okanye ugcwalise la maxabiso angenanto kunye namanye?

Nanku umzobo oqikelelweyo onokukunceda wenze isigqibo malunga nokuba yintoni na, ngokwemigaqo, enokwenziwa ngamaxabiso angenanto:

Iphepha lokukopela leNotepad lokucutshungulwa kwangaphambili kweDatha

0. Susa iikholamu ezingeyomfuneko

df_numerical.drop(labels=["ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°1","ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°2"], axis=1, inplace=True)

1. Ngaba inani lamaxabiso angenanto kule kholamu lingaphezulu kwama-50%?

print(df_numerical.isnull().sum() / df_numerical.shape[0] * 100)

df_numerical.drop(labels=["ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°1","ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°2"], axis=1, inplace=True)#УдаляСм, Ссли какая-Ρ‚ΠΎ ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ° ΠΈΠΌΠ΅Π΅Ρ‚ большС 50 пустых Π·Π½Π°Ρ‡Π΅Π½ΠΈΠΉ

2. Cima imigca enamaxabiso angenanto

df_numerical.dropna(inplace=True)#УдаляСм строчки с пустыми значСниями, Ссли ΠΏΠΎΡ‚ΠΎΠΌ останСтся достаточно Π΄Π°Π½Π½Ρ‹Ρ… для обучСния

3.1. Ukufaka ixabiso elingalindelekanga

import random #ΠΈΠΌΠΏΠΎΡ€Ρ‚ΠΈΡ€ΡƒΠ΅ΠΌ random
df_numerical["ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°"].fillna(lambda x: random.choice(df[df[column] != np.nan]["ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°"]), inplace=True) #вставляСм Ρ€Π°Π½Π΄ΠΎΠΌΠ½Ρ‹Π΅ значСния Π² пустыС ΠΊΠ»Π΅Ρ‚ΠΊΠΈ Ρ‚Π°Π±Π»ΠΈΡ†Ρ‹

3.2. Ukufaka ixabiso elingaguqukiyo

from sklearn.impute import SimpleImputer #ΠΈΠΌΠΏΠΎΡ€Ρ‚ΠΈΡ€ΡƒΠ΅ΠΌ SimpleImputer, ΠΊΠΎΡ‚ΠΎΡ€Ρ‹ΠΉ ΠΏΠΎΠΌΠΎΠΆΠ΅Ρ‚ Π²ΡΡ‚Π°Π²ΠΈΡ‚ΡŒ значСния
imputer = SimpleImputer(strategy='constant', fill_value="<Π’Π°ΡˆΠ΅ Π·Π½Π°Ρ‡Π΅Π½ΠΈΠ΅ здСсь>") #вставляСм ΠΎΠΏΡ€Π΅Π΄Π΅Π»Π΅Π½Π½ΠΎΠ΅ Π·Π½Π°Ρ‡Π΅Π½ΠΈΠ΅ с ΠΏΠΎΠΌΠΎΡ‰ΡŒΡŽ SimpleImputer
df_numerical[["новая_ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°1",'новая_ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°2','новая_ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°3']] = imputer.fit_transform(df_numerical[['ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°1', 'ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°2', 'ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°3']]) #ΠŸΡ€ΠΈΠΌΠ΅Π½ΡΠ΅ΠΌ это для нашСй Ρ‚Π°Π±Π»ΠΈΡ†Ρ‹
df_numerical.drop(labels = ["ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°1","ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°2","ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°3"], axis = 1, inplace = True) #Π£Π±ΠΈΡ€Π°Π΅ΠΌ ΠΊΠΎΠ»ΠΎΠ½ΠΊΠΈ со старыми значСниями

3.3. Faka umndilili okanye ixabiso eliqhelekileyo

from sklearn.impute import SimpleImputer #ΠΈΠΌΠΏΠΎΡ€Ρ‚ΠΈΡ€ΡƒΠ΅ΠΌ SimpleImputer, ΠΊΠΎΡ‚ΠΎΡ€Ρ‹ΠΉ ΠΏΠΎΠΌΠΎΠΆΠ΅Ρ‚ Π²ΡΡ‚Π°Π²ΠΈΡ‚ΡŒ значСния
imputer = SimpleImputer(strategy='mean', missing_values = np.nan) #вмСсто mean ΠΌΠΎΠΆΠ½ΠΎ Ρ‚Π°ΠΊΠΆΠ΅ ΠΈΡΠΏΠΎΠ»ΡŒΠ·ΠΎΠ²Π°Ρ‚ΡŒ most_frequent
df_numerical[["новая_ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°1",'новая_ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°2','новая_ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°3']] = imputer.fit_transform(df_numerical[['ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°1', 'ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°2', 'ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°3']]) #ΠŸΡ€ΠΈΠΌΠ΅Π½ΡΠ΅ΠΌ это для нашСй Ρ‚Π°Π±Π»ΠΈΡ†Ρ‹
df_numerical.drop(labels = ["ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°1","ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°2","ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°3"], axis = 1, inplace = True) #Π£Π±ΠΈΡ€Π°Π΅ΠΌ ΠΊΠΎΠ»ΠΎΠ½ΠΊΠΈ со старыми значСниями

3.4. Faka ixabiso elibalwe yenye imodeli

Ngamanye amaxesha amaxabiso anokubalwa kusetyenziswa imifuziselo yohlengahlengiso kusetyenziswa imifuziselo esuka kwithala leencwadi le-sklearn okanye amanye amathala eencwadi afanayo. Iqela lethu liza kunikezela ngenqaku elahlukileyo lokuba oku kunokwenziwa njani kwixesha elizayo elingekude.

Ke, okwangoku, ingxelo malunga nedatha yobungakanani iya kuphazamiseka, kuba kukho ezinye izinto ezininzi malunga nendlela yokwenza ngcono ukulungiswa kwedatha kunye nokulungiswa kwangaphambili kwemisebenzi eyahlukeneyo, kunye nezinto ezisisiseko zedatha yobungakanani zithathelwe ingqalelo kweli nqaku, kwaye ngoku lixesha lokubuyela kwidatha esemgangathweni esiye sahlula amanyathelo amaninzi ukusuka kwinani. Ungayitshintsha le ncwadana yamanqaku njengoko uthanda, uyilungelelanise kwimisebenzi eyahlukeneyo, ukuze ukucubungula idatha kuhambe ngokukhawuleza!

Idatha esemgangathweni

Ngokusisiseko, kwidatha esemgangathweni, i-One-hot-encoding method isetyenziswa ukwenzela ukuyifomatha ukusuka kumtya (okanye into) ukuya kwinani. Phambi kokudlulela kweli nqanaba, masisebenzise umzobo kunye nekhowudi engentla ukujongana namaxabiso angenanto.

df_categorical.nunique()

sns.heatmap(df_categorical.isnull(),yticklabels=False,cbar=False,cmap='viridis')

Iphepha lokukopela leNotepad lokucutshungulwa kwangaphambili kweDatha

0. Susa iikholamu ezingeyomfuneko

df_categorical.drop(labels=["ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°1","ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°2"], axis=1, inplace=True)

1. Ngaba inani lamaxabiso angenanto kule kholamu lingaphezulu kwama-50%?

print(df_categorical.isnull().sum() / df_numerical.shape[0] * 100)

df_categorical.drop(labels=["ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°1","ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°2"], axis=1, inplace=True) #УдаляСм, Ссли какая-Ρ‚ΠΎ ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ° 
                                                                          #ΠΈΠΌΠ΅Π΅Ρ‚ большС 50% пустых Π·Π½Π°Ρ‡Π΅Π½ΠΈΠΉ

2. Cima imigca enamaxabiso angenanto

df_categorical.dropna(inplace=True)#УдаляСм строчки с пустыми значСниями, 
                                   #Ссли ΠΏΠΎΡ‚ΠΎΠΌ останСтся достаточно Π΄Π°Π½Π½Ρ‹Ρ… для обучСния

3.1. Ukufaka ixabiso elingalindelekanga

import random
df_categorical["ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°"].fillna(lambda x: random.choice(df[df[column] != np.nan]["ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°"]), inplace=True)

3.2. Ukufaka ixabiso elingaguqukiyo

from sklearn.impute import SimpleImputer
imputer = SimpleImputer(strategy='constant', fill_value="<Π’Π°ΡˆΠ΅ Π·Π½Π°Ρ‡Π΅Π½ΠΈΠ΅ здСсь>")
df_categorical[["новая_колонка1",'новая_колонка2','новая_колонка3']] = imputer.fit_transform(df_categorical[['колонка1', 'колонка2', 'колонка3']])
df_categorical.drop(labels = ["ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°1","ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°2","ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°3"], axis = 1, inplace = True)

Ke, ekugqibeleni sifumene umqheba kwii-nulls kwidatha esemgangathweni. Ngoku lixesha lokwenza i-encoding enye-eshushu kumaxabiso akwidatabase yakho. Le ndlela isetyenziswa rhoqo ukuqinisekisa ukuba i-algorithm yakho inokufunda kwidatha ekumgangatho ophezulu.

def encode_and_bind(original_dataframe, feature_to_encode):
    dummies = pd.get_dummies(original_dataframe[[feature_to_encode]])
    res = pd.concat([original_dataframe, dummies], axis=1)
    res = res.drop([feature_to_encode], axis=1)
    return(res)

features_to_encode = ["ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°1","ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°2","ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°3"]
for feature in features_to_encode:
    df_categorical = encode_and_bind(df_categorical, feature))

Ke, ekugqibeleni sigqibile ukusetyenzwa ngokwahlukeneyo ngokomgangatho kunye nedatha yobungakanani-ixesha lokuzidibanisa kwakhona

new_df = pd.concat([df_numerical,df_categorical], axis=1)

Emva kokuba sidibanise iiseti zedatha zibe nye, ekugqibeleni sinokusebenzisa ukuguqulwa kwedatha usebenzisa i-MinMaxScaler kwilayibrari ye-sklearn. Oku kuya kwenza amaxabiso ethu phakathi kwe-0 kunye ne-1, eya kunceda xa siqeqesha imodeli kwixesha elizayo.

from sklearn.preprocessing import MinMaxScaler
min_max_scaler = MinMaxScaler()
new_df = min_max_scaler.fit_transform(new_df)

Le datha ngoku ilungele nantoni na - i-neural networks, standard ML algorithms, njl.!

Kweli nqaku, asikhange sithathele ngqalelo ukusebenza ngedatha yedatha, kuba kwidatha enjalo kufuneka usebenzise ubuchule bokucwangcisa obahluke kancinci, kuxhomekeke kumsebenzi wakho. Kwixesha elizayo, iqela lethu liya kunikela inqaku elahlukileyo kwesi sihloko, kwaye siyathemba ukuba iya kuba nako ukuzisa into enomdla, entsha kunye luncedo ebomini bakho, njengale.

umthombo: www.habr.com

Yongeza izimvo