Ishidi lokukopela lenothiphedi lokucubungula kusengaphambili kwedatha

Imvamisa abantu abangena emkhakheni we-Data Science banamathemba angaphansi kwamaqiniso alokho okubalindile. Abantu abaningi bacabanga ukuthi manje bazobhala amanethiwekhi apholile we-neural, benze umsizi wezwi ovela ku-Iron Man, noma bashaye wonke umuntu ezimakethe zezimali.
Kodwa sebenza Idatha Usosayensi uqhutshwa yidatha, futhi enye yezinto ezibaluleke kakhulu nezidla isikhathi ukucubungula idatha ngaphambi kokuyiphakela kunethiwekhi ye-neural noma ukuyihlaziya ngendlela ethile.

Kulesi sihloko, ithimba lethu lizochaza ukuthi ungayicubungula kanjani idatha ngokushesha futhi kalula ngemiyalo yesinyathelo nesinyathelo kanye nekhodi. Sizame ukwenza ikhodi ivumelane nezimo futhi ingase isetshenziselwe amasethi edatha ahlukene.

Ochwepheshe abaningi bangase bangatholi lutho olungavamile kulesi sihloko, kodwa abaqalayo bazokwazi ukufunda okuthile okusha, futhi noma ubani osenesikhathi eside ephupha ukwenza incwajana ehlukile yokucubungula idatha esheshayo nehlelekile angakopisha ikhodi futhi azifomethe yona, noma landa incwajana eqediwe ku-Github.

Sithole idathasethi. Yini okufanele uyenze ngokulandelayo?

Ngakho-ke, indinganiso: sidinga ukuqonda ukuthi sibhekene nani, isithombe sisonke. Ukwenza lokhu, sisebenzisa ama-panda ukuze sivele sichaze izinhlobo ezahlukene zedatha.

import pandas as pd #ΠΈΠΌΠΏΠΎΡ€Ρ‚ΠΈΡ€ΡƒΠ΅ΠΌ pandas
import numpy as np  #ΠΈΠΌΠΏΠΎΡ€Ρ‚ΠΈΡ€ΡƒΠ΅ΠΌ numpy
df = pd.read_csv("AB_NYC_2019.csv") #Ρ‡ΠΈΡ‚Π°Π΅ΠΌ датасСт ΠΈ записываСм Π² ΠΏΠ΅Ρ€Π΅ΠΌΠ΅Π½Π½ΡƒΡŽ df

df.head(3) #смотрим Π½Π° ΠΏΠ΅Ρ€Π²Ρ‹Π΅ 3 строчки, Ρ‡Ρ‚ΠΎΠ±Ρ‹ ΠΏΠΎΠ½ΡΡ‚ΡŒ, ΠΊΠ°ΠΊ выглядят значСния

Ishidi lokukopela lenothiphedi lokucubungula kusengaphambili kwedatha

df.info() #ДСмонстрируСм ΠΈΠ½Ρ„ΠΎΡ€ΠΌΠ°Ρ†ΠΈΡŽ ΠΎ ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°Ρ…

Ishidi lokukopela lenothiphedi lokucubungula kusengaphambili kwedatha

Ake sibheke amanani ekholomu:

  1. Ingabe inombolo yemigqa kukholomu ngayinye ihambisana nenani eliphelele lemigqa?
  2. Iyini ingqikithi yedatha kukholomu ngayinye?
  3. Iyiphi ikholomu esifuna ukuyikhomba ukuze senze izibikezelo zayo?

Izimpendulo zale mibuzo zizokuvumela ukuthi uhlaziye idathasethi bese udweba cishe uhlelo lwezenzo zakho ezilandelayo.

Futhi, ukuze sibheke ngokujulile amanani kukholomu ngayinye, singasebenzisa umsebenzi we-pandas explain(). Nokho, okubi kwalo msebenzi ukuthi awunikezi ulwazi mayelana namakholomu anamanani eyunithi yezinhlamvu. Sizobhekana nazo ngokuhamba kwesikhathi.

df.describe()

Ishidi lokukopela lenothiphedi lokucubungula kusengaphambili kwedatha

Ukubona ngomlingo

Ake sibheke lapho singenawo nhlobo amanani:

import seaborn as sns
sns.heatmap(df.isnull(),yticklabels=False,cbar=False,cmap='viridis')

Ishidi lokukopela lenothiphedi lokucubungula kusengaphambili kwedatha

Lokhu bekuwukubukeka okufushane okuvela phezulu, manje sizodlulela ezintweni ezithakazelisayo kakhulu

Ake sizame ukuthola futhi, uma kungenzeka, sisuse amakholomu anenani elilodwa kuphela kuyo yonke imigqa (ngeke athinte umphumela nganoma iyiphi indlela):

df = df[[c for c
        in list(df)
        if len(df[c].unique()) > 1]] #ΠŸΠ΅Ρ€Π΅Π·Π°ΠΏΠΈΡΡ‹Π²Π°Π΅ΠΌ датасСт, оставляя Ρ‚ΠΎΠ»ΡŒΠΊΠΎ Ρ‚Π΅ ΠΊΠΎΠ»ΠΎΠ½ΠΊΠΈ, Π² ΠΊΠΎΡ‚ΠΎΡ€Ρ‹Ρ… большС ΠΎΠ΄Π½ΠΎΠ³ΠΎ ΡƒΠ½ΠΈΠΊΠ°Π»ΡŒΠ½ΠΎΠ³ΠΎ значСния

Manje siyazivikela kanye nempumelelo yephrojekthi yethu emigqeni eyimpinda (imigqa equkethe ulwazi olufanayo ngokulandelana okufanayo nomunye wemigqa ekhona):

df.drop_duplicates(inplace=True) #Π”Π΅Π»Π°Π΅ΠΌ это, Ссли считаСм Π½ΡƒΠΆΠ½Ρ‹ΠΌ.
                                 #Π’ Π½Π΅ΠΊΠΎΡ‚ΠΎΡ€Ρ‹Ρ… ΠΏΡ€ΠΎΠ΅ΠΊΡ‚Π°Ρ… ΡƒΠ΄Π°Π»ΡΡ‚ΡŒ Ρ‚Π°ΠΊΠΈΠ΅ Π΄Π°Π½Π½Ρ‹Π΅ с самого Π½Π°Ρ‡Π°Π»Π° Π½Π΅ стоит.

Sihlukanisa idathasethi ibe kabili: eyodwa enamanani ekhwalithi, futhi enye ngamanani

Lapha sidinga ukucacisa okuncane: uma imigqa enedatha engekho kudatha yekhwalithi nenani ingahlobene kakhulu, khona-ke kuzodingeka sinqume ukuthi yini esiyidelayo - yonke imigqa enedatha engekho, ingxenye yayo kuphela, noma amakholomu athile. Uma imigqa ihlotshaniswa, khona-ke sinelungelo lokuhlukanisa idathasethi ibe kabili. Uma kungenjalo, uzodinga kuqala ukubhekana nemigqa engahlobanisi idatha elahlekile ngekhwalithi nenani, bese kuphela uhlukanisa idathasethi ibe kabili.

df_numerical = df.select_dtypes(include = [np.number])
df_categorical = df.select_dtypes(exclude = [np.number])

Senza lokhu ukuze kube lula ngathi ukucubungula lezi zinhlobo ezimbili ezahlukene zedatha - kamuva sizoqonda ukuthi lokhu kwenza impilo yethu ibe lula kangakanani.

Sisebenza ngedatha yobuningi

Into yokuqala okufanele siyenze ukunquma ukuthi akhona yini β€œamakholomu ezinhloli” kudatha yobuningi. Lawa makholomu siwabiza kanjalo ngoba azethula njengedatha yobuningi, kodwa asebenza njengedatha yekhwalithi.

Singababona kanjani? Yiqiniso, konke kuncike kumvelo yedatha oyihlaziyayo, kodwa ngokuvamile amakholomu anjalo angase abe nedatha encane eyingqayizivele (esifundeni samanani ayingqayizivele angu-3-10).

print(df_numerical.nunique())

Uma sesihlonze amakholomu ezinhloli, sizowasusa kudatha yobuningi siye kudatha yekhwalithi:

spy_columns = df_numerical[['ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°1', 'ΠΊΠΎΠ»ΠΎΠΊΠ°2', 'ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°3']]#выдСляСм ΠΊΠΎΠ»ΠΎΠ½ΠΊΠΈ-ΡˆΠΏΠΈΠΎΠ½Ρ‹ ΠΈ записываСм Π² ΠΎΡ‚Π΄Π΅Π»ΡŒΠ½ΡƒΡŽ dataframe
df_numerical.drop(labels=['ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°1', 'ΠΊΠΎΠ»ΠΎΠΊΠ°2', 'ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°3'], axis=1, inplace = True)#Π²Ρ‹Ρ€Π΅Π·Π°Π΅ΠΌ эти ΠΊΠΎΠ»ΠΎΠ½ΠΊΠΈ ΠΈΠ· количСствСнных Π΄Π°Π½Π½Ρ‹Ρ…
df_categorical.insert(1, 'ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°1', spy_columns['ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°1']) #добавляСм ΠΏΠ΅Ρ€Π²ΡƒΡŽ ΠΊΠΎΠ»ΠΎΠ½ΠΊΡƒ-шпион Π² качСствСнныС Π΄Π°Π½Π½Ρ‹Π΅
df_categorical.insert(1, 'ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°2', spy_columns['ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°2']) #добавляСм Π²Ρ‚ΠΎΡ€ΡƒΡŽ ΠΊΠΎΠ»ΠΎΠ½ΠΊΡƒ-шпион Π² качСствСнныС Π΄Π°Π½Π½Ρ‹Π΅
df_categorical.insert(1, 'ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°3', spy_columns['ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°3']) #добавляСм Ρ‚Ρ€Π΅Ρ‚ΡŒΡŽ ΠΊΠΎΠ»ΠΎΠ½ΠΊΡƒ-шпион Π² качСствСнныС Π΄Π°Π½Π½Ρ‹Π΅

Okokugcina, siyihlukanise ngokuphelele idatha yobuningi kudatha yekhwalithi futhi manje singasebenza nayo ngendlela efanele. Into yokuqala ukuqonda lapho sinamanani angenalutho (NaN, futhi kwezinye izimo u-0 uzokwamukelwa njengamanani angenalutho).

for i in df_numerical.columns:
    print(i, df[i][df[i]==0].count())

Kuleli qophelo, kubalulekile ukuqonda ukuthi yimaphi amakholomu oziro abangabonisa amanani angekho: ingabe lokhu kungenxa yokuthi idatha iqoqwe kanjani? Noma ingabe ihlobene namanani edatha? Le mibuzo kufanele iphendulwe ecaleni ngalinye.

Ngakho-ke, uma sisanquma ukuthi kungenzeka silahlekelwe yidatha lapho kunoziro, kufanele simiselele oziro sifake i-NaN ukuze kube lula ukusebenza ngale datha elahlekile kamuva:

df_numerical[["ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ° 1", "ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ° 2"]] = df_numerical[["ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ° 1", "ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ° 2"]].replace(0, nan)

Manje ake sibone lapho sishoda khona idatha:

sns.heatmap(df_numerical.isnull(),yticklabels=False,cbar=False,cmap='viridis') # МоТно Ρ‚Π°ΠΊΠΆΠ΅ Π²ΠΎΡΠΏΠΎΠ»ΡŒΠ·ΠΎΠ²Π°Ρ‚ΡŒΡΡ df_numerical.info()

Ishidi lokukopela lenothiphedi lokucubungula kusengaphambili kwedatha

Lapha lawo manani angaphakathi kwamakholomu angekho kufanele amakwe ngokuphuzi. Futhi manje ubumnandi buqala - kanjani ukubhekana nalezi zindinganiso? Ingabe kufanele ngisuse imigqa ngalawa manani noma amakholomu? Noma ugcwalise la manani angenalutho namanye?

Nawu umdwebo olinganiselwe ongakusiza ukuthi unqume ukuthi yini engenziwa ngamavelu angenalutho:

Ishidi lokukopela lenothiphedi lokucubungula kusengaphambili kwedatha

0. Susa amakholomu angadingekile

df_numerical.drop(labels=["ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°1","ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°2"], axis=1, inplace=True)

1. Ingabe inombolo yamanani angenalutho kule kholomu ingaphezu kuka-50%?

print(df_numerical.isnull().sum() / df_numerical.shape[0] * 100)

df_numerical.drop(labels=["ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°1","ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°2"], axis=1, inplace=True)#УдаляСм, Ссли какая-Ρ‚ΠΎ ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ° ΠΈΠΌΠ΅Π΅Ρ‚ большС 50 пустых Π·Π½Π°Ρ‡Π΅Π½ΠΈΠΉ

2. Susa imigqa enamanani angenalutho

df_numerical.dropna(inplace=True)#УдаляСм строчки с пустыми значСниями, Ссли ΠΏΠΎΡ‚ΠΎΠΌ останСтся достаточно Π΄Π°Π½Π½Ρ‹Ρ… для обучСния

3.1. Ukufaka inani elingahleliwe

import random #ΠΈΠΌΠΏΠΎΡ€Ρ‚ΠΈΡ€ΡƒΠ΅ΠΌ random
df_numerical["ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°"].fillna(lambda x: random.choice(df[df[column] != np.nan]["ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°"]), inplace=True) #вставляСм Ρ€Π°Π½Π΄ΠΎΠΌΠ½Ρ‹Π΅ значСния Π² пустыС ΠΊΠ»Π΅Ρ‚ΠΊΠΈ Ρ‚Π°Π±Π»ΠΈΡ†Ρ‹

3.2. Ukufaka inani elingaguquki

from sklearn.impute import SimpleImputer #ΠΈΠΌΠΏΠΎΡ€Ρ‚ΠΈΡ€ΡƒΠ΅ΠΌ SimpleImputer, ΠΊΠΎΡ‚ΠΎΡ€Ρ‹ΠΉ ΠΏΠΎΠΌΠΎΠΆΠ΅Ρ‚ Π²ΡΡ‚Π°Π²ΠΈΡ‚ΡŒ значСния
imputer = SimpleImputer(strategy='constant', fill_value="<Π’Π°ΡˆΠ΅ Π·Π½Π°Ρ‡Π΅Π½ΠΈΠ΅ здСсь>") #вставляСм ΠΎΠΏΡ€Π΅Π΄Π΅Π»Π΅Π½Π½ΠΎΠ΅ Π·Π½Π°Ρ‡Π΅Π½ΠΈΠ΅ с ΠΏΠΎΠΌΠΎΡ‰ΡŒΡŽ SimpleImputer
df_numerical[["новая_ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°1",'новая_ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°2','новая_ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°3']] = imputer.fit_transform(df_numerical[['ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°1', 'ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°2', 'ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°3']]) #ΠŸΡ€ΠΈΠΌΠ΅Π½ΡΠ΅ΠΌ это для нашСй Ρ‚Π°Π±Π»ΠΈΡ†Ρ‹
df_numerical.drop(labels = ["ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°1","ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°2","ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°3"], axis = 1, inplace = True) #Π£Π±ΠΈΡ€Π°Π΅ΠΌ ΠΊΠΎΠ»ΠΎΠ½ΠΊΠΈ со старыми значСниями

3.3. Faka inani elimaphakathi noma elivame kakhulu

from sklearn.impute import SimpleImputer #ΠΈΠΌΠΏΠΎΡ€Ρ‚ΠΈΡ€ΡƒΠ΅ΠΌ SimpleImputer, ΠΊΠΎΡ‚ΠΎΡ€Ρ‹ΠΉ ΠΏΠΎΠΌΠΎΠΆΠ΅Ρ‚ Π²ΡΡ‚Π°Π²ΠΈΡ‚ΡŒ значСния
imputer = SimpleImputer(strategy='mean', missing_values = np.nan) #вмСсто mean ΠΌΠΎΠΆΠ½ΠΎ Ρ‚Π°ΠΊΠΆΠ΅ ΠΈΡΠΏΠΎΠ»ΡŒΠ·ΠΎΠ²Π°Ρ‚ΡŒ most_frequent
df_numerical[["новая_ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°1",'новая_ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°2','новая_ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°3']] = imputer.fit_transform(df_numerical[['ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°1', 'ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°2', 'ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°3']]) #ΠŸΡ€ΠΈΠΌΠ΅Π½ΡΠ΅ΠΌ это для нашСй Ρ‚Π°Π±Π»ΠΈΡ†Ρ‹
df_numerical.drop(labels = ["ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°1","ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°2","ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°3"], axis = 1, inplace = True) #Π£Π±ΠΈΡ€Π°Π΅ΠΌ ΠΊΠΎΠ»ΠΎΠ½ΠΊΠΈ со старыми значСниями

3.4. Faka inani elibalwe ngenye imodeli

Kwesinye isikhathi amanani angabalwa kusetshenziswa amamodeli wokuhlehla kusetshenziswa amamodeli asuka kulabhulali ye-sklearn noma eminye imitapo yolwazi efanayo. Ithimba lethu lizonikela ngendatshana ehlukile yokuthi lokhu kungenziwa kanjani esikhathini esizayo esiseduze.

Ngakho-ke, okwamanje, ukulandisa mayelana nedatha yobuningi kuzophazamiseka, ngoba kukhona amanye ama-nuances amaningi mayelana nendlela yokwenza kangcono ukulungiselelwa kwedatha nokucubungula ngaphambili kwemisebenzi ehlukene, futhi izinto eziyisisekelo zedatha yobuningi zicatshangelwe kulesi sihloko, futhi manje yisikhathi sokubuyela kudatha yekhwalithi.esihlukanise izinyathelo ezimbalwa emuva kwenani. Ungashintsha le notebook ngokuthanda kwakho, uyivumelanise nemisebenzi eyahlukene, ukuze ukucubungula idatha kuhambe ngokushesha okukhulu!

Idatha yekhwalithi

Ngokuyisisekelo, ngedatha yekhwalithi, indlela ye-One-hot-encoding isetshenziswa ukuze ifomethwe isuka kuyunithi yezinhlamvu (noma into) iye enombolweni. Ngaphambi kokudlulela kuleli phuzu, masisebenzise umdwebo nekhodi engenhla ukuze sibhekane namanani angenalutho.

df_categorical.nunique()

sns.heatmap(df_categorical.isnull(),yticklabels=False,cbar=False,cmap='viridis')

Ishidi lokukopela lenothiphedi lokucubungula kusengaphambili kwedatha

0. Susa amakholomu angadingekile

df_categorical.drop(labels=["ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°1","ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°2"], axis=1, inplace=True)

1. Ingabe inombolo yamanani angenalutho kule kholomu ingaphezu kuka-50%?

print(df_categorical.isnull().sum() / df_numerical.shape[0] * 100)

df_categorical.drop(labels=["ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°1","ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°2"], axis=1, inplace=True) #УдаляСм, Ссли какая-Ρ‚ΠΎ ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ° 
                                                                          #ΠΈΠΌΠ΅Π΅Ρ‚ большС 50% пустых Π·Π½Π°Ρ‡Π΅Π½ΠΈΠΉ

2. Susa imigqa enamanani angenalutho

df_categorical.dropna(inplace=True)#УдаляСм строчки с пустыми значСниями, 
                                   #Ссли ΠΏΠΎΡ‚ΠΎΠΌ останСтся достаточно Π΄Π°Π½Π½Ρ‹Ρ… для обучСния

3.1. Ukufaka inani elingahleliwe

import random
df_categorical["ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°"].fillna(lambda x: random.choice(df[df[column] != np.nan]["ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°"]), inplace=True)

3.2. Ukufaka inani elingaguquki

from sklearn.impute import SimpleImputer
imputer = SimpleImputer(strategy='constant', fill_value="<Π’Π°ΡˆΠ΅ Π·Π½Π°Ρ‡Π΅Π½ΠΈΠ΅ здСсь>")
df_categorical[["новая_колонка1",'новая_колонка2','новая_колонка3']] = imputer.fit_transform(df_categorical[['колонка1', 'колонка2', 'колонка3']])
df_categorical.drop(labels = ["ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°1","ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°2","ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°3"], axis = 1, inplace = True)

Ngakho-ke, ekugcineni sithole isibambo sama-nulls kudatha yekhwalithi. Manje sekuyisikhathi sokufaka ikhodi eyodwa-okushisayo kumanani akusizindalwazi sakho. Le ndlela ivame ukusetshenziswa kakhulu ukuqinisekisa ukuthi i-algorithm yakho ingafunda kudatha yekhwalithi ephezulu.

def encode_and_bind(original_dataframe, feature_to_encode):
    dummies = pd.get_dummies(original_dataframe[[feature_to_encode]])
    res = pd.concat([original_dataframe, dummies], axis=1)
    res = res.drop([feature_to_encode], axis=1)
    return(res)

features_to_encode = ["ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°1","ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°2","ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°3"]
for feature in features_to_encode:
    df_categorical = encode_and_bind(df_categorical, feature))

Ngakho-ke, ekugcineni sesiqedile ukucubungula idatha ehlukene yekhwalithi nenani - isikhathi sokuyihlanganisa futhi

new_df = pd.concat([df_numerical,df_categorical], axis=1)

Ngemuva kokuthi sihlanganise amasethi edatha ndawonye abe yinye, ekugcineni singasebenzisa ukuguqulwa kwedatha sisebenzisa i-MinMaxScaler kusuka kumtapo wezincwadi we-sklearn. Lokhu kuzokwenza amanani ethu abe phakathi kuka-0 no-1, okuzosiza lapho siqeqesha imodeli esikhathini esizayo.

from sklearn.preprocessing import MinMaxScaler
min_max_scaler = MinMaxScaler()
new_df = min_max_scaler.fit_transform(new_df)

Le datha manje isilungele noma yini - amanethiwekhi e-neural, ama-algorithms ajwayelekile e-ML, njll.!

Kulesi sihloko, asizange sikucabangele ukusebenza nedatha yochungechunge lwesikhathi, ngoba kudatha enjalo kufanele usebenzise amasu okucubungula ahluke kancane, kuye ngomsebenzi wakho. Ngokuzayo, ithimba lethu lizonikela ngesihloko esihlukile kulesi sihloko, futhi sithemba ukuthi lizokwazi ukuletha okuthile okuthakazelisayo, okusha nokuwusizo empilweni yakho, njengalena.

Source: www.habr.com

Engeza amazwana