Leqephe la ho qhekella la Notepad bakeng sa ho lokisa data kapele

Hangata batho ba kenang tΕ‘imong ea Data Science ba na le litebello tse fokolang ho feta tse ba letetseng. Batho ba bangata ba nahana hore joale ba tla ngola marang-rang a pholileng a methapo ea kutlo, ba thehe mothusi oa lentsoe ho tsoa ho Iron Man, kapa ba otla e mong le e mong limmarakeng tsa lichelete.
Empa sebetsa Lintlha Scientist e tsamaisoa ke data, 'me e' ngoe ea lintlha tsa bohlokoa ka ho fetisisa le tse senyang nako ke ho sebetsana le data pele e e fepa ho neural network kapa ho e hlahloba ka tsela e itseng.

Sehloohong sena, sehlopha sa rona se tla hlalosa hore na u ka sebetsana joang le data kapele le ha bonolo ka litaelo le khoutu ea mohato ka mohato. Re lekile ho etsa hore khoutu e fetohe habonolo mme e ka sebelisoa bakeng sa li-dataset tse fapaneng.

Litsebi tse ngata li ka 'na tsa se ke tsa fumana letho le sa tloaelehang sehloohong sena, empa ba qalang ba tla khona ho ithuta ntho e ncha,' me mang kapa mang ea nang le nako e telele a lora ho etsa bukana e arohaneng bakeng sa ts'ebetso ea data e potlakileng le e hlophisitsoeng a ka kopitsa khoutu le ho iketsetsa eona, kapa khoasolla bukana e felileng ho tsoa ho Github.

Re fumane dataset. Seo u lokelang ho se etsa kamora moo?

Kahoo, tekanyetso: re hloka ho utloisisa seo re sebetsanang le sona, setΕ‘oantΕ‘o se akaretsang. Ho etsa sena, re sebelisa li-pandas ho hlalosa feela mefuta e fapaneng ea data.

import pandas as pd #ΠΈΠΌΠΏΠΎΡ€Ρ‚ΠΈΡ€ΡƒΠ΅ΠΌ pandas
import numpy as np  #ΠΈΠΌΠΏΠΎΡ€Ρ‚ΠΈΡ€ΡƒΠ΅ΠΌ numpy
df = pd.read_csv("AB_NYC_2019.csv") #Ρ‡ΠΈΡ‚Π°Π΅ΠΌ датасСт ΠΈ записываСм Π² ΠΏΠ΅Ρ€Π΅ΠΌΠ΅Π½Π½ΡƒΡŽ df

df.head(3) #смотрим Π½Π° ΠΏΠ΅Ρ€Π²Ρ‹Π΅ 3 строчки, Ρ‡Ρ‚ΠΎΠ±Ρ‹ ΠΏΠΎΠ½ΡΡ‚ΡŒ, ΠΊΠ°ΠΊ выглядят значСния

Leqephe la ho qhekella la Notepad bakeng sa ho lokisa data kapele

df.info() #ДСмонстрируСм ΠΈΠ½Ρ„ΠΎΡ€ΠΌΠ°Ρ†ΠΈΡŽ ΠΎ ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°Ρ…

Leqephe la ho qhekella la Notepad bakeng sa ho lokisa data kapele

Ha re shebeng boleng ba kholomo:

  1. Na palo ea mela e kholomong ka 'ngoe e lumellana le kakaretso ea mela?
  2. Moko oa lintlha tse kholomong ka 'ngoe ke eng?
  3. Ke kholomo efe eo re batlang ho e shebisisa ho etsa likhakanyo tsa eona?

Likarabo tsa lipotso tsena li tla u lumella ho sekaseka datha mme u rale moralo oa liketso tsa hau tse latelang.

Hape, bakeng sa ho shebisisa boleng ba kholumo e 'ngoe le e 'ngoe, re ka sebelisa pandas explain() mosebetsi. Leha ho le joalo, bothata ba mosebetsi ona ke hore ha e fane ka tlhahisoleseding mabapi le litΕ‘iea tse nang le litekanyetso tsa likhoele. Re tla sebetsana le tsona hamorao.

df.describe()

Leqephe la ho qhekella la Notepad bakeng sa ho lokisa data kapele

Pono ea boselamose

Ha re shebeng moo re senang boleng ho hang:

import seaborn as sns
sns.heatmap(df.isnull(),yticklabels=False,cbar=False,cmap='viridis')

Leqephe la ho qhekella la Notepad bakeng sa ho lokisa data kapele

Ena e ne e le ponahalo e khutΕ‘oanyane e tsoang holimo, joale re tla fetela linthong tse ling tse thahasellisang

Ha re leke ho fumana, 'me, ha ho khoneha, re tlose litΕ‘iea tse nang le boleng bo le bong feela melaleng eohle (li ke ke tsa ama sephetho ka tsela efe kapa efe):

df = df[[c for c
        in list(df)
        if len(df[c].unique()) > 1]] #ΠŸΠ΅Ρ€Π΅Π·Π°ΠΏΠΈΡΡ‹Π²Π°Π΅ΠΌ датасСт, оставляя Ρ‚ΠΎΠ»ΡŒΠΊΠΎ Ρ‚Π΅ ΠΊΠΎΠ»ΠΎΠ½ΠΊΠΈ, Π² ΠΊΠΎΡ‚ΠΎΡ€Ρ‹Ρ… большС ΠΎΠ΄Π½ΠΎΠ³ΠΎ ΡƒΠ½ΠΈΠΊΠ°Π»ΡŒΠ½ΠΎΠ³ΠΎ значСния

Joale rea itΕ‘ireletsa le katleho ea projeke ea rona ho mela e kopitsoang (mela e nang le tlhaiso-leseling e tΕ‘oanang ka tatellano e tΕ‘oanang le e meng ea mela e teng):

df.drop_duplicates(inplace=True) #Π”Π΅Π»Π°Π΅ΠΌ это, Ссли считаСм Π½ΡƒΠΆΠ½Ρ‹ΠΌ.
                                 #Π’ Π½Π΅ΠΊΠΎΡ‚ΠΎΡ€Ρ‹Ρ… ΠΏΡ€ΠΎΠ΅ΠΊΡ‚Π°Ρ… ΡƒΠ΄Π°Π»ΡΡ‚ΡŒ Ρ‚Π°ΠΊΠΈΠ΅ Π΄Π°Π½Π½Ρ‹Π΅ с самого Π½Π°Ρ‡Π°Π»Π° Π½Π΅ стоит.

Re arola dataset ka tse peli: e 'ngoe e na le boleng ba boleng, e' ngoe e na le lipalo.

Mona re hloka ho hlakisa lintlha tse nyane: haeba mela e nang le data e sieo ka data ea boleng le palo e sa amaneng haholo, joale re tla hloka ho etsa qeto ea hore na re tela eng - mela eohle e nang le data e sieo, karolo feela ea eona, kapa litΕ‘iea tse itseng. Haeba mela e hokahane, joale re na le tokelo eohle ea ho arola dataset ka bobeli. Ho seng joalo, u tla tlameha ho qala ka ho sebetsana le mela e sa amaneng le data e sieo ka boleng le bongata, ebe joale o arola dataset ho tse peli.

df_numerical = df.select_dtypes(include = [np.number])
df_categorical = df.select_dtypes(exclude = [np.number])

Re etsa sena ho re nolofalletsa ho sebetsana le mefuta ena e 'meli e fapaneng ea data - hamorao re tla utloisisa hore na sena se nolofalletsa bophelo ba rona hakae.

Re sebetsa ka data ea bongata

Ntho ea pele eo re lokelang ho e etsa ke ho tseba hore na ho na le "likholomo tsa bohloela" ho data ea bongata. Re bitsa likholomo tsena hobane li itlhahisa e le data ea palo, empa e sebetsa joalo ka data ea boleng.

Re ka li tseba joang? Ha e le hantle, tsohle li itΕ‘etlehile ka mofuta oa boitsebiso boo u bo hlahlobang, empa ka kakaretso litΕ‘iea tse joalo li ka 'na tsa e-ba le lintlha tse nyenyane tse ikhethang (sebakeng sa 3-10 ea litekanyetso tse ikhethang).

print(df_numerical.nunique())

Ha re se re khethile litΕ‘iea tsa lihloela, re tla li tlosa ho tloha ho data ea bongata ho ea ho data ea boleng:

spy_columns = df_numerical[['ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°1', 'ΠΊΠΎΠ»ΠΎΠΊΠ°2', 'ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°3']]#выдСляСм ΠΊΠΎΠ»ΠΎΠ½ΠΊΠΈ-ΡˆΠΏΠΈΠΎΠ½Ρ‹ ΠΈ записываСм Π² ΠΎΡ‚Π΄Π΅Π»ΡŒΠ½ΡƒΡŽ dataframe
df_numerical.drop(labels=['ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°1', 'ΠΊΠΎΠ»ΠΎΠΊΠ°2', 'ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°3'], axis=1, inplace = True)#Π²Ρ‹Ρ€Π΅Π·Π°Π΅ΠΌ эти ΠΊΠΎΠ»ΠΎΠ½ΠΊΠΈ ΠΈΠ· количСствСнных Π΄Π°Π½Π½Ρ‹Ρ…
df_categorical.insert(1, 'ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°1', spy_columns['ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°1']) #добавляСм ΠΏΠ΅Ρ€Π²ΡƒΡŽ ΠΊΠΎΠ»ΠΎΠ½ΠΊΡƒ-шпион Π² качСствСнныС Π΄Π°Π½Π½Ρ‹Π΅
df_categorical.insert(1, 'ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°2', spy_columns['ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°2']) #добавляСм Π²Ρ‚ΠΎΡ€ΡƒΡŽ ΠΊΠΎΠ»ΠΎΠ½ΠΊΡƒ-шпион Π² качСствСнныС Π΄Π°Π½Π½Ρ‹Π΅
df_categorical.insert(1, 'ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°3', spy_columns['ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°3']) #добавляСм Ρ‚Ρ€Π΅Ρ‚ΡŒΡŽ ΠΊΠΎΠ»ΠΎΠ½ΠΊΡƒ-шпион Π² качСствСнныС Π΄Π°Π½Π½Ρ‹Π΅

Qetellong, re arotse data ea palo ka botlalo ho tsoa ho data ea boleng mme joale re ka sebetsa le eona hantle. Ntho ea pele ke ho utloisisa moo re nang le boleng bo se nang letho (NaN, 'me maemong a mang 0 e tla amoheloa e le boleng bo se nang letho).

for i in df_numerical.columns:
    print(i, df[i][df[i]==0].count())

Mothating ona, ho bohlokoa ho utloisisa hore na ke litΕ‘iea life tse ka bonts'ang litekanyetso tse sieo: na see se bakoa ke hore na data e ile ea bokelloa joang? Kapa na e amana le boleng ba data? Lipotso tsena li tlameha ho arajoa ho latela maemo.

Kahoo, haeba re ntse re nka qeto ea hore re kanna ra ba sieo data moo ho nang le li-zero, re lokela ho khutlisa zero ka NaN ho etsa hore ho be bonolo ho sebetsa le data ena e lahlehileng hamorao:

df_numerical[["ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ° 1", "ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ° 2"]] = df_numerical[["ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ° 1", "ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ° 2"]].replace(0, nan)

Joale a re boneng moo re sieo data:

sns.heatmap(df_numerical.isnull(),yticklabels=False,cbar=False,cmap='viridis') # МоТно Ρ‚Π°ΠΊΠΆΠ΅ Π²ΠΎΡΠΏΠΎΠ»ΡŒΠ·ΠΎΠ²Π°Ρ‚ΡŒΡΡ df_numerical.info()

Leqephe la ho qhekella la Notepad bakeng sa ho lokisa data kapele

Mona litekanyetso tseo ka har'a likholomo tse sieo li lokela ho tΕ‘oauoa ka mosehla. 'Me joale monate o qala - mokhoa oa ho sebetsana le litekanyetso tsee? Na ke lokela ho hlakola mela e nang le boleng kapa likholomo tsee? Kapa tlatsa litekanyetso tsena tse se nang letho ka tse ling?

Mona ke setΕ‘oantΕ‘o se hakanyetsoang se ka u thusang ho etsa qeto ea hore na, ha e le hantle, ho ka etsoa eng ka litekanyetso tse se nang letho:

Leqephe la ho qhekella la Notepad bakeng sa ho lokisa data kapele

0. Tlosa litΕ‘iea tse sa hlokahaleng

df_numerical.drop(labels=["ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°1","ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°2"], axis=1, inplace=True)

1. Na palo ea litekanyetso tse se nang letho kholomong ee e feta 50%?

print(df_numerical.isnull().sum() / df_numerical.shape[0] * 100)

df_numerical.drop(labels=["ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°1","ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°2"], axis=1, inplace=True)#УдаляСм, Ссли какая-Ρ‚ΠΎ ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ° ΠΈΠΌΠ΅Π΅Ρ‚ большС 50 пустых Π·Π½Π°Ρ‡Π΅Π½ΠΈΠΉ

2. Hlakola mela e nang le boleng bo se nang letho

df_numerical.dropna(inplace=True)#УдаляСм строчки с пустыми значСниями, Ссли ΠΏΠΎΡ‚ΠΎΠΌ останСтся достаточно Π΄Π°Π½Π½Ρ‹Ρ… для обучСния

3.1. Ho kenya boleng bo sa reroang

import random #ΠΈΠΌΠΏΠΎΡ€Ρ‚ΠΈΡ€ΡƒΠ΅ΠΌ random
df_numerical["ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°"].fillna(lambda x: random.choice(df[df[column] != np.nan]["ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°"]), inplace=True) #вставляСм Ρ€Π°Π½Π΄ΠΎΠΌΠ½Ρ‹Π΅ значСния Π² пустыС ΠΊΠ»Π΅Ρ‚ΠΊΠΈ Ρ‚Π°Π±Π»ΠΈΡ†Ρ‹

3.2. Ho kenya boleng bo sa feleng

from sklearn.impute import SimpleImputer #ΠΈΠΌΠΏΠΎΡ€Ρ‚ΠΈΡ€ΡƒΠ΅ΠΌ SimpleImputer, ΠΊΠΎΡ‚ΠΎΡ€Ρ‹ΠΉ ΠΏΠΎΠΌΠΎΠΆΠ΅Ρ‚ Π²ΡΡ‚Π°Π²ΠΈΡ‚ΡŒ значСния
imputer = SimpleImputer(strategy='constant', fill_value="<Π’Π°ΡˆΠ΅ Π·Π½Π°Ρ‡Π΅Π½ΠΈΠ΅ здСсь>") #вставляСм ΠΎΠΏΡ€Π΅Π΄Π΅Π»Π΅Π½Π½ΠΎΠ΅ Π·Π½Π°Ρ‡Π΅Π½ΠΈΠ΅ с ΠΏΠΎΠΌΠΎΡ‰ΡŒΡŽ SimpleImputer
df_numerical[["новая_ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°1",'новая_ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°2','новая_ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°3']] = imputer.fit_transform(df_numerical[['ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°1', 'ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°2', 'ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°3']]) #ΠŸΡ€ΠΈΠΌΠ΅Π½ΡΠ΅ΠΌ это для нашСй Ρ‚Π°Π±Π»ΠΈΡ†Ρ‹
df_numerical.drop(labels = ["ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°1","ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°2","ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°3"], axis = 1, inplace = True) #Π£Π±ΠΈΡ€Π°Π΅ΠΌ ΠΊΠΎΠ»ΠΎΠ½ΠΊΠΈ со старыми значСниями

3.3. Kenya boleng bo tloaelehileng kapa hangata

from sklearn.impute import SimpleImputer #ΠΈΠΌΠΏΠΎΡ€Ρ‚ΠΈΡ€ΡƒΠ΅ΠΌ SimpleImputer, ΠΊΠΎΡ‚ΠΎΡ€Ρ‹ΠΉ ΠΏΠΎΠΌΠΎΠΆΠ΅Ρ‚ Π²ΡΡ‚Π°Π²ΠΈΡ‚ΡŒ значСния
imputer = SimpleImputer(strategy='mean', missing_values = np.nan) #вмСсто mean ΠΌΠΎΠΆΠ½ΠΎ Ρ‚Π°ΠΊΠΆΠ΅ ΠΈΡΠΏΠΎΠ»ΡŒΠ·ΠΎΠ²Π°Ρ‚ΡŒ most_frequent
df_numerical[["новая_ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°1",'новая_ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°2','новая_ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°3']] = imputer.fit_transform(df_numerical[['ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°1', 'ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°2', 'ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°3']]) #ΠŸΡ€ΠΈΠΌΠ΅Π½ΡΠ΅ΠΌ это для нашСй Ρ‚Π°Π±Π»ΠΈΡ†Ρ‹
df_numerical.drop(labels = ["ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°1","ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°2","ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°3"], axis = 1, inplace = True) #Π£Π±ΠΈΡ€Π°Π΅ΠΌ ΠΊΠΎΠ»ΠΎΠ½ΠΊΠΈ со старыми значСниями

3.4. Kenya boleng bo baloang ke mofuta o mong

Ka nako e 'ngoe boleng bo ka baloa ho sebelisoa mefuta ea regression ho sebelisa mefuta e tsoang laeboraring ea sklearn kapa lilaebraring tse ling tse tΕ‘oanang. Sehlopha sa rona se tla fana ka sengoloa se arohaneng sa hore na sena se ka etsoa joang haufinyane.

Kahoo, hajoale, tlaleho e mabapi le data ea bongata e tla sitisoa, hobane ho na le lintlha tse ling tse ngata mabapi le mokhoa oa ho etsa ho lokisoa ha data le ho hlophisoa esale pele bakeng sa mesebetsi e fapaneng, 'me lintho tsa mantlha tsa data ea bongata li hlokometsoe sehloohong sena. joale ke nako ea ho khutlela ho qualitative data.eo re ileng ra e arola mehato e mengata ho tloha ho ea palo. U ka fetola bukana ena kamoo u ratang, u e fetola mesebetsing e fapaneng, e le hore ts'ebetso ea data preprocessing e tsamaee kapele haholo!

Lintlha tsa boleng

Ha e le hantle, bakeng sa lintlha tsa boleng, mokhoa oa One-hot-encoding o sebelisoa e le ho o fomata ho tloha khoele (kapa ntho) ho ea ho nomoro. Pele re fetela ntlheng ena, ha re sebeliseng setΕ‘oantΕ‘o le khoutu e kaholimo ho sebetsana le litekanyetso tse se nang letho.

df_categorical.nunique()

sns.heatmap(df_categorical.isnull(),yticklabels=False,cbar=False,cmap='viridis')

Leqephe la ho qhekella la Notepad bakeng sa ho lokisa data kapele

0. Tlosa litΕ‘iea tse sa hlokahaleng

df_categorical.drop(labels=["ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°1","ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°2"], axis=1, inplace=True)

1. Na palo ea litekanyetso tse se nang letho kholomong ee e feta 50%?

print(df_categorical.isnull().sum() / df_numerical.shape[0] * 100)

df_categorical.drop(labels=["ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°1","ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°2"], axis=1, inplace=True) #УдаляСм, Ссли какая-Ρ‚ΠΎ ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ° 
                                                                          #ΠΈΠΌΠ΅Π΅Ρ‚ большС 50% пустых Π·Π½Π°Ρ‡Π΅Π½ΠΈΠΉ

2. Hlakola mela e nang le boleng bo se nang letho

df_categorical.dropna(inplace=True)#УдаляСм строчки с пустыми значСниями, 
                                   #Ссли ΠΏΠΎΡ‚ΠΎΠΌ останСтся достаточно Π΄Π°Π½Π½Ρ‹Ρ… для обучСния

3.1. Ho kenya boleng bo sa reroang

import random
df_categorical["ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°"].fillna(lambda x: random.choice(df[df[column] != np.nan]["ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°"]), inplace=True)

3.2. Ho kenya boleng bo sa feleng

from sklearn.impute import SimpleImputer
imputer = SimpleImputer(strategy='constant', fill_value="<Π’Π°ΡˆΠ΅ Π·Π½Π°Ρ‡Π΅Π½ΠΈΠ΅ здСсь>")
df_categorical[["новая_колонка1",'новая_колонка2','новая_колонка3']] = imputer.fit_transform(df_categorical[['колонка1', 'колонка2', 'колонка3']])
df_categorical.drop(labels = ["ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°1","ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°2","ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°3"], axis = 1, inplace = True)

Kahoo, qetellong re na le ts'ebetso ea nulls ho data ea boleng. Joale ke nako ea ho etsa encoding e le 'ngoe ho boleng bo fumanehang polokelong ea hau. Mokhoa ona o sebelisoa hangata ho netefatsa hore algorithm ea hau e ka ithuta ho tsoa ho data ea boleng bo holimo.

def encode_and_bind(original_dataframe, feature_to_encode):
    dummies = pd.get_dummies(original_dataframe[[feature_to_encode]])
    res = pd.concat([original_dataframe, dummies], axis=1)
    res = res.drop([feature_to_encode], axis=1)
    return(res)

features_to_encode = ["ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°1","ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°2","ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°3"]
for feature in features_to_encode:
    df_categorical = encode_and_bind(df_categorical, feature))

Kahoo, qetellong re qetile ho sebetsana le data e arohaneng ea boleng le bongata - nako ea ho li kopanya hape

new_df = pd.concat([df_numerical,df_categorical], axis=1)

Kamora hore re kopanye li-dataset ho ba e le 'ngoe, qetellong re ka sebelisa phetoho ea data re sebelisa MinMaxScaler ho tsoa laebraring ea sklearn. Sena se tla etsa hore litekanyetso tsa rona li be pakeng tsa 0 le 1, tse tla thusa ha re koetlisa mohlala nakong e tlang.

from sklearn.preprocessing import MinMaxScaler
min_max_scaler = MinMaxScaler()
new_df = min_max_scaler.fit_transform(new_df)

Lintlha tsena li se li loketse eng kapa eng - marang-rang a neural, li-algorithms tse tloaelehileng tsa ML, joalo-joalo!

Sengoliloeng sena, ha rea ​​ka ra nahana ka ho sebetsa ka data ea letoto la nako, hobane bakeng sa data e joalo o lokela ho sebelisa mekhoa e fapaneng ea ts'ebetso, ho latela mosebetsi oa hau. Nakong e tlang, sehlopha sa rona se tla fana ka sengoloa se arohaneng sehloohong sena, 'me re ts'epa hore se tla khona ho tlisa ntho e khahlisang, e ncha le e sebetsang bophelong ba hau, joalo ka ena.

Source: www.habr.com

Eketsa ka tlhaloso