Muchikamu chino, tichaongorora maverengero ezvinyorwa zvekushandura linear regression mabasa Π² inverse logit shanduko basa (nemwe inodaidzwa kuti logistic response function). Zvadaro, kushandisa arsenal yakanyanya mukana nzira, maererano neiyo logistic regression model, tinowana basa rekurasikirwa Logistic Loss, kana nemamwe mazwi, isu tichatsanangura basa iro paramita yehuremu vector inosarudzwa muiyo logistic regression modhi. .
Mutsara wechinyorwa:
- Ngatidzokorore hukama hwemutsara pakati pemhando mbiri
- Ngationei kudiwa kweshanduko linear regression mabasa Π² logistic mhinduro basa
- Ngatiitei shanduko uye zvinobuda logistic mhinduro basa
- Ngatiedzei kunzwisisa kuti nei nzira shoma yemakwere yakashata pakusarudza ma parameter mabasa Logistic Loss
- Isu tinoshandisa yakanyanya mukana nzira kuti usarudze parameter sarudzo mabasa :
5.1. Nyaya 1: basa Logistic Loss zvezvinhu zvine mazita ekirasi 0 ΠΈ 1:
5.2. Nyaya 2: basa Logistic Loss zvezvinhu zvine mazita ekirasi -1 ΠΈ +1:
Chinyorwa chizere nemienzaniso yakapfava umo maverengero ese ari nyore kuita nemuromo kana pabepa; mune dzimwe nguva, karukureta inogona kudiwa. Saka gadzirira :)
Ichi chinyorwa chinonyanya kuitirwa masayendisiti edata ane nhanho yekutanga yeruzivo mune izvo zvekutanga zvekudzidza muchina.
Chinyorwa chinozopawo kodhi yekudhirowa magirafu uye kuverenga. Kodhi yose yakanyorwa mumutauro python-2.7. Rega nditsanangure mberi nezve "zvitsva" zveshanduro yakashandiswa - iyi ndeimwe yemamiriro ekutora kosi inozivikanwa kubva Yandex papuratifomu yedzidzo yepamhepo yakaenzana Coursera, uye, sekufunga kunoita munhu, mashoko acho akagadzirwa zvichibva paiyi kosi.
01. Kutsamira-mutsara kutsamira
Zvine musoro kubvunza mubvunzo - chii chinonzi mutsara kutsamira uye kudzoreredzwa kwemaitiro zvine chekuita nazvo?
Zviri nyore! Logistic regression ndeimwe yemamodheru ari eiyo linear classifier. Nemashoko akareruka, basa remutsetse wemutsara nderekufanotaura kukosha kwazvinonangwa kubva pane zvakasiyana (regressors) . Zvinotendwa kuti kutsamira pakati pehunhu uye chinangwa chetsika linear. Saka zita reiyo classifier - linear. Kuzvitaura zvakanyanya, iyo logistic regression modhi yakavakirwa pafungidziro yekuti kune hukama hwemutsara pakati pehunhu. uye chinangwa chetsika . Uku ndiko kubatana.
Kune muenzaniso wekutanga mu studio, uye zviri, nenzira kwayo, nezve rectilinear kutsamira kwehuwandu huri kudzidzwa. Mukugadzira chinyorwa, ndakasangana nemuenzaniso wakatoisa vanhu vazhinji pamucheto - kutsamira kweazvino pamagetsi. ("Applied regression analysis", N. Draper, G. Smith). Tichazvitarisawo pano.
Maererano ne Mutemo waOm:
kupi - simba razvino, - voltage, - kuramba.
Dai tisina kuziva Mutemo waOm, zvino taigona kuwana kutsamira empirically nekuchinja uye kuyera , uku achitsigira fixed. Ipapo taizoona kuti kutsamira girafu ΠΎΡ inopa mutsara wakatwasuka kana wakawanda kuburikidza nemabviro. Tinoti "zvizhinji kana zvishoma" nokuti, kunyange zvazvo hukama huri hwechokwadi, zviyero zvedu zvinogona kunge zvine zvikanganiso zviduku, uye naizvozvo pfungwa dziri pagirafu dzinogona kusawira pamutsara, asi dzichapararira kumativi ose.
Girafu 1 "Kutsamira" ΠΎΡ Β»
Kodhi yekudhirowa kwechati
import matplotlib.pyplot as plt
%matplotlib inline
import numpy as np
import random
R = 13.75
x_line = np.arange(0,220,1)
y_line = []
for i in x_line:
y_line.append(i/R)
y_dot = []
for i in y_line:
y_dot.append(i+random.uniform(-0.9,0.9))
fig, axes = plt.subplots(figsize = (14,6), dpi = 80)
plt.plot(x_line,y_line,color = 'purple',lw = 3, label = 'I = U/R')
plt.scatter(x_line,y_dot,color = 'red', label = 'Actual results')
plt.xlabel('I', size = 16)
plt.ylabel('U', size = 16)
plt.legend(prop = {'size': 14})
plt.show()
02. Kudiwa kwekushandura mutsara wekugadzirisa equation
Ngatitarisei mumwe muenzaniso. Ngatifungei kuti isu tinoshanda mubhangi uye basa redu nderekuona mukana wekuti mukwereti adzore chikwereti zvichienderana nezvimwe zvinhu. Kuti basa rive nyore, tichatarisa zvinhu zviviri chete: muhoro wepamwedzi weanokwereta uye mari yekubhadhara chikwereti chemwedzi.
Basa racho rine mamiriro ezvinhu, asi nemuenzaniso uyu tinogona kunzwisisa kuti sei zvisina kukwana kushandisa linear regression mabasa, uye zvakare tsvaga kuti ndedzipi shanduko dzinofanirwa kuitwa nebasa racho.
Ngatidzokere kumuenzaniso. Zvinonzwisiswa kuti iyo yakakwira muhoro, mukwereti anowedzera kukwanisa kugovera mwedzi wega wega kubhadhara chikwereti. Panguva imwecheteyo, kune imwe muhoro renji hukama uhwu huchave hwakanyanya mutsara. Semuenzaniso, ngatitorei muhoro wemuhoro kubva pa60.000 RUR kusvika ku200.000 RUR uye tofunga kuti muchikamu chakatarwa chemuhoro, kutsamira kwehukuru hwemubhadharo wepamwedzi pane saizi yemuhoro inoenderana. Ngatitii kune iyo yakatarwa yemubairo yakaratidzwa kuti muhoro-ku-mubhadharo weyero haigone kuwira pasi pe3 uye mukwereti anofanira kunge achiri ne5.000 RUR mudura. Uye chete munyaya iyi, tichafunga kuti mukwereti achadzorera chikwereti kubhangi. Zvadaro, iyo mutsara regression equation inotora fomu:
apo , , , - mubhadharo -mukwereta, - kubhadhara chikwereti -th mukwereti.
Kutsiva muhoro uye kubhadhara kwechikwereti neakamisikidzwa paramita muequation Unogona kusarudza kupa kana kuramba chikwereti.
Tichitarisa kumberi, tinoona kuti, nematanho akapihwa linear regression function, inoshandiswa mu logistic mhinduro mabasa ichaburitsa hukuru hukuru hunozonetsa kuverenga kuona zvingaitika zvekubhadhara chikwereti. Naizvozvo, zvinorongedzerwa kuderedza macoefficients edu, ngatiti, ne25.000 nguva. Iyi shanduko mu coefficients haizoshandure sarudzo yekupa chikwereti. Ngatiyeukei pfungwa iyi yeramangwana, asi ikozvino, kuti tinyatsojekesa zvatiri kutaura nezvazvo, ngatifungei nezvemamiriro ezvinhu nevatatu vanogona kukwereta.
Tafura 1 "Vanogona kukwereta"
Kodhi yekugadzira tafura
import pandas as pd
r = 25000.0
w_0 = -5000.0/r
w_1 = 1.0/r
w_2 = -3.0/r
data = {'The borrower':np.array(['Vasya', 'Fedya', 'Lesha']),
'Salary':np.array([120000,180000,210000]),
'Payment':np.array([3000,50000,70000])}
df = pd.DataFrame(data)
df['f(w,x)'] = w_0 + df['Salary']*w_1 + df['Payment']*w_2
decision = []
for i in df['f(w,x)']:
if i > 0:
dec = 'Approved'
decision.append(dec)
else:
dec = 'Refusal'
decision.append(dec)
df['Decision'] = decision
df[['The borrower', 'Salary', 'Payment', 'f(w,x)', 'Decision']]
Maererano nedheta iri patafura, Vasya, ane muhoro we120.000 RUR, anoda kugamuchira chikwereti kuitira kuti adzorere pamwedzi pa3.000 RUR. Takasarudza kuti kuti tibvumire chikwereti, muhoro waVasya unofanira kudarika katatu mari yekubhadhara, uye panofanira kunge kune 5.000 RUR yasara. Vasya anozadzisa ichi chinodiwa: . Kunyange 106.000 RUR inoramba iripo. Pasinei nokuti pakuverenga taderedza mikana 25.000 nguva, mhedzisiro yaive yakafanana - chikwereti chinogona kubvumidzwa. Fedya achagamuchirawo chikwereti, asi Lesha, pasinei nokuti anogamuchira zvakanyanya, achafanira kuderedza chido chake.
Ngatidhirowei girafu renyaya iyi.
Chati 2 βKukamurwa kwevanokweretaβ
Kodhi yekudhirowa girafu
salary = np.arange(60000,240000,20000)
payment = (-w_0-w_1*salary)/w_2
fig, axes = plt.subplots(figsize = (14,6), dpi = 80)
plt.plot(salary, payment, color = 'grey', lw = 2, label = '$f(w,x_i)=w_0 + w_1x_{i1} + w_2x_{i2}$')
plt.plot(df[df['Decision'] == 'Approved']['Salary'], df[df['Decision'] == 'Approved']['Payment'],
'o', color ='green', markersize = 12, label = 'Decision - Loan approved')
plt.plot(df[df['Decision'] == 'Refusal']['Salary'], df[df['Decision'] == 'Refusal']['Payment'],
's', color = 'red', markersize = 12, label = 'Decision - Loan refusal')
plt.xlabel('Salary', size = 16)
plt.ylabel('Payment', size = 16)
plt.legend(prop = {'size': 14})
plt.show()
Saka, mutsara wedu wakatwasuka, wakavakwa maererano nebasa , inoparadzanisa βvakaipaβ vanokwereta neβvakanakaβ. Vaya vanokwereta vane zvishuvo zvisingaenderani nekwaniso yavo vari pamusoro pemutsara (Lesha), asi avo, maererano nemiganhu yemuenzaniso wedu, vanokwanisa kubhadhara chikwereti vari pasi pemutsara (Vasya naFedya). Mune mamwe mazwi, tinogona kutaura izvi: mutsara wedu wakananga unokamura vanokwereta kuita makirasi maviri. Ngativaratidze sezvizvi: kukirasi Isu ticharongedza avo vanokwereta avo vangango bhadhara chikwereti se kana Tichabatanidza avo vanokwereta avo vangangove vasingazokwanise kubhadhara chikwereti.
Ngatipei muchidimbu mhedziso kubva mumuenzaniso uyu wakapfava. Ngatitorei pfungwa uye, kutsiva zvinorongeka zvepoindi muyeresheni inoenderana yemutsetse , funga zvinhu zvitatu zvingasarudzwa:
- Kana iyo pfungwa iri pasi pemutsara uye tinoigovera kukirasi , ipapo kukosha kwebasa racho zvichava positive kubva up to . Izvi zvinoreva kuti tinogona kufunga kuti mukana wekubhadhara chikwereti uri mukati . Kukura ukoshi hwebasa, kunowedzera mukana.
- Kana pfungwa iri pamusoro pemutsara uye tinoigovera kukirasi kana , ipapo kukosha kweiyo basa kuchave kwakashata kubva up to . Zvadaro tichafunga kuti mukana wekubhadhara chikwereti uri mukati uye, iyo yakakura kukosha kwakakwana kwebasa, kunowedzera kuvimba kwedu.
- Pfungwa iri pamutsetse wakatwasuka, pamuganhu pakati pemakirasi maviri. Muchiitiko ichi, kukosha kwebasa racho zvichaenzana uye mukana wekubhadhara chikwereti chakaenzana ne .
Zvino, ngatimbofungidzira kuti hatina zvinhu zviviri, asi gumi nemaviri, uye kwete matatu, asi zviuru zvevakwereta. Zvadaro panzvimbo yemutsara wakatwasuka tichava nawo m-dimensional ndege uye coefficients isu hatizobviswi kunze kwemhepo yakaonda, asi inotorwa maererano nemitemo yose, uye pamusana pe data yakaunganidzwa kune vanokwereta vane kana vasina kubhadhara chikwereti. Uye zvechokwadi, cherechedza kuti isu tave kusarudza vanokwereta tichishandisa yakatozivikanwa coefficients . Muchokwadi, basa reiyo logistic regression modhi ndeyekunyatso kuona iyo parameter , apo kukosha kwebasa rekurasikirwa Logistic Loss zvichava zvishoma. Asi nezvekuti vector inoverengwa sei , tichawana zvimwe muchikamu chechishanu chechinyorwa. Panguva ino, tinodzokera kunyika yechipikirwa - kumubhengi wedu nevatengi vake vatatu.
Kutenda kune basa tinoziva kuti ndiani angapiwa chikwereti uye ndiani anofanira kunyimwa. Asi iwe haugone kuenda kumutungamiriri nemashoko akadaro, nokuti vaida kuwana kubva kwatiri mukana wekubhadhara chikwereti nemukwereti wega wega. Kuita sei? Mhinduro iri nyore - isu tinoda neimwe nzira kushandura basa , ane hunhu huri muhuwandu kune basa rine hunhu hucharara muhuwandu . Uye basa rakadaro riripo, rinonzi Logistic mhinduro basa kana inverse-logit shanduko. Kusangana:
Ngatione nhanho nhanho kuti inoshanda sei logistic mhinduro basa. Cherechedza kuti tichafamba nenzira yakapesana, i.e. isu tichafunga kuti isu tinoziva kukosha kwekugona, uko kuri muhuwandu kubva up to uye isu ticha "kusunungura" kukosha uku kune huwandu hwese hwenhamba kubva up to .
03. Isu tinotora iyo logistic mhinduro basa
Danho 1. Shandura kukosha kwezvingangove kuita renji
Panguva yekushandurwa kwebasa Π² logistic mhinduro basa Tichasiya muongorori wedu wechikwereti ega uye totora rwendo rwevabhuki panzvimbo. Aiwa, hongu, isu hatisi kuzoisa mabheti, zvese zvinotifadza ipapo ndizvo zvinorehwa neshoko, semuenzaniso, mukana ndewe 4 kusvika 1. Izvo zvisingaiti, zvinozivikanwa kune vese vanobhejera, ireshiyo ye "kubudirira" kune " kukundikanaβ. Mumashoko angangoitika, odd (odds) mukana wekuti chiitiko chiitike chakakamurwa nemukana wekuti chiitiko chirege kuitika. Ngatinyorei pasi fomula yemukana wekuti chiitiko chiitike :
kupi - mukana wekuti chiitiko chiitike, - mukana wekuti chiitiko HACHIITIKA
Semuyenzaniso, kana mukana wekuti bhiza rechidiki, rakasimba uye rinotamba zita remadunhurirwa rekuti "Veterok" richarova chembere uye yakashata inonzi "Matilda" panhangemutange yakaenzana , ipapo mikana yekubudirira ye "Veterok" ichave ΠΊ uye zvakasiyana, tichiziva zvipingamupinyi, hazvizove zvakaoma kwatiri kuverenga zvingangoitika :
Saka, takadzidza "kushandura" mukana mukana, unotora kukosha kubva up to . Ngatitorei rimwe danho uye tidzidze "kushandura" mukana wemutsetse wenhamba rose kubva up to .
Danho 2. Shandura kukosha kwezvingangove kuita renji
Iyi nhanho iri nyore kwazvo - ngatitorei logarithm yemaodd kuzasi kwenhamba yaEuler. uye tinowana:
Zvino tinoziva kuti kana , wobva waverenga kukosha zvichave zviri nyore uye, zvakare, zvinofanirwa kuve zvakanaka: . Ichi ichokwadi.
Nekuda kuziva, ngatitarisei kuti chii kana , zvino tinotarisira kuona kukosha kwakashata . Tinoongorora: . Izvo ndizvo.
Iye zvino tave kuziva nzira yekushandura iyo probability value kubva up to pamutsara wenhamba dzese kubva up to . Mudanho rinotevera tichaita zvinopesana.
Nokuti ikozvino, tinocherechedza kuti maererano nemitemo ye logarithm, kuziva kukosha kwebasa racho , unogona kuverenga maodds:
Iyi nzira yekuona kusawirirana ichatibatsira munhanho inotevera.
Danho 3. Ngatitorei formula yekuona
Saka takadzidza, tichiziva , tsvaga maitiro ekuita . Nekudaro, isu tinoda chaizvo zvakapesana - kuziva kukosha tsvaga . Kuti tiite izvi, ngatitendeukei kune pfungwa yakadai seyo inverse odds function, maererano nezvayo:
Muchinyorwa isu hatitore fomula iri pamusoro, asi isu tichaiongorora tichishandisa manhamba kubva kumuenzaniso uri pamusoro. Isu tinoziva kuti nemaodds e4 kusvika ku1 (), mukana wekuti chiitiko chiitike 0.8 () Ngatiite imwe inotsiva: . Izvi zvinopindirana nekuverenga kwedu kwakaitwa kare. Ngatienderere mberi.
Mudanho rekupedzisira takaona izvozvo , zvinoreva kuti unogona kutsiva mune inverse odds function. Tinowana:
Kamuranisa zvese nhamba nedhinomineta ne , Zvino:
Zvingoitika, kuti tive nechokwadi chekuti hatina kukanganisa chero kupi zvako, ngatiite imwezve diki cheki. In Danho 2, isu nokuda akasarudza izvozvo . Zvadaro, kutsiva kukosha mukati meiyo logistic mhinduro basa, isu tinotarisira kuwana . Isu tinotsiva uye tinotora:
Makorokoto, muverengi anodiwa, tangotora uye kuyedza iyo logistic mhinduro basa. Ngatitarisei girafu yebasa racho.
Girafu 3 "Logistic mhinduro basa"
Kodhi yekudhirowa girafu
import math
def logit (f):
return 1/(1+math.exp(-f))
f = np.arange(-7,7,0.05)
p = []
for i in f:
p.append(logit(i))
fig, axes = plt.subplots(figsize = (14,6), dpi = 80)
plt.plot(f, p, color = 'grey', label = '$ 1 / (1+e^{-w^Tx_i})$')
plt.xlabel('$f(w,x_i) = w^Tx_i$', size = 16)
plt.ylabel('$p_{i+}$', size = 16)
plt.legend(prop = {'size': 14})
plt.show()
Muzvinyorwa unogonawo kuwana zita rebasa iri se sigmoid basa. Girafu rinoratidza zvakajeka kuti shanduko huru mukubvira kwechinhu chekirasi inoitika mukati mechikamu chidiki. , kumwe kubva up to .
Ini ndinokurudzira kudzokera kumuongorori wedu wechikwereti uye kumubatsira kuverenga mukana wekubhadhara chikwereti, zvikasadaro anogona kusara asina bhonasi :)
Tafura 2 "Vanogona kukwereta"
Kodhi yekugadzira tafura
proba = []
for i in df['f(w,x)']:
proba.append(round(logit(i),2))
df['Probability'] = proba
df[['The borrower', 'Salary', 'Payment', 'f(w,x)', 'Decision', 'Probability']]
Saka, isu takasarudza mukana wekubhadhara chikwereti. Kazhinji, izvi zvinoratidzika kuva zvechokwadi.
Zvechokwadi, mukana wekuti Vasya, ane muhoro we120.000 RUR, achakwanisa kupa 3.000 RUR kubhangi mwedzi wega wega iri pedyo ne100%. Nenzira, tinofanira kunzwisisa kuti bhangi rinogona kupa chikwereti kuna Lesha kana mutemo webhangi uchipa, semuenzaniso, kukweretesa kune vatengi vane mukana wekubhadhara chikwereti kune zvinopfuura, taura, 0.3. Ndizvo chete kuti munyaya iyi bhangi richagadzira nzvimbo yakakura yekurasikirwa kunogona kuitika.
Inofanirawo kucherechedzwa kuti mubhadharo-ku-kubhadhara chiyero cheinenge 3 uye nemuganhu we5.000 RUR yakatorwa kubva padenga. Naizvozvo, isu hatina kukwanisa kushandisa vector yehuremu mune yayo yekutanga fomu . Taida kudzikisa zvakanyanya coefficients, uye mune iyi kesi takakamura imwe neimwe coefficient ne25.000, ndiko kuti, muchidimbu, isu takagadzirisa mhedzisiro. Asi izvi zvakaitwa zvakananga kurerutsa kunzwisiswa kwezvinyorwa pakutanga. Muhupenyu, isu hatizodi kugadzira uye kugadzirisa coefficients, asi tiwane. Muzvikamu zvinotevera zvechinyorwa tichawana equations iyo iyo parameter inosarudzwa .
04. Kashoma masikweya nzira yekuona vheta yehuremu mune iyo logistic mhinduro basa
Isu tatoziva nzira iyi yekusarudza vector yehuremu , se mishoma masikweya nzira (LSM) uye kutaura zvazviri, sei isu ipapo kurishandisa binary classification matambudziko? Chokwadi, hapana chinokutadzisa kushandisa MNC, nzira iyi chete mumatambudziko ekugadzirisa inopa migumisiro isina kunyatsojeka pane Logistic Loss. Pane hwaro hwedzidziso hweizvi. Ngatitangei kutarisa muenzaniso mumwe wakapfava.
Ngatifungei kuti mamodheru edu (kushandisa MSE ΠΈ Logistic Loss) vatotanga kusarudza vector yezviyereso uye takamisa kuverenga pane imwe nhanho. Hazvina mhosva kuti pakati, kumagumo kana pakutanga, chinhu chikuru ndechekuti isu tatova nezvimwe zvakakosha zvevector yehuremu uye ngatifungei kuti padanho iri, vector yehuremu. kune ese mamodheru hapana misiyano. Zvadaro tora huremu hunoguma woisa panzvimbo yavo logistic mhinduro basa () kune chimwe chinhu chekirasi . Isu tinoongorora zviitiko zviviri apo, maererano neakasarudzwa vector yehuremu, modhi yedu yakarasika uye zvakasiyana - iyo modhi ine chivimbo chekuti chinhu chacho ndechekirasi. . Ngationei kuti ndedzipi faindi dzichapihwa kana uchishandisa MNC ΠΈ Logistic Loss.
Kodhi yekuverenga zvirango zvichienderana nebasa rekurasikirwa rinoshandiswa
# ΠΊΠ»Π°ΡΡ ΠΎΠ±ΡΠ΅ΠΊΡΠ°
y = 1
# Π²Π΅ΡΠΎΡΡΠ½ΠΎΡΡΡ ΠΎΡΠ½Π΅ΡΠ΅Π½ΠΈΡ ΠΎΠ±ΡΠ΅ΠΊΡΠ° ΠΊ ΠΊΠ»Π°ΡΡΡ Π² ΡΠΎΠΎΡΠ²Π΅ΡΡΡΠ²ΠΈΠΈ Ρ ΠΏΠ°ΡΠ°ΠΌΠ΅ΡΡΠ°ΠΌΠΈ w
proba_1 = 0.01
MSE_1 = (y - proba_1)**2
print 'Π¨ΡΡΠ°Ρ MSE ΠΏΡΠΈ Π³ΡΡΠ±ΠΎΠΉ ΠΎΡΠΈΠ±ΠΊΠ΅ =', MSE_1
# Π½Π°ΠΏΠΈΡΠ΅ΠΌ ΡΡΠ½ΠΊΡΠΈΡ Π΄Π»Ρ Π²ΡΡΠΈΡΠ»Π΅Π½ΠΈΡ f(w,x) ΠΏΡΠΈ ΠΈΠ·Π²Π΅ΡΡΠ½ΠΎΠΉ Π²Π΅ΡΠΎΡΡΠ½ΠΎΡΡΠΈ ΠΎΡΠ½Π΅ΡΠ΅Π½ΠΈΡ ΠΎΠ±ΡΠ΅ΠΊΡΠ° ΠΊ ΠΊΠ»Π°ΡΡΡ +1 (f(w,x)=ln(odds+))
def f_w_x(proba):
return math.log(proba/(1-proba))
LogLoss_1 = math.log(1+math.exp(-y*f_w_x(proba_1)))
print 'Π¨ΡΡΠ°Ρ Log Loss ΠΏΡΠΈ Π³ΡΡΠ±ΠΎΠΉ ΠΎΡΠΈΠ±ΠΊΠ΅ =', LogLoss_1
proba_2 = 0.99
MSE_2 = (y - proba_2)**2
LogLoss_2 = math.log(1+math.exp(-y*f_w_x(proba_2)))
print '**************************************************************'
print 'Π¨ΡΡΠ°Ρ MSE ΠΏΡΠΈ ΡΠΈΠ»ΡΠ½ΠΎΠΉ ΡΠ²Π΅ΡΠ΅Π½Π½ΠΎΡΡΠΈ =', MSE_2
print 'Π¨ΡΡΠ°Ρ Log Loss ΠΏΡΠΈ ΡΠΈΠ»ΡΠ½ΠΎΠΉ ΡΠ²Π΅ΡΠ΅Π½Π½ΠΎΡΡΠΈ =', LogLoss_2
Nyaya yekukanganisa - muenzaniso unopa chinhu kukirasi ine mukana we 0,01
Chirango pakushandisa MNC zvichazova:
Chirango pakushandisa Logistic Loss zvichazova:
Nyaya yekuvimba kwakasimba - muenzaniso unopa chinhu kukirasi ine mukana we 0,99
Chirango pakushandisa MNC zvichazova:
Chirango pakushandisa Logistic Loss zvichazova:
Uyu muenzaniso unoratidza zvakanaka kuti kana paine kukanganisa kukuru basa rekurasikirwa Los Loss inoranga modhi zvakanyanya kupfuura MSE. Ngatinzwisisei zvino kuti theoretical background ndeyei kushandisa basa rekurasikirwa Los Loss mumatambudziko ekuronga.
05. Yakanyanya mukana nzira uye logistic regression
Sezvakavimbiswa pakutanga, chinyorwa chacho chizere nemienzaniso iri nyore. Mu studio kune mumwe muenzaniso uye vaenzi vekare - vakwereta vebhangi: Vasya, Fedya naLesha.
Zvingoitika, ndisati ndagadzira muenzaniso, regai ndikuyeuchidze kuti muhupenyu isu tiri kubata nemuenzaniso wekudzidzisa wezviuru kana mamirioni ezvinhu zvine makumi kana mazana ezvimiro. Zvisinei, pano nhamba dzinotorwa kuitira kuti dzikwanise kupinda mumusoro we novice data science.
Ngatidzokere kumuenzaniso. Ngatimbofungidzira kuti mutungamiriri webhangi akasarudza kupa chikwereti kumunhu wese anoshaiwa, pasinei nokuti iyo algorithm yakamuudza kuti arege kuibudisa kuna Lesha. Uye zvino nguva yakakwana yapfuura uye tinoziva kuti ndeupi wemagamba matatu akadzorera chikwereti uye asina. Chii chaifanira kutarisirwa: Vasya naFedya vakadzorera chikwereti, asi Lesha haana. Iye zvino ngatimbofungidzira kuti chigumisiro ichi chichava sampuli itsva yekudzidzira kwatiri uye, panguva imwe chete, zvinoita sokuti data yose pamusoro pezvinhu zvinokonzera mukana wekubhadhara chikwereti (muhoro wemukwereti, saizi yekubhadhara pamwedzi) yanyangarika. Zvadaro, intuitively, tinogona kufunga kuti mukwereti wega wega wechitatu haadzore chikwereti kubhangi, kana nemamwe mazwi, mukana wemukwereti anotevera kubhadhara chikwereti. . Iyi intuitive fungidziro ine theoretical simbiso uye yakavakirwa pa yakanyanya mukana nzira, kazhinji mumabhuku inonzi maximum mukana musimboti.
Kutanga, ngatizivei nezve conceptual apparatus.
Sampling mukana iwo mukana wekuwana chaiwo sampuli yakadai, kuwana chaizvo zvakacherechedzwa/zvabuda, i.e. chigadzirwa chemikana yekuwana imwe neimwe yemuenzaniso mhinduro (somuenzaniso, kana chikwereti chaVasya, Fedya naLesha chakabhadharwa kana kusabhadharwa panguva imwe chete).
Zvichida basa inoenderana nemukana wemuenzaniso kune kukosha kweiyo paramita yekugovera.
Mune yedu kesi, sampu yekudzidzira ndeye yakajairika Bernoulli chirongwa, umo iyo yakasarudzika shanduko inotora maviri chete maitiro: kana . Naizvozvo, mukana wekuenzanisira unogona kunyorwa sechinhu chingangoita basa reparameter sezvinotevera:
Zviri pamusoro apa zvinogona kududzirwa sezvinotevera. Iko mukana wekubatana wekuti Vasya naFedya vachadzorera chikwereti chakaenzana , mukana wekuti Lesha HAKUNA kudzorera chikwereti chakaenzana (sezvo yanga isiri kudzoserwa kwechikwereti kwakaitika), saka mukana wekubatana wezviitiko zvese zvitatu zvakaenzana. .
Maximum mukana nzira inzira yekufungidzira parameter isingazivikanwe nekuwedzera mukana mabasa. Kwatiri, tinofanira kuwana ukoshi hwakadaro , apo inosvika pakukwirira kwayo.
Iko pfungwa chaiyo inobva kupi - kutsvaga kukosha kweparameter isingazivikanwe apo mukana wekuita unosvika pakakwirira? Mavambo epfungwa anobva papfungwa yekuti sampuli ndiyo chete bviro yeruzivo rwunowanikwa kwatiri nezvehuwandu hwevanhu. Zvese zvatinoziva nezvehuwandu zvinomiririrwa mumuenzaniso. Naizvozvo, zvese zvatingataura ndezvekuti sampuli ndiyo yakanyanya kuratidzwa yehuwandu huripo kwatiri. Naizvozvo, isu tinofanirwa kutsvaga parameter iyo inowanikwa sampuli inova yakanyanya kuitika.
Zviripachena, isu tiri kubata nedambudziko re optimization umo isu tinoda kuwana yakanyanyisa poindi yebasa. Kuti uwane iyo yakanyanyisa pfungwa, zvakakosha kufunga nezvekutanga-yekurongeka mamiriro, ndiko kuti, kuenzanisa kubva kune basa kune zero uye kugadzirisa iyo equation maererano neinoda parameter. Nekudaro, kutsvaga kunobva kwechigadzirwa chenhamba yakawanda yezvinhu kunogona kuve basa rakareba; kudzivirira izvi, pane yakakosha hunyanzvi - kushandura kune logarithm. mukana mabasa. Nei kuchinja kwakadaro kuchibvira? Ngatiteererei kune chokwadi chekuti isu hatisi kutsvaga kunyanyisa kwebasa racho pacharo, uye iyo yakanyanyisa pfungwa, ndiko kuti, kukosha kweiyo isingazivikanwe parameter , apo inosvika pakukwirira kwayo. Kana uchienda kune logarithm, iyo yekupedzisira nzvimbo haichinji (kunyangwe iyo extremum pachayo ichasiyana), sezvo logarithm iri monotonic basa.
Ngati, maererano nezviri pamusoro, tirambe tichivandudza muenzaniso wedu nezvikwereti kubva kuVasya, Fedya naLesha. Kutanga ngatienderere mberi logarithm yezvingangoita basa:
Iye zvino tinogona kusiyanisa zviri nyore kutaura ne :
Uye pakupedzisira, funga nezvekutanga-yekurongeka mamiriro - isu tinofananidza kubva kune basa kune zero:
Saka, yedu intuitive fungidziro yemukana wekubhadhara chikwereti zvakaruramiswa nedzidziso.
Zvakanaka, asi chii chatinofanira kuita neruzivo urwu izvozvi? Kana tikafunga kuti mukwereti wega wega wechitatu haadzosere mari kubhangi, ipapo iyo yekupedzisira ichaparara zvachose. Ndizvozvo, asi chete kana uchiongorora mukana wekubhadhara chikwereti chakaenzana ne Hatina kufunga nezvezvinhu zvinopesvedzera kubhadharwa kwechikwereti: muhoro wemukwereti uye saizi yemubhadharo wepamwedzi. Ngatiyeukei kuti isu takamboverenga mukana wekubhadhara chikwereti nemutengi wega wega, tichifunga nezvezvinhu zvakafanana izvi. Zvine musoro kuti isu takawana zvingangoitika zvakasiyana kubva kune zvinogara zvakaenzana .
Ngatitsanangurirei mukana wemasampuli:
Kodhi yekuverenga sampuli mikana
from functools import reduce
def likelihood(y,p):
line_true_proba = []
for i in range(len(y)):
ltp_i = p[i]**y[i]*(1-p[i])**(1-y[i])
line_true_proba.append(ltp_i)
likelihood = []
return reduce(lambda a, b: a*b, line_true_proba)
y = [1.0,1.0,0.0]
p_log_response = df['Probability']
const = 2.0/3.0
p_const = [const, const, const]
print 'ΠΡΠ°Π²Π΄ΠΎΠΏΠΎΠ΄ΠΎΠ±ΠΈΠ΅ Π²ΡΠ±ΠΎΡΠΊΠΈ ΠΏΡΠΈ ΠΊΠΎΠ½ΡΡΠ°Π½ΡΠ½ΠΎΠΌ Π·Π½Π°ΡΠ΅Π½ΠΈΠΈ p=2/3:', round(likelihood(y,p_const),3)
print '****************************************************************************************************'
print 'ΠΡΠ°Π²Π΄ΠΎΠΏΠΎΠ΄ΠΎΠ±ΠΈΠ΅ Π²ΡΠ±ΠΎΡΠΊΠΈ ΠΏΡΠΈ ΡΠ°ΡΡΠ΅ΡΠ½ΠΎΠΌ Π·Π½Π°ΡΠ΅Π½ΠΈΠΈ p:', round(likelihood(y,p_log_response),3)
Sample mukana pamutengo wenguva dzose :
Sample mukana kana uchiverenga mukana wekubhadhara chikwereti uchifunga nezvezvinhu :
Mukana wesample ine mukana wakaverengerwa zvichienderana nezvinhu zvakazove zvakakwirira pane zvingangoitika zvine huremu hwenguva dzose hunogoneka. Izvi zvinorevei? Izvi zvinoratidza kuti ruzivo nezve izvo zvinhu zvakaita kuti zvikwanise kunyatso kusarudza mukana wekubhadhara chikwereti kumutengi wega wega. Naizvozvo, pakupa chikwereti chinotevera, zvingava zvakanyanya kunaka kushandisa modhi yakatsanangurwa pakupera kwechikamu 3 chechinyorwa chekuongorora mukana wekubhadhara chikwereti.
Asi zvino, kana tichida kuwedzera sampuli mukana webasa, saka wadii kushandisa imwe algorithm iyo inoburitsa probabilities yeVasya, Fedya naLesha, semuenzaniso, yakaenzana ne0.99, 0.99 uye 0.01, zvichiteerana. Zvichida algorithm yakadaro ichaita zvakanaka pamuenzaniso wekudzidzisa, sezvo ichiunza sampuli yekukwanisa kukosha pedyo , asi, chekutanga, algorithm yakadaro inogona kunge iine matambudziko nekugona kuita generalization, uye chechipiri, iyi algorithm haizonyatso mutsara. Uye kana nzira dzekurwisa kuwedzeredza (yakaenzana kushaya simba generalization) zviri pachena kusabatanidzwa muurongwa hwechinyorwa ichi, saka ngatiendei nepakati pechipiri pfungwa zvakadzama. Kuti uite izvi, ingopindura mubvunzo uri nyore. Ko mukana weVasya naFedya kubhadhara chikwereti ungave wakafanana here, tichifunga nezvezvinhu zvatinoziva? Kubva pakuona kwezwi rinonzwika, hongu kwete, haigone. Saka Vasya achabhadhara 2.5% yemuhoro wake pamwedzi kubhadhara chikwereti, uye Fedya - inenge 27,8%. Uyewo mugirafu 2 "Client classification" tinoona kuti Vasya ari kure zvakanyanya kubva kumutsara unoparadzanisa makirasi kupfuura Fedya. Uye pakupedzisira, tinoziva kuti basa racho yeVasya naFedya inotora maitiro akasiyana: 4.24 yeVasya uye 1.0 yeFedya. Iye zvino, kana Fedya, somuenzaniso, akawana kurongeka kwehukuru kana kukumbira chikwereti chiduku, ipapo mikana yekudzorera chikwereti cheVasya naFedya ingave yakafanana. Mune mamwe mazwi, kutsamira kwemutsara hakugone kunyengedzwa. Uye kana isu takaverenga maodds chaiwo , uye haana kuvabvisa mumhepo yakaonda, tinogona kutaura zvakachengeteka kuti tsika dzedu zvinotibvumira kuti tifungidzire mukana wekubhadhara chikwereti nemunhu wese anokwereta, asi sezvo takabvumirana kufunga kuti kusarudzwa kwema coefficients. yakaitwa maererano nemitemo yese, saka isu tichafunga saizvozvo - coefficients yedu inotibvumira kupa fungidziro iri nani yemukana :)
Zvisinei, isu tinokanganisa. Muchikamu chino tinoda kunzwisisa kuti vector yehuremu inotarwa sei , iyo inofanirwa kuongorora mukana wekubhadhara chikwereti nemukwereti wega wega.
Ngatipfupise muchidimbu kuti ndeapi arsenal yatinotsvaka maodds :
1. Isu tinofungidzira kuti hukama pakati pechinangwa chekuchinja (kufanotaura kukosha) uye chinhu chinokonzera chigumisiro chiri mutsara. Nokuda kwechikonzero ichi, inoshandiswa linear regression function mutsa , mutsara unoparadzanisa zvinhu (vatengi) mumakirasi ΠΈ kana (vatengi vanokwanisa kubhadhara chikwereti uye avo vasingakwanisi). Muchiitiko chedu, equation ine fomu .
2. Tinoshandisa inverse logit basa mutsa kuona mukana wechinhu chiri mukirasi .
3. Isu tinoona kudzidziswa kwedu kwakaiswa sekushandiswa kwe generalized Bernoulli zvirongwa, ndiko kuti, kune chimwe nechimwe chinhu chinoshanduka chinogadzirwa, icho chine mukana (yayo pachayo yechinhu chimwe nechimwe) inotora kukosha 1 uye pamwe nemukana - 0.
4. Tinoziva zvatinoda kuti tiwedzere sampuli mukana webasa tichifunga nezvezvinhu zvinogamuchirwa kuitira kuti sampuli iripo ive inonzwisisika. Mune mamwe mazwi, isu tinofanirwa kusarudza maparamendi apo iyo sample ichanyanya kunzwisisika. Muchiitiko chedu, iyo parameter yakasarudzwa ndiyo mukana wekubhadhara chikwereti , iyo inoenderana neasingazivikanwi coefficients . Saka isu tinofanirwa kuwana vector yakadaro yehuremu , apo mukana wemuenzaniso uchava mukuru.
5. Tinoziva zvekuwedzera sampuli zvingangoita mabasa anogona kushandisa yakanyanya mukana nzira. Uye isu tinoziva ese anonyengera manomano ekushanda neiyi nzira.
Aya ndiwo maitiro azvinoita kuve akawanda-nhanho mafambiro :)
Iye zvino rangarira kuti pakutanga kwechinyorwa isu taida kutora marudzi maviri ekurasikirwa mabasa Logistic Loss zvichienderana nekuti makirasi echinhu akasarudzwa sei. Zvakaitika kuti mumatambudziko ekuronga nemakirasi maviri, makirasi anoratidzwa se ΠΈ kana . Zvichienderana neinoti, iyo inobuda ichave inoenderana nekurasikirwa basa.
Case 1. Kukamurwa kwezvinhu mu ΠΈ
Pakutanga, pakusarudza mukana wemuenzaniso, umo mukana wekubhadhara chikwereti nemukwereti wakaverengwa zvichienderana nezvikonzero uye kupiwa coefficients. , takashandisa fomula:
Muzvokwadi ndizvo zvinoreva logistic mhinduro mabasa kune imwe vheti yezviyereso
Ipapo hapana chinotitadzisa kunyora semuenzaniso ungangoita basa sezvinotevera:
Zvinoitika kuti dzimwe nguva zvakaoma kune vamwe vaongorori vekutanga kuti vanzwisise kuti basa iri rinoshanda sei. Ngatitarisei mienzaniso mipfupi mina inojekesa zvinhu:
1. kana (kureva, maererano nemuenzaniso wekudzidzisa, chinhu chacho ndechekirasi +1), uye algorithm yedu inotara mukana wekuisa chinhu mukirasi yakaenzana ne0.9, saka chidimbu chemuenzaniso mukana chinoverengerwa sezvinotevera:
2. kana uye , ipapo kuverenga kuchaita seizvi:
3. kana uye , ipapo kuverenga kuchaita seizvi:
4. kana uye , ipapo kuverenga kuchaita seizvi:
Zviripachena kuti mukana wekuita uchakwidziridzwa muzviitiko 1 uye 3 kana mune yakajairika kesi - nekunyatso kufungidzira kukosha kwezvingangoita zvekugovera chinhu kukirasi. .
Nekuda kwekuti pakusarudza mukana wekugovera chinhu kukirasi Isu chete hatizive macoefficients , ipapo tichavatsvaka. Sezvambotaurwa pamusoro apa, iri idambudziko re optimization umo isu tinotanga kutsvaga kubva kune mukana wekuita maererano nevector yehuremu. . Nekudaro, kutanga zvine musoro kurerutsa basa isu pachedu: isu tichatsvaga kubva kune iyo logarithm. mukana mabasa.
Sei mushure me logarithm, mukati logistic kukanganisa mabasa, takachinja chiratidzo kubva pamusoro . Zvese zviri nyore, sezvo mumatambudziko ekuongorora mhando yemodhi itsika kudzikisa kukosha kwechishandiso, isu takawedzera rutivi rwerudyi rwechiratidziro ne. uye maererano, panzvimbo yekuwedzera, ikozvino tinoderedza basa racho.
Chaizvoizvo, iko zvino, pamberi pemeso ako, basa rekurasikirwa rakatorwa zvinorwadza - Logistic Loss yekudzidziswa seti ine makirasi maviri: ΠΈ .
Iye zvino, kuti tiwane coefficients, isu tinongoda kuwana kubva logistic kukanganisa mabasa uyezve, uchishandisa nzira dzekuwedzera dzenhamba, senge gradient descent kana stochastic gradient descent, sarudza akanyanya kufanira coefficients. . Asi, nekupa huwandu hwakawanda hwechinyorwa, zvinokurudzirwa kuita mutsauko wega, kana pamwe ichi chichava musoro wechinyorwa chinotevera chine akawanda arithmetic pasina mienzaniso yakadzama.
Case 2. Kukamurwa kwezvinhu mu ΠΈ
Nzira pano ichave yakafanana nemakirasi ΠΈ , asi iyo nzira pachayo yekubuda kwebasa rekurasikirwa Logistic Loss, zvichawedzera kushongedza. Ngatitangei. Nekuda kwekuita basa tichashandisa mushandisi "kana ... saka..."... Ndiko kuti, kana Chinhu th ndechekirasi , tozoverenga mukana wemuenzaniso watinoshandisa mukana , kana chinhu chacho chiri chekirasi , tobva tatsiva mune mukana . Izvi ndizvo zvinoita mukana wekuita senge:
Ngatitsanangure paminwe yedu kuti inoshanda sei. Ngatitarisei nyaya 4:
1. kana ΠΈ , ipapo mukana wekuenzanisira ucha "enda"
2. kana ΠΈ , ipapo mukana wekuenzanisira ucha "enda"
3. kana ΠΈ , ipapo mukana wekuenzanisira ucha "enda"
4. kana ΠΈ , ipapo mukana wekuenzanisira ucha "enda"
Zviripachena kuti mumakesi 1 ne3, apo zvingangoitika zvakatemwa nealgorithm, mukana basa ichawedzerwa, ndiko kuti, izvi ndizvo chaizvo zvataida kuwana. Nekudaro, iyi nzira yakaoma uye inotevera tichafunga imwe compact notation. Asi chekutanga, ngatitorei logarithm iyo ingangoita basa neshanduko yechiratidzo, sezvo ikozvino tichaideredza.
Ngatitsive panzvimbo expression :
Ngatirerutsa izwi rakakodzera pasi pelogarithm tichishandisa akareruka arithmetic matekiniki uye titore:
Iye zvino yave nguva yekubvisa mushandisi "kana ... saka...". Cherechedza kuti kana chinhu ndewekirasi , ipapo muchirevo chiri pasi perogariti, mudhinominata, kusimudzwa kusimba , kana chinhu chacho chiri chekirasi , ipapo $e $ inosimudzwa kune simba . Naizvozvo, notation yedhigirii inogona kurerutswa nekubatanidza ese ari maviri kesi kuita imwe: . Ipapo logistic kukanganisa basa achatora fomu:
Maererano nemitemo ye logarithm, tinoshandura chikamu chacho uye tinoisa chiratidzo "" (minus) yelogarithm, tinowana:
Heino basa rekurasikirwa logistic kurasikirwa, iyo inoshandiswa mukudzidziswa yakaiswa nezvinhu zvakapihwa kumakirasi: ΠΈ .
Zvakanaka, panguva ino ndinoenda uye tinopedzisa chinyorwa.
Zvinhu zvekubatsira
1. Literature
1) Inoshandiswa regression analysis / N. Draper, G. Smith - 2nd ed. β M.: Finance and Statistics, 1986 (shanduro kubva kuChirungu)
2) Probability theory uye masvomhu manhamba / V.E. Gmurman - 9th ed. - M.: Chikoro chepamusoro, 2003
3) Probability theory / N.I. Chernova - Novosibirsk: Novosibirsk State University, 2007
4) Bhizinesi analytics: kubva kune data kusvika kune zivo / Paklin N. B., Oreshkov V. I. - 2nd ed. - St. Petersburg: Peter, 2013
5) Data Science Data sainzi kubva muvare / Joel Gras - St. Petersburg: BHV Petersburg, 2017
6) Nhamba dzinoshanda dzeData Science nyanzvi / P. Bruce, E. Bruce - St. Petersburg: BHV Petersburg, 2018
2. Hurukuro, makosi (vhidhiyo)
1)
2)
3)
4)
5)
3. Nzvimbo dzeInternet
1)
2)
4)
5)
7)
8)
Source: www.habr.com