M'nkhaniyi, tidzasanthula mawerengedwe a chiphunzitso cha kusintha Linear regression function Π² ntchito yosinthira logiti (yomwe imatchedwa ntchito yoyankha). Kenako, pogwiritsa ntchito arsenal pazipita njira, molingana ndi chitsanzo cha logistic regression, timapeza ntchito yotayika Kutayika kwa Logistic, kapena mwa kuyankhula kwina, tidzatanthauzira ntchito yomwe magawo a vector yolemetsa amasankhidwa muzojambula zobwerera. .
Ndondomeko yankhani:
- Tiyeni tibwereze mgwirizano wa mzere pakati pa mitundu iwiri
- Tiyeni tizindikire kufunika kosintha Linear regression function Π² Logistic yankho ntchito
- Tiyeni tichite zosinthika ndi zotuluka Logistic yankho ntchito
- Tiyeni tiyese kumvetsetsa chifukwa chake njira yocheperako imakhala yoyipa posankha magawo ntchito Kutayika kwa Logistic
- Timagwiritsa ntchito pazipita njira za kuzindikira ntchito zosankha parameter :
5.1. Mlandu 1: ntchito Kutayika kwa Logistic kwa zinthu zomwe zili ndi zilembo zamakalasi 0 ΠΈ 1:
5.2. Mlandu 2: ntchito Kutayika kwa Logistic kwa zinthu zomwe zili ndi zilembo zamakalasi -1 ΠΈ +1:
Nkhaniyi ili ndi zitsanzo zosavuta zomwe ziwerengero zonse ndizosavuta kupanga pakamwa kapena papepala; nthawi zina, chowerengera chingafunikire. Ndiye konzekerani :)
Nkhaniyi idapangidwira asayansi a data omwe ali ndi chidziwitso choyambirira pazoyambira zamakina ophunzirira.
Nkhaniyi iperekanso ma code ojambulira ma graph ndi kuwerengera. Ma code onse amalembedwa m'chinenerocho python 2.7. Ndiroleni ndifotokozeretu za "zachilendo" za mtundu womwe wagwiritsidwa ntchito - ichi ndi chimodzi mwamikhalidwe yotengera maphunziro odziwika bwino kuchokera Yandex pa nsanja yodziwika bwino yophunzirira pa intaneti Coursera, ndipo, monga momwe munthu angaganizire, nkhaniyo inakonzedwa kutengera maphunzirowa.
01. Kudalira kwa mzere wowongoka
Ndizomveka kufunsa funso - kodi kudalira kwa mzere ndi kusinthika kwazinthu zikugwirizana bwanji ndi izi?
Ndi zophweka! Logistic regression ndi imodzi mwazinthu zomwe zili mgulu la mzere. M'mawu osavuta, ntchito yowerengera mizere ndi kulosera zomwe mukufuna kuchokera ku variables (regressors) . Amakhulupirira kuti kudalira pakati pa makhalidwe ndi zomwe mukufuna mzere. Chifukwa chake dzina la classifier - linear. Kunena movutikira kwambiri, njira yosinthira zinthu idakhazikitsidwa poganiza kuti pali mgwirizano pakati pa mawonekedwewo. ndi zomwe mukufuna . Uku ndiye kulumikizana.
Pali chitsanzo choyamba mu studio, ndipo, molondola, za kudalira kwa rectilinear kwa kuchuluka komwe kumaphunziridwa. Ndikukonzekera nkhaniyi, ndinapeza chitsanzo chomwe chayika kale anthu ambiri - kudalira mphamvu zamakono pamagetsi. ("Applied regression analysis", N. Draper, G. Smith). Tiziwonanso pano.
Malingana ndi lamulo la Ohm:
kumene - luso lamakono, - Voteji, - kukana.
Ngati sitinadziwe lamulo la Ohm, ndiye titha kupeza kudalirako mwakusintha ndi kuyeza , pothandizira okhazikika. Kenako timawona kuti graph yodalira ΠΎΡ amapereka mzere wowongoka kwambiri kapena wocheperapo kudzera pa chiyambi. Timati "zochuluka kapena zochepa" chifukwa, ngakhale kuti chiyanjanocho chiri cholondola, miyeso yathu ikhoza kukhala ndi zolakwika zazing'ono, choncho mfundo zomwe zili pa graph sizingagwere ndendende pamzere, koma zidzabalalika mozungulira.
Chithunzi 1 "Kudalira" ΠΎΡ Β»
tchati chojambula kodi
import matplotlib.pyplot as plt
%matplotlib inline
import numpy as np
import random
R = 13.75
x_line = np.arange(0,220,1)
y_line = []
for i in x_line:
y_line.append(i/R)
y_dot = []
for i in y_line:
y_dot.append(i+random.uniform(-0.9,0.9))
fig, axes = plt.subplots(figsize = (14,6), dpi = 80)
plt.plot(x_line,y_line,color = 'purple',lw = 3, label = 'I = U/R')
plt.scatter(x_line,y_dot,color = 'red', label = 'Actual results')
plt.xlabel('I', size = 16)
plt.ylabel('U', size = 16)
plt.legend(prop = {'size': 14})
plt.show()
02. Kufunika kosintha mizere yobwerezabwereza
Tiyeni tione chitsanzo china. Tiyerekeze kuti timagwira ntchito kubanki ndipo ntchito yathu ndikuwona mwayi woti wobwereka abweze ngongoleyo malinga ndi zinthu zina. Kuti ntchitoyi ikhale yosavuta, tingoganizira zinthu ziwiri zokha: malipiro apamwezi a wobwereka komanso kuchuluka kwa kubweza ngongole pamwezi.
Ntchitoyi imakhala yokhazikika, koma ndi chitsanzo ichi tikhoza kumvetsa chifukwa chake sikokwanira kugwiritsa ntchito Linear regression function, ndikupezanso zosintha zomwe ziyenera kuchitidwa ndi ntchitoyi.
Tiyeni tibwererenso ku chitsanzo. Zimamveka kuti malipiro akukwera, wobwereketsa amatha kugawa mwezi uliwonse kuti abweze ngongoleyo. Pa nthawi yomweyi, pamtundu wina wa malipiro, ubalewu udzakhala wofanana. Mwachitsanzo, tiyeni titenge kuchuluka kwa malipiro kuchokera ku 60.000 RUR kufika ku 200.000 RUR ndikulingalira kuti mumtundu wa malipiro otchulidwawo, kudalira kukula kwa malipiro a mwezi uliwonse pa kukula kwa malipiro ndi mzere. Tinene kuti pamlingo womwe waperekedwawo zidawululidwa kuti chiwongola dzanja cha malipiro sichingagwere pansi pa 3 ndipo wobwereka ayenera kukhalabe ndi 5.000 RUR posungira. Ndipo pokha pankhaniyi, tidzaganiza kuti wobwereka adzabweza ngongole kubanki. Kenako, mzere wa regression equation utenga mawonekedwe:
kumene , , , - malipiro - wobwereka, - malipiro a ngongole -wobwereka.
Kusintha malipiro ndi malipiro a ngongole ndi magawo okhazikika mu equation Mutha kusankha kupereka kapena kukana ngongole.
Kuyang'ana m'tsogolo, tikuona kuti, ndi magawo anapatsidwa Linear regression function, yogwiritsidwa ntchito Logistic mayankho ntchito idzatulutsa zindalama zazikulu zomwe zingasokoneze kuwerengera kuti mudziwe kuthekera kwa kubweza ngongole. Chifukwa chake, akufunsidwa kuti achepetse ma coefficients athu, tinene, nthawi 25.000. Kusintha uku kwa ma coefficients sikungasinthe chisankho chopereka ngongole. Tiyeni tikumbukire mfundoyi m'tsogolomu, koma tsopano, kuti tifotokoze momveka bwino zomwe tikukamba, tiyeni tiganizire momwe zinthu zilili ndi atatu omwe angakhale obwereka.
Table 1 "Ofuna kubwereka"
Code kupanga tebulo
import pandas as pd
r = 25000.0
w_0 = -5000.0/r
w_1 = 1.0/r
w_2 = -3.0/r
data = {'The borrower':np.array(['Vasya', 'Fedya', 'Lesha']),
'Salary':np.array([120000,180000,210000]),
'Payment':np.array([3000,50000,70000])}
df = pd.DataFrame(data)
df['f(w,x)'] = w_0 + df['Salary']*w_1 + df['Payment']*w_2
decision = []
for i in df['f(w,x)']:
if i > 0:
dec = 'Approved'
decision.append(dec)
else:
dec = 'Refusal'
decision.append(dec)
df['Decision'] = decision
df[['The borrower', 'Salary', 'Payment', 'f(w,x)', 'Decision']]
Malinga ndi zomwe zili patebulo, Vasya, yemwe ali ndi malipiro a 120.000 RUR, akufuna kulandira ngongole kuti athe kubweza mwezi uliwonse pa 3.000 RUR. Tinatsimikiza kuti kuti tivomereze ngongoleyo, malipiro a Vasya ayenera kupitirira katatu kuchuluka kwa malipiro, ndipo payenera kukhala 5.000 RUR yotsala. Vasya amakwaniritsa izi: . Ngakhale 106.000 RUR yatsala. Ngakhale kuti powerengera tachepetsa zovuta Nthawi 25.000, zotsatira zake zinali zofanana - ngongole ikhoza kuvomerezedwa. Fedya adzalandiranso ngongole, koma Lesha, ngakhale kuti amalandira kwambiri, adzayenera kuchepetsa zilakolako zake.
Tiyeni tijambule chithunzi cha mlanduwu.
Tchati 2 βMagulu a Obwerekaβ
Code yojambula graph
salary = np.arange(60000,240000,20000)
payment = (-w_0-w_1*salary)/w_2
fig, axes = plt.subplots(figsize = (14,6), dpi = 80)
plt.plot(salary, payment, color = 'grey', lw = 2, label = '$f(w,x_i)=w_0 + w_1x_{i1} + w_2x_{i2}$')
plt.plot(df[df['Decision'] == 'Approved']['Salary'], df[df['Decision'] == 'Approved']['Payment'],
'o', color ='green', markersize = 12, label = 'Decision - Loan approved')
plt.plot(df[df['Decision'] == 'Refusal']['Salary'], df[df['Decision'] == 'Refusal']['Payment'],
's', color = 'red', markersize = 12, label = 'Decision - Loan refusal')
plt.xlabel('Salary', size = 16)
plt.ylabel('Payment', size = 16)
plt.legend(prop = {'size': 14})
plt.show()
Kotero, mzere wathu wowongoka, womangidwa molingana ndi ntchitoyo , imalekanitsa obwereka βoipaβ ndi βabwinoβ. Obwereketsa omwe zilakolako zawo sizigwirizana ndi kuthekera kwawo ali pamwamba pa mzere (Lesha), pamene iwo omwe, malinga ndi magawo a chitsanzo chathu, amatha kubweza ngongole ali pansi pa mzere (Vasya ndi Fedya). Mwa kuyankhula kwina, tikhoza kunena izi: mzere wathu wachindunji umagawaniza obwereka m'magulu awiri. Tiyeni tiwonetse iwo motere: ku kalasi Tidzasankha obwereketsa omwe atha kubweza ngongoleyo ngati kapena Tiphatikiza obwereka omwe mwina sangathe kubweza ngongoleyo.
Tiyeni tifotokoze mwachidule mfundo zake kuchokera mu chitsanzo chosavutachi. Tiyeni titengepo mfundo ndi, kulowetsa zogwirizanitsa za mfundoyo mu equation yofanana ya mzere , ganizirani zinthu zitatu zimene mungachite:
- Ngati mfundoyo ili pansi pa mzere ndipo timayipereka kwa kalasi , ndiye mtengo wa ntchitoyo adzakhala positive kuchokera mpaka . Izi zikutanthauza kuti tikhoza kuganiza kuti mwayi wobwezera ngongoleyo uli mkati . Kuchulukira kwa magwiridwe antchito, ndikokweranso mwayi.
- Ngati mfundo ili pamwamba pa mzere ndipo timayipereka kwa kalasi kapena , ndiye mtengo wa ntchitoyo udzakhala wopanda pake mpaka . Kenako tidzaganiza kuti mwayi wobweza ngongole uli mkati ndipo, kukula kwakukulu kwa mtengo wathunthu wa ntchitoyo, kukulitsa chidaliro chathu.
- Mfundoyi ili pamzere wowongoka, pamalire pakati pa magulu awiri. Pankhaniyi, mtengo wa ntchito adzakhala ofanana ndipo mwayi wobwezera ngongoleyo ndi wofanana ndi .
Tsopano, tiyeni tiyerekeze kuti tilibe zinthu ziwiri, koma zambiri, osati zitatu, koma zikwi za obwereka. Ndiye mmalo mwa mzere wowongoka tidzakhala nawo m-dimensional ndege ndi coefficients sitidzachotsedwa mpweya woonda, koma anachokera motsatira malamulo onse, ndipo pamaziko a deta anasonkhanitsa pa obwereketsa amene kapena sanabweze ngongole. Ndipo ndithudi, zindikirani kuti tsopano tikusankha obwereka pogwiritsa ntchito ma coefficients omwe amadziwika kale . M'malo mwake, ntchito ya logostic regression model ndikutsimikiza magawowo , pomwe phindu la kutayika limagwira ntchito Kutayika kwa Logistic adzakhala ochepa. Koma za momwe vekitala imawerengedwera , tidzapeza zambiri mu gawo lachisanu la nkhaniyi. Pakadali pano, tikubwerera ku dziko lolonjezedwa - kwa banki wathu ndi makasitomala ake atatu.
Chifukwa cha ntchito tikudziwa amene angapatsidwe ngongole ndi amene ayenera kukanidwa. Koma simungapite kwa wotsogolera ndi chidziwitso chotere, chifukwa amafuna kuti tipeze mwayi wobwezera ngongoleyo ndi wobwereka aliyense. Zoyenera kuchita? Yankho ndi losavuta - tiyenera mwanjira ina kusintha ntchito , omwe mtengo wake uli pamitundu yosiyanasiyana ku ntchito yomwe mtengo wake udzakhala pamndandanda . Ndipo ntchito yotereyi ilipo, imatchedwa ntchito yoyankhira kapena kusintha kosinthika. Kukumana:
Tiyeni tiwone sitepe ndi sitepe momwe zimagwirira ntchito Logistic yankho ntchito. Onani kuti tidzayenda mosiyana, i.e. tidzaganiza kuti tikudziwa mtengo wotheka, womwe uli pamtunda kuchokera mpaka ndiyeno "timasula" mtengo uwu kumitundu yonse ya manambala mpaka .
03. Timapeza ntchito yoyankhira zinthu
Gawo 1. Sinthani makonda kukhala osiyanasiyana
Pa kusintha kwa ntchito Π² Logistic yankho ntchito Tisiya wowunika wathu wangongole yekha ndikuyang'ana olemba mabuku m'malo mwake. Ayi, ndithudi, sitidzaika kubetcherana, zonse zomwe zimatisangalatsa ndi tanthauzo la mawuwo, mwachitsanzo, mwayi ndi 4 mpaka 1. Zovuta, zomwe zimadziwika kwa onse omwe amabetcha, ndi chiΕ΅erengero cha "kupambana" ku " zolepheraβ. M'mawu othekera, mwayi ndi mwayi wa chochitika chomwe chichitike mogawanika ndi kuthekera kwakuti chochitikacho sichinachitike. Tiyeni tilembe dongosolo la mwayi woti chochitika chichitike :
kumene - mwayi woti chochitika chichitike, - kuthekera kwakuti chochitika SICHITI chichitike
Mwachitsanzo, ngati mwayi woti hatchi yachinyamata, yamphamvu komanso yosangalatsa yotchedwa "Veterok" idzagonjetsa mzimayi wachikulire komanso wonyezimira wotchedwa "Matilda" pa mpikisano. , ndiye mwayi wopambana wa "Veterok" udzakhala ΠΊ ndipo mosemphanitsa, podziwa zovutazo, sizidzakhala zovuta kwa ife kuwerengera kuthekera :
Chifukwa chake, taphunzira "kumasulira" mwayi kukhala mwayi, womwe umatenga zofunikira kuchokera mpaka . Tiyeni titengepo gawo limodzi ndikuphunzira βkumasuliraβ kuthekera kwa mzere wonse wa manambala kuchokera mpaka .
Gawo 2. Sinthani makonda kukhala osiyanasiyana
Njira iyi ndiyosavuta - tiyeni titengere logarithm ya zovuta kumunsi pa nambala ya Euler. ndipo timapeza:
Tsopano tikudziwa kuti ngati , kenako werengerani mtengo wake zikhala zophweka, komanso ziyenera kukhala zabwino: . Izi ndi Zow.
Chifukwa cha chidwi, tiyeni tione ngati , ndiye tikuyembekezera kuwona mtengo wolakwika . Tikuwona: . Ndichoncho.
Tsopano tikudziwa momwe tingasinthire kuchuluka kwa kuthekera kuchokera mpaka pa mzere wonse wa nambala kuyambira mpaka . Mu sitepe yotsatira tidzachita zosiyana.
Pakalipano, tikuwona kuti mogwirizana ndi malamulo a logarithm, podziwa kufunika kwa ntchitoyi , mutha kuwerengera zovuta:
Njira iyi yodziwira zovuta idzakhala yothandiza kwa ife mu sitepe yotsatira.
Gawo 3. Tiyeni tipeze chilinganizo kuti tidziwe
Kotero ife tinaphunzira, tikudziwa , pezani magwiridwe antchito . Komabe, kwenikweni, timafunikira zosiyana - kudziwa mtengo wake kupeza . Kuti tichite izi, tiyeni titembenukire ku lingaliro loti inverse odd ntchito, molingana ndi izi:
M'nkhaniyi sitipeza ndondomeko yomwe ili pamwambayi, koma tiyang'ana pogwiritsa ntchito manambala kuchokera ku chitsanzo pamwambapa. Tikudziwa kuti ndi zovuta za 4 mpaka 1 (), mwayi wa chochitikacho ndi 0.8 (). Tiyeni tisinthe: . Izi zikugwirizana ndi kuwerengera kwathu komwe tinachita kale. Tiyeni tipitirire.
Mu sitepe yotsiriza tinazindikira izo , zomwe zikutanthauza kuti mutha kulowetsa m'malo mosintha. Timapeza:
Gawani nambala ndi denominator ndi , Kenako:
Zikatero, kuti tiwonetsetse kuti sitinalakwitse kulikonse, tiyeni tichite cheke china chaching'ono. Mu sitepe 2, ife kwa adatsimikiza kuti . Kenako, m'malo mtengo mu Logistic mayankho ntchito, tikuyembekeza kupeza . Timalowetsa ndikupeza:
Tikukuthokozani, owerenga okondedwa, tangotenga kumene ndikuyesa ntchito yoyankha. Tiyeni tiwone graph ya ntchitoyo.
Graph 3 "Ntchito yoyankhira"
Code yojambula graph
import math
def logit (f):
return 1/(1+math.exp(-f))
f = np.arange(-7,7,0.05)
p = []
for i in f:
p.append(logit(i))
fig, axes = plt.subplots(figsize = (14,6), dpi = 80)
plt.plot(f, p, color = 'grey', label = '$ 1 / (1+e^{-w^Tx_i})$')
plt.xlabel('$f(w,x_i) = w^Tx_i$', size = 16)
plt.ylabel('$p_{i+}$', size = 16)
plt.legend(prop = {'size': 14})
plt.show()
M'mabuku mungapezenso dzina la ntchitoyi ngati ntchito ya sigmoid. Grafu ikuwonetsa momveka bwino kuti kusintha kwakukulu kwa kuthekera kwa chinthu cha kalasi kumachitika mkati mwazochepa. , kwinakwake mpaka .
Ndikupangira kuti ndibwerere kwa katswiri wathu wangongole ndikumuthandiza kuwerengera mwayi wobweza ngongole, apo ayi akhoza kukhala pachiwopsezo chosiyidwa popanda bonasi :)
Table 2 "Ofuna kubwereka"
Code kupanga tebulo
proba = []
for i in df['f(w,x)']:
proba.append(round(logit(i),2))
df['Probability'] = proba
df[['The borrower', 'Salary', 'Payment', 'f(w,x)', 'Decision', 'Probability']]
Chifukwa chake, tatsimikiza kuthekera kobweza ngongole. Mwambiri, izi zikuwoneka ngati zoona.
Inde, mwayi woti Vasya, ndi malipiro a 120.000 RUR, adzatha kupereka 3.000 RUR ku banki mwezi uliwonse ali pafupi ndi 100%. Mwa njira, tiyenera kumvetsetsa kuti banki ikhoza kupereka ngongole kwa Lesha ngati ndondomeko ya banki ikupereka, mwachitsanzo, kubwereketsa kwa makasitomala omwe ali ndi mwayi wobwezera ngongole zambiri kuposa, kunena, 0.3. Kungoti pamenepa banki idzapanga malo ochulukirapo kuti awonongeke.
Tiyeneranso kukumbukira kuti chiΕ΅erengero cha malipiro ndi malipiro osachepera 3 ndi malire a 5.000 RUR anatengedwa kuchokera padenga. Chifukwa chake, sitinathe kugwiritsa ntchito vekitala ya zolemera mu mawonekedwe ake oyamba . Tinkafunika kuchepetsa kwambiri ma coefficients, ndipo pamenepa tinagawaniza coefficient iliyonse ndi 25.000, ndiye kuti, kwenikweni, tinasintha zotsatira. Koma izi zinachitidwa mwachindunji kuti kufeΕ΅etsa kamvedwe ka nkhaniyo pa siteji yoyamba. M'moyo, sitidzafunika kupanga ndikusintha ma coefficients, koma kuwapeza. M'magawo otsatirawa a nkhaniyi tipeza ma equations omwe magawo amasankhidwa .
04. Njira yocheperako yodziwira vekitala ya zolemera mu Logistic mayankho ntchito
Tikudziwa kale njira iyi posankha vekitala ya zolemera , monga Njira yocheperako (LSM) ndipo kwenikweni, bwanji ife ndiye ntchito mu mavuto bayinare gulu? Zowonadi, palibe chomwe chimakulepheretsani kugwiritsa ntchito MNC, njira iyi yokha mu zovuta zamagulu imapereka zotsatira zomwe sizolondola kuposa Kutayika kwa Logistic. Pali maziko ongoyerekeza a izi. Tiyeni choyamba tione chitsanzo chimodzi chophweka.
Tiyerekeze kuti zitsanzo zathu (kugwiritsa ntchito MSE ΠΈ Kutayika kwa Logistic) ayamba kale kusankha vector of weights ndipo tinasiya kuwerengera pa sitepe ina. Ziribe kanthu kaya pakati, kumapeto kapena kumayambiriro, chinthu chachikulu ndi chakuti tili kale ndi mfundo zina za vekitala ya zolemera ndipo tiyeni tiyerekeze kuti pa sitepe iyi, vekitala wa zolemera. kwa zitsanzo zonsezi palibe kusiyana. Kenako tengani zolemera zomwe zatsalazo ndikuzilowetsamo Logistic yankho ntchito () pa chinthu china cha kalasi . Timawunika milandu iwiri pamene, malinga ndi vekitala yosankhidwa yolemera, chitsanzo chathu ndi cholakwika kwambiri ndipo mosiyana - chitsanzocho chimakhulupirira kwambiri kuti chinthucho ndi cha kalasi. . Tiyeni tiwone zomwe zindapusa zidzaperekedwa mukamagwiritsa ntchito MNC ΠΈ Kutayika kwa Logistic.
Code kuwerengera zilango kutengera ntchito yotayika yomwe imagwiritsidwa ntchito
# ΠΊΠ»Π°ΡΡ ΠΎΠ±ΡΠ΅ΠΊΡΠ°
y = 1
# Π²Π΅ΡΠΎΡΡΠ½ΠΎΡΡΡ ΠΎΡΠ½Π΅ΡΠ΅Π½ΠΈΡ ΠΎΠ±ΡΠ΅ΠΊΡΠ° ΠΊ ΠΊΠ»Π°ΡΡΡ Π² ΡΠΎΠΎΡΠ²Π΅ΡΡΡΠ²ΠΈΠΈ Ρ ΠΏΠ°ΡΠ°ΠΌΠ΅ΡΡΠ°ΠΌΠΈ w
proba_1 = 0.01
MSE_1 = (y - proba_1)**2
print 'Π¨ΡΡΠ°Ρ MSE ΠΏΡΠΈ Π³ΡΡΠ±ΠΎΠΉ ΠΎΡΠΈΠ±ΠΊΠ΅ =', MSE_1
# Π½Π°ΠΏΠΈΡΠ΅ΠΌ ΡΡΠ½ΠΊΡΠΈΡ Π΄Π»Ρ Π²ΡΡΠΈΡΠ»Π΅Π½ΠΈΡ f(w,x) ΠΏΡΠΈ ΠΈΠ·Π²Π΅ΡΡΠ½ΠΎΠΉ Π²Π΅ΡΠΎΡΡΠ½ΠΎΡΡΠΈ ΠΎΡΠ½Π΅ΡΠ΅Π½ΠΈΡ ΠΎΠ±ΡΠ΅ΠΊΡΠ° ΠΊ ΠΊΠ»Π°ΡΡΡ +1 (f(w,x)=ln(odds+))
def f_w_x(proba):
return math.log(proba/(1-proba))
LogLoss_1 = math.log(1+math.exp(-y*f_w_x(proba_1)))
print 'Π¨ΡΡΠ°Ρ Log Loss ΠΏΡΠΈ Π³ΡΡΠ±ΠΎΠΉ ΠΎΡΠΈΠ±ΠΊΠ΅ =', LogLoss_1
proba_2 = 0.99
MSE_2 = (y - proba_2)**2
LogLoss_2 = math.log(1+math.exp(-y*f_w_x(proba_2)))
print '**************************************************************'
print 'Π¨ΡΡΠ°Ρ MSE ΠΏΡΠΈ ΡΠΈΠ»ΡΠ½ΠΎΠΉ ΡΠ²Π΅ΡΠ΅Π½Π½ΠΎΡΡΠΈ =', MSE_2
print 'Π¨ΡΡΠ°Ρ Log Loss ΠΏΡΠΈ ΡΠΈΠ»ΡΠ½ΠΎΠΉ ΡΠ²Π΅ΡΠ΅Π½Π½ΠΎΡΡΠΈ =', LogLoss_2
Mlandu wolakwa - chitsanzocho chimapereka chinthu ku kalasi ndi kuthekera kwa 0,01
Chilango pakagwiritsidwe ntchito MNC adzakhala:
Chilango pakagwiritsidwe ntchito Kutayika kwa Logistic adzakhala:
Mlandu wa chidaliro champhamvu - chitsanzocho chimapereka chinthu ku kalasi ndi kuthekera kwa 0,99
Chilango pakagwiritsidwe ntchito MNC adzakhala:
Chilango pakagwiritsidwe ntchito Kutayika kwa Logistic adzakhala:
Chitsanzo ichi chikuwonetsa bwino kuti pakachitika cholakwika chachikulu ntchito yotayika Kutayika kwa Log amalanga chitsanzo kwambiri kuposa MSE. Tiyeni tsopano timvetsetse zomwe maziko a chiphunzitso ndikugwiritsa ntchito ntchito yotayika Kutayika kwa Log mu zovuta zamagulu.
05. Njira yothekera kwambiri ndi kubwereranso kwazinthu
Monga momwe analonjezera pachiyambi, nkhaniyi ili ndi zitsanzo zosavuta. Mu studio pali chitsanzo china ndi alendo akale - obwereketsa banki: Vasya, Fedya ndi Lesha.
Zikatero, ndisanapange chitsanzo, ndiloleni ndikukumbutseni kuti m'moyo tikuchita ndi zitsanzo za masauzande kapena mamiliyoni azinthu zomwe zili ndi makumi kapena mazana. Komabe, apa manambala amatengedwa kuti azitha kulowa mumutu wa novice data wasayansi.
Tiyeni tibwererenso ku chitsanzo. Tiyerekeze kuti wotsogolera bankiyo adaganiza zopereka ngongole kwa aliyense wofunikira, ngakhale kuti ndondomekoyi inamuuza kuti asapereke kwa Lesha. Ndipo tsopano nthawi yokwanira yadutsa ndipo tikudziwa kuti ndi ndani mwa ngwazi zitatu zomwe zidabweza ngongoleyo komanso zomwe sizinabweze. Zomwe ziyenera kuyembekezera: Vasya ndi Fedya adabweza ngongoleyo, koma Lesha sanatero. Tsopano tiyeni tiyerekeze kuti chotsatirachi chidzakhala chitsanzo chatsopano cha maphunziro kwa ife ndipo, panthawi imodzimodziyo, zimakhala ngati deta yonse pazinthu zomwe zimalimbikitsa mwayi wobwezera ngongole (malipiro a wobwereka, kukula kwa malipiro a mwezi uliwonse) yatha. Kenako, mwachidziwitso, titha kuganiza kuti wobwereketsa wachitatu sakubweza ngongole kubanki, kapena mwa kuyankhula kwina, kuthekera kwa wobwereka wina kubweza ngongoleyo. . Lingaliro lachidziwitso ili lili ndi chitsimikiziro chamalingaliro ndipo chimachokera pazipita njira, nthawi zambiri m'mabuku amatchedwa pazipita mwayi mfundo.
Choyamba, tiyeni tidziwe bwino zida zamalingaliro.
Zitsanzo mwayi ndi mwayi wopeza chitsanzo choterocho, kupeza zenizeni / zotsatira, i.e. zotsatira za mwayi wopeza zotsatira za chitsanzo chilichonse (mwachitsanzo, ngati ngongole ya Vasya, Fedya ndi Lesha inabwezeredwa kapena sanabwezedwe nthawi yomweyo).
Kuthekera ntchito ikugwirizana ndi kuthekera kwa sampuli kumagulu a magawo ogawa.
Kwa ife, chitsanzo cha maphunziro ndi chiwembu cha Bernoulli, chomwe kusinthika kwachisawawa kumatenga zinthu ziwiri zokha: kapena . Chifukwa chake, kuthekera kwachitsanzo kumatha kulembedwa ngati ntchito yotheka ya parameter motere:
Zomwe zili pamwambazi zitha kutanthauziridwa motere. Kuthekera kophatikizana komwe Vasya ndi Fedya abweza ngongoleyo ndikofanana , mwayi woti Lesha SADZABWERETSA ngongoleyo ndi wofanana (popeza sikunali kubweza ngongole komwe kunachitika), chifukwa chake kuthekera kophatikizana kwa zochitika zonse zitatu ndizofanana. .
Njira yofikira kwambiri ndi njira yoyezera gawo losadziwika pokulitsa zotheka ntchito. Kwa ife, tifunikira kupeza mtengo wotero pa amafika pachimake.
Kodi lingaliro lenileni limachokera kuti - kuyang'ana mtengo wa chizindikiro chosadziwika chomwe ntchito yotheka ikufika pamlingo waukulu? Chiyambi cha lingalirolo chimachokera ku lingaliro lakuti chitsanzo ndi gwero lokha la chidziwitso chopezeka kwa ife ponena za chiwerengero cha anthu. Chilichonse chomwe tikudziwa chokhudza kuchuluka kwa anthu chikuyimiridwa pachitsanzo. Chifukwa chake, zomwe tinganene ndikuti chitsanzo ndiye chiwonetsero cholondola cha kuchuluka kwa anthu omwe tili nawo. Choncho, tifunika kupeza parameter yomwe chitsanzo chomwe chilipo chimakhala chotheka kwambiri.
Mwachiwonekere, tikulimbana ndi vuto lokhathamiritsa momwe timafunikira kupeza malo omaliza a ntchito. Kuti mupeze mfundo yowonjezereka, m'pofunika kuganizira za dongosolo loyamba, ndiye kuti, kufananitsa chotsatira cha ntchitoyo ndi zero ndi kuthetsa equation polemekeza parameter yomwe mukufuna. Komabe, kusaka zomwe zimachokera kuzinthu zambiri zitha kukhala ntchito yayitali; kupewa izi, pali njira yapadera - kusinthira ku logarithm. zotheka ntchito. Nβchifukwa chiyani kusintha koteroko kuli kotheka? Tiyeni tiyang'ane pa mfundo yakuti sitikuyang'ana mopitirira malire a ntchitoyo, ndi mfundo yowonjezereka, ndiko kuti, mtengo wa chizindikiro chosadziwika pa amafika pachimake. Mukasunthira ku logarithm, malo opitilira muyeso sasintha (ngakhale ma extremum okhawo amasiyana), popeza logarithm ndi ntchito ya monotonic.
Tiyeni, molingana ndi zomwe tafotokozazi, tipitirize kukulitsa chitsanzo chathu ndi ngongole kuchokera kwa Vasya, Fedya ndi Lesha. Choyamba tiyeni tipitirire ku logarithm ya kuthekera ntchito:
Tsopano ife tikhoza kusiyanitsa mosavuta mawu ndi :
Ndipo pomaliza, taganizirani za dongosolo loyamba - timafananiza chochokera ku ntchitoyo ndi zero:
Chifukwa chake, kuyerekeza kwathu mwachilengedwe kwa kuthekera kwa kubweza ngongole zidalungamitsidwa mwachibwana.
Chabwino, koma tiyenera kuchita chiyani ndi chidziwitsochi tsopano? Ngati tikuganiza kuti wobwereketsa wachitatu sabwezera ndalama kubanki, ndiye kuti womalizayo adzasokonekera. Ndiko kulondola, koma pokha pofufuza mwayi wobweza ngongole wofanana ndi Sitinaganizire zinthu zomwe zimakhudza kubweza ngongole: malipiro a wobwereka ndi kukula kwa malipiro a mwezi uliwonse. Tikumbukire kuti tidawerengera kale mwayi wobweza ngongoleyo ndi kasitomala aliyense, poganizira zomwezi. Ndizomveka kuti tinapeza mwayi wosiyana ndi wofanana nthawi zonse .
Tiyeni tifotokoze kuthekera kwa zitsanzo:
Code yowerengera kuthekera kwachitsanzo
from functools import reduce
def likelihood(y,p):
line_true_proba = []
for i in range(len(y)):
ltp_i = p[i]**y[i]*(1-p[i])**(1-y[i])
line_true_proba.append(ltp_i)
likelihood = []
return reduce(lambda a, b: a*b, line_true_proba)
y = [1.0,1.0,0.0]
p_log_response = df['Probability']
const = 2.0/3.0
p_const = [const, const, const]
print 'ΠΡΠ°Π²Π΄ΠΎΠΏΠΎΠ΄ΠΎΠ±ΠΈΠ΅ Π²ΡΠ±ΠΎΡΠΊΠΈ ΠΏΡΠΈ ΠΊΠΎΠ½ΡΡΠ°Π½ΡΠ½ΠΎΠΌ Π·Π½Π°ΡΠ΅Π½ΠΈΠΈ p=2/3:', round(likelihood(y,p_const),3)
print '****************************************************************************************************'
print 'ΠΡΠ°Π²Π΄ΠΎΠΏΠΎΠ΄ΠΎΠ±ΠΈΠ΅ Π²ΡΠ±ΠΎΡΠΊΠΈ ΠΏΡΠΈ ΡΠ°ΡΡΠ΅ΡΠ½ΠΎΠΌ Π·Π½Π°ΡΠ΅Π½ΠΈΠΈ p:', round(likelihood(y,p_log_response),3)
Zitsanzo zothekera pamtengo wokhazikika :
Zitsanzo zothekera powerengera kuthekera kwa kubweza ngongole poganizira zinthu :
Kuthekera kwa sampuli yokhala ndi kuthekera kowerengeredwa kutengera zinthu zomwe zidapezeka kuti ndizokwera kuposa zomwe zili ndi mtengo wokhazikika. Kodi izi zikutanthauza chiyani? Izi zikusonyeza kuti kudziwa za zinthuzi kunapangitsa kuti zitheke kusankha bwino kwambiri mwayi wobweza ngongole kwa kasitomala aliyense. Choncho, popereka ngongole yotsatira, zingakhale zolondola kugwiritsa ntchito chitsanzo chomwe chaperekedwa kumapeto kwa ndime 3 ya nkhaniyo poyesa kubweza ngongole.
Koma ndiye, ngati tikufuna kukulitsa sampuli kuthekera ntchito, ndiye bwanji osagwiritsa ntchito ma aligorivimu omwe angatulutse mwayi wa Vasya, Fedya ndi Lesha, mwachitsanzo, wofanana ndi 0.99, 0.99 ndi 0.01, motsatana. Mwina algorithm yotereyi idzachita bwino pachitsanzo chophunzitsira, chifukwa idzabweretsa kufunikira kwachitsanzo pafupi , koma, choyamba, ma aligorivimu oterowo atha kukhala ndi zovuta ndi kuthekera kowonjezera, ndipo chachiwiri, algorithm iyi sikhala yofananira. Ndipo ngati njira zolimbana ndi kuphunzitsidwa mopambanitsa (kuthekera kofooka kwathunthu) sizikuphatikizidwa mu dongosolo la nkhaniyi, ndiye kuti tidutse mfundo yachiwiri mwatsatanetsatane. Kuti muchite izi, ingoyankhani funso losavuta. Kodi mwayi wa Vasya ndi Fedya kubweza ngongoleyo ungakhale wofanana, poganizira zomwe tikudziwa? Kuchokera pamalingaliro omveka bwino, ayi, sizingatheke. Choncho Vasya adzalipira 2.5% ya malipiro ake pamwezi kuti abweze ngongoleyo, ndipo Fedya - pafupifupi 27,8%. Komanso mu graph 2 "Magulu a Makasitomala" tikuwona kuti Vasya ali kutali kwambiri ndi mzere wolekanitsa makalasi kuposa Fedya. Ndipo potsiriza, ife tikudziwa kuti ntchito kwa Vasya ndi Fedya amatenga zinthu zosiyanasiyana: 4.24 kwa Vasya ndi 1.0 kwa Fedya. Tsopano, ngati Fedya, mwachitsanzo, adalandira lamulo lalikulu kwambiri kapena anapempha ngongole yaing'ono, ndiye kuti mwayi wobwezera ngongole kwa Vasya ndi Fedya ukanakhala wofanana. Mwa kuyankhula kwina, kudalira kwa mzere sikungapusitsidwe. Ndipo ngati tidawerengeradi zovutazo , ndipo sanawachotsere mpweya wowonda, tikhoza kunena mosabisa kuti makhalidwe athu zabwino ziloleni ife kuyerekeza kuthekera kwa kubweza ngongole ndi aliyense wobwereka, koma popeza tinagwirizana kuganiza kuti kutsimikiza kwa coefficients. zidachitika molingana ndi malamulo onse, ndiye titha kuganiza choncho - ma coefficients athu amatilola kupereka kuyerekeza kwabwinoko :)
Komabe, sizolondola. M'chigawo chino tikuyenera kumvetsetsa momwe vector of weights imatsimikiziridwa , zomwe ndizofunikira kuti muwone kuthekera kwa kubweza ngongole ndi wobwereka aliyense.
Tiyeni tifotokoze mwachidule ndi arsenal omwe timapita kukafunafuna zovuta :
1. Timaganiza kuti mgwirizano pakati pa chandamale (mtengo wolosera) ndi chinthu chomwe chimayambitsa zotsatira zake ndi mzere. Pachifukwa ichi, imagwiritsidwa ntchito Linear regression function mitundu , mzere umene umagawanitsa zinthu (makasitomala) m'magulu ΠΈ kapena (makasitomala omwe angathe kubweza ngongoleyo ndi omwe sali). Kwa ife, equation ili ndi mawonekedwe .
2. Timagwiritsa ntchito inverse logit ntchito mitundu kudziwa kuthekera kwa chinthu cha gulu .
3. Timaona maphunziro athu ngati kukhazikitsa kwa generalized Bernoulli ndondomeko, ndiko kuti, pa chinthu chilichonse kusinthika kwachisawawa kumapangidwa, komwe kumakhala kotheka (zake pa chinthu chilichonse) amatenga mtengo 1 ndipo mwina - 0.
4. Timadziwa zomwe tikufunikira kuti tiwonjezere sampuli kuthekera ntchito poganizira zinthu zovomerezeka kuti chitsanzo chomwe chilipo chikhale chomveka bwino. Mwa kuyankhula kwina, tiyenera kusankha magawo omwe chitsanzocho chidzakhala chomveka bwino. Kwa ife, parameter yosankhidwa ndi mwayi wobweza ngongole , zomwe zimadalira ma coefficients osadziwika . Chifukwa chake tiyenera kupeza vekitala yotere ya zolemera , pomwe mwayi wa chitsanzo udzakhala wochuluka.
5. Timadziwa zomwe tingawonjezere zitsanzo zopezeka ntchito akhoza kugwiritsa ntchito pazipita njira. Ndipo tikudziwa zachinyengo zonse zogwirira ntchito ndi njira iyi.
Umu ndi momwe zimakhalira kusuntha kwamasitepe ambiri :)
Tsopano kumbukirani kuti kumayambiriro kwa nkhaniyi tinkafuna kupeza mitundu iwiri ya ntchito zotayika Kutayika kwa Logistic kutengera momwe makalasi azinthu amapangidwira. Zinachitika kuti m'magulu a mavuto omwe ali ndi magulu awiri, makalasiwo amatchulidwa ngati ΠΈ kapena . Kutengera notation, zotulukazo zimakhala ndi ntchito yotayika yofananira.
Mlandu 1. Gulu la zinthu ΠΈ
M'mbuyomu, pozindikira kuthekera kwachitsanzo, momwe kuthekera kwa kubweza ngongole ndi wobwereketsa kudawerengedwa potengera zinthu ndikupatsidwa ma coefficients. , tinagwiritsa ntchito formula:
Ndipotu ndiye tanthauzo Logistic mayankho ntchito kwa vector yopatsidwa ya zolemera
Ndiye palibe chomwe chimatilepheretsa kulemba chitsanzo chotheka motere:
Zimachitika kuti nthawi zina zimakhala zovuta kwa akatswiri ena a novice kuti amvetsetse momwe ntchitoyi imagwirira ntchito. Tiyeni tiwone zitsanzo 4 zazifupi zomwe zingamveke bwino:
1. ngati (ie, malinga ndi chitsanzo cha maphunziro, chinthucho ndi cha kalasi +1), ndi ndondomeko yathu zimatsimikizira mwayi woyika chinthu kukhala gulu wofanana ndi 0.9, ndiye kuti chitsanzo ichi chidzawerengedwa motere:
2. ngati ndi , ndiye kuwerengera kudzakhala motere:
3. ngati ndi , ndiye kuwerengera kudzakhala motere:
4. ngati ndi , ndiye kuwerengera kudzakhala motere:
Ndizodziwikiratu kuti ntchito yotheka idzakulitsidwa muzochitika za 1 ndi 3 kapena nthawi zambiri - ndi malingaliro omveka bwino a kuthekera kopereka chinthu ku kalasi. .
Chifukwa chakuti pozindikira mwayi wopereka chinthu ku kalasi Sitikudziwa ma coefficients okha , pamenepo tidzawafunafuna. Monga tafotokozera pamwambapa, ili ndi vuto lokhathamiritsa lomwe choyamba tiyenera kupeza chotengera cha kuthekera kwa ntchito pokhudzana ndi vekitala ya zolemera. . Komabe, choyamba ndizomveka kufewetsa ntchitoyi kwa ife tokha: tiyang'ana zomwe zimachokera ku logarithm. zotheka ntchito.
Bwanji pambuyo pa logarithm, mu Logistic zolakwika ntchito, tinasintha chizindikiro kuchokera pa . Chilichonse ndi chosavuta, chifukwa muzovuta zowunika mtundu wachitsanzo ndi chizolowezi kuchepetsa mtengo wa ntchito, timachulukitsa mbali yakumanja ya mawuwo ndi ndipo molingana, m'malo mokulitsa, tsopano tikuchepetsa ntchitoyo.
Kwenikweni, pakali pano, pamaso panu, ntchito yotayikayo idachokera movutikira - Kutayika kwa Logistic kwa maphunziro okhala ndi makalasi awiri: ΠΈ .
Tsopano, kuti tipeze ma coefficients, timangofunika kupeza zotumphukira Logistic zolakwika ntchito ndiyeno, pogwiritsa ntchito njira zowonjezeretsa manambala, monga kutsika kwa gradient kapena kutsika kwa stochastic gradient, sankhani ma coefficients abwino kwambiri. . Koma, poganizira kuchuluka kwa nkhaniyo, akufunsidwa kuti achite kusiyanitsa nokha, kapena mwina iyi idzakhala mutu wankhani yotsatira yokhala ndi masamu ambiri popanda zitsanzo zatsatanetsatane.
Mlandu 2. Gulu la zinthu ΠΈ
Njira apa idzakhala yofanana ndi makalasi ΠΈ , koma njira yokha yopita ku zotsatira za ntchito yotayika Kutayika kwa Logistic, adzakhala okongola kwambiri. Tiyeni tiyambe. Kuti tichite zimenezi tidzagwiritsa ntchito "ngati ... ndiye ..."... Ndiko kuti, ngati Chinthucho ndi cha kalasi , ndiye kuti tiwerengere kuthekera kwachitsanzo chomwe timagwiritsa ntchito , ngati chinthucho ndi cha kalasi , kenako timalowa m'malo mwa mwayi . Izi ndi momwe ntchito yovomerezeka imawonekera:
Tiyeni tifotokoze pa zala zathu momwe zimagwirira ntchito. Tiyeni tiwone milandu 4:
1. ngati ΠΈ , ndiye mwayi wa sampuli "upita"
2. ngati ΠΈ , ndiye mwayi wa sampuli "upita"
3. ngati ΠΈ , ndiye mwayi wa sampuli "upita"
4. ngati ΠΈ , ndiye mwayi wa sampuli "upita"
Ndizodziwikiratu kuti muzochitika 1 ndi 3, pomwe mwayiwo udatsimikiziridwa molondola ndi algorithm, mwayi wogwira ntchito zidzakulitsidwa, ndiye kuti, izi ndi zomwe timafuna kupeza. Komabe, njira iyi ndi yovuta kwambiri ndipo kenako tikambirana mawu ophatikizika. Koma choyamba, tiyeni tiyese logarithm ntchito yotheka ndi kusintha kwa chizindikiro, popeza tsopano tichepetsa.
Tiyeni tilowe m'malo mawu :
Tiyeni tifewetse mawu oyenera pansi pa logarithm pogwiritsa ntchito masamu osavuta ndikupeza:
Tsopano ndi nthawi yochotsa woyendetsa "ngati ... ndiye ...". Dziwani kuti pamene chinthu ndi wa kalasi , kenako mβmawu apansi pa logarithm, mu denominator, kukwezedwa ku mphamvu , ngati chinthucho ndi cha kalasi , ndiye $e$ imakwezedwa ku mphamvu . Chifukwa chake, zolemba za digiri zitha kuphweka pophatikiza milandu yonseyi kukhala imodzi: . Kenako Logistic cholakwika ntchito adzatenga fomu:
Mogwirizana ndi malamulo a logarithm, timatembenuza gawolo ndikuyika chizindikiro ""(minus) pa logarithm, timapeza:
Nayi ntchito yotayika kutayika kwazinthu, yomwe imagwiritsidwa ntchito mu maphunziro omwe ali ndi zinthu zoperekedwa ku makalasi: ΠΈ .
Chabwino, panthawiyi ndinyamuka ndikumaliza nkhaniyo.
Zida zothandizira
1. Zolemba
1) Kugwiritsidwa ntchito kwa regression analysis / N. Draper, G. Smith - 2nd ed. - M.: Finance and Statistics, 1986 (kumasulira kuchokera ku Chingerezi)
2) Chiphunzitso chotheka ndi ziwerengero zamasamu / V.E. Gmurman - 9th ed. - M.: Sukulu Yapamwamba, 2003
3) Chiphunzitso chotheka / N.I. Chernova - Novosibirsk: Novosibirsk State University, 2007
4) Kusanthula kwamalonda: kuchokera ku data kupita ku chidziwitso / Paklin N. B., Oreshkov V. I. - 2nd ed. - St. Petersburg: Peter, 2013
5) Data Science Data sayansi kuchokera zikande / Joel Gras - St. Petersburg: BHV Petersburg, 2017
6) Ziwerengero zothandiza kwa akatswiri a Data Science / P. Bruce, E. Bruce - St. Petersburg: BHV Petersburg, 2018
2. Maphunziro, maphunziro (kanema)
1)
2)
3)
4)
5)
3. Magwero a intaneti
1)
2)
3)
4)
7)
Source: www.habr.com