Kulesi sihloko, sizohlaziya izibalo zethiyori zenguquko imisebenzi yokuhlehla komugqa в umsebenzi wokuguqulwa kwelogithi ephambene (okunye okubizwa ngokuthi umsebenzi wokuphendula welogistic). Bese usebenzisa i-arsenal indlela enkulu yamathuba, ngokuhambisana nemodeli yokuhlehlisa izinto, sithola umsebenzi wokulahlekelwa Ukulahlekelwa Kwezinto, noma ngamanye amazwi, sizochaza umsebenzi lapho amapharamitha evetha yesisindo akhethwa ngayo kumodeli yokuhlehla kokuhamba. .
Uhlaka lwe-athikili:
- Ake siphinde ubudlelwano bomugqa phakathi kweziguquguqukayo ezimbili
- Ake sikhombe isidingo soguquko imisebenzi yokuhlehla komugqa в umsebenzi wokuphendula we-logistic
- Masifeze izinguquko kanye nokuphumayo umsebenzi wokuphendula we-logistic
- Ake sizame ukuqonda ukuthi kungani indlela yezikwele ezincane iyimbi lapho kukhethwa amapharamitha imisebenzi Ukulahlekelwa Kwezinto
- Sisebenzisa indlela enkulu yamathuba yokunquma imisebenzi yokukhetha ipharamitha :
5.1. Ikesi 1: umsebenzi Ukulahlekelwa Kwezinto ngezinto ezinamagama ekilasi 0 и 1:
5.2. Ikesi 2: umsebenzi Ukulahlekelwa Kwezinto ngezinto ezinamagama ekilasi -1 и +1:
I-athikili igcwele izibonelo ezilula lapho zonke izibalo kulula ukuzenza ngomlomo noma ephepheni; kwezinye izimo, kungase kudingeke isibali. Ngakho-ke zilungiselele :)
Lesi sihloko sihloselwe kakhulu ososayensi bedatha abaneleveli yokuqala yolwazi ezintweni eziyisisekelo zokufunda komshini.
I-athikili izophinde inikeze ikhodi yokudweba amagrafu nezibalo. Yonke ikhodi ibhalwe ngolimi i-python 2.7. Ake ngichaze kusenesikhathi mayelana “noveli” yenguqulo esetshenzisiwe - lesi esinye semibandela yokuthatha izifundo ezaziwa kakhulu. I-Yandex endaweni eyaziwa ngokulinganayo yemfundo eku-inthanethi Coursera, futhi, njengoba umuntu engase acabange, ukwaziswa kwalungiselelwa ngokusekelwe kulesi sifundo.
01. Ukuncika komugqa oqondile
Kunengqondo ukubuza umbuzo - ukuncika komugqa kanye nokwehla kwezinto kuhlangene ngani nakho?
Kulula! Ukuhlehla kwezinto kungenye yamamodeli okungeyesigaba somugqa. Ngamagama alula, umsebenzi wokuhlukanisa ngomugqa ukubikezela amanani okuqondiwe kusuka kokuguquguqukayo (ama-regressors) . Kukholakala ukuthi ukuncika phakathi izici kanye namanani okuhlosiwe umugqa. Ngakho-ke igama lesihlukanisi - linear. Uma sikubeka kancane, imodeli yokuhlehla kwezinto isekelwe ekucabangeni ukuthi kukhona ubudlelwano bomugqa phakathi kwezici. kanye namanani okuhlosiwe . Lokhu ukuxhumana.
Kukhona isibonelo sokuqala ku-studio, futhi, ngokufanelekile, mayelana nokuncika kwe-rectilinear yamanani afundwayo. Ngesikhathi ngilungiselela lesi sihloko, ngithole isibonelo esesivele sibeke abantu abaningi onqenqemeni - ukuncika kwamandla kagesi. (“Ukuhlaziywa kokuhlehla okusetshenzisiwe”, N. Draper, G. Smith). Sizoyibheka nalapha.
Ngokuhambisana Umthetho ka-Ohm:
kuphi - amandla amanje, - voltage, - ukumelana.
Ukube besingazi Umthetho ka-Ohm, bese singathola ukuncika ngokomthetho ngokushintsha kanye nokulinganisa , ngenkathi isekela kulungisiwe. Bese sibona ukuthi igrafu yokuncika kusukela inikeza umugqa oqondile kakhulu noma omncane phakathi komsuka. Sithi "kakhulu noma ngaphansi" ngoba, nakuba ubudlelwano bunembile ngempela, izilinganiso zethu zingaqukatha amaphutha amancane, ngakho-ke amaphuzu akugrafu angase angaweli ncamashi emugqeni, kodwa azohlakazeka kuwo ngokungahleliwe.
Igrafu 1 "Ukuthembela" kusukela »
Ikhodi yomdwebo weshadi
import matplotlib.pyplot as plt
%matplotlib inline
import numpy as np
import random
R = 13.75
x_line = np.arange(0,220,1)
y_line = []
for i in x_line:
y_line.append(i/R)
y_dot = []
for i in y_line:
y_dot.append(i+random.uniform(-0.9,0.9))
fig, axes = plt.subplots(figsize = (14,6), dpi = 80)
plt.plot(x_line,y_line,color = 'purple',lw = 3, label = 'I = U/R')
plt.scatter(x_line,y_dot,color = 'red', label = 'Actual results')
plt.xlabel('I', size = 16)
plt.ylabel('U', size = 16)
plt.legend(prop = {'size': 14})
plt.show()
02. Isidingo sokuguqula isibalo sokuhlehla komugqa
Ake sibheke esinye isibonelo. Ake sicabange ukuthi sisebenza ebhange futhi umsebenzi wethu uwukuthola amathuba okuthi obolekayo abuyisele imali ebolekiwe kuye ngezici ezithile. Ukwenza umsebenzi ube lula, sizocubungula izici ezimbili kuphela: umholo wenyanga womboleki kanye nenani lokukhokha imali mboleko yenyanga.
Umsebenzi unemibandela kakhulu, kodwa ngalesi sibonelo singaqonda ukuthi kungani kungenele ukusebenzisa imisebenzi yokuhlehla komugqa, futhi uthole ukuthi yiziphi izinguquko okudingeka zenziwe ngomsebenzi.
Ake sibuyele esibonelweni. Kuyaqondakala ukuthi uma iholo likhuphuka, umboleki uzokwazi ukwaba njalo ngenyanga ukuze akhokhe imali ebolekiwe. Ngesikhathi esifanayo, ngenxa yebanga elithile lomholo lobu budlelwano buzoba lula kakhulu. Isibonelo, ake sithathe ibanga lomholo ukusuka ku-60.000 RUR ukuya ku-200.000 RUR futhi sicabange ukuthi ebangeni lomholo elishiwo, ukuncika kosayizi wenkokhelo yanyanga zonke kusayizi womholo kumugqa. Ake sithi ebangeni elicacisiwe lamaholo kwavezwa ukuthi isilinganiso somholo wokukhokha asikwazi ukwehlela ngaphansi kwe-3 futhi umboleki kufanele abe ne-5.000 RUR egciniwe. Futhi kulokhu kuphela, sizocabanga ukuthi umboleki uzobuyisela imali ebolekiwe ebhange. Bese, isibalo sokuhlehla komugqa sizothatha ifomu:
kuphi , , , - umholo -umboleki, - inkokhelo yemalimboleko -th umboleki.
Ukushintsha iholo kanye nenkokhelo yemali mboleko ngemingcele engashintshi kuzibalo Unganquma ukuthi uyayikhipha noma uyenqaba yini ukubolekwa imali.
Uma sibheka phambili, siyaqaphela ukuthi, ngemingcele enikeziwe umsebenzi wokuhlehla komugqa, esetshenziswa ku imisebenzi yokusabela kwelogistic izokhiqiza amanani amakhulu azohlanganisa izibalo ukuze anqume amathuba okubuyisela imali mboleko. Ngakho-ke, kuhlongozwa ukunciphisa ama-coefficients ethu, ake sithi, izikhathi ezingu-25.000. Lolu shintsho kuma-coefficients ngeke luguqule isinqumo sokukhipha imali mboleko. Ake sikhumbule leli phuzu ngekusasa, kodwa manje, ukuze sikwenze kucace nakakhulu ukuthi sikhuluma ngani, ake sicabangele isimo nabathathu abangaba ababolekayo.
Ithebula 1 “Abangababoleki”
Ikhodi yokukhiqiza ithebula
import pandas as pd
r = 25000.0
w_0 = -5000.0/r
w_1 = 1.0/r
w_2 = -3.0/r
data = {'The borrower':np.array(['Vasya', 'Fedya', 'Lesha']),
'Salary':np.array([120000,180000,210000]),
'Payment':np.array([3000,50000,70000])}
df = pd.DataFrame(data)
df['f(w,x)'] = w_0 + df['Salary']*w_1 + df['Payment']*w_2
decision = []
for i in df['f(w,x)']:
if i > 0:
dec = 'Approved'
decision.append(dec)
else:
dec = 'Refusal'
decision.append(dec)
df['Decision'] = decision
df[['The borrower', 'Salary', 'Payment', 'f(w,x)', 'Decision']]
Ngokuvumelana nedatha esetafuleni, uVasya, nomholo we-120.000 RUR, ufuna ukuthola imali mboleko ukuze akwazi ukuyibuyisela njalo ngenyanga ku-3.000 RUR. Sinqume ukuthi ukuze sigunyaze ukubolekwa imali, iholo likaVasya kufanele lidlule inani lenkokhelo eliphindwe kathathu, futhi kusamele kube ne-5.000 RUR esele. U-Vasya wanelisa le mfuneko: . Ngisho ne-106.000 RUR esele. Naphezu kweqiniso lokuthi lapho kubalwa sinciphise amathuba Izikhathi ezingama-25.000, umphumela wawufana - imalimboleko ingavunyelwa. U-Fedya uzophinde athole imali ebolekiwe, kodwa uLesha, naphezu kokuthi uthola okuningi, kuzodingeka anciphise izifiso zakhe.
Ake sidwebe igrafu yaleli cala.
Ishadi 2 “Ukuhlukaniswa kwababolekayo”
Ikhodi yokudweba igrafu
salary = np.arange(60000,240000,20000)
payment = (-w_0-w_1*salary)/w_2
fig, axes = plt.subplots(figsize = (14,6), dpi = 80)
plt.plot(salary, payment, color = 'grey', lw = 2, label = '$f(w,x_i)=w_0 + w_1x_{i1} + w_2x_{i2}$')
plt.plot(df[df['Decision'] == 'Approved']['Salary'], df[df['Decision'] == 'Approved']['Payment'],
'o', color ='green', markersize = 12, label = 'Decision - Loan approved')
plt.plot(df[df['Decision'] == 'Refusal']['Salary'], df[df['Decision'] == 'Refusal']['Payment'],
's', color = 'red', markersize = 12, label = 'Decision - Loan refusal')
plt.xlabel('Salary', size = 16)
plt.ylabel('Payment', size = 16)
plt.legend(prop = {'size': 14})
plt.show()
Ngakho, umugqa wethu oqondile, owakhiwe ngokuhambisana nomsebenzi , ihlukanisa ababoleki “ababi” kwabahle. Labo ababolekayo izifiso zabo ezingahambelani namandla abo zingaphezu komugqa (Lesha), kuyilapho labo, ngokusho kwemingcele yemodeli yethu, abakwazi ukubuyisela imali mboleko bangaphansi komugqa (Vasya noFedya). Ngamanye amazwi, singasho lokhu: umugqa wethu oqondile uhlukanisa ababoleki ngezigaba ezimbili. Ake siwasho kanje: ekilasini Sizohlukanisa labo ababolekayo okungenzeka ukuthi babuyisele imali ebolekiwe njenge noma Sizofaka labo ababolekayo okungenzeka ukuthi bangakwazi ukukhokha imali ebolekiwe.
Ake sifingqe iziphetho ngalesi sibonelo esilula. Ake sithathe iphuzu futhi, ukufaka esikhundleni izixhumanisi zephoyinti ku-equation ehambisanayo yomugqa , cabangela izinketho ezintathu:
- Uma iphuzu lingaphansi komugqa futhi sinikezela ekilasini , bese kuba inani lomsebenzi kuzoba positive kusuka ukuze . Lokhu kusho ukuthi singacabanga ukuthi amathuba okubuyisela imali ebolekiwe angaphakathi . Uma inani lomsebenzi lilikhulu, amathuba aphezulu ayanda.
- Uma iphuzu lingaphezu komugqa futhi silinikeza ikilasi noma , bese inani lomsebenzi lizoba negethivu ukusuka ukuze . Khona-ke sizothatha ngokuthi amathuba okukhokha isikweletu angaphakathi futhi, uma likhulu inani eliphelele lomsebenzi, kulapho ukuzethemba kwethu kuphezulu.
- Iphuzu lisemgqeni oqondile, emngceleni phakathi kwezigaba ezimbili. Kulokhu, inani lomsebenzi izolingana kanye namathuba okubuyisela imali ebolekiwe kuyalingana .
Manje, ake sicabange ukuthi asinazo izici ezimbili, kodwa inqwaba, hhayi ezintathu, kodwa izinkulungwane zababoleki. Khona-ke esikhundleni somugqa oqondile sizoba nawo m-dimensional indiza nama-coefficients ngeke sikhishwe emoyeni omncane, kodwa sithathwe ngokuvumelana nayo yonke imithetho, futhi ngesisekelo sedatha eqoqwe kubaboleki abanemali noma abangakayikhokhi. Futhi ngempela, qaphela ukuthi manje sikhetha ababoleki sisebenzisa ama-coefficient asevele aziwayo . Eqinisweni, umsebenzi wemodeli yokuhlehla kwezinto ukunquma amapharamitha , lapho inani lomsebenzi wokulahlekelwa Ukulahlekelwa Kwezinto izothambekela kokuncane. Kodwa mayelana nokuthi i-vector ibalwa kanjani , sizothola okwengeziwe esigabeni sesi-5 sesihloko. Okwamanje, sibuyela ezweni lesethembiso - kumphathi wethu wasebhange kanye namakhasimende akhe amathathu.
Sibonga umsebenzi siyazi ukuthi ubani ongabolekwa nodinga ukunqatshelwa. Kodwa awukwazi ukuya kumqondisi ngolwazi olunjalo, ngoba babefuna ukuthola kithi amathuba okubuyisela imali mboleko ngumboleki ngamunye. Okufanele ngikwenze? Impendulo ilula - sidinga ukuguqula umsebenzi ngandlela thile , omanani akhe alele ebangeni kumsebenzi omanani awo azoba kububanzi . Futhi umsebenzi onjalo ukhona, ubizwa ngokuthi umsebenzi wokuphendula welogistic noma ukuguqulwa kwelogit ephambene. Hlangana:
Ake sibone isinyathelo ngesinyathelo ukuthi isebenza kanjani umsebenzi wokuphendula we-logistic. Qaphela ukuthi sizohamba ngendlela ehlukile, i.e. sizothatha ngokuthi siyalazi inani lamathuba, elisebangeni ukusuka ukuze bese “sizokhulula” leli nani kulo lonke uhla lwezinombolo ukusuka kulo ukuze .
03. Sithola umsebenzi wokuphendula we-logistic
Isinyathelo 1. Guqula amanani okungenzeka abe ububanzi
Ngesikhathi sokuguqulwa komsebenzi в umsebenzi wokuphendula we-logistic Sizoshiya umhlaziyi wethu wezikweletu yedwa futhi sivakashele obhuki esikhundleni salokho. Cha, vele, ngeke sibheje, konke okusithandayo lapho incazelo yenkulumo, isibonelo, ithuba ngu-4 kuya ku-1. Amathuba, ajwayelekile kubo bonke ababhejiyo, isilinganiso "sokuphumelela" kuya kokuthi " ukwehluleka”. Ngokwemibandela yokungenzeka, izingqinamba amathuba okuthi umcimbi wenzeke ahlukaniswe amathuba okuba umcimbi ungenzeki. Masibhale phansi ifomula yamathuba okuthi kwenzeke isigameko :
kuphi - amathuba okuba kwenzeke umcimbi, - Amathuba okuthi umcimbi UNGAQHUBEKI
Isibonelo, uma amathuba okuthi ihhashi elincane, eliqinile nelidlalayo eliteketiswa ngokuthi “Veterok” lizonqoba isalukazi esidala nesigqame esibizwa ngokuthi “Matilda” emjahweni alingana , khona-ke amathuba okuphumelela "Veterok" azoba к futhi okuphambene nalokho, ukwazi amathuba, ngeke kube nzima ngathi ukubala amathuba :
Ngakho-ke, sifunde "ukuhumusha" amathuba okuba amathuba, athatha amanani kuwo ukuze . Ake sithathe esinye isinyathelo futhi sifunde “ukuhumusha” amathuba kuwo wonke umugqa wezinombolo ukusuka ukuze .
Isinyathelo 2. Guqula amanani okungenzeka abe ububanzi
Lesi sinyathelo silula kakhulu - ake sithathe i-logarithm yezingqinamba siyiyise esisekelweni senombolo ka-Euler futhi sithola:
Manje siyazi ukuthi uma , bese ubala inani kuzoba lula kakhulu futhi, ngaphezu kwalokho, kufanele kube kuhle: . Yiqiniso lokhu.
Ngenxa yelukuluku, ake sihlole ukuthi uma , bese silindela ukubona inani elingalungile . Siyahlola: . Kulungile.
Manje sesiyazi ukuthi singaliguqula kanjani inani lamathuba ukusuka ukuze kuwo wonke umugqa wezinombolo kusuka ukuze . Esinyathelweni esilandelayo sizokwenza okuphambene.
Okwamanje, siyaqaphela ukuthi ngokuhambisana nemithetho ye-logarithm, ukwazi ukubaluleka komsebenzi , ungakwazi ukubala izingqinamba:
Le ndlela yokunquma izingqinamba izoba usizo kithi esinyathelweni esilandelayo.
Isinyathelo sesi-3. Ake sithole ifomula ukuze sinqume
Ngakho safunda, sazi , thola amanani okusebenza . Kodwa-ke, empeleni, sidinga okuphambene - ukwazi inani thola . Ukwenza lokhu, ake siphendukele emcabangweni onjengomsebenzi we-inverse odds, okusho ukuthi:
Esihlokweni ngeke sithole ifomula engenhla, kodwa sizoyihlola sisebenzisa izinombolo ezivela esibonelweni esingenhla. Siyazi ukuthi ngama-ddins ka-4 kuye ku-1 (), amathuba okuthi umcimbi wenzeke ngu-0.8 (). Masenze okunye esikhundleni: . Lokhu kuhambisana nezibalo zethu ezenziwe ngaphambilini. Asiqhubeke.
Esinyathelweni sokugcina sikutholile lokho , okusho ukuthi ungenza okunye esikhundleni somsebenzi wezilinganiso eziphambene. Sithola:
Hlukanisa kokubili inamba nedinominetha ngakho , Bese:
Uma kwenzeka, ukwenza isiqiniseko sokuthi alenzanga iphutha noma kuphi, sizokwenza isheke elilodwa elincane. Esinyathelweni sesi-2, thina wanquma lokho . Bese, esikhundleni senani emsebenzini wokuphendula we-logistic, silindele ukuthola . Sishintsha futhi sithole:
Siyakuhalalisela, mfundi othandekayo, sisanda kuthola futhi sahlola umsebenzi wokuphendula we-logistic. Ake sibheke igrafu yomsebenzi.
Igrafu 3 "Umsebenzi wokuphendula we-Logistic"
Ikhodi yokudweba igrafu
import math
def logit (f):
return 1/(1+math.exp(-f))
f = np.arange(-7,7,0.05)
p = []
for i in f:
p.append(logit(i))
fig, axes = plt.subplots(figsize = (14,6), dpi = 80)
plt.plot(f, p, color = 'grey', label = '$ 1 / (1+e^{-w^Tx_i})$')
plt.xlabel('$f(w,x_i) = w^Tx_i$', size = 16)
plt.ylabel('$p_{i+}$', size = 16)
plt.legend(prop = {'size': 14})
plt.show()
Ezincwadini ungathola futhi igama lalo msebenzi njenge umsebenzi we-sigmoid. Igrafu ibonisa ngokusobala ukuthi ushintsho olukhulu emathubeni ento eyingxenye yekilasi lwenzeka phakathi kobubanzi obuncane ngokuqhathaniswa. , ndawana thize ukuze .
Ngiphakamisa ukuthi ngibuyele kumhlaziyi wethu wezikweletu futhi ngimsize abale amathuba okubuyiselwa kwemali mboleko, ngaphandle kwalokho angaba sengcupheni yokushiywa ngaphandle kwebhonasi :)
Ithebula 2 “Abangababoleki”
Ikhodi yokukhiqiza ithebula
proba = []
for i in df['f(w,x)']:
proba.append(round(logit(i),2))
df['Probability'] = proba
df[['The borrower', 'Salary', 'Payment', 'f(w,x)', 'Decision', 'Probability']]
Ngakho-ke, sinqume amathuba okubuyiselwa kwemali mboleko. Ngokuvamile, lokhu kubonakala kuyiqiniso.
Ngempela, amathuba okuthi u-Vasya, nomholo we-120.000 RUR, uzokwazi ukunikeza i-3.000 RUR ebhange njalo ngenyanga iseduze ne-100%. Ngendlela, kufanele siqonde ukuthi ibhange lingakhipha imali mboleko ku-Lesha uma inqubomgomo yebhange inikeza, isibonelo, ngokuboleka amaklayenti anethuba lokubuyisela imali mboleko engaphezu, ethi, 0.3. Ukuthi nje kulesi simo ibhange lizodala inqolobane enkulu yokulahlekelwa okungenzeka.
Kumele futhi kuqashelwe ukuthi isilinganiso somholo-to-payment okungenani esingu-3 kanye ne-margin ye-5.000 RUR sithathwe ophahleni. Ngakho-ke, asikwazanga ukusebenzisa i-vector yesisindo ngendlela yayo yasekuqaleni . Kwakudingeka sinciphise kakhulu ama-coefficient, futhi kulokhu sihlukanise i-coefficient ngayinye ngo-25.000, okungukuthi, empeleni, silungise umphumela. Kodwa lokhu kwenziwa ngokukhethekile ukuze kube lula ukuqonda indaba ekuqaleni. Empilweni, ngeke kudingeke ukuthi sisungule futhi silungise ama-coefficient, kodwa siwathole. Ezigabeni ezilandelayo zendatshana sizothola ama-equations lapho amapharamitha akhethwa ngawo .
04. Indlela encane yezikwele yokunquma i-vector yezisindo kumsebenzi wokuphendula welogistic
Sesiyayazi le ndlela yokukhetha i-vector yesisindo , njenge indlela yesikwele esincane (LSM) futhi empeleni, kungani singasebenzisi ke ezinkingeni zokuhlukanisa kanambambili? Ngempela, akukho okukuvimbela ekusebenziseni I-MNC, le ndlela kuphela ezinkingeni zokuhlukanisa enikeza imiphumela enembe kancane kune Ukulahlekelwa Kwezinto. Kunesisekelo setiyetha salokhu. Ake siqale sibheke isibonelo esisodwa esilula.
Ake sicabange ukuthi amamodeli ethu (usebenzisa MSE и Ukulahlekelwa Kwezinto) sebeqalile ukukhetha i-vector yezisindo futhi sayeka ukubala ngesinyathelo esithile. Akunandaba ukuthi phakathi, ekugcineni noma ekuqaleni, into esemqoka ukuthi sesivele sinamanani athile we-vector yesisindo futhi ake sicabange ukuthi kulesi sinyathelo, i-vector yesisindo. kuwo womabili amamodeli awekho umehluko. Bese uthatha izisindo eziwumphumela bese uwafaka esikhundleni umsebenzi wokuphendula we-logistic () kokuthile okungeyesigaba . Sihlola amacala amabili lapho, ngokuhambisana nevector ekhethiwe yesisindo, imodeli yethu inephutha kakhulu futhi ngokuphambene nalokho - imodeli iqiniseka kakhulu ukuthi into ingeyesigaba. . Ake sibone ukuthi yiziphi izinhlawulo ezizokhishwa uma usebenzisa I-MNC и Ukulahlekelwa Kwezinto.
Ikhodi yokubala izinhlawulo kuye ngomsebenzi wokulahlekelwa osetshenzisiwe
# класс объекта
y = 1
# вероятность отнесения объекта к классу в соответствии с параметрами w
proba_1 = 0.01
MSE_1 = (y - proba_1)**2
print 'Штраф MSE при грубой ошибке =', MSE_1
# напишем функцию для вычисления f(w,x) при известной вероятности отнесения объекта к классу +1 (f(w,x)=ln(odds+))
def f_w_x(proba):
return math.log(proba/(1-proba))
LogLoss_1 = math.log(1+math.exp(-y*f_w_x(proba_1)))
print 'Штраф Log Loss при грубой ошибке =', LogLoss_1
proba_2 = 0.99
MSE_2 = (y - proba_2)**2
LogLoss_2 = math.log(1+math.exp(-y*f_w_x(proba_2)))
print '**************************************************************'
print 'Штраф MSE при сильной уверенности =', MSE_2
print 'Штраф Log Loss при сильной уверенности =', LogLoss_2
Icala lephutha — imodeli yabela ikilasi into okungenzeka kube ngu-0,01
Isijeziso ekusetshenzisweni I-MNC ngizo:
Isijeziso ekusetshenzisweni Ukulahlekelwa Kwezinto ngizo:
Icala lokuzethemba okuqinile — imodeli yabela ikilasi into okungenzeka kube ngu-0,99
Isijeziso ekusetshenzisweni I-MNC ngizo:
Isijeziso ekusetshenzisweni Ukulahlekelwa Kwezinto ngizo:
Lesi sibonelo sibonisa kahle ukuthi uma kwenzeka iphutha elikhulu umsebenzi wokulahlekelwa Ukulahleka Kwelogi ijezisa imodeli ngokuphawulekayo ngaphezu MSE. Manje ake siqonde ukuthi iyini isizinda setiyetha ekusebenziseni umsebenzi wokulahlekelwa Ukulahleka Kwelogi ezinkingeni zokuhlukanisa.
05. Indlela enkulu yokuba nokwenzeka kanye nokuhlehla kwezinto
Njengoba kwakuthenjisiwe ekuqaleni, lesi sihloko sigcwele izibonelo ezilula. Ku-studio kukhona esinye isibonelo kanye nezivakashi ezindala - ababoleki basebhange: uVasya, uFedya noLesha.
Uma kwenzeka, ngaphambi kokwenza isibonelo, ake ngikukhumbuze ukuthi empilweni sibhekene nesampula yokuqeqeshwa yezinkulungwane noma izigidi zezinto ezinamashumi noma amakhulu ezici. Kodwa-ke, lapha izinombolo zithathwa ukuze zikwazi ukungena kalula ekhanda likasosayensi wedatha ye-novice.
Ake sibuyele esibonelweni. Ake sicabange ukuthi umqondisi webhange wanquma ukukhipha imali mboleko kuwo wonke umuntu odinga usizo, naphezu kokuthi i-algorithm yamtshela ukuthi angayikhiphi kuLesha. Futhi manje isikhathi esanele sesidlulile futhi siyazi ukuthi yimaphi amaqhawe amathathu abuyisele imali ebolekiwe futhi engazange. Okwakulindelwe: uVasya noFedya babuyisela imali ebolekiwe, kodwa uLesha akazange. Manje ake sicabange ukuthi lo mphumela uzoba isampula entsha yokuqeqeshwa kithi futhi, ngesikhathi esifanayo, kunjengokungathi yonke idatha ezicini ezithonya amathuba okubuyisela imali ebolekiwe (iholo lomboleki, usayizi wenkokhelo yanyanga zonke) ilahlekile. Khona-ke, ngokunembile, singacabanga ukuthi wonke umboleki wesithathu akayibuyiseli imali ebolekiwe ebhange, noma ngamanye amazwi, amathuba okuthi umboleki olandelayo abuyisele imali ebolekiwe. . Lokhu kucabanga okunembile kunokuqinisekiswa kwethiyori futhi kusekelwe indlela enkulu yamathuba, ngokuvamile ezincwadini ibizwa ngokuthi umgomo omkhulu wamathuba.
Okokuqala, ake sijwayelane nemishini yomqondo.
Amathuba esampula amathuba okuthola isampula efana ncamashi, ukuthola lokho kubuka/imiphumela, i.e. umkhiqizo wamathuba okuthola umphumela ngamunye wesampula (isibonelo, ukuthi imali mboleko kaVasya, Fedya kanye neLesha ibuyiselwe noma ayibuyiselwa ngesikhathi esifanayo).
Umsebenzi wokungenzeka ihlobanisa amathuba esampula namanani amapharamitha wokusabalalisa.
Esimweni sethu, isampula yokuqeqeshwa iwuhlelo olujwayelekile lweBernoulli, lapho okuguquguqukayo okungahleliwe kuthatha amanani amabili kuphela: noma . Ngakho-ke, amathuba esampula angabhalwa njengomsebenzi ongase ube khona wepharamitha kanje:
Okufakiwe okungenhla kungahunyushwa kanje. Amathuba ahlangene okuthi u-Vasya no-Fedya bazoyibuyisela imali ebolekiwe iyalingana , amathuba okuthi uLesha NGEKE akhokhe imali ebolekiwe ayalingana (njengoba bekungekona ukukhokhwa kwemali mboleko okwenzeka), ngakho-ke amathuba ahlangene azo zontathu izehlakalo ayalingana. .
Indlela enkulu yokuba nokwenzeka iyindlela yokulinganisa ipharamitha engaziwa ngokukhulisa imisebenzi okungenzeka. Esimweni sethu, kudingeka sithole inani elinjalo , lapho ifinyelela ubukhulu bayo.
Uvelaphi umqondo wangempela - ukubheka inani lepharamitha engaziwa lapho umsebenzi wokungenzeka ufinyelela isilinganiso esiphezulu? Umsuka wombono usuka embonweni wokuthi isampula ukuphela komthombo wolwazi esitholakalayo mayelana nenani labantu. Konke esikwaziyo mayelana nenani labantu kumelelwe kusampula. Ngakho-ke, esingakusho nje ukuthi isampula liwukubonakaliswa okunembe kakhulu kwesibalo sabantu esitholakalayo kithi. Ngakho-ke, sidinga ukuthola ipharamitha lapho isampula etholakalayo iba yinto engenzeka kakhulu.
Ngokusobala, sibhekene nenkinga yokuthuthukisa lapho sidinga ukuthola iphuzu elidlulele lomsebenzi. Ukuze uthole iphuzu eliphakeme kakhulu, kuyadingeka ukucabangela isimo se-oda lokuqala, okungukuthi, ukulinganisa okuphuma kokunye komsebenzi ku-zero futhi uxazulule i-equation ngokuphathelene nepharamitha oyifunayo. Kodwa-ke, ukucinga okuphuma kumkhiqizo wenombolo enkulu yezinto kungaba umsebenzi omude; ukugwema lokhu, kunesu elikhethekile - ukushintshela ku-logarithm. imisebenzi okungenzeka. Kungani uguquko olunjalo lungenzeka? Asinanzelele ukuthi kasibhekile umgqigqo womsebenzi ngokwawo, kanye nephuzu elidlulele, okungukuthi, inani lepharamitha engaziwa , lapho ifinyelela ubukhulu bayo. Lapho uthuthela ku-logarithm, iphuzu elidlulele alishintshi (nakuba i-extremum ngokwayo izohluka), njengoba i-logarithm ingumsebenzi we-monotonic.
Ake, ngokuhambisana nalokhu okungenhla, siqhubeke nokuthuthukisa isibonelo sethu ngemali mboleko evela kuVasya, Fedya noLesha. Okokuqala asiqhubekele ku i-logarithm yomsebenzi wokungenzeka:
Manje sesingakwazi ukuhlukanisa kalula isisho ngokuthi :
Futhi ekugcineni, cabangela isimo se-oda lokuqala - silinganisa okuphuma kokunye komsebenzi ku-zero:
Ngakho, isilinganiso sethu esinembile samathuba okubuyiselwa kwemali mboleko kwalungisiswa ngokwethiyori.
Kuhle, kodwa kufanele senzeni ngalolu lwazi manje? Uma sicabanga ukuthi wonke umboleki wesithathu akayibuyiseli imali ebhange, khona-ke lowo wokugcina uzolahlekelwa nakanjani. Kulungile, kodwa kuphela lapho kuhlolwa amathuba okubuyiselwa kwemali mboleko alingana ne Asizange sicabangele izici ezithonya ukukhokhwa kwemali mboleko: iholo lomboleki kanye nosayizi wenkokhelo yanyanga zonke. Masikhumbule ukuthi ngaphambilini sabala amathuba okuthi ikhasimende ngalinye libuyisele imali ebolekiwe, sicabangela lezi zici ezifanayo. Kunengqondo ukuthi sathola amathuba ahlukile kokulingana okuqhubekayo .
Ake sichaze ukuba nokwenzeka kwamasampuli:
Ikhodi yokubala amathuba esampula
from functools import reduce
def likelihood(y,p):
line_true_proba = []
for i in range(len(y)):
ltp_i = p[i]**y[i]*(1-p[i])**(1-y[i])
line_true_proba.append(ltp_i)
likelihood = []
return reduce(lambda a, b: a*b, line_true_proba)
y = [1.0,1.0,0.0]
p_log_response = df['Probability']
const = 2.0/3.0
p_const = [const, const, const]
print 'Правдоподобие выборки при константном значении p=2/3:', round(likelihood(y,p_const),3)
print '****************************************************************************************************'
print 'Правдоподобие выборки при расчетном значении p:', round(likelihood(y,p_log_response),3)
Amathuba esampuli ngevelu engashintshi :
Isampula yamathuba lapho kubalwa amathuba okubuyiselwa kwemali mboleko kucatshangelwa izici :
Amathuba esampula anethuba elibalwe kuye ngezici kuvele ukuthi angaphezulu kokungenzeka anevelu yamathuba angashintshi. Kusho ukuthini lokhu? Lokhu kuphakamisa ukuthi ulwazi mayelana nezici lwenza kwaba nokwenzeka ukukhetha ngokunembe kakhudlwana amathuba okukhokhwa kwemali mboleko yeklayenti ngalinye. Ngakho-ke, lapho kukhishwa imalimboleko elandelayo, kungaba okulungile kakhulu ukusebenzisa imodeli ehlongozwayo ekupheleni kwesigaba sesi-3 se-athikili ukuze kuhlolwe amathuba okukhokhwa kwesikweletu.
Kodwa-ke, uma sifuna ukukhulisa umsebenzi wesampula wamathuba, kungani-ke ungasebenzisi i-algorithm ethile ezokhiqiza amathuba e-Vasya, Fedya no-Lesha, isibonelo, alingana no-0.99, 0.99 no-0.01, ngokulandelana. Mhlawumbe i-algorithm enjalo izokwenza kahle kusampula yokuqeqeshwa, ngoba izosondeza inani lesampula lokungaba khona , kodwa, okokuqala, i-algorithm enjalo cishe izoba nobunzima ngekhono lokujwayelekile, futhi okwesibili, le-algorithm nakanjani ngeke ibe umugqa. Futhi uma izindlela zokulwa nokuqeqeshwa ngokweqile (ikhono elivamile elibuthakathaka ngokulinganayo) azifakiwe ngokucacile ohlelweni lwalesi sihloko, ake sidlule iphuzu lesibili ngokuningiliziwe. Ukuze wenze lokhu, vele uphendule umbuzo olula. Ingabe amathuba okuthi uVasya noFedya babuyisele imali ebolekiwe angafana, kucatshangelwa izici esizaziyo? Ngokombono we-logic yomsindo, akunjalo, ayikwazi. Ngakho uVasya uzokhokha u-2.5% womholo wakhe ngenyanga ukuze abuyisele imali ebolekiwe, kanti uFedya - cishe u-27,8%. Futhi kugrafu 2 "Ukuhlelwa kweklayenti" sibona ukuthi u-Vasya usekude kakhulu emgqeni ohlukanisa amakilasi kune-Fedya. Futhi ekugcineni, siyazi ukuthi umsebenzi ku-Vasya kanye ne-Fedya ithatha amanani ahlukene: 4.24 ye-Vasya kanye ne-1.0 ye-Fedya. Manje, uma u-Fedya, ngokwesibonelo, ethola i-oda lobukhulu noma ecela imali mboleko encane, khona-ke amathuba okubuyisela imali mboleko kaVasya noFedya azofana. Ngamanye amazwi, ukuncika ngomugqa akukwazi ukukhohliswa. Futhi uma empeleni sibale amathuba , futhi singawakhiphi emoyeni, singasho ngokuphepha ukuthi izindinganiso zethu Okungcono kakhulu kusivumele ukuthi silinganisele amathuba okukhokhwa kwemali mboleko ngumboleki ngamunye, kodwa njengoba sivumelene ukucabanga ukuthi ukunqunywa kwama-coefficients kwenziwa ngokuvumelana nayo yonke imithetho, khona-ke sizothatha kanjalo - ama-coefficients ethu asivumela ukuba sinikeze isilinganiso esingcono samathuba :)
Nokho, siyayehlisa. Kulesi sigaba sidinga ukuqonda ukuthi i-vector yesisindo inqunywa kanjani , okudingekayo ukuze kuhlolwe ithuba lokukhokhwa kwemali mboleko ngumboleki ngamunye.
Ake sifingqe kafushane ngokuthi iyiphi i-arsenal esiyofuna izingqinamba :
1. Sicabanga ukuthi ubudlelwano phakathi kokuguquguquka okuqondiwe (inani lokubikezela) kanye nesici esithonya umphumela kumugqa. Ngenxa yalesi sizathu isetshenziswa umsebenzi wokuhlehla komugqa unomusa , umugqa ohlukanisa izinto (amaklayenti) ngezigaba и noma (amaklayenti akwazi ukukhokha imali ebolekiwe kanye nalabo abangakwazi ukukhokha). Esimweni sethu, i-equation inefomu .
2. Sisebenzisa umsebenzi welogithi ephambene unomusa ukunquma amathuba ento okungeyesigaba .
3. Sibheka ukuqeqeshwa kwethu okusethiwe njengokuqaliswa kokujwayelekile Izikimu zeBernoulli, okungukuthi, entweni ngayinye kukhiqizwa okuguquguqukayo okungahleliwe, okungenzeka kube khona (okwayo entweni ngayinye) ithatha inani elingu-1 kanye namathuba - 0.
4. Siyazi ukuthi yini esiyidingayo ukuze sikhulise umsebenzi wesampula wamathuba kucatshangelwa izinto ezamukelekile ukuze isampula etholakalayo ibe ngeyokwenzeka kakhulu. Ngamanye amazwi, sidinga ukukhetha amapharamitha lapho isampula izokholeka kakhulu. Esimweni sethu, ipharamitha ekhethiwe yithuba lokukhokhwa kwemali mboleko , okubuye kuncike kuma-coefficient angaziwa . Ngakho-ke sidinga ukuthola i-vector enjalo yesisindo , lapho amathuba esampula azoba mkhulu.
5. Siyazi ukuthi yini okufanele sikhulise imisebenzi yesampula yamathuba angasebenzisa indlela enkulu yamathuba. Futhi siyawazi wonke amaqhinga akhohlisayo okusebenza ngale ndlela.
Lokhu kwenzeka kanjani ukuthi kube isinyathelo esinezinyathelo eziningi :)
Manje khumbula ukuthi ekuqaleni kwesihloko besifuna ukuthola izinhlobo ezimbili zemisebenzi yokulahlekelwa Ukulahlekelwa Kwezinto kuye ngokuthi amakilasi ezinto aqokwa kanjani. Kwenzeka ukuthi ezinkingeni zokuhlukaniswa ngezigaba ezimbili, amakilasi achazwa ngokuthi и noma . Kuye nge-notation, okukhiphayo kuzoba nomsebenzi wokulahlekelwa ohambisanayo.
Icala 1. Ukuhlelwa kwezinto zibe и
Ngaphambilini, lapho kunqunywa ukuba nokwenzeka kwesampula, lapho ithuba lokukhokhwa kwesikweletu ngumboleki abalwe ngokusekelwe ezicini kanye nama-coefficient anikezwe. , sisebenzise ifomula:
Eqinisweni yincazelo imisebenzi yokusabela kwelogistic ngevekhtha enikeziwe yesisindo
Khona-ke akukho okusivimbelayo ekubhaleni umsebenzi wesampula wamathuba ngale ndlela elandelayo:
Kwenzeka ukuthi ngezinye izikhathi kunzima kwabanye abahlaziyi be-novice ukuqonda ngokushesha ukuthi lo msebenzi usebenza kanjani. Ake sibheke izibonelo ezi-4 ezimfushane ezizocacisa izinto:
1. Uma (okungukuthi, ngokwesampula yokuqeqeshwa, into ingeyesigaba +1), kanye ne-algorithm yethu inquma amathuba okuhlukanisa into ngokwesigaba elilingana no-0.9, khona-ke lolu cezu lwamathuba esampula luzobalwa ngale ndlela elandelayo:
2. Uma , futhi , khona-ke ukubala kuzoba kanje:
3. Uma , futhi , khona-ke ukubala kuzoba kanje:
4. Uma , futhi , khona-ke ukubala kuzoba kanje:
Kusobala ukuthi umsebenzi wokungenzeka uzokhuliswa ezimweni 1 no-3 noma esimweni esijwayelekile - ngamavelu aqagelwe kahle amathuba okunikeza into ekilasini. .
Ngenxa yokuthi uma kunqunywa amathuba okunikeza into ekilasini Asiwazi ama-coefficients kuphela , khona sizobafuna. Njengoba kushiwo ngenhla, lena inkinga yokwenza kahle lapho kuqala sidinga ukuthola okuphumayo kokungenzeka maqondana nevekhtha yezisindo. . Nokho, okokuqala kunengqondo ukuzenzela umsebenzi ube lula: sizobheka okuphuma ku-logarithm. imisebenzi okungenzeka.
Kungani ngemuva kwe-logarithm, ku imisebenzi yephutha lokungena, sishintshe uphawu ukusuka on . Yonke into ilula, njengoba ezinkingeni zokuhlola ikhwalithi yemodeli kuyisiko ukunciphisa inani lomsebenzi, siphindaphinde uhlangothi olungakwesokudla lwesisho ngokuthi futhi ngokufanele, esikhundleni sokwandisa, manje sinciphisa umsebenzi.
Eqinisweni, njengamanje, phambi kwamehlo akho, umsebenzi wokulahlekelwa uthathwe ngokucophelela - Ukulahlekelwa Kwezinto ngesethi yokuqeqeshwa enamakilasi amabili: и .
Manje, ukuze sithole ama-coefficients, sidinga nje ukuthola okuphuma kuyo imisebenzi yephutha lokungena bese usebenzisa izindlela zokuthuthukisa izinombolo, njengokwehla kwegradient noma ukwehla kwe-stochastic gradient, khetha ama-coefficients afaneleka kakhulu. . Kodwa, uma kubhekwa umthamo omkhulu we-athikili, kuhlongozwa ukuba wenze umehluko ngokwakho, noma mhlawumbe lokhu kuzoba isihloko sesihloko esilandelayo esinezibalo eziningi ngaphandle kwezibonelo ezinemininingwane.
Icala 2. Ukuhlelwa kwezinto zibe и
Indlela lapha izofana neyamakilasi и , kodwa indlela ngokwayo eya ekuphumeni komsebenzi wokulahlekelwa Ukulahlekelwa Kwezinto, izoba yinhle kakhulu. Ake siqale. Ngomsebenzi wamathuba sizosebenzisa opharetha "uma... bese..."... Okusho ukuthi, uma Into th ingeyesigaba , bese kubalwa ukuba nokwenzeka kwesampula sisebenzisa amathuba , uma into ingeyesigaba , bese sifaka esikhundleni salokho okungenzeka . Nansi indlela umsebenzi wokungenzeka ubukeka ngayo:
Ake sichaze eminweni yethu ukuthi isebenza kanjani. Ake sicabangele izimo ezi-4:
1. Uma и , bese amathuba esampula "azohamba"
2. Uma и , bese amathuba esampula "azohamba"
3. Uma и , bese amathuba esampula "azohamba"
4. Uma и , bese amathuba esampula "azohamba"
Kusobala ukuthi ezimweni 1 kanye no-3, lapho amathuba enqunywa kahle yi-algorithm, umsebenzi wokungenzeka izokhuliswa, okungukuthi, yilokhu kanye ebesifuna ukukuthola. Kodwa-ke, le ndlela inzima kakhulu futhi ngokulandelayo sizocubungula umbhalo ohlangene. Kodwa okokuqala, ake senze i-logarithm umsebenzi wokungenzeka ngoshintsho lophawu, njengoba manje sizowunciphisa.
Asishintshe esikhundleni isisho :
Masenze lula igama elifanele ngaphansi kwe-logarithm sisebenzisa amasu e-arithmetic alula futhi sithole:
Manje sekuyisikhathi sokususa opharetha "uma... bese...". Qaphela ukuthi uma into ungowesigaba , bese kuba enkulumweni engaphansi kwe-logarithm, ku-denominator, ephakanyiswe emandleni , uma into ingeyesigaba , bese i-$e$ iphakanyiselwa emandleni . Ngakho-ke, incazelo yedigri ingenziwa lula ngokuhlanganisa zombili izimo zibe munye: ... Ngemuva kwalokho umsebenzi wephutha lokungena izothatha ifomu:
Ngokuhambisana nemithetho ye-logarithm, siphendulela ingxenyena bese sibeka uphawu ""(Minus) ku-logarithm, sithola:
Nansi umsebenzi wokulahlekelwa ukulahlekelwa kwezinto, esetshenziswa kusethi yokuqeqeshwa enezinto ezabelwe amakilasi: и .
Nokho, ngalesi sikhathi ngiyahamba futhi siphetha isihloko.
Izinto ezisizayo
1. Izincwadi
1) Ukuhlaziywa kokuhlehla okusetshenzisiwe / N. Draper, G. Smith - 2nd ed. – M.: Ezezimali Nezibalo, 1986 (ukuhunyushwa kusuka esiNgisini)
2) Ithiyori yokungenzeka kanye nezibalo zezibalo / V.E. Gmurman - 9th ed. - M.: Isikole Esiphakeme, 2003
3) I-Probability theory / N.I. Chernova - Novosibirsk: Novosibirsk State University, 2007
4) Ukuhlaziywa kwebhizinisi: kusuka kudatha kuya kulwazi / Paklin N. B., Oreshkov V. I. - 2nd ed. - St. Petersburg: Peter, 2013
5) Isayensi Yedatha Yedatha yesayensi kusukela ekuqaleni / Joel Gras - St. Petersburg: BHV Petersburg, 2017
6) Izibalo ezisebenzayo zochwepheshe beSayensi Yedatha / P. Bruce, E. Bruce - St. Petersburg: BHV Petersburg, 2018
2. Izifundo, izifundo (ividiyo)
1)
2)
3)
4)
5)
3. Imithombo ye-inthanethi
1)
2)
3)
4)
6)
7)
Source: www.habr.com