A cikin wannan labarin, za mu bincika lissafin ka'idar canji ayyukan koma baya na layi в aikin jujjuyawar logiti (in ba haka ba ana kiran aikin amsa logistic). Sa'an nan, amfani da arsenal mafi girman hanyar yiwuwa, daidai da tsarin jujjuyawar logistic, muna samun aikin asara Rashin Hankali, ko kuma a wasu kalmomi, za mu ayyana wani aiki tare da wanda aka zaɓi ma'auni na vector mai nauyi a cikin tsarin jujjuyawar logistic. .
Bayanin labarin:
- Bari mu sake maimaita alakar madaidaiciya tsakanin masu canji biyu
- Mu gano bukatar canji ayyukan koma baya na layi в aikin amsa dabaru
- Bari mu aiwatar da canje-canje da fitarwa aikin amsa dabaru
- Bari mu yi ƙoƙari mu fahimci dalilin da yasa mafi ƙanƙanta hanyar murabba'i ba ta da kyau yayin zabar sigogi ayyuka Rashin Hankali
- Muna amfani mafi girman hanyar yiwuwa domin kayyade ayyuka zabin siga :
5.1. Case 1: aiki Rashin Hankali ga abubuwa tare da zane-zane 0 и 1:
5.2. Case 2: aiki Rashin Hankali ga abubuwa tare da zane-zane -1 и +1:
Labarin yana cike da misalai masu sauƙi waɗanda duk lissafin suna da sauƙin yin baki ko a kan takarda; a wasu lokuta, ana iya buƙatar ƙira. Don haka shirya :)
Wannan labarin an yi niyya ne da farko don masana kimiyyar bayanai tare da matakin farko na ilimi a cikin tushen koyan na'ura.
Labarin kuma zai ba da lambar don zana hotuna da lissafi. An rubuta duk lambar a cikin harshe Python 2.7. Bari in yi bayani a gaba game da "sabon" na sigar da aka yi amfani da ita - wannan yana ɗaya daga cikin sharuɗɗan shan sanannun kwas daga Yandex a kan wani sanannen sanannen dandamali na ilimi na kan layi Coursera, kuma, kamar yadda mutum zai iya ɗauka, an shirya kayan bisa ga wannan hanya.
01. Madaidaicin dogara
Yana da ma'ana sosai don yin tambaya - menene dogaro da linzamin kwamfuta da koma bayan dabaru ke da alaƙa da shi?
Yana da sauki! Komawar dabaru yana ɗaya daga cikin samfuran da ke cikin ma'anar ma'anar layi. A cikin kalmomi masu sauƙi, aikin mai rarraba layin layi shine tsinkaya ƙimar manufa daga masu canji (regressors) . An yi imani da cewa dogara tsakanin halaye da ƙimar manufa mikakke. Saboda haka sunan mai rarraba - layi. Don sanya shi sosai, tsarin jujjuyawar logistic yana dogara ne akan tsammanin cewa akwai alaƙar layi tsakanin halaye. da ƙimar manufa . Wannan shine haɗin gwiwa.
Akwai misali na farko a cikin ɗakin studio, kuma shine, daidai, game da dogara ga madaidaicin adadin da ake nazarin. A cikin shirin shirya labarin, na ga wani misali wanda ya riga ya kafa mutane da yawa a gefe - dogara ga halin yanzu akan wutar lantarki. ("Aikin bincike na koma baya", N. Draper, G. Smith). Za mu duba a nan ma.
A cewar Dokokin Ohm:
inda - ƙarfin halin yanzu, - irin ƙarfin lantarki, - juriya.
Idan bamu sani ba Dokokin Ohm, to za mu iya samun dogaro ta hanyar canzawa da aunawa , yayin tallafawa gyarawa. Sa'an nan za mu ga cewa dogara jadawali daga yana ba da ƙarin ko žasa madaidaiciyar layi ta hanyar asali. Mun ce "mafi ko žasa" saboda, ko da yake dangantakar ta kasance daidai, ma'aunin mu na iya ƙunsar ƙananan kurakurai, sabili da haka maki a kan jadawali bazai fadi daidai a kan layi ba, amma za a warwatse a kusa da shi ba da gangan ba.
Graph 1 "Dogara" daga »
Lambar zanen ginshiƙi
import matplotlib.pyplot as plt
%matplotlib inline
import numpy as np
import random
R = 13.75
x_line = np.arange(0,220,1)
y_line = []
for i in x_line:
y_line.append(i/R)
y_dot = []
for i in y_line:
y_dot.append(i+random.uniform(-0.9,0.9))
fig, axes = plt.subplots(figsize = (14,6), dpi = 80)
plt.plot(x_line,y_line,color = 'purple',lw = 3, label = 'I = U/R')
plt.scatter(x_line,y_dot,color = 'red', label = 'Actual results')
plt.xlabel('I', size = 16)
plt.ylabel('U', size = 16)
plt.legend(prop = {'size': 14})
plt.show()
02. Bukatar canza ma'auni na koma baya na layi
Bari mu kalli wani misali. Bari mu yi tunanin cewa muna aiki a banki kuma aikinmu shine sanin yiwuwar mai karɓar bashi ya biya bashin dangane da wasu dalilai. Don sauƙaƙe aikin, za mu yi la'akari da abubuwa biyu kawai: albashin mai karbar bashi da kuma adadin biyan bashin kowane wata.
Ayyukan yana da matukar sharadi, amma tare da wannan misalin za mu iya fahimtar dalilin da yasa bai isa ba don amfani ayyukan koma baya na layi, da kuma gano irin canje-canjen da ake buƙatar aiwatarwa tare da aikin.
Mu koma ga misalin. An fahimci cewa idan aka kara yawan albashi, mai karbar bashi zai iya ware duk wata don biyan bashin. A lokaci guda, don wani kewayon albashi wannan dangantakar za ta kasance madaidaiciya. Alal misali, bari mu ɗauki nauyin albashi daga 60.000 RUR zuwa 200.000 RUR kuma mu ɗauka cewa a cikin adadin albashin da aka ƙayyade, dogara da girman girman biyan kuɗi na wata-wata akan girman albashin shine layi. Bari mu ce don ƙayyadadden adadin albashi an bayyana cewa rabon albashi-da-biya ba zai iya faɗuwa ƙasa da 3 ba kuma mai karɓar bashi har yanzu yana da 5.000 RUR a ajiyar. Kuma kawai a cikin wannan yanayin, zamu ɗauka cewa mai karɓar bashi zai biya bashin zuwa banki. Sa'an nan, ma'aunin koma baya na layi zai ɗauki nau'i:
inda , , , - albashi - mai bashi, - biyan bashi - mai bashi.
Sauya albashi da biyan rance tare da ƙayyadaddun sigogi a cikin ma'auni Kuna iya yanke shawara ko bayar da rance ko ƙi.
Neman gaba, mun lura cewa, tare da sigogi da aka bayar aikin koma baya na layi, amfani a Ayyukan amsa dabaru zai samar da manyan dabi'u da za su dagula lissafin don sanin yiwuwar biyan lamuni. Saboda haka, an ba da shawara don rage yawan adadin mu, bari mu ce, da sau 25.000. Wannan canji a cikin ƙididdiga ba zai canza shawarar ba da lamuni ba. Bari mu tuna da wannan batu don nan gaba, amma yanzu, don ƙara bayyana abin da muke magana akai, bari mu yi la'akari da halin da ake ciki tare da masu ba da bashi guda uku.
Tebur na 1 "Masu iya karbar bashi"
Lambar don samar da tebur
import pandas as pd
r = 25000.0
w_0 = -5000.0/r
w_1 = 1.0/r
w_2 = -3.0/r
data = {'The borrower':np.array(['Vasya', 'Fedya', 'Lesha']),
'Salary':np.array([120000,180000,210000]),
'Payment':np.array([3000,50000,70000])}
df = pd.DataFrame(data)
df['f(w,x)'] = w_0 + df['Salary']*w_1 + df['Payment']*w_2
decision = []
for i in df['f(w,x)']:
if i > 0:
dec = 'Approved'
decision.append(dec)
else:
dec = 'Refusal'
decision.append(dec)
df['Decision'] = decision
df[['The borrower', 'Salary', 'Payment', 'f(w,x)', 'Decision']]
Dangane da bayanan da ke cikin tebur, Vasya, tare da albashin RUR 120.000, yana son karɓar lamuni don ya iya biya kowane wata akan 3.000 RUR. Mun ƙaddara cewa don amincewa da lamuni, dole ne albashin Vasya ya wuce adadin kuɗin da aka biya sau uku, kuma har yanzu akwai sauran RUR 5.000. Vasya ya biya wannan bukata: . Ko da saura 106.000 RUR. Duk da cewa lokacin yin lissafi mun rage rashin daidaito Sau 25.000, sakamakon ya kasance iri ɗaya - ana iya amincewa da lamuni. Fedya kuma zai karɓi lamuni, amma Lesha, duk da cewa yana karɓar mafi yawa, dole ne ya hana cin abinci.
Bari mu zana jadawali don wannan harka.
Chart 2 "Rarraba masu karbar bashi"
Lambar don zana jadawali
salary = np.arange(60000,240000,20000)
payment = (-w_0-w_1*salary)/w_2
fig, axes = plt.subplots(figsize = (14,6), dpi = 80)
plt.plot(salary, payment, color = 'grey', lw = 2, label = '$f(w,x_i)=w_0 + w_1x_{i1} + w_2x_{i2}$')
plt.plot(df[df['Decision'] == 'Approved']['Salary'], df[df['Decision'] == 'Approved']['Payment'],
'o', color ='green', markersize = 12, label = 'Decision - Loan approved')
plt.plot(df[df['Decision'] == 'Refusal']['Salary'], df[df['Decision'] == 'Refusal']['Payment'],
's', color = 'red', markersize = 12, label = 'Decision - Loan refusal')
plt.xlabel('Salary', size = 16)
plt.ylabel('Payment', size = 16)
plt.legend(prop = {'size': 14})
plt.show()
Don haka, layinmu madaidaiciya, an gina shi daidai da aikin , ya raba masu bashi "mara kyau" daga "mai kyau". Wadanda masu ba da bashi waɗanda sha'awar ba su dace da iyawar su ba suna sama da layi (Lesha), yayin da waɗanda, bisa ga sigogin tsarin mu, suna iya biya bashin suna ƙasa da layi (Vasya da Fedya). Ma’ana, muna iya cewa: layinmu kai tsaye ya raba masu karbar bashi zuwa kashi biyu. Bari mu nuna su kamar haka: zuwa aji Za mu kasafta waɗancan masu ba da bashi waɗanda za su iya biyan bashin a matsayin ko Za mu haɗa da waɗancan masu ba da bashi waɗanda wataƙila ba za su iya biyan lamunin ba.
Bari mu taƙaita ƙarshe daga wannan misali mai sauƙi. Bari mu dauki batu kuma, musanya madaidaitan ma'ana cikin ma'auni mai dacewa na layi , la'akari da zaɓuɓɓuka uku:
- Idan batu yana ƙarƙashin layin kuma mun sanya shi zuwa aji , sannan darajar aikin zai zama tabbatacce daga to . Wannan yana nufin za mu iya ɗauka cewa yuwuwar biyan bashin yana cikin . Girman ƙimar aikin, mafi girman yiwuwar.
- Idan batu yana sama da layi kuma mu sanya shi ga ajin ko , to, darajar aikin zai zama mara kyau daga to . Sa'an nan kuma za mu ɗauka cewa yiwuwar biyan bashin yana cikin kuma, mafi girman cikakkiyar ƙimar aikin, mafi girman ƙarfinmu.
- Batun yana kan layi madaidaiciya, akan iyaka tsakanin aji biyu. A wannan yanayin, ƙimar aikin zai zama daidai kuma yuwuwar biyan bashin daidai yake da .
Yanzu, bari mu yi tunanin cewa ba mu da dalilai guda biyu, amma da dama, kuma ba uku ba, amma dubban masu karbar bashi. Sa'an nan maimakon madaidaiciyar layi za mu samu m-girma jirgin sama da coefficients Ba za a fitar da mu daga iska mai iska ba, amma an samo mu bisa ga dukkan ka'idoji, kuma a kan bayanan da aka tattara akan masu ba da bashi waɗanda ke da ko basu biya bashin ba. Kuma lalle ne, a lura cewa a yanzu muna zabar masu ba da bashi ta amfani da abubuwan da aka riga aka sani . A gaskiya ma, aikin ƙirar ƙididdiga na logistic shine daidai don ƙayyade sigogi , wanda darajar asarar ke aiki Rashin Hankali zai ayan zuwa m. Amma game da yadda ake lissafin vector , za mu sami ƙarin bayani a sashe na 5 na labarin. A halin yanzu, muna komawa ƙasar alkawari - zuwa ga ma'aikacin banki da abokan cinikinsa guda uku.
Godiya ga aikin mun san wanda za a iya bashi da wanda yake bukatar a ki. Amma ba za ku iya zuwa wurin darektan da irin wannan bayanin ba, saboda suna so su sami yiwuwar biyan bashin kowane mai ba da bashi daga gare mu. Me za a yi? Amsar ita ce mai sauƙi - muna buƙatar canza aikin ko ta yaya , wanda kimarsa ke cikin kewayon zuwa wani aiki wanda ƙimarsa za ta kasance a cikin kewayon . Kuma irin wannan aikin ya wanzu, ana kiransa Aikin mayar da martani ko jujjuyawar logistic. Haɗu:
Bari mu ga mataki-mataki yadda yake aiki aikin amsa dabaru. Lura cewa za mu yi tafiya ta gaba, watau. za mu ɗauka cewa mun san ƙimar yiwuwar, wanda ke cikin kewayon daga to sa'an nan kuma za mu "sake" wannan darajar zuwa dukan kewayon lambobi daga to .
03. Mun sami aikin amsawa na logistic
Mataki 1. Mayar da ƙimar yuwuwar zuwa kewayo
A lokacin canji na aikin в aikin amsa dabaru Za mu bar manazarcin kiredit shi kaɗai kuma mu zagaya da masu yin littattafai maimakon. A'a, ba shakka, ba za mu sanya fare ba, duk abin da ke sha'awar mu akwai ma'anar magana, alal misali, damar ita ce 4 zuwa 1. Rashin daidaituwa, saba da duk masu cin amana, shine rabo na "nasara" zuwa " kasawa”. A cikin sharuddan yuwuwar, rashin daidaito shine yuwuwar faruwar al'amarin zuwa kashi na yuwuwar rashin faruwa. Bari mu rubuta dabara don damar faruwar wani abu :
inda - yuwuwar faruwar lamarin, - yuwuwar faruwar wani abu
Misali, idan yuwuwar matashin doki mai karfi da wasa da ake yi wa lakabi da “Veterok” zai doke tsohuwar dattijuwa mai suna “Matilda” a tseren ya yi daidai da. , to, damar samun nasara ga "Veterok" zai kasance к kuma akasin haka, sanin rashin daidaituwa, ba zai yi mana wahala ba don ƙididdige yiwuwar :
Don haka, mun koyi yin “fassara” yuwuwar zuwa dama, waɗanda ke ɗaukar ƙima daga to . Bari mu ɗauki mataki ɗaya kuma mu koyi “fassara” yuwuwar zuwa gabaɗayan layin lamba daga to .
Mataki 2. Mayar da ƙimar yuwuwar zuwa kewayo
Wannan mataki ne mai sauqi qwarai - bari mu dauki logarithm na rashin daidaito zuwa tushe na lambar Euler. kuma muna samun:
Yanzu mun san cewa idan , sannan lissafta darajar zai zama mai sauqi qwarai kuma, haka ma, ya kamata ya zama tabbatacce: . Wannan gaskiya ne.
Saboda son sani, bari mu bincika idan , to muna sa ran ganin mummunan darajar . Muna duba: . Haka ne.
Yanzu mun san yadda ake canza ƙimar yuwuwar daga to tare da dukan layin lamba daga to . A mataki na gaba za mu yi akasin haka.
A yanzu, mun lura cewa daidai da ka'idodin logarithm, sanin ƙimar aikin , za ku iya lissafin rashin daidaito:
Wannan hanyar tantance rashin daidaito za ta kasance da amfani gare mu a mataki na gaba.
Mataki na 3. Bari mu fitar da dabara don tantancewa
Don haka muka koya, muna sani , nemo ƙimar ayyuka . Duk da haka, a gaskiya, muna buƙatar daidai da akasin haka - sanin darajar neman . Don yin wannan, bari mu juya zuwa irin wannan ra'ayi kamar aikin rashin daidaituwa, bisa ga:
A cikin labarin ba za mu sami dabarar da ke sama ba, amma za mu duba ta ta amfani da lambobi daga misalin da ke sama. Mun san cewa tare da rashin daidaito na 4 zuwa 1 (), yuwuwar faruwar lamarin shine 0.8 (). Mu yi canji: . Wannan ya zo daidai da lissafin da aka yi a baya. Mu ci gaba.
A mataki na karshe mun tsinkayi hakan , wanda ke nufin za ku iya yin canji a cikin aikin rashin daidaituwa. Mun samu:
Rarraba duka mai ƙididdigewa da ƙididdiga ta , Sannan:
Kawai idan, don tabbatar da cewa ba mu yi kuskure a ko'ina ba, za mu sake yin wani karamin bincike. A mataki na 2, muna don ƙaddara cewa . Sa'an nan, maye gurbin darajar a cikin aikin mayar da martani, muna sa ran samun . Mun canza mu samu:
Taya murna, masoyi mai karatu, yanzu mun samo kuma mun gwada aikin mayar da martani. Bari mu dubi jadawali na aikin.
Graph 3 "Aikin Amsa Logistic"
Lambar don zana jadawali
import math
def logit (f):
return 1/(1+math.exp(-f))
f = np.arange(-7,7,0.05)
p = []
for i in f:
p.append(logit(i))
fig, axes = plt.subplots(figsize = (14,6), dpi = 80)
plt.plot(f, p, color = 'grey', label = '$ 1 / (1+e^{-w^Tx_i})$')
plt.xlabel('$f(w,x_i) = w^Tx_i$', size = 16)
plt.ylabel('$p_{i+}$', size = 16)
plt.legend(prop = {'size': 14})
plt.show()
A cikin adabi kuma zaku iya samun sunan wannan aikin kamar sigmoid aiki. Jadawalin yana nuna a sarari cewa babban canjin yuwuwar wani abu na aji yana faruwa ne a cikin ɗan ƙaramin yanki. , wani wuri daga to .
Ina ba da shawarar komawa ga manazarcin kiredit ɗin mu da taimaka masa ya lissafta yuwuwar biyan lamuni, in ba haka ba yana haɗarin a bar shi ba tare da kari ba :)
Tebur na 2 "Masu iya karbar bashi"
Lambar don samar da tebur
proba = []
for i in df['f(w,x)']:
proba.append(round(logit(i),2))
df['Probability'] = proba
df[['The borrower', 'Salary', 'Payment', 'f(w,x)', 'Decision', 'Probability']]
Don haka, mun ƙaddara yuwuwar biyan lamuni. Gabaɗaya, wannan yana da alama gaskiya ne.
Tabbas, yuwuwar cewa Vasya, tare da albashin RUR 120.000, zai iya ba da RUR 3.000 ga banki kowane wata yana kusa da 100%. Ta hanyar, dole ne mu fahimci cewa banki na iya ba da lamuni ga Lesha idan manufar bankin ta ba da, alal misali, don ba da rance ga abokan ciniki tare da yuwuwar biyan lamuni fiye da, ka ce, 0.3. Kawai a cikin wannan yanayin bankin zai haifar da babban tanadi don yiwuwar asara.
Har ila yau, ya kamata a lura cewa rabon albashi na biyan kuɗi na akalla 3 kuma tare da gefen 5.000 RUR an ɗauke shi daga rufi. Saboda haka, ba za mu iya amfani da vector na ma'auni a cikin ainihin siffarsa ba . Muna buƙatar rage yawan ƙididdiga, kuma a wannan yanayin mun raba kowace ƙididdiga da 25.000, wato, a zahiri, mun daidaita sakamakon. Amma an yi wannan musamman don sauƙaƙe fahimtar kayan a matakin farko. A rayuwa, ba za mu buƙaci ƙirƙira da daidaita ƙididdiga ba, amma nemo su. A cikin sassan da ke gaba na labarin za mu samo ma'auni waɗanda aka zaɓi sigogi da su .
04. Hanyar murabba'i mafi ƙanƙanta don ƙayyade vector na nauyi a cikin aikin mayar da martani
Mun riga mun san wannan hanyar don zaɓar vector na nauyi , kamar yadda Hanyar murabba'i mafi ƙanƙanta (LSM) kuma a gaskiya, me ya sa ba za mu yi amfani da shi a cikin matsalolin rarraba binaryar ba? Lallai, babu abin da zai hana ku amfani MNC, kawai wannan hanya a cikin matsalolin rarrabuwa yana ba da sakamakon da bai dace ba Rashin Hankali. Akwai tushen ka'idar wannan. Bari mu fara duba misali ɗaya mai sauƙi.
Bari mu ɗauka cewa samfuranmu (amfani MSE и Rashin Hankali) sun riga sun fara zabar vector na nauyi kuma mun dakatar da lissafin a wani mataki. Ba kome ko a tsakiya, a karshen ko a farkon, babban abu shi ne cewa mun riga da wasu dabi'u na vector na nauyi da kuma bari mu ɗauka cewa a wannan mataki, da vector na nauyi. don duka samfuran biyu babu bambance-bambance. Sa'an nan kuma ɗauki sakamakon ma'aunin nauyi da kuma musanya su a ciki aikin amsa dabaru () ga wani abu na ajin . Muna bincika lokuta biyu lokacin da, daidai da zaɓin vector na ma'aunin nauyi, ƙirarmu ta yi kuskure sosai kuma akasin haka - ƙirar tana da kwarin gwiwa cewa abu na cikin aji ne. . Bari mu ga irin tarar da za a bayar yayin amfani MNC и Rashin Hankali.
Lambar don ƙididdige hukunci dangane da aikin asarar da aka yi amfani da shi
# класс объекта
y = 1
# вероятность отнесения объекта к классу в соответствии с параметрами w
proba_1 = 0.01
MSE_1 = (y - proba_1)**2
print 'Штраф MSE при грубой ошибке =', MSE_1
# напишем функцию для вычисления f(w,x) при известной вероятности отнесения объекта к классу +1 (f(w,x)=ln(odds+))
def f_w_x(proba):
return math.log(proba/(1-proba))
LogLoss_1 = math.log(1+math.exp(-y*f_w_x(proba_1)))
print 'Штраф Log Loss при грубой ошибке =', LogLoss_1
proba_2 = 0.99
MSE_2 = (y - proba_2)**2
LogLoss_2 = math.log(1+math.exp(-y*f_w_x(proba_2)))
print '**************************************************************'
print 'Штраф MSE при сильной уверенности =', MSE_2
print 'Штраф Log Loss при сильной уверенности =', LogLoss_2
Al'amarin kuskure - samfurin yana sanya abu zuwa aji tare da yuwuwar 0,01
Hukunci akan amfani MNC zai kasance:
Hukunci akan amfani Rashin Hankali zai kasance:
Al'amarin amincewa mai ƙarfi - samfurin yana sanya abu zuwa aji tare da yuwuwar 0,99
Hukunci akan amfani MNC zai kasance:
Hukunci akan amfani Rashin Hankali zai kasance:
Wannan misalin yana kwatanta da kyau cewa idan aka sami babban kuskure aikin asara Log Loss hukunta samfurin muhimmanci fiye da MSE. Bari yanzu mu fahimci menene asalin ka'idar don amfani da aikin asara Log Loss a cikin matsalolin rarrabawa.
05. Matsakaicin hanyar yiwuwa da koma bayan dabaru
Kamar yadda aka yi alkawari a farkon, labarin yana cike da misalai masu sauƙi. A cikin ɗakin studio akwai wani misali da tsofaffin baƙi - masu karbar bashi na banki: Vasya, Fedya da Lesha.
Kamar dai, kafin haɓaka misalin, bari in tunatar da ku cewa a rayuwa muna hulɗa da samfurin horo na dubbai ko miliyoyin abubuwa masu dubun ko ɗaruruwan fasali. Koyaya, a nan ana ɗaukar lambobin ta yadda za su iya shiga cikin sauƙi cikin shugaban novice masanin kimiyyar bayanai.
Mu koma ga misali. Bari mu yi tunanin cewa darektan bankin ya yanke shawarar ba da lamuni ga duk wanda yake bukata, duk da cewa algorithm ya gaya masa kada ya ba da shi ga Lesha. Kuma yanzu lokaci ya ishe mu mun san wanne ne cikin jaruman ukun ya biya bashin da kuma wanda bai biya ba. Abin da ake tsammani: Vasya da Fedya sun biya lamunin, amma Lesha bai yi ba. Yanzu bari mu yi tunanin cewa wannan sakamakon zai zama sabon samfurin horo a gare mu, kuma, a lokaci guda, kamar dai duk bayanan da ke tattare da abubuwan da ke haifar da yiwuwar biyan bashin (labashin bashi, girman biyan kuɗi na wata) ya ɓace. Sa'an nan kuma, a cikin fahimta, za mu iya ɗauka cewa kowane mai bashi na uku ba ya biya bashin zuwa banki, ko a wasu kalmomi, yiwuwar mai bashi na gaba ya biya bashin. . Wannan zato mai hankali yana da tabbaci na ka'idar kuma ya dogara akan mafi girman hanyar yiwuwa, sau da yawa a cikin wallafe-wallafen ana kiransa matsakaicin ka'idar yiwuwar.
Da farko, bari mu saba da na'urar ra'ayi.
Yiwuwar yin samfur shi ne yuwuwar samun ainihin irin wannan samfurin, samun daidai irin waɗannan abubuwan lura/sakamako, watau. Samfurin yuwuwar samun kowane sakamakon samfurin (misali, ko an biya lamunin Vasya, Fedya da Lesha ko ba a biya su lokaci guda ba).
Yiwuwar aikin yana da alaƙa da yuwuwar samfurin zuwa ƙimar sigogin rarrabawa.
A cikin yanayinmu, samfurin horon tsarin tsarin Bernoulli ne, wanda bazuwar madaidaicin ke ɗaukar dabi'u biyu kawai: ko . Don haka, ana iya rubuta yuwuwar samfurin azaman aikin yuwuwar siga kamar haka:
Ana iya fassara shigarwar da ke sama kamar haka. Yiwuwar haɗin gwiwa cewa Vasya da Fedya za su biya lamunin daidai yake da , yuwuwar cewa Lesha ba zai biya lamuni daidai ba (tunda BA rancen da aka biya ba ne ya faru), don haka yuwuwar haɗin gwiwa na duk abubuwan guda uku daidai suke. .
Hanya mafi girma hanya ce don ƙididdige sigar da ba a sani ba ta mafi girma yiwuwar ayyuka. A cikin yanayinmu, muna buƙatar samun irin wannan darajar wanda ya kai iyakarsa.
A ina ainihin ra'ayin ya fito - don neman ƙimar sigar da ba a sani ba wacce aikin yuwuwar ya kai matsakaicin? Asalin ra'ayin ya samo asali ne daga ra'ayin cewa samfurin shine kawai tushen ilimin da muke da shi game da yawan jama'a. Duk abin da muka sani game da yawan jama'a ana wakilta a cikin samfurin. Saboda haka, duk abin da za mu iya cewa shi ne samfurin shine mafi daidaitaccen tunanin yawan jama'ar da muke da shi. Don haka, muna buƙatar nemo ma'auni wanda samfurin da ake samu ya zama mafi yuwuwa.
Babu shakka, muna fuskantar matsalar ingantawa wanda a cikinta muke buƙatar nemo madaidaicin wurin aiki. Don nemo maƙasudin ƙaƙƙarfan, wajibi ne a yi la'akari da yanayin tsari na farko, wato, daidaita abin da aka samo asali na aikin zuwa sifili kuma warware ma'auni game da ma'aunin da ake so. Duk da haka, neman abin da aka samo asali na samfurori masu yawa na iya zama aiki mai tsawo; don kauce wa wannan, akwai wata fasaha ta musamman - canzawa zuwa logarithm. yiwuwar ayyuka. Me yasa irin wannan canjin zai yiwu? Bari mu mai da hankali ga gaskiyar cewa ba muna neman iyakar aikin kanta ba, da kuma matsananciyar batu, wato, ƙimar da ba a sani ba wanda ya kai iyakarsa. Lokacin matsawa zuwa logarithm, maƙasudin maɗaukaki baya canzawa (ko da yake ita kanta za ta bambanta), tunda logarithm aiki ne na monotonic.
Bari mu, daidai da abin da ke sama, ci gaba da haɓaka misalinmu tare da lamuni daga Vasya, Fedya da Lesha. Da farko mu ci gaba zuwa logarithm na aikin yiwuwar aiki:
Yanzu za mu iya sauƙin bambanta magana ta hanyar :
Kuma a ƙarshe, la'akari da yanayin oda na farko - muna daidaita abin da aka samu na aikin zuwa sifili:
Don haka, ƙididdigar mu na ilhama na yuwuwar biyan lamuni a ka'ida ya barata.
Babban, amma menene ya kamata mu yi da wannan bayanin yanzu? Idan muka dauka cewa duk mai karbar bashi na uku bai mayar da kudin zuwa banki ba, to babu makawa na karshen zai yi fatara. Haka ne, amma kawai lokacin da aka tantance yiwuwar biyan lamuni daidai Ba mu yi la'akari da abubuwan da ke haifar da biyan bashin ba: albashin mai karbar bashi da girman biyan kuɗi na wata-wata. Bari mu tuna cewa a baya mun ƙididdige yiwuwar biyan bashin ta kowane abokin ciniki, la'akari da waɗannan abubuwan. Yana da ma'ana cewa mun sami yuwuwar bambanta da daidaitattun daidaito .
Bari mu ayyana yuwuwar samfuran:
Lambar don ƙididdige yiwuwar samfurin
from functools import reduce
def likelihood(y,p):
line_true_proba = []
for i in range(len(y)):
ltp_i = p[i]**y[i]*(1-p[i])**(1-y[i])
line_true_proba.append(ltp_i)
likelihood = []
return reduce(lambda a, b: a*b, line_true_proba)
y = [1.0,1.0,0.0]
p_log_response = df['Probability']
const = 2.0/3.0
p_const = [const, const, const]
print 'Правдоподобие выборки при константном значении p=2/3:', round(likelihood(y,p_const),3)
print '****************************************************************************************************'
print 'Правдоподобие выборки при расчетном значении p:', round(likelihood(y,p_log_response),3)
Yiwuwar samfurin a ƙimar ƙima :
Yiwuwar samfurin lokacin ƙididdige yiwuwar biyan lamuni la'akari da dalilai :
Yiwuwar samfurin tare da yuwuwar ƙididdigewa dangane da abubuwan da suka juya sun zama mafi girma fiye da yuwuwar tare da ƙimar yuwuwar dindindin. Menene ma'anar wannan? Wannan yana nuna cewa ilimin game da abubuwan sun ba da damar yin zaɓi daidai da yiwuwar biyan lamuni ga kowane abokin ciniki. Saboda haka, lokacin bayar da lamuni na gaba, zai zama mafi daidai don amfani da samfurin da aka tsara a ƙarshen sashe na 3 na labarin don tantance yiwuwar biyan bashin.
Amma sai, idan muna so mu kara girma aikin yuwuwar samfurin, to me yasa ba za a yi amfani da wasu algorithm wanda zai samar da yiwuwar Vasya, Fedya da Lesha, alal misali, daidai da 0.99, 0.99 da 0.01, bi da bi. Wataƙila irin wannan algorithm zai yi kyau a kan samfurin horo, tun da zai kawo darajar yiwuwar samfurin kusa da , amma, da farko, irin wannan algorithm zai fi dacewa yana da matsaloli tare da ikon haɓakawa, kuma na biyu, wannan algorithm ba shakka ba zai kasance mai layi ba. Kuma idan hanyoyin da za a magance overtraining (daidai da raunin gaba ɗaya) ba a haɗa su a cikin shirin wannan labarin ba, to bari mu shiga cikin batu na biyu dalla-dalla. Don yin wannan, kawai amsa tambaya mai sauƙi. Shin yuwuwar Vasya da Fedya za su iya biyan lamuni iri ɗaya ne, la'akari da abubuwan da muka sani? Daga ra'ayi na sauti dabaru, ba shakka ba, ba zai iya ba. Don haka Vasya zai biya 2.5% na albashinsa a kowane wata don biyan lamunin, kuma Fedya - kusan 27,8%. Har ila yau, a cikin jadawali 2 "Client classification" mun ga cewa Vasya yana da nisa daga layin raba azuzuwan fiye da Fedya. Kuma a ƙarshe, mun san cewa aikin don Vasya da Fedya suna ɗaukar ƙima daban-daban: 4.24 don Vasya da 1.0 don Fedya. Yanzu, idan Fedya, alal misali, ya sami tsari na girma ko kuma ya nemi ƙaramin lamuni, to yuwuwar biyan lamunin Vasya da Fedya zai kasance iri ɗaya. A wasu kalmomi, ba za a iya yaudarar dogaron layi ba. Kuma idan a zahiri mun ƙididdige rashin daidaito , kuma ba mu fitar da su daga iska mai iska ba, za mu iya aminta da cewa ƙimar mu mafi kyawu a ba mu damar kimanta yuwuwar biyan bashin ta kowane mai ba da bashi, amma tunda mun yarda mu ɗauka cewa ƙayyadaddun ƙima. An aiwatar da shi bisa ga dukkan ka'idoji, to, za mu ɗauka haka - ƙididdigar mu suna ba mu damar ba da ƙimar mafi kyawun yuwuwar :)
Duk da haka, muna yin kuskure. A cikin wannan sashe muna buƙatar fahimtar yadda ake ƙayyade vector na ma'aunin nauyi , wanda ya zama dole don tantance yiwuwar biyan bashin da kowane mai bashi.
Bari mu taƙaice da abin da arsenal za mu je neman rashin daidaito :
1. Muna ɗauka cewa dangantakar dake tsakanin maƙasudin maƙasudin (ƙimar tsinkaya) da kuma abin da ke tasiri sakamakon shine layi. A saboda wannan dalili ana amfani da shi aikin koma baya na layi jinsunan , layin da ke rarraba abubuwa (abokan ciniki) zuwa azuzuwan и ko (abokan ciniki waɗanda ke iya biyan bashin da waɗanda ba su da shi). A cikin yanayinmu, ma'auni yana da tsari .
2. Muna amfani inverse logit aiki jinsunan don tantance yuwuwar wani abu na aji .
3. Mun yi la'akari da tsarin horar da mu a matsayin aiwatar da ƙaddamarwa Bernoulli makirci, wato, ga kowane abu ana haifar da canjin bazuwar, wanda tare da yuwuwar (nasa ga kowane abu) yana ɗaukar ƙimar 1 kuma tare da yuwuwar - 0.
4. Mun san abin da muke bukata don ƙarawa aikin yuwuwar samfurin la'akari da abubuwan da aka yarda da su ta yadda samfurin da ke samuwa ya zama mafi dacewa. A wasu kalmomi, muna buƙatar zaɓar sigogi wanda samfurin zai zama mafi dacewa. A cikin yanayinmu, sigar da aka zaɓa shine yuwuwar biyan lamuni , wanda bi da bi ya dogara da ba a sani ba coefficients . Don haka muna buƙatar samun irin wannan vector na nauyi , wanda yiwuwar samfurin zai zama mafi girma.
5. Mun san abin da za mu kara girma samfurin yiwuwar ayyuka iya amfani mafi girman hanyar yiwuwa. Kuma mun san duk dabarun dabaru don yin aiki tare da wannan hanyar.
Wannan shine yadda zai zama motsi mai matakai da yawa :)
Yanzu tuna cewa a farkon labarin muna so mu sami nau'ikan ayyuka guda biyu na asarar Rashin Hankali ya danganta da yadda aka tsara azuzuwan abu. Ya faru da cewa a cikin matsalolin rarrabawa tare da azuzuwan biyu, ana nuna azuzuwan kamar и ko . Dangane da bayanin, fitarwa zai sami aikin asara daidai.
Case 1. Rarraba abubuwa cikin и
Tun da farko, lokacin da aka ƙayyade yiwuwar samfurin, wanda aka ƙididdige yiwuwar biyan bashin da mai ba da bashi bisa ga dalilai kuma an ba da ƙididdiga. , mun yi amfani da dabarar:
A zahiri shine ma'anar Ayyukan amsa dabaru ga wani da aka ba vector na nauyi
Sannan babu abin da zai hana mu rubuta aikin yuwuwar samfurin kamar haka:
Yana faruwa cewa wani lokacin yana da wahala ga wasu novice manazarta su fahimci yadda wannan aikin ke aiki nan da nan. Bari mu dubi gajerun misalai guda 4 da za su warware abubuwa:
1. idan (watau, bisa ga samfurin horo, abu yana cikin aji +1), da kuma algorithm mu yana ƙayyade yuwuwar rarraba abu zuwa aji daidai 0.9, to, wannan yanki na yiwuwar samfurin za a lissafta kamar haka:
2. idan da kuma , to lissafin zai kasance kamar haka:
3. idan da kuma , to lissafin zai kasance kamar haka:
4. idan da kuma , to lissafin zai kasance kamar haka:
A bayyane yake cewa aikin yuwuwar za a haɓaka shi a cikin shari'o'in 1 da 3 ko a cikin yanayin gabaɗaya - tare da ƙimar ƙima daidai na yiwuwar sanya abu zuwa aji. .
Saboda gaskiyar cewa lokacin tantance yiwuwar sanya abu zuwa aji Mu kawai ba mu san ƙididdiga ba , to za mu neme su. Kamar yadda aka ambata a sama, wannan matsala ce ta ingantawa wanda da farko muna buƙatar nemo asalin aikin yuwuwar aiki dangane da vector na nauyi. . Koyaya, da farko yana da ma'ana don sauƙaƙe aikin don kanmu: za mu nemi abin da aka samu na logarithm. yiwuwar ayyuka.
Me yasa bayan logarithm, in ayyukan kuskuren dabaru, mun canza alamar daga a kan . Komai yana da sauƙi, tun da yake a cikin matsalolin tantance ingancin samfurin yana da al'ada don rage girman darajar aiki, mun ninka gefen dama na magana ta hanyar. kuma bisa ga haka, maimakon maximizing, yanzu mun rage girman aikin.
A haƙiƙa, a yanzu, a gaban idanunku, aikin asara ya kasance mai wahala. Rashin Hankali don tsarin horo tare da azuzuwan guda biyu: и .
Yanzu, don nemo ƙididdiga, kawai muna buƙatar nemo abubuwan da aka samo asali ayyukan kuskuren dabaru sannan, ta amfani da hanyoyin inganta lambobi, kamar zuriyar gradient ko zuriyar gradient, zaɓi mafi kyawun ƙididdiga. . Amma, da aka ba da girma na labarin, an ba da shawarar aiwatar da bambance-bambancen da kanku, ko watakila wannan zai zama batun labarin na gaba tare da ƙididdiga masu yawa ba tare da irin waɗannan cikakkun misalai ba.
Case 2. Rarraba abubuwa cikin и
Hanyar nan za ta kasance daidai da azuzuwan и , amma hanyar kanta zuwa fitowar aikin asara Rashin Hankali, zai zama mafi ado. Mu fara. Don yuwuwar aikin za mu yi amfani da afareta "idan... sai..."... Wato idan Abun na ajin ne , to don lissafin yiwuwar samfurin muna amfani da yuwuwar , idan abun na ajin ne , sa'an nan kuma mu musanya a cikin yiwuwar . Wannan shine yadda aikin yuwuwar yayi kama:
Bari mu bayyana akan yatsunmu yadda yake aiki. Bari mu yi la'akari da lokuta 4:
1. idan и , to, yiwuwar samfurin zai "tafi"
2. idan и , to, yiwuwar samfurin zai "tafi"
3. idan и , to, yiwuwar samfurin zai "tafi"
4. idan и , to, yiwuwar samfurin zai "tafi"
A bayyane yake cewa a cikin shari'o'in 1 da 3, lokacin da aka ƙayyade yiwuwar daidaitattun algorithm. aikin yiwuwa za a ƙara girma, wato, wannan shine ainihin abin da muke so mu samu. Koyaya, wannan hanyar tana da wahala sosai kuma a gaba za mu yi la'akari da ƙaramin rubutu. Amma da farko, bari mu yi amfani da aikin logarithm tare da canjin alamar, tunda yanzu za mu rage shi.
Bari mu musanya maimakon magana :
Bari mu sauƙaƙa madaidaicin lokaci a ƙarƙashin logarithm ta amfani da dabarun ƙididdiga masu sauƙi kuma mu sami:
Yanzu lokaci ya yi da za a kawar da ma'aikacin "idan... sai...". Lura cewa lokacin da abu na ajin , sa'an nan a cikin magana a ƙarƙashin logarithm, a cikin ƙididdiga, tashe zuwa ga iko , idan abun na ajin ne , sa'an nan $e$ ya tashi zuwa ga iko . Don haka, ana iya sauƙaƙa bayanin darajar digiri ta hanyar haɗa shari'o'i biyu zuwa ɗaya: . Sa'an nan kuma aikin kuskuren dabaru zai dauki form:
Dangane da ka'idodin logarithm, muna juya juzu'in kuma mu fitar da alamar ""(rasa) don logarithm, muna samun:
Ga aikin asara dabaru hasara, wanda ake amfani da shi a cikin tsarin horo tare da abubuwan da aka sanya zuwa azuzuwan: и .
To, a wannan lokacin na ɗauki hutu kuma muka kammala labarin.
Kayayyakin taimako
1. Littattafai
1) Ana amfani da nazarin koma baya / N. Draper, G. Smith - 2nd ed. – M.: Kudi da Kididdigar, 1986 (fassara daga Turanci)
2) Ka'idar yiwuwa da ƙididdiga na lissafi / V.E. Gmurman - 9th ed. - M.: Makarantar Sakandare, 2003
3) Ka'idar yiwuwar / N.I. Chernova - Novosibirsk: Jami'ar Jihar Novosibirsk, 2007
4) Nazarin kasuwanci: daga bayanai zuwa ilimi / Paklin N. B., Oreshkov V. I. - 2nd ed. - St. Petersburg: Bitrus, 2013
5) Kimiyyar Kimiyyar Bayanan Kimiyya daga karce / Joel Gras - St. Petersburg: BHV Petersburg, 2017
6) Ƙididdiga mai amfani don ƙwararrun Kimiyyar Kimiyya / P. Bruce, E. Bruce - St. Petersburg: BHV Petersburg, 2018
2. Lectures, courses (bidiyo)
1)
2)
3)
4)
5)
3. Hanyoyin Intanet
1)
2)
4)
5)
7)
source: www.habr.com