Maqolada oddiy (juftlangan) regressiya chizig'ining matematik tenglamasini aniqlashning bir necha usullari ko'rib chiqiladi.
Bu yerda muhokama qilingan tenglamani yechishning barcha usullari eng kichik kvadratlar usuliga asoslangan. Usullarni quyidagicha belgilaymiz:
- Analitik yechim
- Gradient tushishi
- Stokastik gradient tushishi
To'g'ri chiziq tenglamasini echishning har bir usuli uchun maqolada turli xil funktsiyalar berilgan, ular asosan kutubxonadan foydalanmasdan yozilganlarga bo'linadi. numpy va hisob-kitoblar uchun foydalanadiganlar numpy. Bu mohirona foydalanish, deb ishoniladi numpy hisoblash xarajatlarini kamaytiradi.
Maqolada keltirilgan barcha kodlar tilda yozilgan piton 2.7 foydalanish Yupyter daftarchasi. Manba kodi va namuna ma'lumotlari bilan fayl joylashtirilgan
Maqola ko'proq yangi boshlanuvchilar uchun ham, sun'iy intellektning juda keng bo'limini - mashinani o'rganishni asta-sekin o'rganishni boshlaganlar uchun mo'ljallangan.
Materialni tasvirlash uchun biz juda oddiy misoldan foydalanamiz.
Misol shartlari
Bizda qaramlikni tavsiflovchi beshta qiymat mavjud Y ΠΎΡ X (1-jadval):
β1-jadval βNamunali shartlarβ
Biz qadriyatlar deb taxmin qilamiz yilning oyi, va - bu oydagi daromad. Boshqacha qilib aytganda, daromad yilning oyiga bog'liq va - daromad bog'liq bo'lgan yagona belgi.
Masalan, daromadning yil oyiga shartli bog'liqligi nuqtai nazaridan ham, qiymatlar soni nuqtai nazaridan ham - ular juda oz. Biroq, bunday soddalashtirish, ular aytganidek, yangi boshlanuvchilar o'zlashtiradigan materialni har doim ham osonlik bilan tushuntirishga imkon beradi. Shuningdek, raqamlarning soddaligi misolni qog'ozda katta mehnat xarajatlarisiz hal qilishni xohlaydiganlarga imkon beradi.
Faraz qilaylik, misolda keltirilgan bog'liqlik shaklning oddiy (juftlangan) regressiya chizig'ining matematik tenglamasi bilan juda yaxshi yaqinlashishi mumkin:
qayerda daromad olingan oy, - oyga to'g'ri keladigan daromad; ΠΈ taxminiy chiziqning regressiya koeffitsientlari hisoblanadi.
Koeffitsientga e'tibor bering ko'pincha taxminiy chiziqning qiyalik yoki gradienti deb ataladi; miqdorini ifodalaydi o'zgarganda .
Shubhasiz, bizning misoldagi vazifamiz tenglamada bunday koeffitsientlarni tanlashdir ΠΈ , bunda bizning hisoblangan daromad qiymatlarining oylar bo'yicha haqiqiy javoblardan og'ishlari, ya'ni. namunada keltirilgan qiymatlar minimal bo'ladi.
Eng kichik kvadrat usuli
Eng kichik kvadratlar usuliga ko'ra, og'ish uni kvadratga aylantirish orqali hisoblanishi kerak. Ushbu uslub, agar ular qarama-qarshi belgilarga ega bo'lsa, og'ishlarni o'zaro bekor qilishdan qochish imkonini beradi. Misol uchun, agar bir holatda, og'ish bo'lsa +5 (ortiqcha besh) va boshqasida -5 (minus besh), keyin og'ishlar yig'indisi bir-birini bekor qiladi va 0 (nol) ga teng bo'ladi. Bu og'ish kvadratiga emas, balki modulning xususiyatidan foydalanish mumkin va keyin barcha og'ishlar ijobiy bo'ladi va to'planadi. Biz bu nuqtaga batafsil to'xtalib o'tmaymiz, shunchaki hisob-kitoblarning qulayligi uchun og'ishning kvadratiga solish odatiy hol ekanligini ko'rsatamiz.
Kvadrat og'ishlarning (xatolarning) eng kichik yig'indisini aniqlaydigan formula shunday ko'rinadi:
qayerda bu to'g'ri javoblarni yaqinlashtirish funktsiyasidir (ya'ni biz hisoblagan daromad),
to'g'ri javoblar (namunada ko'rsatilgan daromad),
namunaviy indeks (og'ish aniqlangan oyning soni)
Funksiyani farqlaylik, qisman differensial tenglamalarni aniqlaymiz va analitik yechimga oβtishga tayyormiz. Ammo birinchi navbatda, differentsiatsiya nima ekanligi haqida qisqacha ekskursiya qilaylik va hosilaning geometrik ma'nosini eslaylik.
Differentsiatsiya
Differentsiallash funksiyaning hosilasini topish amalidir.
hosila nima uchun ishlatiladi? Funktsiyaning hosilasi funktsiyaning o'zgarish tezligini tavsiflaydi va bizga uning yo'nalishini aytadi. Agar berilgan nuqtadagi hosila musbat bo'lsa, u holda funktsiya ortadi, aks holda funktsiya kamayadi. Va mutlaq lotin qiymati qanchalik katta bo'lsa, funktsiya qiymatlarining o'zgarish tezligi shunchalik yuqori bo'ladi, shuningdek, funktsiya grafigining qiyaligi qanchalik tik bo'ladi.
Masalan, Dekart koordinata sistemasi sharoitida M(0,0) nuqtadagi hosilaning qiymati ga teng. + 25 ma'lum bir nuqtada, qiymat o'zgartirilganda, degan ma'noni anglatadi an'anaviy birlik, qiymat bilan o'ngga 25 an'anaviy birlikka oshadi. Grafikda bu qiymatlarning keskin o'sishiga o'xshaydi berilgan nuqtadan.
Yana bir misol. Hosila qiymati teng -0,1 ko'chirilganda degan ma'noni anglatadi bitta an'anaviy birlik uchun qiymat faqat 0,1 an'anaviy birlikka kamayadi. Shu bilan birga, funktsiya grafigida biz deyarli sezilmaydigan pastga qiyalikni kuzatishimiz mumkin. Tog'ga o'xshatish, biz avvalgi misoldan farqli o'laroq, biz juda tik cho'qqilarga chiqishimiz kerak bo'lgan tog'dan yumshoq qiyalikdan juda sekin tushayotganga o'xshaymiz :)
Shunday qilib, funktsiyani farqlashdan keyin ziddiyatlar bilan ΠΈ , 1-tartibli qisman differentsial tenglamalarni aniqlaymiz. Tenglamalarni aniqlagandan so'ng, biz ikkita tenglamalar tizimini olamiz, ularni hal qilish orqali biz koeffitsientlarning bunday qiymatlarini tanlashimiz mumkin. ΠΈ , buning uchun berilgan nuqtalarda mos keladigan hosilalarning qiymatlari juda, juda kichik miqdorga o'zgaradi va analitik yechimda umuman o'zgarmaydi. Boshqacha qilib aytganda, topilgan koeffitsientlardagi xato funktsiyasi minimal darajaga etadi, chunki bu nuqtalarda qisman hosilalarning qiymatlari nolga teng bo'ladi.
Demak, differentsiallash qoidalariga ko'ra, koeffitsientga nisbatan 1-tartibli qisman hosila tenglamasi shaklni oladi:
ga nisbatan 1-tartibli qisman hosila tenglama shaklni oladi:
Natijada, biz juda oddiy analitik yechimga ega bo'lgan tenglamalar tizimini oldik:
boshlash{tenglama*}
boshlash{holatlar}
na + bsumlimits_{i=1}^nx_i β sumlimits_{i=1}^ny_i = 0
sumlimits_{i=1}^nx_i(a +bsumlimits_{i=1}^nx_i β sumlimits_{i=1}^ny_i) = 0
end{holatlar}
end{tenglama*}
Tenglamani yechishdan oldin, keling, oldindan yuklaymiz, yuklanishning to'g'riligini tekshiramiz va ma'lumotlarni formatlaymiz.
Ma'lumotlarni yuklash va formatlash
Shuni ta'kidlash kerakki, analitik yechim uchun, keyinchalik gradient va stokastik gradient tushishi uchun biz kodni ikkita variantda ishlatamiz: kutubxonadan foydalanish numpy va undan foydalanmasdan, keyin bizga tegishli ma'lumotlarni formatlash kerak bo'ladi (kodga qarang).
Ma'lumotlarni yuklash va qayta ishlash kodi
# ΠΈΠΌΠΏΠΎΡΡΠΈΡΡΠ΅ΠΌ Π²ΡΠ΅ Π½ΡΠΆΠ½ΡΠ΅ Π½Π°ΠΌ Π±ΠΈΠ±Π»ΠΈΠΎΡΠ΅ΠΊΠΈ
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import math
import pylab as pl
import random
# Π³ΡΠ°ΡΠΈΠΊΠΈ ΠΎΡΠΎΠ±ΡΠ°Π·ΠΈΠΌ Π² Jupyter
%matplotlib inline
# ΡΠΊΠ°ΠΆΠ΅ΠΌ ΡΠ°Π·ΠΌΠ΅Ρ Π³ΡΠ°ΡΠΈΠΊΠΎΠ²
from pylab import rcParams
rcParams['figure.figsize'] = 12, 6
# ΠΎΡΠΊΠ»ΡΡΠΈΠΌ ΠΏΡΠ΅Π΄ΡΠΏΡΠ΅ΠΆΠ΄Π΅Π½ΠΈΡ Anaconda
import warnings
warnings.simplefilter('ignore')
# Π·Π°Π³ΡΡΠ·ΠΈΠΌ Π·Π½Π°ΡΠ΅Π½ΠΈΡ
table_zero = pd.read_csv('data_example.txt', header=0, sep='t')
# ΠΏΠΎΡΠΌΠΎΡΡΠΈΠΌ ΠΈΠ½ΡΠΎΡΠΌΠ°ΡΠΈΡ ΠΎ ΡΠ°Π±Π»ΠΈΡΠ΅ ΠΈ Π½Π° ΡΠ°ΠΌΡ ΡΠ°Π±Π»ΠΈΡΡ
print table_zero.info()
print '********************************************'
print table_zero
print '********************************************'
# ΠΏΠΎΠ΄Π³ΠΎΡΠΎΠ²ΠΈΠΌ Π΄Π°Π½Π½ΡΠ΅ Π±Π΅Π· ΠΈΡΠΏΠΎΠ»ΡΠ·ΠΎΠ²Π°Π½ΠΈΡ NumPy
x_us = []
[x_us.append(float(i)) for i in table_zero['x']]
print x_us
print type(x_us)
print '********************************************'
y_us = []
[y_us.append(float(i)) for i in table_zero['y']]
print y_us
print type(y_us)
print '********************************************'
# ΠΏΠΎΠ΄Π³ΠΎΡΠΎΠ²ΠΈΠΌ Π΄Π°Π½Π½ΡΠ΅ Ρ ΠΈΡΠΏΠΎΠ»ΡΠ·ΠΎΠ²Π°Π½ΠΈΠ΅ΠΌ NumPy
x_np = table_zero[['x']].values
print x_np
print type(x_np)
print x_np.shape
print '********************************************'
y_np = table_zero[['y']].values
print y_np
print type(y_np)
print y_np.shape
print '********************************************'
Vizualizatsiya
Endi, birinchidan, ma'lumotlarni yuklaganimizdan so'ng, ikkinchidan, yuklanishning to'g'riligini tekshirib, nihoyat ma'lumotlarni formatlashdan so'ng, biz birinchi vizualizatsiyani amalga oshiramiz. Buning uchun tez-tez ishlatiladigan usul juftlik chizmasi kutubxonalar Dengiz tug'ilishi. Bizning misolimizda raqamlar cheklanganligi sababli kutubxonadan foydalanishning ma'nosi yo'q Dengiz tug'ilishi. Biz oddiy kutubxonadan foydalanamiz matplotlib va shunchaki scatterplotga qarang.
Tarqalish kodi
print 'ΠΡΠ°ΡΠΈΠΊ β1 "ΠΠ°Π²ΠΈΡΠΈΠΌΠΎΡΡΡ Π²ΡΡΡΡΠΊΠΈ ΠΎΡ ΠΌΠ΅ΡΡΡΠ° Π³ΠΎΠ΄Π°"'
plt.plot(x_us,y_us,'o',color='green',markersize=16)
plt.xlabel('$Months$', size=16)
plt.ylabel('$Sales$', size=16)
plt.show()
Grafik β 1 Β«Daromadning yil oyiga bog'liqligiΒ»
Analitik yechim
Keling, eng keng tarqalgan vositalardan foydalanaylik python va tenglamalar tizimini yeching:
boshlash{tenglama*}
boshlash{holatlar}
na + bsumlimits_{i=1}^nx_i β sumlimits_{i=1}^ny_i = 0
sumlimits_{i=1}^nx_i(a +bsumlimits_{i=1}^nx_i β sumlimits_{i=1}^ny_i) = 0
end{holatlar}
end{tenglama*}
Kramer qoidasiga ko'ra umumiy aniqlovchini, shuningdek, tomonidan aniqlovchilarni topamiz va tomonidan , shundan keyin aniqlovchini bo'lish umumiy determinantga - koeffitsientni toping , xuddi shunday koeffitsientni topamiz .
Analitik yechim kodi
# ΠΎΠΏΡΠ΅Π΄Π΅Π»ΠΈΠΌ ΡΡΠ½ΠΊΡΠΈΡ Π΄Π»Ρ ΡΠ°ΡΡΠ΅ΡΠ° ΠΊΠΎΡΡΡΠΈΡΠΈΠ΅Π½ΡΠΎΠ² a ΠΈ b ΠΏΠΎ ΠΏΡΠ°Π²ΠΈΠ»Ρ ΠΡΠ°ΠΌΠ΅ΡΠ°
def Kramer_method (x,y):
# ΡΡΠΌΠΌΠ° Π·Π½Π°ΡΠ΅Π½ΠΈΠΉ (Π²ΡΠ΅ ΠΌΠ΅ΡΡΡΠ°)
sx = sum(x)
# ΡΡΠΌΠΌΠ° ΠΈΡΡΠΈΠ½Π½ΡΡ
ΠΎΡΠ²Π΅ΡΠΎΠ² (Π²ΡΡΡΡΠΊΠ° Π·Π° Π²Π΅ΡΡ ΠΏΠ΅ΡΠΈΠΎΠ΄)
sy = sum(y)
# ΡΡΠΌΠΌΠ° ΠΏΡΠΎΠΈΠ·Π²Π΅Π΄Π΅Π½ΠΈΡ Π·Π½Π°ΡΠ΅Π½ΠΈΠΉ Π½Π° ΠΈΡΡΠΈΠ½Π½ΡΠ΅ ΠΎΡΠ²Π΅ΡΡ
list_xy = []
[list_xy.append(x[i]*y[i]) for i in range(len(x))]
sxy = sum(list_xy)
# ΡΡΠΌΠΌΠ° ΠΊΠ²Π°Π΄ΡΠ°ΡΠΎΠ² Π·Π½Π°ΡΠ΅Π½ΠΈΠΉ
list_x_sq = []
[list_x_sq.append(x[i]**2) for i in range(len(x))]
sx_sq = sum(list_x_sq)
# ΠΊΠΎΠ»ΠΈΡΠ΅ΡΡΠ²ΠΎ Π·Π½Π°ΡΠ΅Π½ΠΈΠΉ
n = len(x)
# ΠΎΠ±ΡΠΈΠΉ ΠΎΠΏΡΠ΅Π΄Π΅Π»ΠΈΡΠ΅Π»Ρ
det = sx_sq*n - sx*sx
# ΠΎΠΏΡΠ΅Π΄Π΅Π»ΠΈΡΠ΅Π»Ρ ΠΏΠΎ a
det_a = sx_sq*sy - sx*sxy
# ΠΈΡΠΊΠΎΠΌΡΠΉ ΠΏΠ°ΡΠ°ΠΌΠ΅ΡΡ a
a = (det_a / det)
# ΠΎΠΏΡΠ΅Π΄Π΅Π»ΠΈΡΠ΅Π»Ρ ΠΏΠΎ b
det_b = sxy*n - sy*sx
# ΠΈΡΠΊΠΎΠΌΡΠΉ ΠΏΠ°ΡΠ°ΠΌΠ΅ΡΡ b
b = (det_b / det)
# ΠΊΠΎΠ½ΡΡΠΎΠ»ΡΠ½ΡΠ΅ Π·Π½Π°ΡΠ΅Π½ΠΈΡ (ΠΏΡΠΎΠΎΠ²Π΅ΡΠΊΠ°)
check1 = (n*b + a*sx - sy)
check2 = (b*sx + a*sx_sq - sxy)
return [round(a,4), round(b,4)]
# Π·Π°ΠΏΡΡΡΠΈΠΌ ΡΡΠ½ΠΊΡΠΈΡ ΠΈ Π·Π°ΠΏΠΈΡΠ΅ΠΌ ΠΏΡΠ°Π²ΠΈΠ»ΡΠ½ΡΠ΅ ΠΎΡΠ²Π΅ΡΡ
ab_us = Kramer_method(x_us,y_us)
a_us = ab_us[0]
b_us = ab_us[1]
print ' 33[1m' + ' 33[4m' + "ΠΠΏΡΠΈΠΌΠ°Π»ΡΠ½ΡΠ΅ Π·Π½Π°ΡΠ΅Π½ΠΈΡ ΠΊΠΎΡΡΡΠΈΡΠΈΠ΅Π½ΡΠΎΠ² a ΠΈ b:" + ' 33[0m'
print 'a =', a_us
print 'b =', b_us
print
# ΠΎΠΏΡΠ΅Π΄Π΅Π»ΠΈΠΌ ΡΡΠ½ΠΊΡΠΈΡ Π΄Π»Ρ ΠΏΠΎΠ΄ΡΡΠ΅ΡΠ° ΡΡΠΌΠΌΡ ΠΊΠ²Π°Π΄ΡΠ°ΡΠΎΠ² ΠΎΡΠΈΠ±ΠΎΠΊ
def errors_sq_Kramer_method(answers,x,y):
list_errors_sq = []
for i in range(len(x)):
err = (answers[0] + answers[1]*x[i] - y[i])**2
list_errors_sq.append(err)
return sum(list_errors_sq)
# Π·Π°ΠΏΡΡΡΠΈΠΌ ΡΡΠ½ΠΊΡΠΈΡ ΠΈ Π·Π°ΠΏΠΈΡΠ΅ΠΌ Π·Π½Π°ΡΠ΅Π½ΠΈΠ΅ ΠΎΡΠΈΠ±ΠΊΠΈ
error_sq = errors_sq_Kramer_method(ab_us,x_us,y_us)
print ' 33[1m' + ' 33[4m' + "Π‘ΡΠΌΠΌΠ° ΠΊΠ²Π°Π΄ΡΠ°ΡΠΎΠ² ΠΎΡΠΊΠ»ΠΎΠ½Π΅Π½ΠΈΠΉ" + ' 33[0m'
print error_sq
print
# Π·Π°ΠΌΠ΅ΡΠΈΠΌ Π²ΡΠ΅ΠΌΡ ΡΠ°ΡΡΠ΅ΡΠ°
# print ' 33[1m' + ' 33[4m' + "ΠΡΠ΅ΠΌΡ Π²ΡΠΏΠΎΠ»Π½Π΅Π½ΠΈΡ ΡΠ°ΡΡΠ΅ΡΠ° ΡΡΠΌΠΌΡ ΠΊΠ²Π°Π΄ΡΠ°ΡΠΎΠ² ΠΎΡΠΊΠ»ΠΎΠ½Π΅Π½ΠΈΠΉ:" + ' 33[0m'
# % timeit error_sq = errors_sq_Kramer_method(ab,x_us,y_us)
Mana bizda nima bor:
Shunday qilib, koeffitsientlarning qiymatlari topildi, kvadratik og'ishlar yig'indisi aniqlandi. Tarqalish gistogrammasida topilgan koeffitsientlarga muvofiq to'g'ri chiziq chizamiz.
Regressiya satr kodi
# ΠΎΠΏΡΠ΅Π΄Π΅Π»ΠΈΠΌ ΡΡΠ½ΠΊΡΠΈΡ Π΄Π»Ρ ΡΠΎΡΠΌΠΈΡΠΎΠ²Π°Π½ΠΈΡ ΠΌΠ°ΡΡΠΈΠ²Π° ΡΠ°ΡΡΡΠ΅ΡΠ½ΡΡ
Π·Π½Π°ΡΠ΅Π½ΠΈΠΉ Π²ΡΡΡΡΠΊΠΈ
def sales_count(ab,x,y):
line_answers = []
[line_answers.append(ab[0]+ab[1]*x[i]) for i in range(len(x))]
return line_answers
# ΠΏΠΎΡΡΡΠΎΠΈΠΌ Π³ΡΠ°ΡΠΈΠΊΠΈ
print 'ΠΡΡΠΈΠΊβ2 "ΠΡΠ°Π²ΠΈΠ»ΡΠ½ΡΠ΅ ΠΈ ΡΠ°ΡΡΠ΅ΡΠ½ΡΠ΅ ΠΎΡΠ²Π΅ΡΡ"'
plt.plot(x_us,y_us,'o',color='green',markersize=16, label = '$True$ $answers$')
plt.plot(x_us, sales_count(ab_us,x_us,y_us), color='red',lw=4,
label='$Function: a + bx,$ $where$ $a='+str(round(ab_us[0],2))+',$ $b='+str(round(ab_us[1],2))+'$')
plt.xlabel('$Months$', size=16)
plt.ylabel('$Sales$', size=16)
plt.legend(loc=1, prop={'size': 16})
plt.show()
Grafik β 2 "To'g'ri va hisoblangan javoblar"
Har oy uchun og'ish grafigiga qarashingiz mumkin. Bizning holatda, biz undan muhim amaliy ahamiyatga ega bo'lmaymiz, lekin oddiy chiziqli regressiya tenglamasi daromadning yil oyiga bog'liqligini qanchalik yaxshi tavsiflashiga qiziqishimizni qondiramiz.
Og'ish diagrammasi kodi
# ΠΎΠΏΡΠ΅Π΄Π΅Π»ΠΈΠΌ ΡΡΠ½ΠΊΡΠΈΡ Π΄Π»Ρ ΡΠΎΡΠΌΠΈΡΠΎΠ²Π°Π½ΠΈΡ ΠΌΠ°ΡΡΠΈΠ²Π° ΠΎΡΠΊΠ»ΠΎΠ½Π΅Π½ΠΈΠΉ Π² ΠΏΡΠΎΡΠ΅Π½ΡΠ°Ρ
def error_per_month(ab,x,y):
sales_c = sales_count(ab,x,y)
errors_percent = []
for i in range(len(x)):
errors_percent.append(100*(sales_c[i]-y[i])/y[i])
return errors_percent
# ΠΏΠΎΡΡΡΠΎΠΈΠΌ Π³ΡΠ°ΡΠΈΠΊ
print 'ΠΡΠ°ΡΠΈΠΊβ3 "ΠΡΠΊΠ»ΠΎΠ½Π΅Π½ΠΈΡ ΠΏΠΎ-ΠΌΠ΅ΡΡΡΠ½ΠΎ, %"'
plt.gca().bar(x_us, error_per_month(ab_us,x_us,y_us), color='brown')
plt.xlabel('Months', size=16)
plt.ylabel('Calculation error, %', size=16)
plt.show()
Grafik β 3 βOg'ishlar, %β
Mukammal emas, lekin biz vazifamizni bajardik.
Koeffitsientlarni aniqlash uchun funksiya yozamiz ΠΈ kutubxonadan foydalanadi numpy, aniqrog'i, biz ikkita funktsiyani yozamiz: biri psevdoteskari matritsa yordamida (amalda tavsiya etilmaydi, chunki jarayon hisoblash jihatidan murakkab va beqaror), ikkinchisi matritsa tenglamasi yordamida.
Analitik yechim kodi (NumPy)
# Π΄Π»Ρ Π½Π°ΡΠ°Π»Π° Π΄ΠΎΠ±Π°Π²ΠΈΠΌ ΡΡΠΎΠ»Π±Π΅Ρ Ρ Π½Π΅ ΠΈΠ·ΠΌΠ΅Π½ΡΡΡΠΈΠΌΡΡ Π·Π½Π°ΡΠ΅Π½ΠΈΠ΅ΠΌ Π² 1.
# ΠΠ°Π½Π½ΡΠΉ ΡΡΠΎΠ»Π±Π΅Ρ Π½ΡΠΆΠ΅Π½ Π΄Π»Ρ ΡΠΎΠ³ΠΎ, ΡΡΠΎΠ±Ρ Π½Π΅ ΠΎΠ±ΡΠ°Π±Π°ΡΡΠ²Π°ΡΡ ΠΎΡΠ΄Π΅Π»ΡΠ½ΠΎ ΠΊΠΎΡΡΡΠΈΡΠ΅Π½Ρ a
vector_1 = np.ones((x_np.shape[0],1))
x_np = table_zero[['x']].values # Π½Π° Π²ΡΡΠΊΠΈΠΉ ΡΠ»ΡΡΠ°ΠΉ ΠΏΡΠΈΠ²Π΅Π΄Π΅ΠΌ Π² ΠΏΠ΅ΡΠ²ΠΈΡΠ½ΡΠΉ ΡΠΎΡΠΌΠ°Ρ Π²Π΅ΠΊΡΠΎΡ x_np
x_np = np.hstack((vector_1,x_np))
# ΠΏΡΠΎΠ²Π΅ΡΠΈΠΌ ΡΠΎ, ΡΡΠΎ Π²ΡΠ΅ ΡΠ΄Π΅Π»Π°Π»ΠΈ ΠΏΡΠ°Π²ΠΈΠ»ΡΠ½ΠΎ
print vector_1[0:3]
print x_np[0:3]
print '***************************************'
print
# Π½Π°ΠΏΠΈΡΠ΅ΠΌ ΡΡΠ½ΠΊΡΠΈΡ, ΠΊΠΎΡΠΎΡΠ°Ρ ΠΎΠΏΡΠ΅Π΄Π΅Π»ΡΠ΅Ρ Π·Π½Π°ΡΠ΅Π½ΠΈΡ ΠΊΠΎΡΡΡΠΈΡΠΈΠ΅Π½ΡΠΎΠ² a ΠΈ b Ρ ΠΈΡΠΏΠΎΠ»ΡΠ·ΠΎΠ²Π°Π½ΠΈΠ΅ΠΌ ΠΏΡΠ΅Π²Π΄ΠΎΠΎΠ±ΡΠ°ΡΠ½ΠΎΠΉ ΠΌΠ°ΡΡΠΈΡΡ
def pseudoinverse_matrix(X, y):
# Π·Π°Π΄Π°Π΅ΠΌ ΡΠ²Π½ΡΠΉ ΡΠΎΡΠΌΠ°Ρ ΠΌΠ°ΡΡΠΈΡΡ ΠΏΡΠΈΠ·Π½Π°ΠΊΠΎΠ²
X = np.matrix(X)
# ΠΎΠΏΡΠ΅Π΄Π΅Π»ΡΠ΅ΠΌ ΡΡΠ°Π½ΡΠΏΠΎΠ½ΠΈΡΠΎΠ²Π°Π½Π½ΡΡ ΠΌΠ°ΡΡΠΈΡΡ
XT = X.T
# ΠΎΠΏΡΠ΅Π΄Π΅Π»ΡΠ΅ΠΌ ΠΊΠ²Π°Π΄ΡΠ°ΡΠ½ΡΡ ΠΌΠ°ΡΡΠΈΡΡ
XTX = XT*X
# ΠΎΠΏΡΠ΅Π΄Π΅Π»ΡΠ΅ΠΌ ΠΏΡΠ΅Π²Π΄ΠΎΠΎΠ±ΡΠ°ΡΠ½ΡΡ ΠΌΠ°ΡΡΠΈΡΡ
inv = np.linalg.pinv(XTX)
# Π·Π°Π΄Π°Π΅ΠΌ ΡΠ²Π½ΡΠΉ ΡΠΎΡΠΌΠ°Ρ ΠΌΠ°ΡΡΠΈΡΡ ΠΎΡΠ²Π΅ΡΠΎΠ²
y = np.matrix(y)
# Π½Π°Ρ
ΠΎΠ΄ΠΈΠΌ Π²Π΅ΠΊΡΠΎΡ Π²Π΅ΡΠΎΠ²
return (inv*XT)*y
# Π·Π°ΠΏΡΡΡΠΈΠΌ ΡΡΠ½ΠΊΡΠΈΡ
ab_np = pseudoinverse_matrix(x_np, y_np)
print ab_np
print '***************************************'
print
# Π½Π°ΠΏΠΈΡΠ΅ΠΌ ΡΡΠ½ΠΊΡΠΈΡ, ΠΊΠΎΡΠΎΡΠ°Ρ ΠΈΡΠΏΠΎΠ»ΡΠ·ΡΠ΅Ρ Π΄Π»Ρ ΡΠ΅ΡΠ΅Π½ΠΈΡ ΠΌΠ°ΡΡΠΈΡΠ½ΠΎΠ΅ ΡΡΠ°Π²Π½Π΅Π½ΠΈΠ΅
def matrix_equation(X,y):
a = np.dot(X.T, X)
b = np.dot(X.T, y)
return np.linalg.solve(a, b)
# Π·Π°ΠΏΡΡΡΠΈΠΌ ΡΡΠ½ΠΊΡΠΈΡ
ab_np = matrix_equation(x_np,y_np)
print ab_np
Keling, koeffitsientlarni aniqlashga sarflangan vaqtni taqqoslaylik ΠΈ , taqdim etilgan 3 ta usulga muvofiq.
Hisoblash vaqtini hisoblash uchun kod
print ' 33[1m' + ' 33[4m' + "ΠΡΠ΅ΠΌΡ Π²ΡΠΏΠΎΠ»Π½Π΅Π½ΠΈΡ ΡΠ°ΡΡΠ΅ΡΠ° ΠΊΠΎΡΡΡΠΈΡΠΈΠ΅Π½ΡΠΎΠ² Π±Π΅Π· ΠΈΡΠΏΠΎΠ»ΡΠ·ΠΎΠ²Π°Π½ΠΈΡ Π±ΠΈΠ±Π»ΠΈΠΎΡΠ΅ΠΊΠΈ NumPy:" + ' 33[0m'
% timeit ab_us = Kramer_method(x_us,y_us)
print '***************************************'
print
print ' 33[1m' + ' 33[4m' + "ΠΡΠ΅ΠΌΡ Π²ΡΠΏΠΎΠ»Π½Π΅Π½ΠΈΡ ΡΠ°ΡΡΠ΅ΡΠ° ΠΊΠΎΡΡΡΠΈΡΠΈΠ΅Π½ΡΠΎΠ² Ρ ΠΈΡΠΏΠΎΠ»ΡΠ·ΠΎΠ²Π°Π½ΠΈΠ΅ΠΌ ΠΏΡΠ΅Π²Π΄ΠΎΠΎΠ±ΡΠ°ΡΠ½ΠΎΠΉ ΠΌΠ°ΡΡΠΈΡΡ:" + ' 33[0m'
%timeit ab_np = pseudoinverse_matrix(x_np, y_np)
print '***************************************'
print
print ' 33[1m' + ' 33[4m' + "ΠΡΠ΅ΠΌΡ Π²ΡΠΏΠΎΠ»Π½Π΅Π½ΠΈΡ ΡΠ°ΡΡΠ΅ΡΠ° ΠΊΠΎΡΡΡΠΈΡΠΈΠ΅Π½ΡΠΎΠ² Ρ ΠΈΡΠΏΠΎΠ»ΡΠ·ΠΎΠ²Π°Π½ΠΈΠ΅ΠΌ ΠΌΠ°ΡΡΠΈΡΠ½ΠΎΠ³ΠΎ ΡΡΠ°Π²Π½Π΅Π½ΠΈΡ:" + ' 33[0m'
%timeit ab_np = matrix_equation(x_np, y_np)
Kichik miqdordagi ma'lumotlar bilan "o'z-o'zidan yozilgan" funksiya oldinga chiqadi, bu esa Cramer usuli yordamida koeffitsientlarni topadi.
Endi siz koeffitsientlarni topishning boshqa usullariga o'tishingiz mumkin ΠΈ .
Gradient tushishi
Birinchidan, gradient nima ekanligini aniqlaylik. Oddiy qilib aytganda, gradient - bu funksiyaning maksimal o'sish yo'nalishini ko'rsatadigan segment. Tog'ga ko'tarilish bilan taqqoslaganda, gradient yuzlari tog' cho'qqisiga eng tik ko'tarilish joyi bo'lgan joyda. Tog' bilan misolni ishlab chiqsak, biz pasttekislikka imkon qadar tezroq erishish uchun eng tik tushish kerakligini eslaymiz, ya'ni minimal - funktsiya o'smaydigan yoki kamaymaydigan joy. Bu vaqtda hosila nolga teng bo'ladi. Shuning uchun bizga gradient emas, balki antigradient kerak. Antigradientni topish uchun gradientni ko'paytirish kifoya -1 (minus bir).
Funktsiyaning bir nechta minimalari bo'lishi mumkinligiga e'tibor qarataylik va quyida taklif qilingan algoritm yordamida ulardan biriga tushsak, topilganidan pastroq bo'lishi mumkin bo'lgan boshqa minimumni topa olmaymiz. Keling, tinchlanaylik, bu bizga tahdid emas! Bizning holatimizda biz bitta minimal bilan shug'ullanamiz, chunki bizning funktsiyamiz Grafikda muntazam parabola. Va hammamiz maktab matematika kursidan juda yaxshi bilishimiz kerakki, parabolaning faqat bitta minimumi bor.
Bizga gradient nima uchun kerakligini, shuningdek, gradient segment, ya'ni koordinatalari bir xil koeffitsientlarga ega vektor ekanligini aniqlaganimizdan so'ng. ΠΈ gradient tushishni amalga oshirishimiz mumkin.
Boshlashdan oldin, men tushish algoritmi haqida bir nechta jumlalarni o'qishni taklif qilaman:
- Biz koeffitsientlarning koordinatalarini psevdo-tasodifiy tarzda aniqlaymiz ΠΈ . Bizning misolimizda biz nolga yaqin koeffitsientlarni aniqlaymiz. Bu odatiy amaliyot, ammo har bir holatda o'z amaliyoti bo'lishi mumkin.
- Koordinatadan nuqtadagi 1-tartibli qisman hosilaning qiymatini ayirish . Shunday qilib, agar hosila ijobiy bo'lsa, u holda funktsiya ortadi. Shuning uchun hosila qiymatini ayirib, biz o'sishning teskari yo'nalishi bo'yicha, ya'ni tushish yo'nalishi bo'yicha harakat qilamiz. Agar hosila manfiy bo'lsa, bu nuqtadagi funktsiya kamayadi va hosila qiymatini ayirish orqali biz tushish yo'nalishi bo'yicha harakat qilamiz.
- Biz koordinata bilan shunga o'xshash operatsiyani bajaramiz : nuqtadagi qisman hosilaning qiymatini ayirish .
- Minimaldan sakrab o'tmaslik va chuqur kosmosga uchmaslik uchun qadam o'lchamini tushish yo'nalishi bo'yicha belgilash kerak. Umuman olganda, hisoblash xarajatlarini kamaytirish uchun qadamni qanday to'g'ri o'rnatish va tushish jarayonida uni qanday o'zgartirish haqida butun maqola yozishingiz mumkin. Ammo endi oldimizda biroz boshqacha vazifa turibdi va biz qadam o'lchamini ilmiy "poke" usuli yoki ular aytganidek, empirik tarzda o'rnatamiz.
- Berilgan koordinatalardan chiqqanimizdan keyin ΠΈ hosilalarning qiymatlarini olib tashlasak, biz yangi koordinatalarni olamiz ΠΈ . Biz hisoblangan koordinatalardan keyingi qadamni (ayirish) qilamiz. Va shuning uchun tsikl kerakli konvergentsiyaga erishilgunga qadar qayta-qayta boshlanadi.
Hammasi! Endi biz Mariana xandaqining eng chuqur darasini qidirishga tayyormiz. Qani boshladik.
Gradient tushish uchun kod
# Π½Π°ΠΏΠΈΡΠ΅ΠΌ ΡΡΠ½ΠΊΡΠΈΡ Π³ΡΠ°Π΄ΠΈΠ΅Π½ΡΠ½ΠΎΠ³ΠΎ ΡΠΏΡΡΠΊΠ° Π±Π΅Π· ΠΈΡΠΏΠΎΠ»ΡΠ·ΠΎΠ²Π°Π½ΠΈΡ Π±ΠΈΠ±Π»ΠΈΠΎΡΠ΅ΠΊΠΈ NumPy.
# Π€ΡΠ½ΠΊΡΠΈΡ Π½Π° Π²Ρ
ΠΎΠ΄ ΠΏΡΠΈΠ½ΠΈΠΌΠ°Π΅Ρ Π΄ΠΈΠ°ΠΏΠ°Π·ΠΎΠ½Ρ Π·Π½Π°ΡΠ΅Π½ΠΈΠΉ x,y, Π΄Π»ΠΈΠ½Ρ ΡΠ°Π³Π° (ΠΏΠΎ ΡΠΌΠΎΠ»ΡΠ°Π½ΠΈΡ=0,1), Π΄ΠΎΠΏΡΡΡΠΈΠΌΡΡ ΠΏΠΎΠ³ΡΠ΅ΡΠ½ΠΎΡΡΡ(tolerance)
def gradient_descent_usual(x_us,y_us,l=0.1,tolerance=0.000000000001):
# ΡΡΠΌΠΌΠ° Π·Π½Π°ΡΠ΅Π½ΠΈΠΉ (Π²ΡΠ΅ ΠΌΠ΅ΡΡΡΠ°)
sx = sum(x_us)
# ΡΡΠΌΠΌΠ° ΠΈΡΡΠΈΠ½Π½ΡΡ
ΠΎΡΠ²Π΅ΡΠΎΠ² (Π²ΡΡΡΡΠΊΠ° Π·Π° Π²Π΅ΡΡ ΠΏΠ΅ΡΠΈΠΎΠ΄)
sy = sum(y_us)
# ΡΡΠΌΠΌΠ° ΠΏΡΠΎΠΈΠ·Π²Π΅Π΄Π΅Π½ΠΈΡ Π·Π½Π°ΡΠ΅Π½ΠΈΠΉ Π½Π° ΠΈΡΡΠΈΠ½Π½ΡΠ΅ ΠΎΡΠ²Π΅ΡΡ
list_xy = []
[list_xy.append(x_us[i]*y_us[i]) for i in range(len(x_us))]
sxy = sum(list_xy)
# ΡΡΠΌΠΌΠ° ΠΊΠ²Π°Π΄ΡΠ°ΡΠΎΠ² Π·Π½Π°ΡΠ΅Π½ΠΈΠΉ
list_x_sq = []
[list_x_sq.append(x_us[i]**2) for i in range(len(x_us))]
sx_sq = sum(list_x_sq)
# ΠΊΠΎΠ»ΠΈΡΠ΅ΡΡΠ²ΠΎ Π·Π½Π°ΡΠ΅Π½ΠΈΠΉ
num = len(x_us)
# Π½Π°ΡΠ°Π»ΡΠ½ΡΠ΅ Π·Π½Π°ΡΠ΅Π½ΠΈΡ ΠΊΠΎΡΡΡΠΈΡΠΈΠ΅Π½ΡΠΎΠ², ΠΎΠΏΡΠ΅Π΄Π΅Π»Π΅Π½Π½ΡΠ΅ ΠΏΡΠ΅Π²Π΄ΠΎΡΠ»ΡΡΠ°ΠΉΠ½ΡΠΌ ΠΎΠ±ΡΠ°Π·ΠΎΠΌ
a = float(random.uniform(-0.5, 0.5))
b = float(random.uniform(-0.5, 0.5))
# ΡΠΎΠ·Π΄Π°Π΅ΠΌ ΠΌΠ°ΡΡΠΈΠ² Ρ ΠΎΡΠΈΠ±ΠΊΠ°ΠΌΠΈ, Π΄Π»Ρ ΡΡΠ°ΡΡΠ° ΠΈΡΠΏΠΎΠ»ΡΠ·ΡΠ΅ΠΌ Π·Π½Π°ΡΠ΅Π½ΠΈΡ 1 ΠΈ 0
# ΠΏΠΎΡΠ»Π΅ Π·Π°Π²Π΅ΡΡΠ΅Π½ΠΈΡ ΡΠΏΡΡΠΊΠ° ΡΡΠ°ΡΡΠΎΠ²ΡΠ΅ Π·Π½Π°ΡΠ΅Π½ΠΈΡ ΡΠ΄Π°Π»ΠΈΠΌ
errors = [1,0]
# Π·Π°ΠΏΡΡΠΊΠ°Π΅ΠΌ ΡΠΈΠΊΠ» ΡΠΏΡΡΠΊΠ°
# ΡΠΈΠΊΠ» ΡΠ°Π±ΠΎΡΠ°Π΅Ρ Π΄ΠΎ ΡΠ΅Ρ
ΠΏΠΎΡ, ΠΏΠΎΠΊΠ° ΠΎΡΠΊΠ»ΠΎΠ½Π΅Π½ΠΈΠ΅ ΠΏΠΎΡΠ»Π΅Π΄Π½Π΅ΠΉ ΠΎΡΠΈΠ±ΠΊΠΈ ΡΡΠΌΠΌΡ ΠΊΠ²Π°Π΄ΡΠ°ΡΠΎΠ² ΠΎΡ ΠΏΡΠ΅Π΄ΡΠ΄ΡΡΠ΅ΠΉ, Π½Π΅ Π±ΡΠ΄Π΅Ρ ΠΌΠ΅Π½ΡΡΠ΅ tolerance
while abs(errors[-1]-errors[-2]) > tolerance:
a_step = a - l*(num*a + b*sx - sy)/num
b_step = b - l*(a*sx + b*sx_sq - sxy)/num
a = a_step
b = b_step
ab = [a,b]
errors.append(errors_sq_Kramer_method(ab,x_us,y_us))
return (ab),(errors[2:])
# Π·Π°ΠΏΠΈΡΠ΅ΠΌ ΠΌΠ°ΡΡΠΈΠ² Π·Π½Π°ΡΠ΅Π½ΠΈΠΉ
list_parametres_gradient_descence = gradient_descent_usual(x_us,y_us,l=0.1,tolerance=0.000000000001)
print ' 33[1m' + ' 33[4m' + "ΠΠ½Π°ΡΠ΅Π½ΠΈΡ ΠΊΠΎΡΡΡΠΈΡΠΈΠ΅Π½ΡΠΎΠ² a ΠΈ b:" + ' 33[0m'
print 'a =', round(list_parametres_gradient_descence[0][0],3)
print 'b =', round(list_parametres_gradient_descence[0][1],3)
print
print ' 33[1m' + ' 33[4m' + "Π‘ΡΠΌΠΌΠ° ΠΊΠ²Π°Π΄ΡΠ°ΡΠΎΠ² ΠΎΡΠΊΠ»ΠΎΠ½Π΅Π½ΠΈΠΉ:" + ' 33[0m'
print round(list_parametres_gradient_descence[1][-1],3)
print
print ' 33[1m' + ' 33[4m' + "ΠΠΎΠ»ΠΈΡΠ΅ΡΡΠ²ΠΎ ΠΈΡΠ΅ΡΠ°ΡΠΈΠΉ Π² Π³ΡΠ°Π΄ΠΈΠ΅Π½ΡΠ½ΠΎΠΌ ΡΠΏΡΡΠΊΠ΅:" + ' 33[0m'
print len(list_parametres_gradient_descence[1])
print
Biz Mariana xandaqining eng tubiga sho'ng'idik va u erda bir xil koeffitsient qiymatlarini topdik ΠΈ , aynan nima kutilgan edi.
Keling, yana bir sho'ng'in qilaylik, faqat bu safar bizning chuqur dengiz transportimiz boshqa texnologiyalar, xususan kutubxona bilan to'ldiriladi. numpy.
Gradient tushish kodi (NumPy)
# ΠΏΠ΅ΡΠ΅Π΄ ΡΠ΅ΠΌ ΠΎΠΏΡΠ΅Π΄Π΅Π»ΠΈΡΡ ΡΡΠ½ΠΊΡΠΈΡ Π΄Π»Ρ Π³ΡΠ°Π΄ΠΈΠ΅Π½ΡΠ½ΠΎΠ³ΠΎ ΡΠΏΡΡΠΊΠ° Ρ ΠΈΡΠΏΠΎΠ»ΡΠ·ΠΎΠ²Π°Π½ΠΈΠ΅ΠΌ Π±ΠΈΠ±Π»ΠΈΠΎΡΠ΅ΠΊΠΈ NumPy,
# Π½Π°ΠΏΠΈΡΠ΅ΠΌ ΡΡΠ½ΠΊΡΠΈΡ ΠΎΠΏΡΠ΅Π΄Π΅Π»Π΅Π½ΠΈΡ ΡΡΠΌΠΌΡ ΠΊΠ²Π°Π΄ΡΠ°ΡΠΎΠ² ΠΎΡΠΊΠ»ΠΎΠ½Π΅Π½ΠΈΠΉ ΡΠ°ΠΊΠΆΠ΅ Ρ ΠΈΡΠΏΠΎΠ»ΡΠ·ΠΎΠ²Π°Π½ΠΈΠ΅ΠΌ NumPy
def error_square_numpy(ab,x_np,y_np):
y_pred = np.dot(x_np,ab)
error = y_pred - y_np
return sum((error)**2)
# Π½Π°ΠΏΠΈΡΠ΅ΠΌ ΡΡΠ½ΠΊΡΠΈΡ Π³ΡΠ°Π΄ΠΈΠ΅Π½ΡΠ½ΠΎΠ³ΠΎ ΡΠΏΡΡΠΊΠ° Ρ ΠΈΡΠΏΠΎΠ»ΡΠ·ΠΎΠ²Π°Π½ΠΈΠ΅ΠΌ Π±ΠΈΠ±Π»ΠΈΠΎΡΠ΅ΠΊΠΈ NumPy.
# Π€ΡΠ½ΠΊΡΠΈΡ Π½Π° Π²Ρ
ΠΎΠ΄ ΠΏΡΠΈΠ½ΠΈΠΌΠ°Π΅Ρ Π΄ΠΈΠ°ΠΏΠ°Π·ΠΎΠ½Ρ Π·Π½Π°ΡΠ΅Π½ΠΈΠΉ x,y, Π΄Π»ΠΈΠ½Ρ ΡΠ°Π³Π° (ΠΏΠΎ ΡΠΌΠΎΠ»ΡΠ°Π½ΠΈΡ=0,1), Π΄ΠΎΠΏΡΡΡΠΈΠΌΡΡ ΠΏΠΎΠ³ΡΠ΅ΡΠ½ΠΎΡΡΡ(tolerance)
def gradient_descent_numpy(x_np,y_np,l=0.1,tolerance=0.000000000001):
# ΡΡΠΌΠΌΠ° Π·Π½Π°ΡΠ΅Π½ΠΈΠΉ (Π²ΡΠ΅ ΠΌΠ΅ΡΡΡΠ°)
sx = float(sum(x_np[:,1]))
# ΡΡΠΌΠΌΠ° ΠΈΡΡΠΈΠ½Π½ΡΡ
ΠΎΡΠ²Π΅ΡΠΎΠ² (Π²ΡΡΡΡΠΊΠ° Π·Π° Π²Π΅ΡΡ ΠΏΠ΅ΡΠΈΠΎΠ΄)
sy = float(sum(y_np))
# ΡΡΠΌΠΌΠ° ΠΏΡΠΎΠΈΠ·Π²Π΅Π΄Π΅Π½ΠΈΡ Π·Π½Π°ΡΠ΅Π½ΠΈΠΉ Π½Π° ΠΈΡΡΠΈΠ½Π½ΡΠ΅ ΠΎΡΠ²Π΅ΡΡ
sxy = x_np*y_np
sxy = float(sum(sxy[:,1]))
# ΡΡΠΌΠΌΠ° ΠΊΠ²Π°Π΄ΡΠ°ΡΠΎΠ² Π·Π½Π°ΡΠ΅Π½ΠΈΠΉ
sx_sq = float(sum(x_np[:,1]**2))
# ΠΊΠΎΠ»ΠΈΡΠ΅ΡΡΠ²ΠΎ Π·Π½Π°ΡΠ΅Π½ΠΈΠΉ
num = float(x_np.shape[0])
# Π½Π°ΡΠ°Π»ΡΠ½ΡΠ΅ Π·Π½Π°ΡΠ΅Π½ΠΈΡ ΠΊΠΎΡΡΡΠΈΡΠΈΠ΅Π½ΡΠΎΠ², ΠΎΠΏΡΠ΅Π΄Π΅Π»Π΅Π½Π½ΡΠ΅ ΠΏΡΠ΅Π²Π΄ΠΎΡΠ»ΡΡΠ°ΠΉΠ½ΡΠΌ ΠΎΠ±ΡΠ°Π·ΠΎΠΌ
a = float(random.uniform(-0.5, 0.5))
b = float(random.uniform(-0.5, 0.5))
# ΡΠΎΠ·Π΄Π°Π΅ΠΌ ΠΌΠ°ΡΡΠΈΠ² Ρ ΠΎΡΠΈΠ±ΠΊΠ°ΠΌΠΈ, Π΄Π»Ρ ΡΡΠ°ΡΡΠ° ΠΈΡΠΏΠΎΠ»ΡΠ·ΡΠ΅ΠΌ Π·Π½Π°ΡΠ΅Π½ΠΈΡ 1 ΠΈ 0
# ΠΏΠΎΡΠ»Π΅ Π·Π°Π²Π΅ΡΡΠ΅Π½ΠΈΡ ΡΠΏΡΡΠΊΠ° ΡΡΠ°ΡΡΠΎΠ²ΡΠ΅ Π·Π½Π°ΡΠ΅Π½ΠΈΡ ΡΠ΄Π°Π»ΠΈΠΌ
errors = [1,0]
# Π·Π°ΠΏΡΡΠΊΠ°Π΅ΠΌ ΡΠΈΠΊΠ» ΡΠΏΡΡΠΊΠ°
# ΡΠΈΠΊΠ» ΡΠ°Π±ΠΎΡΠ°Π΅Ρ Π΄ΠΎ ΡΠ΅Ρ
ΠΏΠΎΡ, ΠΏΠΎΠΊΠ° ΠΎΡΠΊΠ»ΠΎΠ½Π΅Π½ΠΈΠ΅ ΠΏΠΎΡΠ»Π΅Π΄Π½Π΅ΠΉ ΠΎΡΠΈΠ±ΠΊΠΈ ΡΡΠΌΠΌΡ ΠΊΠ²Π°Π΄ΡΠ°ΡΠΎΠ² ΠΎΡ ΠΏΡΠ΅Π΄ΡΠ΄ΡΡΠ΅ΠΉ, Π½Π΅ Π±ΡΠ΄Π΅Ρ ΠΌΠ΅Π½ΡΡΠ΅ tolerance
while abs(errors[-1]-errors[-2]) > tolerance:
a_step = a - l*(num*a + b*sx - sy)/num
b_step = b - l*(a*sx + b*sx_sq - sxy)/num
a = a_step
b = b_step
ab = np.array([[a],[b]])
errors.append(error_square_numpy(ab,x_np,y_np))
return (ab),(errors[2:])
# Π·Π°ΠΏΠΈΡΠ΅ΠΌ ΠΌΠ°ΡΡΠΈΠ² Π·Π½Π°ΡΠ΅Π½ΠΈΠΉ
list_parametres_gradient_descence = gradient_descent_numpy(x_np,y_np,l=0.1,tolerance=0.000000000001)
print ' 33[1m' + ' 33[4m' + "ΠΠ½Π°ΡΠ΅Π½ΠΈΡ ΠΊΠΎΡΡΡΠΈΡΠΈΠ΅Π½ΡΠΎΠ² a ΠΈ b:" + ' 33[0m'
print 'a =', round(list_parametres_gradient_descence[0][0],3)
print 'b =', round(list_parametres_gradient_descence[0][1],3)
print
print ' 33[1m' + ' 33[4m' + "Π‘ΡΠΌΠΌΠ° ΠΊΠ²Π°Π΄ΡΠ°ΡΠΎΠ² ΠΎΡΠΊΠ»ΠΎΠ½Π΅Π½ΠΈΠΉ:" + ' 33[0m'
print round(list_parametres_gradient_descence[1][-1],3)
print
print ' 33[1m' + ' 33[4m' + "ΠΠΎΠ»ΠΈΡΠ΅ΡΡΠ²ΠΎ ΠΈΡΠ΅ΡΠ°ΡΠΈΠΉ Π² Π³ΡΠ°Π΄ΠΈΠ΅Π½ΡΠ½ΠΎΠΌ ΡΠΏΡΡΠΊΠ΅:" + ' 33[0m'
print len(list_parametres_gradient_descence[1])
print
Koeffitsient qiymatlari ΠΈ o'zgarmas.
Keling, gradient tushishi paytida xato qanday o'zgarganini, ya'ni har bir qadamda kvadrat og'ishlar yig'indisi qanday o'zgarganini ko'rib chiqaylik.
Kvadrat og'ishlar yig'indisini chizish uchun kod
print 'ΠΡΠ°ΡΠΈΠΊβ4 "Π‘ΡΠΌΠΌΠ° ΠΊΠ²Π°Π΄ΡΠ°ΡΠΎΠ² ΠΎΡΠΊΠ»ΠΎΠ½Π΅Π½ΠΈΠΉ ΠΏΠΎ-ΡΠ°Π³ΠΎΠ²ΠΎ"'
plt.plot(range(len(list_parametres_gradient_descence[1])), list_parametres_gradient_descence[1], color='red', lw=3)
plt.xlabel('Steps (Iteration)', size=16)
plt.ylabel('Sum of squared deviations', size=16)
plt.show()
Grafik β 4 Β«Gradiyent tushish paytida kvadrat og'ishlar yig'indisiΒ»
Grafikda biz har bir qadamda xato kamayib borishini ko'ramiz va ma'lum miqdordagi iteratsiyalardan so'ng biz deyarli gorizontal chiziqni kuzatamiz.
Nihoyat, kodni bajarish vaqtidagi farqni hisoblaylik:
Gradient tushishini hisoblash vaqtini aniqlash uchun kod
print ' 33[1m' + ' 33[4m' + "ΠΡΠ΅ΠΌΡ Π²ΡΠΏΠΎΠ»Π½Π΅Π½ΠΈΡ Π³ΡΠ°Π΄ΠΈΠ΅Π½ΡΠ½ΠΎΠ³ΠΎ ΡΠΏΡΡΠΊΠ° Π±Π΅Π· ΠΈΡΠΏΠΎΠ»ΡΠ·ΠΎΠ²Π°Π½ΠΈΡ Π±ΠΈΠ±Π»ΠΈΠΎΡΠ΅ΠΊΠΈ NumPy:" + ' 33[0m'
%timeit list_parametres_gradient_descence = gradient_descent_usual(x_us,y_us,l=0.1,tolerance=0.000000000001)
print '***************************************'
print
print ' 33[1m' + ' 33[4m' + "ΠΡΠ΅ΠΌΡ Π²ΡΠΏΠΎΠ»Π½Π΅Π½ΠΈΡ Π³ΡΠ°Π΄ΠΈΠ΅Π½ΡΠ½ΠΎΠ³ΠΎ ΡΠΏΡΡΠΊΠ° Ρ ΠΈΡΠΏΠΎΠ»ΡΠ·ΠΎΠ²Π°Π½ΠΈΠ΅ΠΌ Π±ΠΈΠ±Π»ΠΈΠΎΡΠ΅ΠΊΠΈ NumPy:" + ' 33[0m'
%timeit list_parametres_gradient_descence = gradient_descent_numpy(x_np,y_np,l=0.1,tolerance=0.000000000001)
Ehtimol, biz noto'g'ri ish qilyapmiz, lekin yana bu kutubxonadan foydalanmaydigan oddiy "uyda yozilgan" funksiya. numpy kutubxonadan foydalangan holda funksiyani hisoblash vaqtidan oshib ketadi numpy.
Ammo biz bir joyda turmayapmiz, balki oddiy chiziqli regressiya tenglamasini yechishning boshqa qiziqarli usulini o'rganishga intilyapmiz. Tanishing!
Stokastik gradient tushishi
Stokastik gradient tushishining ishlash printsipini tezda tushunish uchun uning oddiy gradient tushishidan farqini aniqlash yaxshiroqdir. Biz, gradient tushish holatida, hosilalarining tenglamalarida ΠΈ namunada mavjud bo'lgan barcha xususiyatlar va to'g'ri javoblar qiymatlari yig'indisidan foydalangan (ya'ni barcha xususiyatlarning yig'indisi). ΠΈ ). Stokastik gradient tushishida biz namunadagi barcha qiymatlardan foydalanmaymiz, aksincha, psevdo-tasodifiy ravishda namunaviy indeks deb ataladigan narsani tanlaymiz va uning qiymatlaridan foydalanamiz.
Misol uchun, agar indeks 3 (uch) raqami bo'lishi aniqlansa, biz qiymatlarni olamiz ΠΈ , keyin biz qiymatlarni hosila tenglamalariga almashtiramiz va yangi koordinatalarni aniqlaymiz. Keyin koordinatalarni aniqlab, biz yana psevdo-tasodifiy namuna indeksini aniqlaymiz, indeksga mos keladigan qiymatlarni qisman differentsial tenglamalarga almashtiramiz va koordinatalarni yangi usulda aniqlaymiz. ΠΈ va hokazo. konvergentsiya yashil rangga aylanmaguncha. Bir qarashda, bu umuman ishlamaydigandek tuyulishi mumkin, ammo shunday bo'ladi. To'g'ri, xato har qadamda kamaymasligini ta'kidlash joiz, ammo tendentsiya albatta mavjud.
Stokastik gradient tushishning an'anaviydan qanday afzalliklari bor? Agar bizning tanlama hajmimiz juda katta bo'lsa va o'n minglab qiymatlar bilan o'lchanadigan bo'lsa, unda butun namunani emas, balki, aytaylik, tasodifiy mingtasini qayta ishlash ancha oson bo'ladi. Bu erda stokastik gradient tushishi o'ynaydi. Bizning holatlarimizda, albatta, biz katta farqni sezmaymiz.
Keling, kodni ko'rib chiqaylik.
Stokastik gradient tushish uchun kod
# ΠΎΠΏΡΠ΅Π΄Π΅Π»ΠΈΠΌ ΡΡΠ½ΠΊΡΠΈΡ ΡΡΠΎΡ
.Π³ΡΠ°Π΄.ΡΠ°Π³Π°
def stoch_grad_step_usual(vector_init, x_us, ind, y_us, l):
# Π²ΡΠ±ΠΈΡΠ°Π΅ΠΌ Π·Π½Π°ΡΠ΅Π½ΠΈΠ΅ ΠΈΠΊΡ, ΠΊΠΎΡΠΎΡΠΎΠ΅ ΡΠΎΠΎΡΠ²Π΅ΡΡΡΠ²ΡΠ΅Ρ ΡΠ»ΡΡΠ°ΠΉΠ½ΠΎΠΌΡ Π·Π½Π°ΡΠ΅Π½ΠΈΡ ΠΏΠ°ΡΠ°ΠΌΠ΅ΡΡΠ° ind
# (ΡΠΌ.Ρ-ΡΠΈΡ stoch_grad_descent_usual)
x = x_us[ind]
# ΡΠ°ΡΡΡΠΈΡΡΠ²ΡΠ°Π΅ΠΌ Π·Π½Π°ΡΠ΅Π½ΠΈΠ΅ y (Π²ΡΡΡΡΠΊΡ), ΠΊΠΎΡΠΎΡΠ°Ρ ΡΠΎΠΎΡΠ²Π΅ΡΡΡΠ²ΡΠ΅Ρ Π²ΡΠ±ΡΠ°Π½Π½ΠΎΠΌΡ Π·Π½Π°ΡΠ΅Π½ΠΈΡ x
y_pred = vector_init[0] + vector_init[1]*x_us[ind]
# Π²ΡΡΠΈΡΠ»ΡΠ΅ΠΌ ΠΎΡΠΈΠ±ΠΊΡ ΡΠ°ΡΡΠ΅ΡΠ½ΠΎΠΉ Π²ΡΡΡΡΠΊΠΈ ΠΎΡΠ½ΠΎΡΠΈΡΠ΅Π»ΡΠ½ΠΎ ΠΏΡΠ΅Π΄ΡΡΠ°Π²Π»Π΅Π½Π½ΠΎΠΉ Π² Π²ΡΠ±ΠΎΡΠΊΠ΅
error = y_pred - y_us[ind]
# ΠΎΠΏΡΠ΅Π΄Π΅Π»ΡΠ΅ΠΌ ΠΏΠ΅ΡΠ²ΡΡ ΠΊΠΎΠΎΡΠ΄ΠΈΠ½Π°ΡΡ Π³ΡΠ°Π΄ΠΈΠ΅Π½ΡΠ° ab
grad_a = error
# ΠΎΠΏΡΠ΅Π΄Π΅Π»ΡΠ΅ΠΌ Π²ΡΠΎΡΡΡ ΠΊΠΎΠΎΡΠ΄ΠΈΠ½Π°ΡΡ ab
grad_b = x_us[ind]*error
# Π²ΡΡΠΈΡΠ»ΡΠ΅ΠΌ Π½ΠΎΠ²ΡΠΉ Π²Π΅ΠΊΡΠΎΡ ΠΊΠΎΡΡΡΠΈΡΠΈΠ΅Π½ΡΠΎΠ²
vector_new = [vector_init[0]-l*grad_a, vector_init[1]-l*grad_b]
return vector_new
# ΠΎΠΏΡΠ΅Π΄Π΅Π»ΠΈΠΌ ΡΡΠ½ΠΊΡΠΈΡ ΡΡΠΎΡ
.Π³ΡΠ°Π΄.ΡΠΏΡΡΠΊΠ°
def stoch_grad_descent_usual(x_us, y_us, l=0.1, steps = 800):
# Π΄Π»Ρ ΡΠ°ΠΌΠΎΠ³ΠΎ Π½Π°ΡΠ°Π»Π° ΡΠ°Π±ΠΎΡΡ ΡΡΠ½ΠΊΡΠΈΠΈ Π·Π°Π΄Π°Π΄ΠΈΠΌ Π½Π°ΡΠ°Π»ΡΠ½ΡΠ΅ Π·Π½Π°ΡΠ΅Π½ΠΈΡ ΠΊΠΎΡΡΡΠΈΡΠΈΠ΅Π½ΡΠΎΠ²
vector_init = [float(random.uniform(-0.5, 0.5)), float(random.uniform(-0.5, 0.5))]
errors = []
# Π·Π°ΠΏΡΡΡΠΈΠΌ ΡΠΈΠΊΠ» ΡΠΏΡΡΠΊΠ°
# ΡΠΈΠΊΠ» ΡΠ°ΡΡΠΈΡΠ°Π½ Π½Π° ΠΎΠΏΡΠ΅Π΄Π΅Π»Π΅Π½Π½ΠΎΠ΅ ΠΊΠΎΠ»ΠΈΡΠ΅ΡΡΠ²ΠΎ ΡΠ°Π³ΠΎΠ² (steps)
for i in range(steps):
ind = random.choice(range(len(x_us)))
new_vector = stoch_grad_step_usual(vector_init, x_us, ind, y_us, l)
vector_init = new_vector
errors.append(errors_sq_Kramer_method(vector_init,x_us,y_us))
return (vector_init),(errors)
# Π·Π°ΠΏΠΈΡΠ΅ΠΌ ΠΌΠ°ΡΡΠΈΠ² Π·Π½Π°ΡΠ΅Π½ΠΈΠΉ
list_parametres_stoch_gradient_descence = stoch_grad_descent_usual(x_us, y_us, l=0.1, steps = 800)
print ' 33[1m' + ' 33[4m' + "ΠΠ½Π°ΡΠ΅Π½ΠΈΡ ΠΊΠΎΡΡΡΠΈΡΠΈΠ΅Π½ΡΠΎΠ² a ΠΈ b:" + ' 33[0m'
print 'a =', round(list_parametres_stoch_gradient_descence[0][0],3)
print 'b =', round(list_parametres_stoch_gradient_descence[0][1],3)
print
print ' 33[1m' + ' 33[4m' + "Π‘ΡΠΌΠΌΠ° ΠΊΠ²Π°Π΄ΡΠ°ΡΠΎΠ² ΠΎΡΠΊΠ»ΠΎΠ½Π΅Π½ΠΈΠΉ:" + ' 33[0m'
print round(list_parametres_stoch_gradient_descence[1][-1],3)
print
print ' 33[1m' + ' 33[4m' + "ΠΠΎΠ»ΠΈΡΠ΅ΡΡΠ²ΠΎ ΠΈΡΠ΅ΡΠ°ΡΠΈΠΉ Π² ΡΡΠΎΡ
Π°ΡΡΠΈΡΠ΅ΡΠΊΠΎΠΌ Π³ΡΠ°Π΄ΠΈΠ΅Π½ΡΠ½ΠΎΠΌ ΡΠΏΡΡΠΊΠ΅:" + ' 33[0m'
print len(list_parametres_stoch_gradient_descence[1])
Biz koeffitsientlarga diqqat bilan qaraymiz va "Bu qanday bo'lishi mumkin?" Degan savolni qo'lga kiritamiz. Biz boshqa koeffitsient qiymatlarini oldik ΠΈ . Balki stoxastik gradient tushishi tenglama uchun maqbulroq parametrlarni topgandir? Afsuski yo `q. Kvadrat og'ishlar yig'indisiga qarash va koeffitsientlarning yangi qiymatlari bilan xato kattaroq ekanligini ko'rish kifoya. Biz umidsizlikka shoshilmaymiz. Keling, xato o'zgarishining grafigini tuzamiz.
Stokastik gradient tushishdagi kvadratik og'ishlar yig'indisini chizish uchun kod
print 'ΠΡΠ°ΡΠΈΠΊ β5 "Π‘ΡΠΌΠΌΠ° ΠΊΠ²Π°Π΄ΡΠ°ΡΠΎΠ² ΠΎΡΠΊΠ»ΠΎΠ½Π΅Π½ΠΈΠΉ ΠΏΠΎ-ΡΠ°Π³ΠΎΠ²ΠΎ"'
plt.plot(range(len(list_parametres_stoch_gradient_descence[1])), list_parametres_stoch_gradient_descence[1], color='red', lw=2)
plt.xlabel('Steps (Iteration)', size=16)
plt.ylabel('Sum of squared deviations', size=16)
plt.show()
Grafik β 5 Β«Stokastik gradient tushish paytida kvadrat og'ishlar yig'indisiΒ»
Jadvalga qarasak, hamma narsa joyiga tushadi va endi biz hamma narsani tuzatamiz.
Xo'sh, nima bo'ldi? Quyidagi voqea sodir bo'ldi. Bir oyni tasodifiy tanlaganimizda, bizning algoritmimiz tanlangan oy uchun daromadni hisoblashda xatolikni kamaytirishga harakat qiladi. Keyin biz yana bir oyni tanlaymiz va hisobni takrorlaymiz, lekin biz ikkinchi tanlangan oy uchun xatoni kamaytiramiz. Endi birinchi ikki oy oddiy chiziqli regressiya tenglamasi chizig'idan sezilarli darajada og'ishini unutmang. Bu shuni anglatadiki, ushbu ikki oydan birortasi tanlanganda, ularning har birining xatosini kamaytirish orqali bizning algoritmimiz butun namunadagi xatoni jiddiy ravishda oshiradi. Xo'sh, nima qilish kerak? Javob oddiy: siz tushish bosqichini kamaytirishingiz kerak. Axir, tushish qadamini kamaytirish orqali, xato ham yuqoriga va pastga "sakrash" ni to'xtatadi. To'g'rirog'i, "sakrash" xatosi to'xtamaydi, lekin u buni juda tez qilmaydi :) Keling, tekshiramiz.
SGDni kichikroq qadamlar bilan ishga tushirish uchun kod
# Π·Π°ΠΏΡΡΡΠΈΠΌ ΡΡΠ½ΠΊΡΠΈΡ, ΡΠΌΠ΅Π½ΡΡΠΈΠ² ΡΠ°Π³ Π² 100 ΡΠ°Π· ΠΈ ΡΠ²Π΅Π»ΠΈΡΠΈΠ² ΠΊΠΎΠ»ΠΈΡΠ΅ΡΡΠ²ΠΎ ΡΠ°Π³ΠΎΠ² ΡΠΎΠΎΡΠ²Π΅ΡΡΠ²ΡΡΡΠ΅
list_parametres_stoch_gradient_descence = stoch_grad_descent_usual(x_us, y_us, l=0.001, steps = 80000)
print ' 33[1m' + ' 33[4m' + "ΠΠ½Π°ΡΠ΅Π½ΠΈΡ ΠΊΠΎΡΡΡΠΈΡΠΈΠ΅Π½ΡΠΎΠ² a ΠΈ b:" + ' 33[0m'
print 'a =', round(list_parametres_stoch_gradient_descence[0][0],3)
print 'b =', round(list_parametres_stoch_gradient_descence[0][1],3)
print
print ' 33[1m' + ' 33[4m' + "Π‘ΡΠΌΠΌΠ° ΠΊΠ²Π°Π΄ΡΠ°ΡΠΎΠ² ΠΎΡΠΊΠ»ΠΎΠ½Π΅Π½ΠΈΠΉ:" + ' 33[0m'
print round(list_parametres_stoch_gradient_descence[1][-1],3)
print
print ' 33[1m' + ' 33[4m' + "ΠΠΎΠ»ΠΈΡΠ΅ΡΡΠ²ΠΎ ΠΈΡΠ΅ΡΠ°ΡΠΈΠΉ Π² ΡΡΠΎΡ
Π°ΡΡΠΈΡΠ΅ΡΠΊΠΎΠΌ Π³ΡΠ°Π΄ΠΈΠ΅Π½ΡΠ½ΠΎΠΌ ΡΠΏΡΡΠΊΠ΅:" + ' 33[0m'
print len(list_parametres_stoch_gradient_descence[1])
print 'ΠΡΠ°ΡΠΈΠΊ β6 "Π‘ΡΠΌΠΌΠ° ΠΊΠ²Π°Π΄ΡΠ°ΡΠΎΠ² ΠΎΡΠΊΠ»ΠΎΠ½Π΅Π½ΠΈΠΉ ΠΏΠΎ-ΡΠ°Π³ΠΎΠ²ΠΎ"'
plt.plot(range(len(list_parametres_stoch_gradient_descence[1])), list_parametres_stoch_gradient_descence[1], color='red', lw=2)
plt.xlabel('Steps (Iteration)', size=16)
plt.ylabel('Sum of squared deviations', size=16)
plt.show()
Grafik β 6 βStokastik gradient tushishi (80 ming qadam) paytida kvadrat og'ishlar yig'indisiβ
Koeffitsientlar yaxshilandi, lekin hali ham ideal emas. Gipotetik jihatdan, buni shu tarzda tuzatish mumkin. Biz, masalan, oxirgi 1000 iteratsiyada minimal xatolikka yo'l qo'yilgan koeffitsientlarning qiymatlarini tanlaymiz. To'g'ri, buning uchun biz koeffitsientlarning qiymatlarini ham yozishimiz kerak. Biz buni qilmaymiz, aksincha, jadvalga e'tibor beramiz. Bu silliq ko'rinadi va xatolik teng ravishda kamaygan ko'rinadi. Aslida bu haqiqat emas. Keling, birinchi 1000 ta takrorlashni ko'rib chiqaylik va ularni oxirgisi bilan taqqoslaylik.
SGD diagrammasi uchun kod (birinchi 1000 qadam)
print 'ΠΡΠ°ΡΠΈΠΊ β7 "Π‘ΡΠΌΠΌΠ° ΠΊΠ²Π°Π΄ΡΠ°ΡΠΎΠ² ΠΎΡΠΊΠ»ΠΎΠ½Π΅Π½ΠΈΠΉ ΠΏΠΎ-ΡΠ°Π³ΠΎΠ²ΠΎ. ΠΠ΅ΡΠ²ΡΠ΅ 1000 ΠΈΡΠ΅ΡΠ°ΡΠΈΠΉ"'
plt.plot(range(len(list_parametres_stoch_gradient_descence[1][:1000])),
list_parametres_stoch_gradient_descence[1][:1000], color='red', lw=2)
plt.xlabel('Steps (Iteration)', size=16)
plt.ylabel('Sum of squared deviations', size=16)
plt.show()
print 'ΠΡΠ°ΡΠΈΠΊ β7 "Π‘ΡΠΌΠΌΠ° ΠΊΠ²Π°Π΄ΡΠ°ΡΠΎΠ² ΠΎΡΠΊΠ»ΠΎΠ½Π΅Π½ΠΈΠΉ ΠΏΠΎ-ΡΠ°Π³ΠΎΠ²ΠΎ. ΠΠΎΡΠ»Π΅Π΄Π½ΠΈΠ΅ 1000 ΠΈΡΠ΅ΡΠ°ΡΠΈΠΉ"'
plt.plot(range(len(list_parametres_stoch_gradient_descence[1][-1000:])),
list_parametres_stoch_gradient_descence[1][-1000:], color='red', lw=2)
plt.xlabel('Steps (Iteration)', size=16)
plt.ylabel('Sum of squared deviations', size=16)
plt.show()
Grafik β 7 "SGD kvadrat og'ishlar yig'indisi (birinchi 1000 qadam)"
Grafik β 8 "SGD kvadrat og'ishlar yig'indisi (oxirgi 1000 qadam)"
Tushishning boshida biz xatolikning bir xil va keskin kamayishini kuzatamiz. Oxirgi iteratsiyalarda biz xato 1,475 qiymati atrofida va ba'zi daqiqalarda bu optimal qiymatga teng ekanligini ko'ramiz, lekin keyin u hali ham ko'tariladi ... Takrorlayman, siz qiymatlarni yozib olishingiz mumkin. koeffitsientlar ΠΈ , va keyin xato minimal bo'lganlarni tanlang. Biroq, bizda jiddiyroq muammo bor edi: qiymatlarni optimalga yaqinlashtirish uchun 80 ming qadamni (kodga qarang) bajarishimiz kerak edi. Va bu allaqachon gradient tushishga nisbatan stokastik gradient tushishi bilan hisoblash vaqtini tejash g'oyasiga zid keladi. Nimani tuzatish va yaxshilash mumkin? Birinchi iteratsiyalarda biz ishonch bilan pastga tushayotganimizni payqash qiyin emas va shuning uchun biz birinchi iteratsiyalarda katta qadam qoldirib, oldinga siljishda qadamni kamaytirishimiz kerak. Biz buni ushbu maqolada qilmaymiz - bu juda uzoq. Xohlaganlar buni qanday qilishni o'zlari o'ylab ko'rishlari mumkin, bu qiyin emas :)
Endi kutubxona yordamida stokastik gradient tushishini amalga oshiramiz numpy (va biz ilgari aniqlagan toshlarga qoqilmaylik)
Stokastik gradient tushish kodi (NumPy)
# Π΄Π»Ρ Π½Π°ΡΠ°Π»Π° Π½Π°ΠΏΠΈΡΠ΅ΠΌ ΡΡΠ½ΠΊΡΠΈΡ Π³ΡΠ°Π΄ΠΈΠ΅Π½ΡΠ½ΠΎΠ³ΠΎ ΡΠ°Π³Π°
def stoch_grad_step_numpy(vector_init, X, ind, y, l):
x = X[ind]
y_pred = np.dot(x,vector_init)
err = y_pred - y[ind]
grad_a = err
grad_b = x[1]*err
return vector_init - l*np.array([grad_a, grad_b])
# ΠΎΠΏΡΠ΅Π΄Π΅Π»ΠΈΠΌ ΡΡΠ½ΠΊΡΠΈΡ ΡΡΠΎΡ
Π°ΡΡΠΈΡΠ΅ΡΠΊΠΎΠ³ΠΎ Π³ΡΠ°Π΄ΠΈΠ΅Π½ΡΠ½ΠΎΠ³ΠΎ ΡΠΏΡΡΠΊΠ°
def stoch_grad_descent_numpy(X, y, l=0.1, steps = 800):
vector_init = np.array([[np.random.randint(X.shape[0])], [np.random.randint(X.shape[0])]])
errors = []
for i in range(steps):
ind = np.random.randint(X.shape[0])
new_vector = stoch_grad_step_numpy(vector_init, X, ind, y, l)
vector_init = new_vector
errors.append(error_square_numpy(vector_init,X,y))
return (vector_init), (errors)
# Π·Π°ΠΏΠΈΡΠ΅ΠΌ ΠΌΠ°ΡΡΠΈΠ² Π·Π½Π°ΡΠ΅Π½ΠΈΠΉ
list_parametres_stoch_gradient_descence = stoch_grad_descent_numpy(x_np, y_np, l=0.001, steps = 80000)
print ' 33[1m' + ' 33[4m' + "ΠΠ½Π°ΡΠ΅Π½ΠΈΡ ΠΊΠΎΡΡΡΠΈΡΠΈΠ΅Π½ΡΠΎΠ² a ΠΈ b:" + ' 33[0m'
print 'a =', round(list_parametres_stoch_gradient_descence[0][0],3)
print 'b =', round(list_parametres_stoch_gradient_descence[0][1],3)
print
print ' 33[1m' + ' 33[4m' + "Π‘ΡΠΌΠΌΠ° ΠΊΠ²Π°Π΄ΡΠ°ΡΠΎΠ² ΠΎΡΠΊΠ»ΠΎΠ½Π΅Π½ΠΈΠΉ:" + ' 33[0m'
print round(list_parametres_stoch_gradient_descence[1][-1],3)
print
print ' 33[1m' + ' 33[4m' + "ΠΠΎΠ»ΠΈΡΠ΅ΡΡΠ²ΠΎ ΠΈΡΠ΅ΡΠ°ΡΠΈΠΉ Π² ΡΡΠΎΡ
Π°ΡΡΠΈΡΠ΅ΡΠΊΠΎΠΌ Π³ΡΠ°Π΄ΠΈΠ΅Π½ΡΠ½ΠΎΠΌ ΡΠΏΡΡΠΊΠ΅:" + ' 33[0m'
print len(list_parametres_stoch_gradient_descence[1])
print
Qadriyatlar foydalanmasdan tushganda deyarli bir xil bo'lib chiqdi numpy. Biroq, bu mantiqiy.
Keling, stokastik gradient tushishi bizni qancha vaqt olganini bilib olaylik.
SGD hisoblash vaqtini aniqlash uchun kod (80 ming qadam)
print ' 33[1m' + ' 33[4m' +
"ΠΡΠ΅ΠΌΡ Π²ΡΠΏΠΎΠ»Π½Π΅Π½ΠΈΡ ΡΡΠΎΡ
Π°ΡΡΠΈΡΠ΅ΡΠΊΠΎΠ³ΠΎ Π³ΡΠ°Π΄ΠΈΠ΅Π½ΡΠ½ΠΎΠ³ΠΎ ΡΠΏΡΡΠΊΠ° Π±Π΅Π· ΠΈΡΠΏΠΎΠ»ΡΠ·ΠΎΠ²Π°Π½ΠΈΡ Π±ΠΈΠ±Π»ΠΈΠΎΡΠ΅ΠΊΠΈ NumPy:"
+ ' 33[0m'
%timeit list_parametres_stoch_gradient_descence = stoch_grad_descent_usual(x_us, y_us, l=0.001, steps = 80000)
print '***************************************'
print
print ' 33[1m' + ' 33[4m' +
"ΠΡΠ΅ΠΌΡ Π²ΡΠΏΠΎΠ»Π½Π΅Π½ΠΈΡ ΡΡΠΎΡ
Π°ΡΡΠΈΡΠ΅ΡΠΊΠΎΠ³ΠΎ Π³ΡΠ°Π΄ΠΈΠ΅Π½ΡΠ½ΠΎΠ³ΠΎ ΡΠΏΡΡΠΊΠ° Ρ ΠΈΡΠΏΠΎΠ»ΡΠ·ΠΎΠ²Π°Π½ΠΈΠ΅ΠΌ Π±ΠΈΠ±Π»ΠΈΠΎΡΠ΅ΠΊΠΈ NumPy:"
+ ' 33[0m'
%timeit list_parametres_stoch_gradient_descence = stoch_grad_descent_numpy(x_np, y_np, l=0.001, steps = 80000)
O'rmonga qanchalik uzoq bo'lsa, bulutlar qorong'i bo'ladi: yana "o'z-o'zidan yozilgan" formula eng yaxshi natijani ko'rsatadi. Bularning barchasi kutubxonadan foydalanishning yanada nozik usullari bo'lishi kerakligini ko'rsatadi numpy, bu haqiqatan ham hisoblash operatsiyalarini tezlashtiradi. Ushbu maqolada biz ular haqida bilib olmaymiz. Bo'sh vaqtingizda o'ylaydigan narsa bo'ladi :)
Xulosa qilaylik
Xulosa qilishdan oldin, men aziz o'quvchimizdan kelib chiqqan savolga javob bermoqchiman. Nega, aslida, bunday "qiynoqlar" tushish bilan, agar qo'limizda shunday kuchli va oddiy qurilma bo'lsa, nega biz qimmatbaho pasttekislikni topish uchun tog'dan pastga va tog'dan (asosan pastga) yurishimiz kerak. Bizni bir zumda to'g'ri joyga teleportatsiya qiladigan analitik yechim shakli?
Bu savolga javob sirtda yotadi. Endi biz juda oddiy misolni ko'rib chiqdik, unda to'g'ri javob bor bir belgiga bog'liq . Siz buni hayotda tez-tez uchratmaysiz, shuning uchun bizda 2, 30, 50 yoki undan ortiq belgilar borligini tasavvur qilaylik. Keling, har bir atribut uchun minglab, hatto o'n minglab qiymatlarni qo'shamiz. Bunday holda, analitik yechim sinovga bardosh bermasligi va muvaffaqiyatsiz bo'lishi mumkin. O'z navbatida, gradient tushishi va uning o'zgarishi asta-sekin, lekin shubhasiz bizni maqsadga yaqinlashtiradi - funktsiyaning minimal darajasi. Va tezlik haqida tashvishlanmang - biz qadam uzunligini (ya'ni tezlikni) o'rnatish va tartibga solish imkonini beradigan usullarni ko'rib chiqamiz.
Va endi haqiqiy qisqacha xulosa.
Birinchidan, umid qilamanki, maqolada keltirilgan material boshlang'ich "ma'lumotlar olimlariga" oddiy (va nafaqat) chiziqli regressiya tenglamalarini qanday echishni tushunishda yordam beradi.
Ikkinchidan, biz tenglamani yechishning bir necha usullarini ko'rib chiqdik. Endi vaziyatga qarab, biz muammoni hal qilish uchun eng mos keladiganini tanlashimiz mumkin.
Uchinchidan, biz qo'shimcha sozlamalarning kuchini, ya'ni gradient tushish qadamining uzunligini ko'rdik. Ushbu parametrni e'tiborsiz qoldirib bo'lmaydi. Yuqorida ta'kidlab o'tilganidek, hisob-kitoblarning narxini pasaytirish uchun, tushish vaqtida qadam uzunligini o'zgartirish kerak.
To'rtinchidan, bizning holatlarimizda "uyda yozilgan" funktsiyalar hisob-kitoblar uchun eng yaxshi vaqt natijalarini ko'rsatdi. Bu, ehtimol, kutubxonaning imkoniyatlaridan eng professional tarzda foydalanilmaganligi bilan bog'liq numpy. Qanday bo'lmasin, quyidagi xulosa o'zini oqlaydi. Bir tomondan, ba'zida o'rnatilgan fikrlarni shubha ostiga qo'yishga arziydi, boshqa tomondan, har doim ham hamma narsani murakkablashtirishga arzimaydi - aksincha, ba'zida muammoni hal qilishning oddiy usuli samaraliroq bo'ladi. Va bizning maqsadimiz oddiy chiziqli regressiya tenglamasini echishning uchta yondashuvini tahlil qilish bo'lganligi sababli, biz uchun "o'z-o'zidan yozilgan" funktsiyalardan foydalanish etarli edi.
Adabiyot (yoki shunga o'xshash narsa)
1. Chiziqli regressiya
2. Eng kichik kvadratlar usuli
3. Hosil
4. Gradient
5. Gradientning tushishi
6. NumPy kutubxonasi