Solving a Simple Linear Regression Equation

The article discusses several ways to determine the mathematical equation of a simple (pair) regression line.

All methods of solving the equation considered here are based on the least squares method. We denote the methods as follows:

  • Analytical solution
  • gradient descent
  • Stochastic Gradient Descent

For each of the ways to solve the equation of a straight line, the article presents various functions, which are mainly divided into those that are written without using the library NumPy and those that are used for calculations NumPy. It is believed that the skillful use NumPy will reduce the cost of computing.

All code in this article is written in python-2.7 using Jupyter Notebook. The source code and the sample data file are available at Github

The article is more focused on both beginners and those who have already gradually begun to master the study of a very extensive section in artificial intelligence - machine learning.

Let's use a very simple example to illustrate the material.

Example Conditions

We have five values ​​that characterize addiction Y from X (Table #1):

Table No. 1 "Conditions of the example"

Solving a Simple Linear Regression Equation

We will assume that the values Solving a Simple Linear Regression Equation is the month of the year, and Solving a Simple Linear Regression Equation - earnings this month. In other words, revenue depends on the month of the year, and Solving a Simple Linear Regression Equation - the only sign on which revenue depends.

The example is so-so, both in terms of the conditional dependence of revenue on the month of the year, and in terms of the number of values ​​- there are very few of them. However, such a simplification will allow, as they say on the fingers, to explain, not always with ease, the material assimilated by beginners. And also the simplicity of the numbers will allow those who wish to solve the example on β€œpaper” without significant labor costs.

Suppose that the dependence given in the example can be approximated quite well by the mathematical equation of the line of a simple (pair) regression of the form:

Solving a Simple Linear Regression Equation

where Solving a Simple Linear Regression Equation is the month in which the proceeds were received, Solving a Simple Linear Regression Equation - revenue corresponding to the month, Solving a Simple Linear Regression Equation ΠΈ Solving a Simple Linear Regression Equation are the regression coefficients of the estimated line.

Note that the coefficient Solving a Simple Linear Regression Equation often referred to as the slope or gradient of the estimated line; is the amount by which the Solving a Simple Linear Regression Equation when it changes Solving a Simple Linear Regression Equation.

Obviously, our task in the example is to select such coefficients in the equation Solving a Simple Linear Regression Equation ΠΈ Solving a Simple Linear Regression Equation, at which the deviations of our estimated revenue values ​​by months from the true answers, i.e. values ​​presented in the sample will be minimal.

Least square method

According to the least squares method, the deviation should be calculated by squaring it. Such a technique allows avoiding the mutual repayment of deviations, if they have opposite signs. For example, if in one case, the deviation is +5 (plus five), and in the other -5 (minus five), then the sum of the deviations will be mutually canceled and will be 0 (zero). It is possible not to square the deviation, but to use the modulus property and then all the deviations will be positive and will accumulate. We will not dwell on this point in detail, but simply indicate that for the convenience of calculations, it is customary to square the deviation.

This is how the formula looks like, with the help of which we will determine the smallest sum of squared deviations (errors):

Solving a Simple Linear Regression Equation

where Solving a Simple Linear Regression Equation is a function of approximation of true answers (that is, the revenue calculated by us),

Solving a Simple Linear Regression Equation are the true answers (revenue provided in the sample),

Solving a Simple Linear Regression Equation is the index of the sample (the number of the month in which the deviation is determined)

Let's differentiate the function, define the partial differential equations, and be ready to move on to an analytical solution. But first, let's take a short digression about what differentiation is and recall the geometric meaning of the derivative.

Differentiation

Differentiation is the operation of finding the derivative of a function.

What is the derivative for? The derivative of a function characterizes the rate of change of the function and shows us its direction. If the derivative at a given point is positive, then the function is increasing; otherwise, the function is decreasing. And the greater the value of the modulo derivative, the higher the rate of change of the function values, as well as the steeper the slope of the function graph.

For example, under the conditions of a Cartesian coordinate system, the value of the derivative at the point M(0,0) is equal to +25 means that at a given point, when the value is shifted Solving a Simple Linear Regression Equation to the right by a conventional unit, value Solving a Simple Linear Regression Equation increases by 25 conventional units. On the graph, it looks like a fairly steep angle of rise in values Solving a Simple Linear Regression Equation from a given point.

Another example. The value of the derivative is -0,1 means that when shifting Solving a Simple Linear Regression Equation per one conventional unit, the value Solving a Simple Linear Regression Equation decreases by only 0,1 conventional unit. At the same time, on the graph of the function, we can observe a barely noticeable downward slope. Drawing an analogy with a mountain, it’s as if we are very slowly descending a gentle slope from a mountain, unlike the previous example, where we had to take very steep peaks :)

Thus, after differentiating the function Solving a Simple Linear Regression Equation by odds Solving a Simple Linear Regression Equation ΠΈ Solving a Simple Linear Regression Equation, we define the equations of partial derivatives of the 1st order. After defining the equations, we will get a system of two equations, solving which we can choose such values ​​of the coefficients Solving a Simple Linear Regression Equation ΠΈ Solving a Simple Linear Regression Equation, at which the values ​​of the corresponding derivatives at given points change by a very, very small amount, and in the case of an analytical solution do not change at all. In other words, the error function at the found coefficients will reach a minimum, since the values ​​of partial derivatives at these points will be equal to zero.

So, according to the rules of differentiation, the equation of the partial derivative of the 1st order with respect to the coefficient Solving a Simple Linear Regression Equation will take the form:

Solving a Simple Linear Regression Equation

1st order partial derivative equation with respect to Solving a Simple Linear Regression Equation will take the form:

Solving a Simple Linear Regression Equation

As a result, we got a system of equations that has a fairly simple analytical solution:

begin{equation*}
begin{cases}
na + bsumlimits_{i=1}^nx_i - sumlimits_{i=1}^ny_i = 0

sumlimits_{i=1}^nx_i(a +bsumlimits_{i=1}^nx_i - sumlimits_{i=1}^ny_i) = 0
end{cases}
end{equation*}

Before solving the equation, let's preload, check the correctness of loading and format the data.

Loading and formatting data

It should be noted that due to the fact that for the analytical solution, and in the future for gradient and stochastic gradient descent, we will use the code in two variations: using the library NumPy and without using it, then we need the appropriate data formatting (see code).

Data loading and processing code

# ΠΈΠΌΠΏΠΎΡ€Ρ‚ΠΈΡ€ΡƒΠ΅ΠΌ всС Π½ΡƒΠΆΠ½Ρ‹Π΅ Π½Π°ΠΌ Π±ΠΈΠ±Π»ΠΈΠΎΡ‚Π΅ΠΊΠΈ
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import math
import pylab as pl
import random

# Π³Ρ€Π°Ρ„ΠΈΠΊΠΈ ΠΎΡ‚ΠΎΠ±Ρ€Π°Π·ΠΈΠΌ Π² Jupyter
%matplotlib inline

# ΡƒΠΊΠ°ΠΆΠ΅ΠΌ Ρ€Π°Π·ΠΌΠ΅Ρ€ Π³Ρ€Π°Ρ„ΠΈΠΊΠΎΠ²
from pylab import rcParams
rcParams['figure.figsize'] = 12, 6

# ΠΎΡ‚ΠΊΠ»ΡŽΡ‡ΠΈΠΌ прСдупрСТдСния Anaconda
import warnings
warnings.simplefilter('ignore')

# Π·Π°Π³Ρ€ΡƒΠ·ΠΈΠΌ значСния
table_zero = pd.read_csv('data_example.txt', header=0, sep='t')

# посмотрим ΠΈΠ½Ρ„ΠΎΡ€ΠΌΠ°Ρ†ΠΈΡŽ ΠΎ Ρ‚Π°Π±Π»ΠΈΡ†Π΅ ΠΈ Π½Π° саму Ρ‚Π°Π±Π»ΠΈΡ†Ρƒ
print table_zero.info()
print '********************************************'
print table_zero
print '********************************************'

# ΠΏΠΎΠ΄Π³ΠΎΡ‚ΠΎΠ²ΠΈΠΌ Π΄Π°Π½Π½Ρ‹Π΅ Π±Π΅Π· использования NumPy

x_us = []
[x_us.append(float(i)) for i in table_zero['x']]
print x_us
print type(x_us)
print '********************************************'

y_us = []
[y_us.append(float(i)) for i in table_zero['y']]
print y_us
print type(y_us)
print '********************************************'

# ΠΏΠΎΠ΄Π³ΠΎΡ‚ΠΎΠ²ΠΈΠΌ Π΄Π°Π½Π½Ρ‹Π΅ с использованиСм NumPy

x_np = table_zero[['x']].values
print x_np
print type(x_np)
print x_np.shape
print '********************************************'

y_np = table_zero[['y']].values
print y_np
print type(y_np)
print y_np.shape
print '********************************************'

Visualization

Now, after we, firstly, loaded the data, secondly, checked the correctness of loading and finally formatted the data, we will carry out the first visualization. Often this method is used pairplot Library Seaborn. In our example, due to the limited numbers, it makes no sense to use the library Seaborn. We will use the usual library Matplotlib and look only at the scatterplot.

Scatterplot Code

print 'Π“Ρ€Π°Ρ„ΠΈΠΊ β„–1 "Π—Π°Π²ΠΈΡΠΈΠΌΠΎΡΡ‚ΡŒ Π²Ρ‹Ρ€ΡƒΡ‡ΠΊΠΈ ΠΎΡ‚ мСсяца Π³ΠΎΠ΄Π°"'

plt.plot(x_us,y_us,'o',color='green',markersize=16)
plt.xlabel('$Months$', size=16)
plt.ylabel('$Sales$', size=16)
plt.show()

Chart No. 1 "Dependence of revenue on the month of the year"

Solving a Simple Linear Regression Equation

Analytical solution

We use the most common tools in python and solve the system of equations:

begin{equation*}
begin{cases}
na + bsumlimits_{i=1}^nx_i - sumlimits_{i=1}^ny_i = 0

sumlimits_{i=1}^nx_i(a +bsumlimits_{i=1}^nx_i - sumlimits_{i=1}^ny_i) = 0
end{cases}
end{equation*}

According to Cramer's rule find a common determinant, as well as determinants by Solving a Simple Linear Regression Equation and Solving a Simple Linear Regression Equation, after which, dividing the determinant by Solving a Simple Linear Regression Equation to a common determinant - find the coefficient Solving a Simple Linear Regression Equation, similarly we find the coefficient Solving a Simple Linear Regression Equation.

Analytical solution code

# ΠΎΠΏΡ€Π΅Π΄Π΅Π»ΠΈΠΌ Ρ„ΡƒΠ½ΠΊΡ†ΠΈΡŽ для расчСта коэффициСнтов a ΠΈ b ΠΏΠΎ ΠΏΡ€Π°Π²ΠΈΠ»Ρƒ ΠšΡ€Π°ΠΌΠ΅Ρ€Π°
def Kramer_method (x,y):
        # сумма Π·Π½Π°Ρ‡Π΅Π½ΠΈΠΉ (всС мСсяца)
    sx = sum(x)
        # сумма истинных ΠΎΡ‚Π²Π΅Ρ‚ΠΎΠ² (Π²Ρ‹Ρ€ΡƒΡ‡ΠΊΠ° Π·Π° вСсь ΠΏΠ΅Ρ€ΠΈΠΎΠ΄)
    sy = sum(y)
        # сумма произвСдСния Π·Π½Π°Ρ‡Π΅Π½ΠΈΠΉ Π½Π° истинныС ΠΎΡ‚Π²Π΅Ρ‚Ρ‹
    list_xy = []
    [list_xy.append(x[i]*y[i]) for i in range(len(x))]
    sxy = sum(list_xy)
        # сумма ΠΊΠ²Π°Π΄Ρ€Π°Ρ‚ΠΎΠ² Π·Π½Π°Ρ‡Π΅Π½ΠΈΠΉ
    list_x_sq = []
    [list_x_sq.append(x[i]**2) for i in range(len(x))]
    sx_sq = sum(list_x_sq)
        # количСство Π·Π½Π°Ρ‡Π΅Π½ΠΈΠΉ
    n = len(x)
        # ΠΎΠ±Ρ‰ΠΈΠΉ ΠΎΠΏΡ€Π΅Π΄Π΅Π»ΠΈΡ‚Π΅Π»ΡŒ
    det = sx_sq*n - sx*sx
        # ΠΎΠΏΡ€Π΅Π΄Π΅Π»ΠΈΡ‚Π΅Π»ΡŒ ΠΏΠΎ a
    det_a = sx_sq*sy - sx*sxy
        # искомый ΠΏΠ°Ρ€Π°ΠΌΠ΅Ρ‚Ρ€ a
    a = (det_a / det)
        # ΠΎΠΏΡ€Π΅Π΄Π΅Π»ΠΈΡ‚Π΅Π»ΡŒ ΠΏΠΎ b
    det_b = sxy*n - sy*sx
        # искомый ΠΏΠ°Ρ€Π°ΠΌΠ΅Ρ‚Ρ€ b
    b = (det_b / det)
        # ΠΊΠΎΠ½Ρ‚Ρ€ΠΎΠ»ΡŒΠ½Ρ‹Π΅ значСния (ΠΏΡ€ΠΎΠΎΠ²Π΅Ρ€ΠΊΠ°)
    check1 = (n*b + a*sx - sy)
    check2 = (b*sx + a*sx_sq - sxy)
    return [round(a,4), round(b,4)]

# запустим Ρ„ΡƒΠ½ΠΊΡ†ΠΈΡŽ ΠΈ запишСм ΠΏΡ€Π°Π²ΠΈΠ»ΡŒΠ½Ρ‹Π΅ ΠΎΡ‚Π²Π΅Ρ‚Ρ‹
ab_us = Kramer_method(x_us,y_us)
a_us = ab_us[0]
b_us = ab_us[1]
print ' 33[1m' + ' 33[4m' + "ΠžΠΏΡ‚ΠΈΠΌΠ°Π»ΡŒΠ½Ρ‹Π΅ значСния коэффициСнтов a ΠΈ b:"  + ' 33[0m' 
print 'a =', a_us
print 'b =', b_us
print

# ΠΎΠΏΡ€Π΅Π΄Π΅Π»ΠΈΠΌ Ρ„ΡƒΠ½ΠΊΡ†ΠΈΡŽ для подсчСта суммы ΠΊΠ²Π°Π΄Ρ€Π°Ρ‚ΠΎΠ² ошибок
def errors_sq_Kramer_method(answers,x,y):
    list_errors_sq = []
    for i in range(len(x)):
        err = (answers[0] + answers[1]*x[i] - y[i])**2
        list_errors_sq.append(err)
    return sum(list_errors_sq)

# запустим Ρ„ΡƒΠ½ΠΊΡ†ΠΈΡŽ ΠΈ запишСм Π·Π½Π°Ρ‡Π΅Π½ΠΈΠ΅ ошибки
error_sq = errors_sq_Kramer_method(ab_us,x_us,y_us)
print ' 33[1m' + ' 33[4m' + "Π‘ΡƒΠΌΠΌΠ° ΠΊΠ²Π°Π΄Ρ€Π°Ρ‚ΠΎΠ² ΠΎΡ‚ΠΊΠ»ΠΎΠ½Π΅Π½ΠΈΠΉ" + ' 33[0m'
print error_sq
print

# Π·Π°ΠΌΠ΅Ρ€ΠΈΠΌ врСмя расчСта
# print ' 33[1m' + ' 33[4m' + "ВрСмя выполнСния расчСта суммы ΠΊΠ²Π°Π΄Ρ€Π°Ρ‚ΠΎΠ² ΠΎΡ‚ΠΊΠ»ΠΎΠ½Π΅Π½ΠΈΠΉ:" + ' 33[0m'
# % timeit error_sq = errors_sq_Kramer_method(ab,x_us,y_us)

Here's what we got:

Solving a Simple Linear Regression Equation

So, the values ​​of the coefficients are found, the sum of the squared deviations is set. Let's draw a straight line on the scattering histogram in accordance with the found coefficients.

Regression line code

# ΠΎΠΏΡ€Π΅Π΄Π΅Π»ΠΈΠΌ Ρ„ΡƒΠ½ΠΊΡ†ΠΈΡŽ для формирования массива рассчСтных Π·Π½Π°Ρ‡Π΅Π½ΠΈΠΉ Π²Ρ‹Ρ€ΡƒΡ‡ΠΊΠΈ
def sales_count(ab,x,y):
    line_answers = []
    [line_answers.append(ab[0]+ab[1]*x[i]) for i in range(len(x))]
    return line_answers

# построим Π³Ρ€Π°Ρ„ΠΈΠΊΠΈ
print 'Π“Ρ€Ρ„ΠΈΠΊβ„–2 "ΠŸΡ€Π°Π²ΠΈΠ»ΡŒΠ½Ρ‹Π΅ ΠΈ расчСтныС ΠΎΡ‚Π²Π΅Ρ‚Ρ‹"'
plt.plot(x_us,y_us,'o',color='green',markersize=16, label = '$True$ $answers$')
plt.plot(x_us, sales_count(ab_us,x_us,y_us), color='red',lw=4,
         label='$Function: a + bx,$ $where$ $a='+str(round(ab_us[0],2))+',$ $b='+str(round(ab_us[1],2))+'$')
plt.xlabel('$Months$', size=16)
plt.ylabel('$Sales$', size=16)
plt.legend(loc=1, prop={'size': 16})
plt.show()

Chart No. 2 "Correct and calculated answers"

Solving a Simple Linear Regression Equation

You can look at the graph of deviations for each month. In our case, we will not derive any significant practical value from it, but we will satisfy curiosity in how well the simple linear regression equation characterizes the dependence of revenue on the month of the year.

Deviation chart code

# ΠΎΠΏΡ€Π΅Π΄Π΅Π»ΠΈΠΌ Ρ„ΡƒΠ½ΠΊΡ†ΠΈΡŽ для формирования массива ΠΎΡ‚ΠΊΠ»ΠΎΠ½Π΅Π½ΠΈΠΉ Π² ΠΏΡ€ΠΎΡ†Π΅Π½Ρ‚Π°Ρ…
def error_per_month(ab,x,y):
    sales_c = sales_count(ab,x,y)
    errors_percent = []
    for i in range(len(x)):
        errors_percent.append(100*(sales_c[i]-y[i])/y[i])
    return errors_percent

# построим Π³Ρ€Π°Ρ„ΠΈΠΊ
print 'Π“Ρ€Π°Ρ„ΠΈΠΊβ„–3 "ΠžΡ‚ΠΊΠ»ΠΎΠ½Π΅Π½ΠΈΡ ΠΏΠΎ-мСсячно, %"'
plt.gca().bar(x_us, error_per_month(ab_us,x_us,y_us), color='brown')
plt.xlabel('Months', size=16)
plt.ylabel('Calculation error, %', size=16)
plt.show()

Chart No. 3 "Deviations,%"

Solving a Simple Linear Regression Equation

Not perfect, but we did our job.

Let's write a function that, to determine the coefficients Solving a Simple Linear Regression Equation ΠΈ Solving a Simple Linear Regression Equation uses the library NumPy, more precisely, we will write two functions: one using a pseudo-inverse matrix (not recommended in practice, since the process is computationally complex and unstable), the other using a matrix equation.

Analytic Solution Code (NumPy)

# для Π½Π°Ρ‡Π°Π»Π° Π΄ΠΎΠ±Π°Π²ΠΈΠΌ столбСц с Π½Π΅ ΠΈΠ·ΠΌΠ΅Π½ΡΡŽΡ‰ΠΈΠΌΡΡ Π·Π½Π°Ρ‡Π΅Π½ΠΈΠ΅ΠΌ Π² 1. 
# Π”Π°Π½Π½Ρ‹ΠΉ столбСц Π½ΡƒΠΆΠ΅Π½ для Ρ‚ΠΎΠ³ΠΎ, Ρ‡Ρ‚ΠΎΠ±Ρ‹ Π½Π΅ ΠΎΠ±Ρ€Π°Π±Π°Ρ‚Ρ‹Π²Π°Ρ‚ΡŒ ΠΎΡ‚Π΄Π΅Π»ΡŒΠ½ΠΎ коэффицСнт a
vector_1 = np.ones((x_np.shape[0],1))
x_np = table_zero[['x']].values # Π½Π° всякий случай ΠΏΡ€ΠΈΠ²Π΅Π΄Π΅ΠΌ Π² ΠΏΠ΅Ρ€Π²ΠΈΡ‡Π½Ρ‹ΠΉ Ρ„ΠΎΡ€ΠΌΠ°Ρ‚ Π²Π΅ΠΊΡ‚ΠΎΡ€ x_np
x_np = np.hstack((vector_1,x_np))

# ΠΏΡ€ΠΎΠ²Π΅Ρ€ΠΈΠΌ Ρ‚ΠΎ, Ρ‡Ρ‚ΠΎ всС сдСлали ΠΏΡ€Π°Π²ΠΈΠ»ΡŒΠ½ΠΎ
print vector_1[0:3]
print x_np[0:3]
print '***************************************'
print

# напишСм Ρ„ΡƒΠ½ΠΊΡ†ΠΈΡŽ, которая опрСдСляСт значСния коэффициСнтов a ΠΈ b с использованиСм псСвдообратной ΠΌΠ°Ρ‚Ρ€ΠΈΡ†Ρ‹
def pseudoinverse_matrix(X, y):
    # Π·Π°Π΄Π°Π΅ΠΌ явный Ρ„ΠΎΡ€ΠΌΠ°Ρ‚ ΠΌΠ°Ρ‚Ρ€ΠΈΡ†Ρ‹ ΠΏΡ€ΠΈΠ·Π½Π°ΠΊΠΎΠ²
    X = np.matrix(X)
    # опрСдСляСм Ρ‚Ρ€Π°Π½ΡΠΏΠΎΠ½ΠΈΡ€ΠΎΠ²Π°Π½Π½ΡƒΡŽ ΠΌΠ°Ρ‚Ρ€ΠΈΡ†Ρƒ
    XT = X.T
    # опрСдСляСм ΠΊΠ²Π°Π΄Ρ€Π°Ρ‚Π½ΡƒΡŽ ΠΌΠ°Ρ‚Ρ€ΠΈΡ†Ρƒ
    XTX = XT*X
    # опрСдСляСм ΠΏΡΠ΅Π²Π΄ΠΎΠΎΠ±Ρ€Π°Ρ‚Π½ΡƒΡŽ ΠΌΠ°Ρ‚Ρ€ΠΈΡ†Ρƒ
    inv = np.linalg.pinv(XTX)
    # Π·Π°Π΄Π°Π΅ΠΌ явный Ρ„ΠΎΡ€ΠΌΠ°Ρ‚ ΠΌΠ°Ρ‚Ρ€ΠΈΡ†Ρ‹ ΠΎΡ‚Π²Π΅Ρ‚ΠΎΠ²
    y = np.matrix(y)
    # Π½Π°Ρ…ΠΎΠ΄ΠΈΠΌ Π²Π΅ΠΊΡ‚ΠΎΡ€ вСсов
    return (inv*XT)*y

# запустим Ρ„ΡƒΠ½ΠΊΡ†ΠΈΡŽ
ab_np = pseudoinverse_matrix(x_np, y_np)
print ab_np
print '***************************************'
print

# напишСм Ρ„ΡƒΠ½ΠΊΡ†ΠΈΡŽ, которая ΠΈΡΠΏΠΎΠ»ΡŒΠ·ΡƒΠ΅Ρ‚ для Ρ€Π΅ΡˆΠ΅Π½ΠΈΡ ΠΌΠ°Ρ‚Ρ€ΠΈΡ‡Π½ΠΎΠ΅ ΡƒΡ€Π°Π²Π½Π΅Π½ΠΈΠ΅
def matrix_equation(X,y):
    a = np.dot(X.T, X)
    b = np.dot(X.T, y)
    return np.linalg.solve(a, b)

# запустим Ρ„ΡƒΠ½ΠΊΡ†ΠΈΡŽ
ab_np = matrix_equation(x_np,y_np)
print ab_np

Compare the time it took to determine the coefficients Solving a Simple Linear Regression Equation ΠΈ Solving a Simple Linear Regression Equation, according to the 3 methods presented.

Code for Calculation Time Calculation

print ' 33[1m' + ' 33[4m' + "ВрСмя выполнСния расчСта коэффициСнтов Π±Π΅Π· использования Π±ΠΈΠ±Π»ΠΈΠΎΡ‚Π΅ΠΊΠΈ NumPy:" + ' 33[0m'
% timeit ab_us = Kramer_method(x_us,y_us)
print '***************************************'
print
print ' 33[1m' + ' 33[4m' + "ВрСмя выполнСния расчСта коэффициСнтов с использованиСм псСвдообратной ΠΌΠ°Ρ‚Ρ€ΠΈΡ†Ρ‹:" + ' 33[0m'
%timeit ab_np = pseudoinverse_matrix(x_np, y_np)
print '***************************************'
print
print ' 33[1m' + ' 33[4m' + "ВрСмя выполнСния расчСта коэффициСнтов с использованиСм ΠΌΠ°Ρ‚Ρ€ΠΈΡ‡Π½ΠΎΠ³ΠΎ уравнСния:" + ' 33[0m'
%timeit ab_np = matrix_equation(x_np, y_np)

Solving a Simple Linear Regression Equation

On a small amount of data, a "self-written" function comes forward, which finds the coefficients using the Cramer method.

Now you can move on to other ways to find the coefficients Solving a Simple Linear Regression Equation ΠΈ Solving a Simple Linear Regression Equation.

gradient descent

First, let's define what a gradient is. In a simple way, the gradient is a segment that indicates the direction of maximum growth of the function. By analogy with climbing uphill, where the gradient looks, there is the steepest climb to the top of the mountain. Developing the mountain example, remember that we actually need the steepest descent in order to reach the low as quickly as possible, that is, the minimum - the place where the function does not increase and does not decrease. At this point, the derivative will be zero. Therefore, we do not need a gradient, but an anti-gradient. To find the antigradient, you just need to multiply the gradient by -1 (minus one).

Let us pay attention to the fact that a function can have several minima, and having descended to one of them according to the algorithm proposed below, we will not be able to find another minimum, which may be lower than the found one. Relax, we are not in danger! In our case, we are dealing with a single minimum, since our function Solving a Simple Linear Regression Equation on the graph is an ordinary parabola. And as we all should know very well from a school mathematics course, a parabola has only one minimum.

After we figured out why we needed a gradient, and also that the gradient is a segment, that is, a vector with given coordinates, which are just the same coefficients Solving a Simple Linear Regression Equation ΠΈ Solving a Simple Linear Regression Equation we can implement gradient descent.

Before starting, I suggest reading just a few sentences about the descent algorithm:

  • We determine pseudo-randomly the coordinates of the coefficients Solving a Simple Linear Regression Equation ΠΈ Solving a Simple Linear Regression Equation. In our example, we will define coefficients near zero. This is a common practice, but each case may have its own practice.
  • From coordinate Solving a Simple Linear Regression Equation subtract the value of the partial derivative of the 1st order at the point Solving a Simple Linear Regression Equation. So, if the derivative is positive, then the function is increasing. Therefore, subtracting the value of the derivative, we will move in the opposite direction of growth, that is, in the direction of descent. If the derivative is negative, then the function at this point is decreasing and subtracting the value of the derivative, we move towards the descent.
  • We carry out a similar operation with the coordinate Solving a Simple Linear Regression Equation: subtract the value of the partial derivative at the point Solving a Simple Linear Regression Equation.
  • In order not to jump over the minimum and not fly away into deep space, it is necessary to set the step size in the direction of descent. In general, one could write an entire article on how to set the step correctly and how to change it during descent in order to reduce the cost of calculations. But now we have a slightly different task, and we will establish the step size by the scientific method of "poke" or, as they say in the common people, empirically.
  • Once we're out of the given coordinates Solving a Simple Linear Regression Equation ΠΈ Solving a Simple Linear Regression Equation subtract the values ​​of the derivatives, we get new coordinates Solving a Simple Linear Regression Equation ΠΈ Solving a Simple Linear Regression Equation. We take the next step (subtraction), already from the calculated coordinates. And so the cycle starts again and again, until the required convergence is reached.

All! Now we are ready to go in search of the deepest gorge of the Mariana Trench. Let's get started.

Gradient Descent Code

# напишСм Ρ„ΡƒΠ½ΠΊΡ†ΠΈΡŽ Π³Ρ€Π°Π΄ΠΈΠ΅Π½Ρ‚Π½ΠΎΠ³ΠΎ спуска Π±Π΅Π· использования Π±ΠΈΠ±Π»ΠΈΠΎΡ‚Π΅ΠΊΠΈ NumPy. 
# Ѐункция Π½Π° Π²Ρ…ΠΎΠ΄ ΠΏΡ€ΠΈΠ½ΠΈΠΌΠ°Π΅Ρ‚ Π΄ΠΈΠ°ΠΏΠ°Π·ΠΎΠ½Ρ‹ Π·Π½Π°Ρ‡Π΅Π½ΠΈΠΉ x,y, Π΄Π»ΠΈΠ½Ρƒ шага (ΠΏΠΎ ΡƒΠΌΠΎΠ»Ρ‡Π°Π½ΠΈΡŽ=0,1), Π΄ΠΎΠΏΡƒΡΡ‚ΠΈΠΌΡƒΡŽ ΠΏΠΎΠ³Ρ€Π΅ΡˆΠ½ΠΎΡΡ‚ΡŒ(tolerance)
def gradient_descent_usual(x_us,y_us,l=0.1,tolerance=0.000000000001):
    # сумма Π·Π½Π°Ρ‡Π΅Π½ΠΈΠΉ (всС мСсяца)
    sx = sum(x_us)
    # сумма истинных ΠΎΡ‚Π²Π΅Ρ‚ΠΎΠ² (Π²Ρ‹Ρ€ΡƒΡ‡ΠΊΠ° Π·Π° вСсь ΠΏΠ΅Ρ€ΠΈΠΎΠ΄)
    sy = sum(y_us)
    # сумма произвСдСния Π·Π½Π°Ρ‡Π΅Π½ΠΈΠΉ Π½Π° истинныС ΠΎΡ‚Π²Π΅Ρ‚Ρ‹
    list_xy = []
    [list_xy.append(x_us[i]*y_us[i]) for i in range(len(x_us))]
    sxy = sum(list_xy)
    # сумма ΠΊΠ²Π°Π΄Ρ€Π°Ρ‚ΠΎΠ² Π·Π½Π°Ρ‡Π΅Π½ΠΈΠΉ
    list_x_sq = []
    [list_x_sq.append(x_us[i]**2) for i in range(len(x_us))]
    sx_sq = sum(list_x_sq)
    # количСство Π·Π½Π°Ρ‡Π΅Π½ΠΈΠΉ
    num = len(x_us)
    # Π½Π°Ρ‡Π°Π»ΡŒΠ½Ρ‹Π΅ значСния коэффициСнтов, ΠΎΠΏΡ€Π΅Π΄Π΅Π»Π΅Π½Π½Ρ‹Π΅ псСвдослучайным ΠΎΠ±Ρ€Π°Π·ΠΎΠΌ
    a = float(random.uniform(-0.5, 0.5))
    b = float(random.uniform(-0.5, 0.5))
    # создаСм массив с ошибками, для старта ΠΈΡΠΏΠΎΠ»ΡŒΠ·ΡƒΠ΅ΠΌ значСния 1 ΠΈ 0
    # послС Π·Π°Π²Π΅Ρ€ΡˆΠ΅Π½ΠΈΡ спуска стартовыС значСния ΡƒΠ΄Π°Π»ΠΈΠΌ
    errors = [1,0]
    # запускаСм Ρ†ΠΈΠΊΠ» спуска
    # Ρ†ΠΈΠΊΠ» Ρ€Π°Π±ΠΎΡ‚Π°Π΅Ρ‚ Π΄ΠΎ Ρ‚Π΅Ρ… ΠΏΠΎΡ€, ΠΏΠΎΠΊΠ° ΠΎΡ‚ΠΊΠ»ΠΎΠ½Π΅Π½ΠΈΠ΅ послСднСй ошибки суммы ΠΊΠ²Π°Π΄Ρ€Π°Ρ‚ΠΎΠ² ΠΎΡ‚ ΠΏΡ€Π΅Π΄Ρ‹Π΄ΡƒΡ‰Π΅ΠΉ, Π½Π΅ Π±ΡƒΠ΄Π΅Ρ‚ мСньшС tolerance
    while abs(errors[-1]-errors[-2]) > tolerance:
        a_step = a - l*(num*a + b*sx - sy)/num
        b_step = b - l*(a*sx + b*sx_sq - sxy)/num
        a = a_step
        b = b_step
        ab = [a,b]
        errors.append(errors_sq_Kramer_method(ab,x_us,y_us))
    return (ab),(errors[2:])

# запишСм массив Π·Π½Π°Ρ‡Π΅Π½ΠΈΠΉ 
list_parametres_gradient_descence = gradient_descent_usual(x_us,y_us,l=0.1,tolerance=0.000000000001)


print ' 33[1m' + ' 33[4m' + "ЗначСния коэффициСнтов a ΠΈ b:" + ' 33[0m'
print 'a =', round(list_parametres_gradient_descence[0][0],3)
print 'b =', round(list_parametres_gradient_descence[0][1],3)
print


print ' 33[1m' + ' 33[4m' + "Π‘ΡƒΠΌΠΌΠ° ΠΊΠ²Π°Π΄Ρ€Π°Ρ‚ΠΎΠ² ΠΎΡ‚ΠΊΠ»ΠΎΠ½Π΅Π½ΠΈΠΉ:" + ' 33[0m'
print round(list_parametres_gradient_descence[1][-1],3)
print



print ' 33[1m' + ' 33[4m' + "ΠšΠΎΠ»ΠΈΡ‡Π΅ΡΡ‚Π²ΠΎ ΠΈΡ‚Π΅Ρ€Π°Ρ†ΠΈΠΉ Π² Π³Ρ€Π°Π΄ΠΈΠ΅Π½Ρ‚Π½ΠΎΠΌ спускС:" + ' 33[0m'
print len(list_parametres_gradient_descence[1])
print

Solving a Simple Linear Regression Equation

We dived to the very bottom of the Mariana Trench and there we found all the same values ​​of the coefficients Solving a Simple Linear Regression Equation ΠΈ Solving a Simple Linear Regression Equationwhich is actually to be expected.

Let's make another dive, only this time, the stuffing of our deep-sea vehicle will be other technologies, namely the library NumPy.

Gradient Descent Code (NumPy)

# ΠΏΠ΅Ρ€Π΅Π΄ Ρ‚Π΅ΠΌ ΠΎΠΏΡ€Π΅Π΄Π΅Π»ΠΈΡ‚ΡŒ Ρ„ΡƒΠ½ΠΊΡ†ΠΈΡŽ для Π³Ρ€Π°Π΄ΠΈΠ΅Π½Ρ‚Π½ΠΎΠ³ΠΎ спуска с использованиСм Π±ΠΈΠ±Π»ΠΈΠΎΡ‚Π΅ΠΊΠΈ NumPy, 
# напишСм Ρ„ΡƒΠ½ΠΊΡ†ΠΈΡŽ опрСдСлСния суммы ΠΊΠ²Π°Π΄Ρ€Π°Ρ‚ΠΎΠ² ΠΎΡ‚ΠΊΠ»ΠΎΠ½Π΅Π½ΠΈΠΉ Ρ‚Π°ΠΊΠΆΠ΅ с использованиСм NumPy
def error_square_numpy(ab,x_np,y_np):
    y_pred = np.dot(x_np,ab)
    error = y_pred - y_np
    return sum((error)**2)

# напишСм Ρ„ΡƒΠ½ΠΊΡ†ΠΈΡŽ Π³Ρ€Π°Π΄ΠΈΠ΅Π½Ρ‚Π½ΠΎΠ³ΠΎ спуска с использованиСм Π±ΠΈΠ±Π»ΠΈΠΎΡ‚Π΅ΠΊΠΈ NumPy. 
# Ѐункция Π½Π° Π²Ρ…ΠΎΠ΄ ΠΏΡ€ΠΈΠ½ΠΈΠΌΠ°Π΅Ρ‚ Π΄ΠΈΠ°ΠΏΠ°Π·ΠΎΠ½Ρ‹ Π·Π½Π°Ρ‡Π΅Π½ΠΈΠΉ x,y, Π΄Π»ΠΈΠ½Ρƒ шага (ΠΏΠΎ ΡƒΠΌΠΎΠ»Ρ‡Π°Π½ΠΈΡŽ=0,1), Π΄ΠΎΠΏΡƒΡΡ‚ΠΈΠΌΡƒΡŽ ΠΏΠΎΠ³Ρ€Π΅ΡˆΠ½ΠΎΡΡ‚ΡŒ(tolerance)
def gradient_descent_numpy(x_np,y_np,l=0.1,tolerance=0.000000000001):
    # сумма Π·Π½Π°Ρ‡Π΅Π½ΠΈΠΉ (всС мСсяца)
    sx = float(sum(x_np[:,1]))
    # сумма истинных ΠΎΡ‚Π²Π΅Ρ‚ΠΎΠ² (Π²Ρ‹Ρ€ΡƒΡ‡ΠΊΠ° Π·Π° вСсь ΠΏΠ΅Ρ€ΠΈΠΎΠ΄)
    sy = float(sum(y_np))
    # сумма произвСдСния Π·Π½Π°Ρ‡Π΅Π½ΠΈΠΉ Π½Π° истинныС ΠΎΡ‚Π²Π΅Ρ‚Ρ‹
    sxy = x_np*y_np
    sxy = float(sum(sxy[:,1]))
    # сумма ΠΊΠ²Π°Π΄Ρ€Π°Ρ‚ΠΎΠ² Π·Π½Π°Ρ‡Π΅Π½ΠΈΠΉ
    sx_sq = float(sum(x_np[:,1]**2))
    # количСство Π·Π½Π°Ρ‡Π΅Π½ΠΈΠΉ
    num = float(x_np.shape[0])
    # Π½Π°Ρ‡Π°Π»ΡŒΠ½Ρ‹Π΅ значСния коэффициСнтов, ΠΎΠΏΡ€Π΅Π΄Π΅Π»Π΅Π½Π½Ρ‹Π΅ псСвдослучайным ΠΎΠ±Ρ€Π°Π·ΠΎΠΌ
    a = float(random.uniform(-0.5, 0.5))
    b = float(random.uniform(-0.5, 0.5))
    # создаСм массив с ошибками, для старта ΠΈΡΠΏΠΎΠ»ΡŒΠ·ΡƒΠ΅ΠΌ значСния 1 ΠΈ 0
    # послС Π·Π°Π²Π΅Ρ€ΡˆΠ΅Π½ΠΈΡ спуска стартовыС значСния ΡƒΠ΄Π°Π»ΠΈΠΌ
    errors = [1,0]
    # запускаСм Ρ†ΠΈΠΊΠ» спуска
    # Ρ†ΠΈΠΊΠ» Ρ€Π°Π±ΠΎΡ‚Π°Π΅Ρ‚ Π΄ΠΎ Ρ‚Π΅Ρ… ΠΏΠΎΡ€, ΠΏΠΎΠΊΠ° ΠΎΡ‚ΠΊΠ»ΠΎΠ½Π΅Π½ΠΈΠ΅ послСднСй ошибки суммы ΠΊΠ²Π°Π΄Ρ€Π°Ρ‚ΠΎΠ² ΠΎΡ‚ ΠΏΡ€Π΅Π΄Ρ‹Π΄ΡƒΡ‰Π΅ΠΉ, Π½Π΅ Π±ΡƒΠ΄Π΅Ρ‚ мСньшС tolerance
    while abs(errors[-1]-errors[-2]) > tolerance:
        a_step = a - l*(num*a + b*sx - sy)/num
        b_step = b - l*(a*sx + b*sx_sq - sxy)/num
        a = a_step
        b = b_step
        ab = np.array([[a],[b]])
        errors.append(error_square_numpy(ab,x_np,y_np))
    return (ab),(errors[2:])

# запишСм массив Π·Π½Π°Ρ‡Π΅Π½ΠΈΠΉ 
list_parametres_gradient_descence = gradient_descent_numpy(x_np,y_np,l=0.1,tolerance=0.000000000001)

print ' 33[1m' + ' 33[4m' + "ЗначСния коэффициСнтов a ΠΈ b:" + ' 33[0m'
print 'a =', round(list_parametres_gradient_descence[0][0],3)
print 'b =', round(list_parametres_gradient_descence[0][1],3)
print


print ' 33[1m' + ' 33[4m' + "Π‘ΡƒΠΌΠΌΠ° ΠΊΠ²Π°Π΄Ρ€Π°Ρ‚ΠΎΠ² ΠΎΡ‚ΠΊΠ»ΠΎΠ½Π΅Π½ΠΈΠΉ:" + ' 33[0m'
print round(list_parametres_gradient_descence[1][-1],3)
print

print ' 33[1m' + ' 33[4m' + "ΠšΠΎΠ»ΠΈΡ‡Π΅ΡΡ‚Π²ΠΎ ΠΈΡ‚Π΅Ρ€Π°Ρ†ΠΈΠΉ Π² Π³Ρ€Π°Π΄ΠΈΠ΅Π½Ρ‚Π½ΠΎΠΌ спускС:" + ' 33[0m'
print len(list_parametres_gradient_descence[1])
print

Solving a Simple Linear Regression Equation
Coefficient values Solving a Simple Linear Regression Equation ΠΈ Solving a Simple Linear Regression Equation are unchanged.

Let's look at how the error changed during gradient descent, that is, how the sum of the squared deviations changed with each step.

Code for Sum Squared Deviation Plot

print 'Π“Ρ€Π°Ρ„ΠΈΠΊβ„–4 "Π‘ΡƒΠΌΠΌΠ° ΠΊΠ²Π°Π΄Ρ€Π°Ρ‚ΠΎΠ² ΠΎΡ‚ΠΊΠ»ΠΎΠ½Π΅Π½ΠΈΠΉ ΠΏΠΎ-шагово"'
plt.plot(range(len(list_parametres_gradient_descence[1])), list_parametres_gradient_descence[1], color='red', lw=3)
plt.xlabel('Steps (Iteration)', size=16)
plt.ylabel('Sum of squared deviations', size=16)
plt.show()

Chart #4 "Sum of Squared Deviations in Gradient Descent"

Solving a Simple Linear Regression Equation

On the graph, we see that the error decreases with each step, and after a certain number of iterations, we observe an almost horizontal line.

Finally, let's evaluate the difference in code execution time:

Code for timing gradient descent computation

print ' 33[1m' + ' 33[4m' + "ВрСмя выполнСния Π³Ρ€Π°Π΄ΠΈΠ΅Π½Ρ‚Π½ΠΎΠ³ΠΎ спуска Π±Π΅Π· использования Π±ΠΈΠ±Π»ΠΈΠΎΡ‚Π΅ΠΊΠΈ NumPy:" + ' 33[0m'
%timeit list_parametres_gradient_descence = gradient_descent_usual(x_us,y_us,l=0.1,tolerance=0.000000000001)
print '***************************************'
print

print ' 33[1m' + ' 33[4m' + "ВрСмя выполнСния Π³Ρ€Π°Π΄ΠΈΠ΅Π½Ρ‚Π½ΠΎΠ³ΠΎ спуска с использованиСм Π±ΠΈΠ±Π»ΠΈΠΎΡ‚Π΅ΠΊΠΈ NumPy:" + ' 33[0m'
%timeit list_parametres_gradient_descence = gradient_descent_numpy(x_np,y_np,l=0.1,tolerance=0.000000000001)

Solving a Simple Linear Regression Equation

Perhaps we are doing something wrong, but again a simple "self-written" function that does not use the library NumPy ahead of the calculation time of the function using the library NumPy.

But we are not standing still, but are moving towards learning another exciting way to solve the simple linear regression equation. Meet!

Stochastic Gradient Descent

In order to quickly understand how stochastic gradient descent works, it is better to define its differences from conventional gradient descent. We, in the case of gradient descent, in the equations of derivatives of Solving a Simple Linear Regression Equation ΠΈ Solving a Simple Linear Regression Equation used the sum of the values ​​of all features and true answers available in the sample (that is, the sum of all Solving a Simple Linear Regression Equation ΠΈ Solving a Simple Linear Regression Equation). In stochastic gradient descent, we will not use all the values ​​in the sample, but instead, we will pseudo-randomly choose the so-called sample index and use its values.

For example, if the index is defined as number 3 (three), then we take the values Solving a Simple Linear Regression Equation ΠΈ Solving a Simple Linear Regression Equation, then we substitute the values ​​into the equations of derivatives and determine new coordinates. Then, having determined the coordinates, we again determine the sample index in a pseudo-random manner, substitute the values ​​\uXNUMXb\uXNUMXbcorresponding to the index into the partial differential equations, and determine the coordinates in a new way Solving a Simple Linear Regression Equation ΠΈ Solving a Simple Linear Regression Equation etc. until green convergence. At first glance, it may seem like it can work at all, but it works. True, it is worth noting that the error does not decrease with each step, but there is certainly a trend.

What are the advantages of stochastic gradient descent over conventional gradient descent? If our sample size is very large and is measured in tens of thousands of values, then it is much easier to process, say, a random thousand of them, than the entire sample. This is where stochastic gradient descent kicks in. In our case, of course, we will not notice a big difference.

Let's look at the code.

Code for Stochastic Gradient Descent

# ΠΎΠΏΡ€Π΅Π΄Π΅Π»ΠΈΠΌ Ρ„ΡƒΠ½ΠΊΡ†ΠΈΡŽ стох.Π³Ρ€Π°Π΄.шага
def stoch_grad_step_usual(vector_init, x_us, ind, y_us, l):
#     Π²Ρ‹Π±ΠΈΡ€Π°Π΅ΠΌ Π·Π½Π°Ρ‡Π΅Π½ΠΈΠ΅ икс, ΠΊΠΎΡ‚ΠΎΡ€ΠΎΠ΅ соотвСтствуСт случайному Π·Π½Π°Ρ‡Π΅Π½ΠΈΡŽ ΠΏΠ°Ρ€Π°ΠΌΠ΅Ρ‚Ρ€Π° ind 
# (см.Ρ„-Ρ†ΠΈΡŽ stoch_grad_descent_usual)
    x = x_us[ind]
#     рассчитывыаСм Π·Π½Π°Ρ‡Π΅Π½ΠΈΠ΅ y (Π²Ρ‹Ρ€ΡƒΡ‡ΠΊΡƒ), которая соотвСтствуСт Π²Ρ‹Π±Ρ€Π°Π½Π½ΠΎΠΌΡƒ Π·Π½Π°Ρ‡Π΅Π½ΠΈΡŽ x
    y_pred = vector_init[0] + vector_init[1]*x_us[ind]
#     вычисляСм ΠΎΡˆΠΈΠ±ΠΊΡƒ расчСтной Π²Ρ‹Ρ€ΡƒΡ‡ΠΊΠΈ ΠΎΡ‚Π½ΠΎΡΠΈΡ‚Π΅Π»ΡŒΠ½ΠΎ прСдставлСнной Π² Π²Ρ‹Π±ΠΎΡ€ΠΊΠ΅
    error = y_pred - y_us[ind]
#     опрСдСляСм ΠΏΠ΅Ρ€Π²ΡƒΡŽ ΠΊΠΎΠΎΡ€Π΄ΠΈΠ½Π°Ρ‚Ρƒ Π³Ρ€Π°Π΄ΠΈΠ΅Π½Ρ‚Π° ab
    grad_a = error
#     опрСдСляСм Π²Ρ‚ΠΎΡ€ΡƒΡŽ ΠΊΠΎΠΎΡ€Π΄ΠΈΠ½Π°Ρ‚Ρƒ ab
    grad_b = x_us[ind]*error
#     вычисляСм Π½ΠΎΠ²Ρ‹ΠΉ Π²Π΅ΠΊΡ‚ΠΎΡ€ коэффициСнтов
    vector_new = [vector_init[0]-l*grad_a, vector_init[1]-l*grad_b]
    return vector_new


# ΠΎΠΏΡ€Π΅Π΄Π΅Π»ΠΈΠΌ Ρ„ΡƒΠ½ΠΊΡ†ΠΈΡŽ стох.Π³Ρ€Π°Π΄.спуска
def stoch_grad_descent_usual(x_us, y_us, l=0.1, steps = 800):
#     для самого Π½Π°Ρ‡Π°Π»Π° Ρ€Π°Π±ΠΎΡ‚Ρ‹ Ρ„ΡƒΠ½ΠΊΡ†ΠΈΠΈ Π·Π°Π΄Π°Π΄ΠΈΠΌ Π½Π°Ρ‡Π°Π»ΡŒΠ½Ρ‹Π΅ значСния коэффициСнтов
    vector_init = [float(random.uniform(-0.5, 0.5)), float(random.uniform(-0.5, 0.5))]
    errors = []
#     запустим Ρ†ΠΈΠΊΠ» спуска
# Ρ†ΠΈΠΊΠ» расчитан Π½Π° ΠΎΠΏΡ€Π΅Π΄Π΅Π»Π΅Π½Π½ΠΎΠ΅ количСство шагов (steps)
    for i in range(steps):
        ind = random.choice(range(len(x_us)))
        new_vector = stoch_grad_step_usual(vector_init, x_us, ind, y_us, l)
        vector_init = new_vector
        errors.append(errors_sq_Kramer_method(vector_init,x_us,y_us))
    return (vector_init),(errors)


# запишСм массив Π·Π½Π°Ρ‡Π΅Π½ΠΈΠΉ 
list_parametres_stoch_gradient_descence = stoch_grad_descent_usual(x_us, y_us, l=0.1, steps = 800)

print ' 33[1m' + ' 33[4m' + "ЗначСния коэффициСнтов a ΠΈ b:" + ' 33[0m'
print 'a =', round(list_parametres_stoch_gradient_descence[0][0],3)
print 'b =', round(list_parametres_stoch_gradient_descence[0][1],3)
print


print ' 33[1m' + ' 33[4m' + "Π‘ΡƒΠΌΠΌΠ° ΠΊΠ²Π°Π΄Ρ€Π°Ρ‚ΠΎΠ² ΠΎΡ‚ΠΊΠ»ΠΎΠ½Π΅Π½ΠΈΠΉ:" + ' 33[0m'
print round(list_parametres_stoch_gradient_descence[1][-1],3)
print

print ' 33[1m' + ' 33[4m' + "ΠšΠΎΠ»ΠΈΡ‡Π΅ΡΡ‚Π²ΠΎ ΠΈΡ‚Π΅Ρ€Π°Ρ†ΠΈΠΉ Π² стохастичСском Π³Ρ€Π°Π΄ΠΈΠ΅Π½Ρ‚Π½ΠΎΠΌ спускС:" + ' 33[0m'
print len(list_parametres_stoch_gradient_descence[1])

Solving a Simple Linear Regression Equation

We look carefully at the coefficients and catch ourselves on the question "How so?". We got other values ​​of the coefficients Solving a Simple Linear Regression Equation ΠΈ Solving a Simple Linear Regression Equation. Maybe stochastic gradient descent has found more optimal parameters for the equation? Unfortunately no. It is enough to look at the sum of squared deviations and see that with new values ​​of the coefficients, the error is greater. We are not in a hurry to despair. Let's build a graph of error change.

Code for plotting the sum of squared deviations in stochastic gradient descent

print 'Π“Ρ€Π°Ρ„ΠΈΠΊ β„–5 "Π‘ΡƒΠΌΠΌΠ° ΠΊΠ²Π°Π΄Ρ€Π°Ρ‚ΠΎΠ² ΠΎΡ‚ΠΊΠ»ΠΎΠ½Π΅Π½ΠΈΠΉ ΠΏΠΎ-шагово"'
plt.plot(range(len(list_parametres_stoch_gradient_descence[1])), list_parametres_stoch_gradient_descence[1], color='red', lw=2)
plt.xlabel('Steps (Iteration)', size=16)
plt.ylabel('Sum of squared deviations', size=16)
plt.show()

Chart #5 "Sum of Squared Deviations in Stochastic Gradient Descent"

Solving a Simple Linear Regression Equation

After looking at the schedule, everything falls into place and now we will fix everything.

So what happened? The following happened. When we randomly select a month, it is for the selected month that our algorithm seeks to reduce the error in the calculation of revenue. Then we select another month and repeat the calculation, but we reduce the error for the second selected month. And now let's remember that we have the first two months significantly deviate from the line of the simple linear regression equation. This means that when either of these two months is chosen, by reducing the error of each of them, our algorithm seriously increases the error over the entire sample. So what to do? The answer is simple: you need to reduce the step of descent. Indeed, by reducing the descent step, the error will also stop β€œjumping” either up or down. Or rather, the β€œjumping” error will not stop, but it will not do it so quickly :) Let's check.

Code to run SGD with a smaller step

# запустим Ρ„ΡƒΠ½ΠΊΡ†ΠΈΡŽ, ΡƒΠΌΠ΅Π½ΡŒΡˆΠΈΠ² шаг Π² 100 Ρ€Π°Π· ΠΈ ΡƒΠ²Π΅Π»ΠΈΡ‡ΠΈΠ² количСство шагов ΡΠΎΠΎΡ‚Π²Π΅Ρ‚ΡΠ²ΡƒΡŽΡ‰Π΅ 
list_parametres_stoch_gradient_descence = stoch_grad_descent_usual(x_us, y_us, l=0.001, steps = 80000)

print ' 33[1m' + ' 33[4m' + "ЗначСния коэффициСнтов a ΠΈ b:" + ' 33[0m'
print 'a =', round(list_parametres_stoch_gradient_descence[0][0],3)
print 'b =', round(list_parametres_stoch_gradient_descence[0][1],3)
print


print ' 33[1m' + ' 33[4m' + "Π‘ΡƒΠΌΠΌΠ° ΠΊΠ²Π°Π΄Ρ€Π°Ρ‚ΠΎΠ² ΠΎΡ‚ΠΊΠ»ΠΎΠ½Π΅Π½ΠΈΠΉ:" + ' 33[0m'
print round(list_parametres_stoch_gradient_descence[1][-1],3)
print



print ' 33[1m' + ' 33[4m' + "ΠšΠΎΠ»ΠΈΡ‡Π΅ΡΡ‚Π²ΠΎ ΠΈΡ‚Π΅Ρ€Π°Ρ†ΠΈΠΉ Π² стохастичСском Π³Ρ€Π°Π΄ΠΈΠ΅Π½Ρ‚Π½ΠΎΠΌ спускС:" + ' 33[0m'
print len(list_parametres_stoch_gradient_descence[1])

print 'Π“Ρ€Π°Ρ„ΠΈΠΊ β„–6 "Π‘ΡƒΠΌΠΌΠ° ΠΊΠ²Π°Π΄Ρ€Π°Ρ‚ΠΎΠ² ΠΎΡ‚ΠΊΠ»ΠΎΠ½Π΅Π½ΠΈΠΉ ΠΏΠΎ-шагово"'
plt.plot(range(len(list_parametres_stoch_gradient_descence[1])), list_parametres_stoch_gradient_descence[1], color='red', lw=2)
plt.xlabel('Steps (Iteration)', size=16)
plt.ylabel('Sum of squared deviations', size=16)
plt.show()

Solving a Simple Linear Regression Equation

Chart #6 "Sum of squared deviations for stochastic gradient descent (80k steps)"

Solving a Simple Linear Regression Equation

The odds have improved, but are still not ideal. Hypothetically, this can be corrected in this way. We select, for example, on the last 1000 iterations the values ​​of the coefficients with which the minimum error was made. True, for this we will have to write down the values ​​of the coefficients themselves. We will not do this, but rather pay attention to the schedule. It looks smooth and the error seems to decrease evenly. Actually it is not. Let's look at the first 1000 iterations and compare them with the last ones.

Code for SGD chart (first 1000 steps)

print 'Π“Ρ€Π°Ρ„ΠΈΠΊ β„–7 "Π‘ΡƒΠΌΠΌΠ° ΠΊΠ²Π°Π΄Ρ€Π°Ρ‚ΠΎΠ² ΠΎΡ‚ΠΊΠ»ΠΎΠ½Π΅Π½ΠΈΠΉ ΠΏΠΎ-шагово. ΠŸΠ΅Ρ€Π²Ρ‹Π΅ 1000 ΠΈΡ‚Π΅Ρ€Π°Ρ†ΠΈΠΉ"'
plt.plot(range(len(list_parametres_stoch_gradient_descence[1][:1000])), 
         list_parametres_stoch_gradient_descence[1][:1000], color='red', lw=2)
plt.xlabel('Steps (Iteration)', size=16)
plt.ylabel('Sum of squared deviations', size=16)
plt.show()

print 'Π“Ρ€Π°Ρ„ΠΈΠΊ β„–7 "Π‘ΡƒΠΌΠΌΠ° ΠΊΠ²Π°Π΄Ρ€Π°Ρ‚ΠΎΠ² ΠΎΡ‚ΠΊΠ»ΠΎΠ½Π΅Π½ΠΈΠΉ ΠΏΠΎ-шагово. ПослСдниС 1000 ΠΈΡ‚Π΅Ρ€Π°Ρ†ΠΈΠΉ"'
plt.plot(range(len(list_parametres_stoch_gradient_descence[1][-1000:])), 
         list_parametres_stoch_gradient_descence[1][-1000:], color='red', lw=2)
plt.xlabel('Steps (Iteration)', size=16)
plt.ylabel('Sum of squared deviations', size=16)
plt.show()

Chart No. 7 "Sum of squared deviations of SGD (first 1000 steps)"

Solving a Simple Linear Regression Equation

Chart #8 "Sum of squared deviations of SGD (last 1000 steps)"

Solving a Simple Linear Regression Equation

At the very beginning of the descent, we observe a fairly uniform and steep decrease in the error. At the last iterations, we see that the error goes around and around the value of 1,475 and at some moments even equals this optimal value, but then it still goes up ... I repeat, you can write down the values ​​of the coefficients Solving a Simple Linear Regression Equation ΠΈ Solving a Simple Linear Regression Equation, and then choose those for which the error is minimal. However, we had a bigger problem: we had to take 80 thousand steps (see code) to get values ​​close to optimal. And this already contradicts the idea of ​​saving computation time in stochastic gradient descent with respect to gradient descent. What can be corrected and improved? It is not difficult to see that in the first iterations we are steadily going down and, therefore, we should leave a large step in the first iterations and decrease the step as we move forward. We will not do this in this article - it has already dragged on. Those who wish can think for themselves how to do it, it's not difficult πŸ™‚

Now let's perform stochastic gradient descent using the library NumPy (and let's not trip over the rocks we've identified earlier)

Code for Stochastic Gradient Descent (NumPy)

# для Π½Π°Ρ‡Π°Π»Π° напишСм Ρ„ΡƒΠ½ΠΊΡ†ΠΈΡŽ Π³Ρ€Π°Π΄ΠΈΠ΅Π½Ρ‚Π½ΠΎΠ³ΠΎ шага
def stoch_grad_step_numpy(vector_init, X, ind, y, l):
    x = X[ind]
    y_pred = np.dot(x,vector_init)
    err = y_pred - y[ind]
    grad_a = err
    grad_b = x[1]*err
    return vector_init - l*np.array([grad_a, grad_b])

# ΠΎΠΏΡ€Π΅Π΄Π΅Π»ΠΈΠΌ Ρ„ΡƒΠ½ΠΊΡ†ΠΈΡŽ стохастичСского Π³Ρ€Π°Π΄ΠΈΠ΅Π½Ρ‚Π½ΠΎΠ³ΠΎ спуска
def stoch_grad_descent_numpy(X, y, l=0.1, steps = 800):
    vector_init = np.array([[np.random.randint(X.shape[0])], [np.random.randint(X.shape[0])]])
    errors = []
    for i in range(steps):
        ind = np.random.randint(X.shape[0])
        new_vector = stoch_grad_step_numpy(vector_init, X, ind, y, l)
        vector_init = new_vector
        errors.append(error_square_numpy(vector_init,X,y))
    return (vector_init), (errors)

# запишСм массив Π·Π½Π°Ρ‡Π΅Π½ΠΈΠΉ 
list_parametres_stoch_gradient_descence = stoch_grad_descent_numpy(x_np, y_np, l=0.001, steps = 80000)

print ' 33[1m' + ' 33[4m' + "ЗначСния коэффициСнтов a ΠΈ b:" + ' 33[0m'
print 'a =', round(list_parametres_stoch_gradient_descence[0][0],3)
print 'b =', round(list_parametres_stoch_gradient_descence[0][1],3)
print


print ' 33[1m' + ' 33[4m' + "Π‘ΡƒΠΌΠΌΠ° ΠΊΠ²Π°Π΄Ρ€Π°Ρ‚ΠΎΠ² ΠΎΡ‚ΠΊΠ»ΠΎΠ½Π΅Π½ΠΈΠΉ:" + ' 33[0m'
print round(list_parametres_stoch_gradient_descence[1][-1],3)
print



print ' 33[1m' + ' 33[4m' + "ΠšΠΎΠ»ΠΈΡ‡Π΅ΡΡ‚Π²ΠΎ ΠΈΡ‚Π΅Ρ€Π°Ρ†ΠΈΠΉ Π² стохастичСском Π³Ρ€Π°Π΄ΠΈΠ΅Π½Ρ‚Π½ΠΎΠΌ спускС:" + ' 33[0m'
print len(list_parametres_stoch_gradient_descence[1])
print

Solving a Simple Linear Regression Equation

The values ​​turned out to be almost the same as when descending without using NumPy. However, this is logical.

Let's find out how much time stochastic gradient descents took us.

Code to determine SGD calculation time (80k steps)

print ' 33[1m' + ' 33[4m' +
"ВрСмя выполнСния стохастичСского Π³Ρ€Π°Π΄ΠΈΠ΅Π½Ρ‚Π½ΠΎΠ³ΠΎ спуска Π±Π΅Π· использования Π±ΠΈΠ±Π»ΠΈΠΎΡ‚Π΅ΠΊΠΈ NumPy:"
+ ' 33[0m'
%timeit list_parametres_stoch_gradient_descence = stoch_grad_descent_usual(x_us, y_us, l=0.001, steps = 80000)
print '***************************************'
print

print ' 33[1m' + ' 33[4m' +
"ВрСмя выполнСния стохастичСского Π³Ρ€Π°Π΄ΠΈΠ΅Π½Ρ‚Π½ΠΎΠ³ΠΎ спуска с использованиСм Π±ΠΈΠ±Π»ΠΈΠΎΡ‚Π΅ΠΊΠΈ NumPy:"
+ ' 33[0m'
%timeit list_parametres_stoch_gradient_descence = stoch_grad_descent_numpy(x_np, y_np, l=0.001, steps = 80000)

Solving a Simple Linear Regression Equation

The further into the forest, the darker the clouds: again, the "self-written" formula shows the best result. All this suggests that there must be even more subtle ways of using the library. NumPy, which really speed up computational operations. In this article, we will not learn about them. Something to think about at your leisure :)

We summarize the

Before summarizing, I would like to answer a question that most likely arose from our dear reader. Why, in fact, such β€œtorments” with descents, why do we need to walk up and down the mountain (mostly down) to find the treasured lowland, if we have such a powerful and simple device in our hands, in the form of an analytical solution that instantly teleports us to Right place?

The answer to this question lies on the surface. Now we have analyzed a very simple example in which the true answer is Solving a Simple Linear Regression Equation depends on one feature Solving a Simple Linear Regression Equation. You don’t see this often in life, so let’s imagine that we have 2, 30, 50 or more signs. Add to this thousands, and even tens of thousands of values ​​for each feature. In this case, the analytical solution may fail the test and fail. In turn, gradient descent and its variations will slowly but surely bring us closer to the goal - the minimum of the function. And don't worry about the speed - we will probably still analyze the ways that will allow us to set and adjust the step length (that is, the speed).

And now for a short summary.

Firstly, I hope that the material presented in the article will help beginner "data scientists" in understanding how to solve simple (and not only) linear regression equations.

Second, we looked at several ways to solve the equation. Now, depending on the situation, we can choose the one that is best suited for the task at hand.

Thirdly, we saw the power of additional settings, namely the step length of gradient descent. This parameter must not be neglected. As noted above, in order to reduce the cost of performing calculations, the step length should be changed during the descent.

Fourth, in our case, "self-written" functions showed the best time result of calculations. This is probably due to not the most professional use of the library's capabilities. NumPy. But be that as it may, the conclusion suggests itself as follows. On the one hand, sometimes it is worth questioning established opinions, and on the other hand, it is not always worth complicating everything - on the contrary, sometimes a simpler way of solving a problem is more effective. And since our goal was to analyze three approaches to solving a simple linear regression equation, the use of β€œself-written” functions was enough for us.

Literature (or something like that)

1. Linear regression

http://statistica.ru/theory/osnovy-lineynoy-regressii/

2. Least squares method

mathprofi.ru/method_naimenshih_kvadratov.html

3. Derivative

www.mathprofi.ru/chastnye_proizvodnye_primery.html

4. Gradient

mathprofi.ru/proizvodnaja_po_napravleniju_i_gradient.html

5. Gradient Descent

habr.com/en/post/471458

habr.com/en/post/307312

artemarakcheev.com//2017-12-31/linear_regression

6. NumPy Library

docs.scipy.org/doc/numpy-1.10.1/reference/generated/numpy.linalg.solve.html

docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.linalg.pinv.html

pythonworld.com/numpy/2.html

Source: habr.com

Add a comment