The purpose of the article is to provide support to novice data scientists. IN
Why it makes sense to pay special attention to the formula ?
It is with the matrix equation that in most cases acquaintance with linear regression begins. At the same time, detailed calculations of how the formula was derived are rare.
For example, in Yandex machine learning courses, when students are introduced to regularization, they are offered to use functions from the library sklearn, while not a word is mentioned about the matrix representation of the algorithm. It is at this point that some listeners may want to understand this issue in more detail - write code without using ready-made functions. And for this, we must first present the equation with a regularizer in matrix form. This article, just, will allow those who wish to master such skills. Let's get started.
Initial conditions
Targets
We have a range of target values. For example, a target could be the price of an asset: oil, gold, wheat, dollar, etc. At the same time, under the range of values ββof the target indicator, we mean the number of observations. Such observations can be, for example, monthly oil prices for the year, that is, we will have 12 target values. Let's start introducing notation. Let's denote each value of the target indicator as . In total we have observations, so we can represent our observations as .
Regressors
We will assume that there are factors that to a certain extent explain the values ββof the target indicator. For example, the exchange rate of the dollar/ruble pair is strongly influenced by the price of oil, the Fed rate, etc. Such factors are called regressors. At the same time, each value of the target indicator should correspond to the value of the regressor, that is, if we have 12 target indicators for each month in 2018, then we should also have 12 regressor values ββfor the same period. Denote the values ββof each regressor through . Let in our case we have regressors (i.e. factors that influence the target indicator values). So our regressors can be represented as follows: for the 1st regressor (for example, the price of oil): , for the 2nd regressor (for example, the Fed rate): , for "-th" regressor:
Dependence of target indicators on regressors
Let's assume that the dependence of the target indicator from the regressors-th" observation can be expressed through a linear regression equation of the form:
Where - "-th" value of the regressor from 1 to ,
β the number of regressors from 1 to
β slope coefficients, which represent the amount by which the calculated target indicator will change on average when the regressor changes.
In other words, we are for everyone (with the exception of ) of the regressor, we determine "our" coefficient , then we multiply the coefficients by the values ββof the regressors "-th" observation, as a result we obtain a certain approximation "-th" target indicator.
Therefore, we need to choose such coefficients , for which the values ββof our approximating function will be located as close as possible to the target values.
Estimation of the quality of the approximating function
We will determine the estimate of the quality of the approximating function by the least squares method. The quality evaluation function in this case will take the following form:
We need to choose such values ββof the coefficients $w$ for which the value will be the smallest.
We translate the equation into a matrix form
Vector representation
To begin with, to make your life easier, you should pay attention to the linear regression equation and notice that the first coefficient is not multiplied by any regressor. At the same time, when we convert the data into a matrix form, the above circumstance will seriously complicate the calculations. In this regard, it is proposed to introduce another regressor for the first coefficient and equate it to one. Rather, eachequate the βthβ value of this regressor to one - after all, when multiplying by one, nothing will change from the point of view of the result of calculations, and from the point of view of the rules for the product of matrices, our torment will be significantly reduced.
Now, for the moment, for the sake of simplicity, let's assume that we have only one "th observation. Then, we present the values ββof the regressors "th" observation as a vector ... Vector has the dimension That is, rows and 1 column:
We represent the required coefficients as a vector , which has the dimension :
Linear regression equation for "-th" observation will take the form:
The function of assessing the quality of the linear model will take the form:
Note that in accordance with the rules of matrix multiplication, we needed to transpose the vector .
Matrix representation
As a result of vector multiplication, we get a number: , which is to be expected. This number is an approximation-th" target indicator. But we need an approximation of not one value of the target indicator, but all. To do this, we write down all-th" regressors in matrix format . The resulting matrix has dimension :
Now the linear regression equation will take the form:
Let us denote the values ββof the target indicators (all ) per vector dimension :
Now we can write in a matrix format the equation for estimating the quality of a linear model:
Actually, from this formula, we further obtain the formula known to us
How it's done? Parentheses are opened, differentiation is performed, resulting expressions are transformed, etc., and this is what we will do now.
Matrix transformations
Let's open the brackets
Let's prepare the equation for differentiation
To do this, we will carry out some transformations. In subsequent calculations, it will be more convenient for us if the vector will be presented at the beginning of each product in the equation.
Transformation 1
How did it happen? To answer this question, it is enough to look at the sizes of the multiplied matrices and see that at the output we get a number or otherwise .
Let's write down the sizes of matrix expressions.
Transformation 2
Let us write out similarly to transformation 1
At the output, we get an equation that we have to differentiate:
We differentiate the function of assessing the quality of the model
Differentiate with respect to the vector :
Questions why should not be, but we will analyze the operations for determining derivatives in the other two expressions in more detail.
Derivation 1
Let's expand the differentiation:
In order to determine the derivative of a matrix or vector, you need to look at what they have inside. We look:
Denote the product of matrices through the matrix . The matrix square and moreover, it is symmetrical. These properties will be useful to us later, remember them. Matrix has the dimension :
Now our task is to correctly multiply the vectors by the matrix and not get "twice two five", so let's focus and be extremely careful.
However, we have an intricate expression! In fact, we got a number - a scalar. And now, for real, we turn to differentiation. It is necessary to find the derivative of the resulting expression for each coefficient and get the output dimension vector . Just in case, I will write down the procedures for actions:
1) differentiate with respect to , we get:
2) differentiate with respect to , we get:
3) differentiate with respect to , we get:
The output is the promised vector of size :
If you take a closer look at the vector, you will notice that the left and corresponding right elements of the vector can be grouped in such a way that, as a result, a vector can be selected from the presented vector size . For example, (left element of the top row of the vector) (the right element of the top row of the vector) can be represented as , - as etc. for each line. Let's group:
Take out the vector and at the output we get:
Now, let's look at the resulting matrix. The matrix is ββthe sum of two matrices :
Recall that a little earlier, we noted one important property of the matrix - it is symmetrical. Based on this property, we can confidently state that the expression equals . This is easy to check by expanding the product of matrices element by element . We will not do this here, those who wish can check it themselves.
Let's go back to our expression. After our transformations, it turned out the way we wanted to see it:
So, we have dealt with the first differentiation. Let's move on to the second expression.
Derivation 2
Let's go down the beaten track. It will be much shorter than the previous one, so don't go too far from the screen.
Let's expand the vector and matrix element by element:
For a while, we will remove the deuce from the calculations - it does not play a big role, then we will return it to its place. We multiply the vectors by the matrix. First, multiply the matrix per vector , we have no restrictions here. Get the size vector :
Let's perform the following action - multiply the vector to the resulting vector. At the output, a number will be waiting for us:
We will differentiate it. At the output we get a vector of dimensions :
Does it remind you of something? All right! This is the product of the matrix per vector .
Thus, the second differentiation is successfully completed.
Instead of a conclusion
Now we know how the equality came about .
Finally, we describe a quick way to transform the basic formulas.
Let's evaluate the quality of the model in accordance with the least squares method:
We differentiate the resulting expression:
Literature
Internet sources:
1)
2)
3)
4)
Textbooks, collections of tasks:
1) Lecture notes on higher mathematics: full course / D.T. Written - 4th ed. - M .: Iris-press, 2006
2) Applied regression analysis / N. Draper, G. Smith - 2nd ed. - M .: Finance and statistics, 1986 (translated from English)
3) Tasks for solving matrix equations:
Source: habr.com