Machine learning without Python, Anaconda and other creeps

No, well, of course I'm not serious. There must be a limit to the extent to which it is possible to simplify the subject. But for the first stages, understanding the basic concepts and quickly “entering” the topic, it may be acceptable. And how to name this material correctly (options: “Machine learning for dummies”, “Data analysis from diapers”, “Algorithms for the smallest ones”), we will discuss at the end.

To business. Wrote several application programs in MS Excel to visualize and visualize the processes that occur in various machine learning methods when analyzing data. Seeing is believing, in the end, as the bearers of the culture that developed most of these methods say (by the way, not all of them. The most powerful “support vector machine”, or SVM, support vector machine is an invention of our compatriot Vladimir Vapnik, Moscow Institute of Management. 1963, by the way! Now, however, he teaches and works in the USA).

Three files for review

1. Clustering by k-means

Problems of this kind refer to "unsupervised learning", when we need to break the source data into some pre-known number of categories, but at the same time we do not have any number of "correct answers", we must extract them from the data itself. The fundamental classical problem of finding subspecies of iris flowers (Ronald Fisher, 1936!), which is considered the first sign of this field of knowledge, is of such a nature.

The method is quite simple. We have a set of objects represented as vectors (sets of N numbers). In irises, these are sets of 4 numbers that characterize the flower: the length and width of the outer and inner perianth segments, respectively (Fisher's irises - Wikipedia.). As a distance, or a measure of proximity between objects, the usual Cartesian metric is chosen.

Further, arbitrarily (or not arbitrarily, see below) the centers of the clusters are selected, and the distances from each object to the centers of the clusters are calculated. Each object at a given iteration step is marked as belonging to the nearest center. Then the center of each cluster is transferred to the arithmetic mean of the coordinates of its members (by analogy with physics, it is also called the "center of mass"), and the procedure is repeated.

The process converges fairly quickly. On the pictures in two dimensions it looks like this:

1. Initial random distribution of points on the plane and the number of clusters

Machine learning without Python, Anaconda and other creeps

2. Setting the centers of clusters and assigning points to their clusters

Machine learning without Python, Anaconda and other creeps

3. Transferring the coordinates of the centers of clusters, recalculating the belonging of points until the centers stabilize. The trajectory of the movement of the cluster center to the final position is visible.

Machine learning without Python, Anaconda and other creeps

At any time, you can set new cluster centers (without generating a new distribution of points!) and see that the partitioning process is not always unambiguous. Mathematically, this means that the optimized function (the sum of the squared distances from the points to the centers of their clusters) we find not a global, but a local minimum. This problem can be overcome either by a non-random choice of initial centers of clusters, or by enumeration of possible centers (sometimes it is advantageous to place them exactly in one of the points, then at least there is a guarantee that we will not get empty clusters). In any case, a finite set always has an infimum.

You can play with this file at this link (do not forget to enable macro support. Files have been checked for viruses)

Description of the method on Wikipedia − k-means method

2. Approximation by polynomials and breakdown of data. Retraining

A remarkable scientist and popularizer of data science K.V. Vorontsov briefly talks about machine learning methods as "the science of drawing curves through points." In this example, we will find a pattern in the data using the least squares method.

The technique of splitting the initial data into "training" and "control", as well as such a phenomenon as overfitting, or "re-adjustment" to the data, is shown. With a correct approximation, we will have some error on the training data and a slightly larger error on the control data. If incorrect, fine adjustment to the training data and a huge error on the control ones.

(It is a well-known fact that through N points it is possible to draw a single curve of the N-1th degree, and this method generally does not give the desired result. Lagrange interpolation polynomial on Wikipedia)

1. Set the initial distribution

Machine learning without Python, Anaconda and other creeps

2. We divide the points into "training" and "control" in the ratio of 70 to 30.

Machine learning without Python, Anaconda and other creeps

3. We draw an approximating curve along the training points, we see the error that it gives on the control data

Machine learning without Python, Anaconda and other creeps

4. We draw an exact curve through the training points, and we see a monstrous error on the control data (and zero on the training ones, but what's the point?).

Machine learning without Python, Anaconda and other creeps

Of course, the simplest version is shown with a single partition into “training” and “control” subsets; in the general case, this is done repeatedly for the best adjustment of the coefficients.

The file is available here, checked by antivirus. Enable macros to work correctly

3. Gradient descent and error dynamics

There will be a 4-dimensional case and linear regression here. Linear regression coefficients will be determined step by step using the gradient descent method, initially all coefficients are zero. A separate graph shows the dynamics of the error decrease as the coefficients are fine-tuned more and more. It is possible to view all four 2D projections.

If we set the gradient descent step too large, then it is clear that each time we will skip the minimum and reach the result in more steps, although, in the end, we will still come (unless we delay the descent step too much - then the algorithm will go " in disarray"). And the graph of the dependence of the error on the iteration step will not be smooth, but “twitchy”.

1. Generate data, set the gradient descent step

Machine learning without Python, Anaconda and other creeps

2. With the correct selection of the gradient descent step, we smoothly and quickly enough come to a minimum

Machine learning without Python, Anaconda and other creeps

3. If the gradient descent step is chosen incorrectly, we skip the maximum, the error graph is “twitchy”, convergence takes a greater number of steps

Machine learning without Python, Anaconda and other creeps
и

Machine learning without Python, Anaconda and other creeps

4. With a completely wrong selection of the gradient descent step, we move away from the minimum

Machine learning without Python, Anaconda and other creeps

(To reproduce the process at the gradient descent step values ​​shown in the pictures, check the "reference data" box).

File - follow this link, you need to enable macros, there are no viruses.

According to the respected community, is such a simplification and method of presenting material acceptable? Should the article be translated into English?

Source: habr.com

Add a comment