Is the Machine Learning Bubble Bursting or the Beginning of a New Dawn

Recently Released article, which shows a good trend in machine learning in recent years. In short, the number of machine learning startups has plummeted in the past two years.

Is the Machine Learning Bubble Bursting or the Beginning of a New Dawn
Well. Let's analyze "whether the bubble burst", "how to continue to live" and talk about where such a squiggle comes from.

First, let's talk about what was the booster of this curve. Where did she come from. Probably everyone will remember victory machine learning in 2012 at the ImageNet competition. After all, this is the first global event! But in reality this is not the case. Yes, and the growth of the curve begins a little earlier. I would break it down into several points.

  1. 2008 is the emergence of the term “big data”. Real products started appear since 2010. Big data is directly related to machine learning. Without big data, the stable operation of the algorithms that existed at that time is impossible. And these are not neural networks. Until 2012, neural networks were the lot of a marginalized minority. But then completely different algorithms began to work, which had existed for years, or even decades: SVM(1963,1993), Random Forest (1995) AdaBoost (2003), ... Startups of those years are primarily associated with the automatic processing of structured data: cash registers, users, advertising, and much more.

    The derivative of this first wave is a set of frameworks such as XGBoost, CatBoost, LightGBM, etc.

  2. In 2011-2012 convolutional neural networks won a number of competitions in image recognition. Their actual use has been somewhat delayed. I would say that massively meaningful startups and solutions began to appear since 2014. It took two years to digest that neurons still work, to make convenient frameworks that could be installed and launched in a reasonable time, to develop methods that would stabilize and accelerate the convergence time.

    Convolutional networks made it possible to solve machine vision problems: classification of images and objects in an image, object detection, object and person recognition, image enhancement, etc., etc.

  3. 2015-2017 years. The boom of algorithms and projects tied to recurrent networks or their analogues (LSTM, GRU, TransformerNet, etc.). Well-functioning speech-to-text algorithms and machine translation systems have appeared. They are partly based on convolutional networks to highlight core features. Partly on the fact that we learned how to collect really large and good datasets.

Is the Machine Learning Bubble Bursting or the Beginning of a New Dawn

“Bubble burst? Hype overheated? Did they die like blockchain?”
And then! Tomorrow, Siri will stop working on your phone, and the day after tomorrow, Tesla will not distinguish between a turn and a kangaroo.

Neural networks are already working. They are in dozens of devices. They really allow you to earn money, change the market and the world around you. Hype looks a little different:

Is the Machine Learning Bubble Bursting or the Beginning of a New Dawn

It’s just that neural networks have ceased to be something new. Yes, many people have high expectations. But a large number of companies have learned to use neurons and make products based on them. Neurons provide new functionality, allow you to reduce jobs, reduce the price of services:

  • Manufacturing companies are integrating algorithms to analyze defects on the assembly line.
  • Animal farms buy systems to control cows.
  • Automatic harvesters.
  • Automated Call-centers.
  • Filters in SnapChat. (well, at least something useful!)

But the main thing, and not the most obvious: "There are no more new ideas, or they will not bring instant capital." Neural networks have solved dozens of problems. And decide more. All the obvious ideas that were - gave rise to many startups. But everything that was on the surface has already been collected. Over the past two years, I have not seen a single new idea for the application of neural networks. Not a single new approach (well, ok, there are a few problems with GANs).

And each next startup is more and more difficult. It no longer requires two guys who train a neuron on open data. It requires programmers, a server, a layout team, complex support, etc.

As a result, there are fewer startups. But more production. Need to add license plate recognition? There are hundreds of professionals with relevant experience on the market. You can hire and in a couple of months your employee will make a system. Or buy ready-made. But doing a new startup?.. Madness!

You need to make a visitor tracking system - why pay for a bunch of licenses when you can make your own in 3-4 months, sharpen it for your business.

Now neural networks are going the same way that dozens of other technologies have gone.

Remember how the concept of “website developer” has changed since 1995? While the market is not saturated with specialists. There are very few professionals. But I can bet that in 5-10 years there won't be much difference between a Java programmer and a neural network developer. And those and those specialists will be enough in the market.

There will simply be a class of tasks for which neurons solve. There was a task - hire a specialist.

"What's next? Where is the promised artificial intelligence?”

And here is a small but interesting misunderstanding :)

The stack of technologies that exists today, apparently, will not lead us to artificial intelligence. Ideas, their novelty, have largely exhausted themselves. Let's talk about what keeps the current level of development.

Restrictions

Let's start with auto-drones. It seems clear that it is possible to make fully autonomous cars with today's technologies. But in how many years this will happen is not clear. Tesla believes that this will happen in a couple of years -


There are many others expertswho estimate it as 5-10 years.

Most likely, in my opinion, in 15 years the infrastructure of cities will already change so that the emergence of autonomous cars will become inevitable, will become its continuation. But this cannot be considered intelligence. Modern Tesla is a very complex pipeline for filtering data, searching for it and retraining. These are rules-rules-rules, data collection and filters over them (here here I wrote a little more about it, or watch from this marks).

First problem

And this is where we see first fundamental problem. Big data. This is exactly what spawned the current wave of neural networks and machine learning. Now, to do something complex and automatic, you need a lot of data. Not just a lot, but a lot, a lot. We need automated algorithms for their collection, marking, and use. We want to make the car see the trucks against the sun - we must first collect a sufficient number of them. We want the car not to go crazy with a bike bolted to the trunk - more samples.

And one example is not enough. Hundreds? Thousands?

Is the Machine Learning Bubble Bursting or the Beginning of a New Dawn

Second problem

Second problem - visualization of what our neural network understood. This is a very non-trivial task. Until now, few people understand how to visualize it. These articles are very recent, these are just a few examples, even distant ones:
Visualization obsession with textures. It shows well what the neuron tends to get hung up on + what it perceives as starting information.

Is the Machine Learning Bubble Bursting or the Beginning of a New Dawn
Visualization attenuation at translations. In fact, attension can often be used to show exactly what caused such a network reaction. I met such things both for debugging and for product solutions. There are a lot of articles on this subject. But the more complex the data, the more difficult it is to understand how to achieve a stable visualization.

Is the Machine Learning Bubble Bursting or the Beginning of a New Dawn

Well, yes, the good old set of "look what the grid has inside in filters". These pictures were popular 3-4 years ago, but everyone quickly realized that the pictures are beautiful, but there is not much sense in them.

Is the Machine Learning Bubble Bursting or the Beginning of a New Dawn

I did not name dozens of other lotions, ways, hacks, studies on how to display the insides of the network. Do these tools work? Do they help to quickly understand what the problem is and debug the network?.. Pull out the last percent? Well, it's like this:

Is the Machine Learning Bubble Bursting or the Beginning of a New Dawn

You can watch any competition on Kaggle. And a description of how people make final decisions. We stacked 100-500-800 mulion models and it worked!

I, of course, exaggerate. But these approaches do not provide quick and direct answers.

With sufficient experience, poking through different options can give a verdict on why your system made such a decision. But it will be difficult to correct the behavior of the system. Put a crutch, move the threshold, add a dataset, take another backend network.

Third problem

Third fundamental problem - grids do not teach logic, but statistics. Statistically it face:

Is the Machine Learning Bubble Bursting or the Beginning of a New Dawn

Logically, it's not very similar. Neural networks don't learn something complex unless they're forced to. They always teach the most simple signs. Got eyes, nose, head? So it's a face! Or give an example where the eyes will not mean the face. Again, there are millions of examples.

There's Plenty of Room at the Bottom

I would say that it is these three global problems that currently limit the development of neural networks and machine learning. And where these problems did not limit - it is already actively used.

This is the end? Are neural networks up?

Unknown. But, of course, everyone hopes not.

There are many approaches and directions to solving those fundamental problems that I have highlighted above. But so far, none of these approaches has made it possible to do something fundamentally new, to solve something that has not yet been solved. So far, all fundamental projects are made on the basis of stable approaches (Tesla), or remain test projects of institutions or corporations (Google Brain, OpenAI).

Roughly speaking, the main direction is the creation of some high-level representation of the input data. In a sense, "memory". The simplest example of memory is various “Embedding” representations of images. Well, for example, all facial recognition systems. The network learns to get some stable representation from the face that does not depend on rotation, lighting, resolution. In essence, the network minimizes the metrics "different faces - far" and "identical - close".

Is the Machine Learning Bubble Bursting or the Beginning of a New Dawn

For such training, tens and hundreds of thousands of examples are needed. But the result bears some rudiments of “One-shot Learning”. Now we do not need hundreds of faces to remember a person. Just one face and that's all we are learn!
Only here's the problem ... The grid can only learn fairly simple objects. When trying to distinguish not faces, but, for example, “people by their clothes” (task Re-identification) - the quality falls by many orders of magnitude. And the network can no longer learn fairly obvious changes in angles.

Yes, and learning from millions of examples is also somehow so-so entertainment.

There are works to significantly reduce the elections. For example, one can immediately recall one of the first works on One Shot Learning from google:

Is the Machine Learning Bubble Bursting or the Beginning of a New Dawn

There are many such works, for example 1 or 2 or 3.

One minus - usually training works well on some simple, “MNIST's examples”. And when moving on to complex tasks, you need a large base, an object model, or some kind of magic.
In general, work on One-Shot training is a very interesting topic. You find many ideas. But for the most part, the two problems that I listed (pre-training on a huge dataset / instability on complex data) are very hindering training.

On the other hand, GANs are approaching the topic of Embedding - generative adversarial networks. You probably read a bunch of articles on this topic on Habré. (1, 2,3)
A feature of GAN is the formation of some internal state space (essentially the same Embedding), which allows you to draw an image. It can be face, can be activity.

Is the Machine Learning Bubble Bursting or the Beginning of a New Dawn

The GAN problem is that the more complex the generated object is, the more difficult it is to describe it in the “generator-discriminator” logic. As a result, of the real GAN ​​applications that are heard, only DeepFake, which, again, manipulates facial representations (for which there is a huge base).

I have seen very few other useful applications. Usually some kind of whistle-fake with drawing pictures.

And again. No one understands how this will allow us to move into a brighter future. Representing logic/space in a neural network is good. But we need a huge number of examples, we don’t understand how the neuron represents this in itself, we don’t understand how to make the neuron remember some really complex representation.

Reinforcement learning - this is a completely different approach. Surely you remember how Google beat everyone in Go. Recent victories in Starcraft and Dota. But here everything is far from being so rosy and promising. Best of all about RL and its complexity tells this article.

To summarize briefly what the author wrote:

  • Models out of the box do not fit / work poorly in most cases
  • Practical problems are easier to solve in other ways. Boston Dynamics does not use RL due to its complexity/unpredictability/computational complexity
  • To make RL work, you need a complex function. Often it is difficult to create / write
  • Difficult to train models. You have to spend a lot of time to swing and get out of local optima
  • As a result, it is difficult to repeat the model, the instability of the model at the slightest change
  • Often overfits for some left patterns, up to a random number generator

The key point is that RL is not yet in production. Google has some experiments ( 1, 2 ). But I haven't seen a single product system.

Memory. The downside of all that is described above is unstructured. One of the ways they try to clean up all this is to give the neural network access to a separate memory. So that she can record and re-record the results of her steps there. Then the neural network can be determined by the current state of the memory. This is very similar to classic processors and computers.

The most famous and popular article — from Deep Mind:

Is the Machine Learning Bubble Bursting or the Beginning of a New Dawn

It seems that here it is, the key to understanding intelligence? But rather not. The system still needs a huge amount of data to train. And it works mainly with structured tabular data. However, when Facebook decided a similar problem, then they went along the path “fuck the memory, just make the neuron more complicated, but there are more examples - and it will learn itself.”

disentanglement. Another way to create a meaningful memory is to take the same embeddings, but when learning, introduce additional criteria that would allow them to highlight “meanings” in them. For example, we want to train a neural network to distinguish between human behavior in a store. If we followed the standard path, we would have to make a dozen networks. One is looking for a person, the second determines what he is doing, the third is his age, the fourth is his gender. A separate logic looks at the part of the shop where it does/trains for it. The third determines its trajectory, and so on.

Or, if there were an infinite amount of data, then it would be possible to train one network for all possible outcomes (obviously, such an array of data cannot be collected).

The disentanglement approach tells us - let's train the network so that it can distinguish between concepts. So that she would form an embedding from the video, where one area would determine the action, one - the position on the floor in time, one - the height of a person, and one more - his gender. At the same time, during training, I would like to almost not prompt the network with such key concepts, but for it to select and group the areas itself. There are quite a few such articles (some of them 1, 2, 3) and in general they are quite theoretical.

But this direction, at least theoretically, should close the problems listed at the beginning.

Is the Machine Learning Bubble Bursting or the Beginning of a New Dawn

Decomposition of the image according to the parameters “wall color / floor color / object shape / object color / etc.”

Is the Machine Learning Bubble Bursting or the Beginning of a New Dawn

Decomposition of the face according to the parameters “size, eyebrows, orientation, skin color, etc.”

Other

There are many other not so much global directions that allow you to somehow reduce the bases, work with more heterogeneous data, etc.

You have been warned!. It probably doesn't make sense to separate this as a separate method. Just an approach that enhances others. Many articles have been devoted to him1,2,3). The meaning of Attention is to strengthen the reaction of the network to significant objects during training. Often by some external target designation, or a small external network.

3D simulation. If you make a good 3D engine, then they can often cover 90% of the training data (I even saw an example when almost 99% of the data was covered by a good engine). There are many ideas and hacks on how to make a network trained on a 3D engine work on real data (Fine tuning, style transfer, etc.). But often making a good engine is several orders of magnitude more difficult than typing data. Examples when engines were made:
Robot Training (google, braingarden)
Training recognition goods in the store (but in the two projects that we did, we easily managed without it).
Training at Tesla (again, the video that was above).

Conclusions

The whole article is, in a sense, conclusions. Probably, the main message that I wanted to make was “the freebie is over, neurons do not give more simple solutions.” Now we have to work hard in building complex solutions. Or work hard doing complex scientific research.

In general, the topic is debatable. Maybe readers have more interesting examples?

Source: habr.com

Add a comment