Yandex Resident Program, or How an Experienced Backender Becomes an ML Engineer

Yandex Resident Program, or How an Experienced Backender Becomes an ML Engineer

Yandex opens machine learning residency program for experienced back-end developers. If you have written a lot in C++/Python and want to apply this knowledge in ML, then we will teach you how to do practical research and select experienced curators. You will work on key Yandex services and gain skills in such areas as linear models and gradient boosting, recommender systems, neural networks for image, text and sound analysis. You will also learn how to correctly evaluate your models using offline and online metrics.

The duration of the program is one year, during which participants will work in the Yandex Machine Intelligence and Research Department, as well as attend lectures and seminars. Participation is paid and assumes full employment: 40 hours per week, starting July 1 this year. Applications are already open and will last until May 1st. 

And now in more detail - about what kind of audience we are waiting for, what the workflow will be like, and in general, how a back-end specialist can switch to a career in ML.

Directivity

Many companies have Residency Programs, including, for example, Google and Facebook. They are mainly aimed at junior and intermediate level professionals who are trying to step into the side of ML research. Our program is for a different audience. We invite back-end developers who have already gained enough experience and know for sure that they need to shift towards ML in their competencies, gain practical skills - and not the skills of a scientist - in solving industrial machine learning problems. This does not mean that we do not support young researchers. For them, we organized a separate program - the prize named after Ilya Segalovich, which also allows you to work in Yandex.

Where the resident will have to work

We in the department of machine intelligence and research develop project ideas ourselves. The main source of inspiration is scientific literature, articles, trends of the research community. My colleagues and I analyze what we read, see how we can improve or expand the methods proposed by scientists. At the same time, each of us takes into account his own area of ​​​​knowledge and interests, formulates the task based on the areas that he considers important. At the intersection of the results of external research and own competencies, the idea of ​​the project is usually born.

Such a system is good in that it largely solves the technological problems of Yandex services even before they arise. When a service faces a problem, its representatives come to us, most likely to take the technologies we have already prepared, which can only be correctly applied in the product. If something is not ready, at least we will quickly remember where you can β€œstart digging”, in which articles to look for a solution. As you know, the scientific approach is to stand on the shoulders of giants.

What is to be done

In Yandex β€” and even specifically in our department β€” all relevant areas of ML are being developed. Our task is to improve the quality of a wide variety of products, and this serves as an incentive to test everything new. In addition, new services appear regularly. So the lecture program has all the key (well-established) areas of machine learning in industrial development. When compiling my part of the course, I used the experience of teaching at the School of Data Analysis, as well as materials and achievements of other teachers of the ShAD. I know that colleagues did the same.

In the first months, training according to the course program will be approximately 30% of your working time, then - about 10%. However, it is important to understand that working with the ML models themselves will continue to take about four times less than all related processes. These include preparing the backend, receiving data, writing a pipeline for their preprocessing, optimizing code, adapting to specific hardware, etc. An ML engineer is, if you like, a fullstack developer (only with a greater bias in machine learning), solve the problem from start to finish. Even with a finished model, you will probably need to do a number of more actions: parallelize its execution on several machines, prepare an implementation in the form of a handle, a library, or a component of the service itself.

Student choice
If you have the impression that it is better to go into ML engineers after working as a backend developer first, this is not so. Entering the same ShAD without real experience in developing services, learning and becoming extremely in demand in the market is a great option. Many specialists at Yandex ended up in their current positions in this way. If some company is ready to offer you a job in the field of ML immediately after graduation, it is probably worth accepting the offer too. Try to get into a good team with an experienced mentor and get ready to learn a lot.

What usually prevents you from doing ML

If a backender aspires to become an ML engineer, he - without taking into account the residency program - can choose from two areas of development.

Firstly, to study within the framework of some educational course. Lessons on Coursera will bring you closer to understanding the basic techniques, but to immerse yourself in the profession to a sufficient degree, you need to devote much more time to it. For example, to finish ShAD. In different years, the ShAD had a different number of courses directly on machine learning - on average, about eight. Each of them is really important and useful, including in the opinion of graduates. 

Secondly, you can participate in combat projects where you need to implement one or another ML algorithm. However, there are very few such projects on the IT development market: in most tasks, machine learning is not used. Even in banks that are actively exploring opportunities related to ML, only a few are engaged in data analysis. If you weren’t able to join one of these teams, the only thing left to do is either start your own project (where, most likely, you will set deadlines for yourself, and this has little to do with combat production tasks), or start competing on Kaggle.

Indeed, team up with other community members and try yourself in competitions relatively easy - especially if you back up your skills with training and the mentioned courses on Coursera. Each competition has a deadline - it will serve as an incentive for you and prepare you for a similar system in IT companies. This is a good way - which, however, is also a little divorced from real processes. Kaggle gives you pre-processed, if not always perfect, data; do not offer to think about the contribution to the product; and most importantly, they do not require solutions suitable for production. Your algorithms will probably work and have high accuracy, but your models and code will look like a Frankenstein sewn from different parts - in a combat project, this whole structure will work too slowly, it will be difficult to update and expand (for example, language and voice algorithms are always partially rewritten as the language develops). Companies are interested in the fact that not only you yourself can do the listed work (it is clear that you, as the author of the solution, can do this), but also any of your colleagues. About the difference between sports and industrial programming said lot, and Kaggle educates β€œathletes” - even if it does it very well, allowing you to gain some of the experience.

I described two possible lines of development - training through educational programs and training "in combat", for example on Kaggle. The residence program is a combination of these two methods. You are waiting for lectures and seminars at the ShAD level, as well as really combat projects.

Source: habr.com

Add a comment