MLOps: DevOps in the Machine Learning world

In 2018, in professional circles and at thematic conferences dedicated to AI, the concept of MLOps appeared, which quickly gained a foothold in the industry and is now developing as an independent direction. In the future, MLOps may become one of the most demanded areas in IT. What is it and what it is eaten with, we understand under the cut.

MLOps: DevOps in the Machine Learning world

What is MLOps

MLOps (merge of technologies and processes of machine learning and approaches to the implementation of developed models in business processes) is a new way of cooperation between business representatives, scientists, mathematicians, machine learning specialists and IT engineers in the creation of artificial intelligence systems.

In other words, it is a way to turn machine learning methods and technologies into a useful tool for solving business problems. 

It must be understood that the chain of productivity begins long before the development of the model. Its first step is to define the business objective, the hypothesis about the value that can be extracted from the data, and the business idea to apply it. 

The very concept of MLOps arose as an analogy to the concept of DevOps in relation to models and machine learning technologies. DevOps is a software development approach that allows you to increase the speed of individual changes while maintaining flexibility and reliability using a number of approaches, including continuous development, separation of functions into a number of independent microservices, automated testing and deployment of individual changes, global health monitoring, a rapid response system for detected failures, etc. 

DevOps defined the software life cycle, and the idea came up in the software community to apply the same technique to big data. DataOps is an attempt to adapt and expand the methodology, taking into account the peculiarities of storing, transmitting and processing large amounts of data in a variety of and interacting platforms.
  
With the advent of a certain critical mass of machine learning models embedded in the business processes of enterprises, a strong similarity between the life cycle of mathematical models of machine learning and the software life cycle was noticed. The only difference is that the model algorithms are created using machine learning tools and methods. Therefore, the idea naturally arose to apply and adapt already known approaches to software development for machine learning models. Thus, the following key stages can be distinguished in the life cycle of machine learning models:

  • defining a business idea;
  • model training;
  • testing and implementation of the model in the business process;
  • model operation.

When during operation it becomes necessary to change or retrain the model on new data, the cycle starts anew - the model is finalized, tested, and a new version is deployed.

Retreat. Why re-educate and not re-educate? The term “model retraining” has a double interpretation: among specialists it means a defect in the model, when the model predicts well, actually repeats the predicted parameter on the training set, but works much worse on the external data set. Naturally, such a model is a defect, since this defect does not allow its application.

In this life cycle, it seems logical to use DevOps tools: automated testing, deployment and monitoring, registration of model calculation as separate microservices. But there are a number of features that prevent the direct use of these tools without additional ML-tying.

MLOps: DevOps in the Machine Learning world

How to make models work and make a profit

As an example, in which we will demonstrate the application of the MLOps approach, we will take the now classic task of robotizing the support chat for a banking (or any other) product. A typical chat support business process is as follows: a customer enters a question in a chat and receives a response from a specialist within a predefined dialog tree. The task of automating such a chat is usually solved using expertly defined sets of rules that are very laborious to develop and maintain. The efficiency of such automation, depending on the level of complexity of the task, can be 20-30%. Naturally, the idea arises that it is more beneficial to implement an artificial intelligence module - a model developed using machine learning that:

  • able to process more requests without the participation of an operator (depending on the topic, in some cases, the efficiency can reach 70-80%);
  • better adapts to non-standard wording in the dialogue - is able to determine the intent, the real desire of the user for an unclearly formulated request;
  • is able to determine when the answer of the model is adequate, and when there are doubts as to the “consciousness” of this answer and it is necessary to ask an additional clarifying question or switch to the operator;
  • can be retrained automatically (instead of a group of developers constantly adapting and correcting response scenarios, the model is retrained by a Data Scientist using the appropriate machine learning libraries). 

MLOps: DevOps in the Machine Learning world

How to make such an advanced model work? 

As in solving any other task, before developing such a module, it is necessary to define a business process and formally describe a specific task that we will solve using the machine learning method. At this point, the process of operationalization, denoted by the abbreviation Ops, begins. 

The next step is that the data scientist, in collaboration with the data engineer, checks the availability and sufficiency of the data and the business hypothesis about the workability of the business idea, developing a prototype of the model and checking its actual effectiveness. Only after confirmation by the business can the transition from developing the model to embedding it into systems that perform a specific business process begin. End-to-end implementation planning, a deep understanding at each stage of how the model will be used and what economic effect it will bring, is a fundamental moment in the processes of introducing MLOps approaches into the company's technological landscape.

With the development of AI technologies, the number and variety of tasks that can be solved with the help of machine learning are increasing like an avalanche. Each such business process is saving the company by automating the labor of employees of mass positions (call center, checking and sorting documents, etc.), it is expanding the client base by adding new attractive and convenient functions, saving money due to optimal their use and redistribution of resources and much more. Ultimately, any process is focused on creating value and, as a result, must bring a certain economic effect. Here it is very important to clearly articulate the business idea and calculate the expected profit from the implementation of the model in the overall structure of the company's value creation. There are situations when the implementation of the model does not justify itself, and the time spent by machine learning specialists is much more expensive than the workplace of the operator performing this task. That is why it is necessary to try to identify such cases in the early stages of creating AI systems.

Consequently, the models begin to bring profit only when the business task was correctly formulated in the MLOps process, priorities were set, and the process of introducing the model into the system was formulated at the early stages of development.

New process - new challenges

An exhaustive answer to the fundamental business question of how applicable ML models are to solving problems, the general question of trust in AI is one of the key challenges in the development and implementation of MLOps approaches. Initially, businesses are skeptical about the introduction of machine learning into processes - it is difficult to rely on models in places where people usually worked in the past. For business, the programs appear to be a “black box”, the relevance of the answers of which still needs to be proven. In addition, in banking, in the business of telecom operators and others, there are strict requirements of state regulators. All systems and algorithms that are implemented in banking processes are subject to audit. To solve this problem, to prove to business and regulators the validity and correctness of artificial intelligence answers, monitoring tools are being introduced along with the model. In addition, there is an independent validation procedure, mandatory for regulatory models, which meets the requirements of the Central Bank. An independent expert group audits the results obtained by the model, taking into account the input data.

The second challenge is the assessment and consideration of model risks when implementing a machine learning model. Even if a person cannot answer the question with absolute certainty whether that dress was white or blue, then artificial intelligence also has the right to make a mistake. It is also worth considering that data can change over time, and models need to be retrained in order to produce a sufficiently accurate result. In order for the business process not to suffer, it is necessary to manage model risks and monitor the performance of the model, regularly retraining it on new data.

MLOps: DevOps in the Machine Learning world

But after the first stage of mistrust, the opposite effect begins to appear. The more models are successfully introduced into processes, the more the business has a growing appetite for the use of artificial intelligence - there are new and new tasks that can be solved using machine learning methods. Each task launches a whole process that requires certain competencies:

  • data engineers prepare and process data;
  • data scientists apply machine learning tools and develop a model;
  • IT implement the model into the system;
  • The ML engineer determines how to correctly integrate this model into the process, which IT tools to use depending on the requirements for the mode of application of the model, taking into account the flow of requests, response time, etc. 
  • An ML architect designs how a software product can be physically implemented in an industrial system.

The whole cycle requires a large number of highly qualified specialists. At a certain point of development and the degree of penetration of ML models into business processes, it turns out that linearly scaling the number of specialists in proportion to the growth in the number of tasks becomes expensive and inefficient. Therefore, the question arises of automating the MLOps process - defining several standard classes of machine learning problems, developing typical data processing pipelines and retraining models. In the ideal picture, to solve such problems, professionals are required who are equally well-versed in competencies at the junction of BigData, Data Science, DevOps and IT. Therefore, the biggest problem in the Data Science industry and the biggest challenge in organizing MLOps processes is the lack of such competence in the existing training market. Specialists who meet such requirements are currently rare in the labor market and are worth their weight in gold.

To the question of competencies

In theory, all MLOps tasks can be solved with classic DevOps tools and without resorting to a specialized role model extension. Then, as we noted above, a data scientist should be not only a mathematician and data analyst, but also a guru of the entire pipeline - architecture development, programming models in several languages ​​depending on the architecture, preparing a data mart and deployment falls on his shoulders. the application itself. However, the creation of a technological binding implemented in the end-to-end MLOps process takes up to 80% of labor costs, which means that a qualified mathematician, which is a high-quality Data Scientist, will devote only 20% of the time to his specialty. Therefore, the differentiation of the roles of specialists implementing the process of implementing machine learning models becomes vital. 

How detailed the roles should be delineated depends on the size of the enterprise. It's one thing when a startup has one specialist, a worker in the reserve of power engineers, an engineer, an architect, and DevOps in his own right. It is a completely different matter when, in a large enterprise, all model development processes are concentrated on a few high-level Data Scientists, while a programmer or database specialist - a more common and less expensive competency in the labor market - can take on most of the routine tasks.

Thus, where the boundary lies in the choice of specialists to ensure the MLOps process and how the process of operationalization of the developed models is organized directly affects the speed and quality of the developed models, the productivity of the team and the microclimate in it.

What has already been done by our team

We recently started building the competency framework and MLOps processes. But already now, our projects on managing the life cycle of models and on using models as a service are at the MVP testing stage.

We also determined the optimal structure of competencies for a large enterprise and the organizational structure of interaction between all participants in the process. Agile teams were organized to solve problems for the entire spectrum of business customers, and a process of interaction with project teams was established to create platforms and infrastructure, which is the foundation of the MLOps building under construction.

Questions for the future

MLOps is a growing area that is experiencing a lack of competencies and will gain momentum in the future. In the meantime, it's best to build on the developments and practices of DevOps. The main goal of MLOps is to use ML models more effectively to solve business problems. But this raises many questions:

  • How to reduce the time to launch models in production?
  • How to reduce bureaucratic friction between teams of different competencies and increase the focus on cooperation?
  • How to track models, manage versions and organize effective monitoring?
  • How to create a truly circular lifecycle for a modern ML model?
  • How to standardize the machine learning process?

The answers to these questions will largely determine how quickly MLOps will reveal its full potential.

Source: habr.com

Add a comment