General overview of the service architecture for evaluating appearance based on neural networks

General overview of the service architecture for evaluating appearance based on neural networks

Entry

Hi!

In this article, I will share my experience of building a microservice architecture for a project using neural networks.

We will talk about the requirements for the architecture, look at various structural diagrams, analyze each of the components of the finished architecture, and evaluate the technical metrics of the solution.

Enjoy reading!

A few words about the problem and its solution

The main idea is to evaluate the attractiveness of a person on a ten-point scale based on a photo.

In this article, we will depart from the description of both the neural networks used and the process of data preparation and training. However, in one of the following publications, we will definitely return to the analysis of the evaluation pipeline at an in-depth level.

Now we will go through the evaluation pipeline at the top level, and the emphasis will be on the interaction of microservices in the context of the overall project architecture. 

When working on the attractiveness assessment pipeline, the task was decomposed into the following components:

  1. Selecting faces in a photo
  2. Evaluation of each person
  3. Result render

The first is solved by the forces of a pre-trained MTCNN. For the second, a convolutional neural network was trained on PyTorch, as a backbone was used ResNet34 - from the balance "quality / speed of inference on the CPU"

General overview of the service architecture for evaluating appearance based on neural networks

Functional evaluation pipeline diagram

Analysis of project architecture requirements

In the life cycle ML project stages of work on the architecture and automation of the deployment of the model, often one of the most time-consuming and resource-consuming.

General overview of the service architecture for evaluating appearance based on neural networks

Life cycle of an ML project

This project is no exception - it was decided to wrap the assessment pipeline into an online service, for this it was necessary to immerse yourself in architecture. The following basic requirements were identified:

  1. Unified log storage - all services should write logs in one place, they should be convenient to analyze
  2. Possibility of horizontal scaling of the assessment service - as the most likely Bottleneck
  3. The same amount of processor resources should be allocated to evaluate each image - in order to avoid spikes in the distribution of inference time
  4. Fast (re)deployment of both specific services and the stack as a whole
  5. Possibility, if necessary, to use common objects in different services

Architecture

After analyzing the requirements, it became obvious that the microservice architecture fits almost perfectly.

In order to get rid of unnecessary headaches, the Telegram API was chosen as the frontend.

First, let's look at the structural diagram of the finished architecture, then proceed to the description of each of the components, and also formalize the process of successful image processing.

General overview of the service architecture for evaluating appearance based on neural networks

Structural diagram of the finished architecture

Let's talk in more detail about each of the components of the diagram, denote them as Single Responsibility in the process of evaluating the image.

Microservice "attrai-telegram-bot"

This microservice encapsulates all interactions with the Telegram API. There are 2 main scenarios - working with a custom image and working with the result of the evaluation pipeline. Let's take a look at both scenarios.

When receiving a custom message with an image:

  1. Filtering is performed, consisting of the following checks:
    • Having the optimal image size
    • The number of user images already in the queue
  2. When passing the primary filtering, the image is stored in the docker volume
  3. A task will be produced in the “to_estimate” queue, in which, among other things, the path to the image in our volume appears
  4. If the above steps are completed successfully, the user will receive a message with an estimated image processing time, which is calculated based on the number of tasks in the queue. In the event of an error, the user will be explicitly notified of this - by sending a message with information about what could have gone wrong.

Also, this microservice, as a celery worker, listens to the “after_estimate” queue, which is intended for tasks that have passed through the evaluation pipeline.

When receiving a new task from “after_estimate”:

  1. If the image is processed successfully, we send the result to the user; if not, we notify the user of an error.
  2. Deleting the image that is the result of the evaluation pipeline

Attrai-estimator evaluation microservice

This microservice is a celery worker and encapsulates everything related to the image evaluation pipeline. There is only one algorithm of work here - we will analyze it.

When receiving a new task from “to_estimate”:

  1. Run the image through the evaluation pipeline:
    1. Loading an image into memory
    2. We bring the image to the desired size
    3. Find all faces (MTCNN)
    4. We evaluate all the faces (we wrap the faces found in the last paragraph in a ResNet34 batch and inferensim)
    5. Rendering the final image
      1. Draw bounding boxes
      2. Drawing the grades
  2. Removing the custom (original) image
  3. Saving the output from the evaluation pipeline
  4. We put the task in the “after_estimate” queue, which is listened to by the “attrai-telegram-bot” microservice parsed above

Graylog (+ mongoDB + Elasticsearch)

graylog is a solution for centralized log management. In this project, it was used for its intended purpose.

The choice fell on him, and not on the usual from ELK stack, due to the convenience of working with it from under Python. All you need to do for logging to Graylog is to add the GELFTCPHandler from the package graypy to the rest of the root logger handlers of our python microservice.

As someone who has only worked with the ELK stack before, I have generally had a positive experience while working with Graylog. The only thing that depresses is the superiority in features of Kibana over the Graylog web interface.

Rabbit MQ

Rabbit MQ is a message broker based on the AMQP protocol.

In this project, it was used as the most stable and time-tested broker for Celery and worked in durable mode.

Redis

Redis is a NoSQL DBMS that works with key-value data structures

Sometimes it becomes necessary to use common objects in different python microservices that implement any data structures.

For example, Redis stores a hashmap of the form “telegram_user_id => number of active tasks in the queue”, which allows you to limit the number of requests from one user to a certain value and, thereby, prevent DoS attacks.

We formalize the process of successful image processing

  1. User sends an image to Telegram bot
  2. "attrai-telegram-bot" receives a message from the Telegram API and parses it
  3. The task with the image is added to the asynchronous queue "to_estimate"
  4. The user receives a message with the estimated evaluation time
  5. "attrai-estimator" takes a task from the "to_estimate" queue, runs the estimates through the pipeline and produces the task in the "after_estimate" queue
  6. "attrai-telegram-bot" listening to the "after_estimate" queue sends the result to the user

DevOps

Finally, after reviewing the architecture, we can move on to the equally interesting part - DevOps

Docker swarm

 

General overview of the service architecture for evaluating appearance based on neural networks

Docker swarm  is a clustering system, the functionality of which is implemented inside the Docker Engine and is available out of the box.

With the help of a "swarm", all the nodes of our cluster can be divided into 2 types - worker and manager. On machines of the first type, groups of containers (stacks) are deployed, machines of the second type are responsible for scaling, balancing, and other cool features. Managers are also workers by default.

General overview of the service architecture for evaluating appearance based on neural networks

Cluster with one leader manager and three workers

The minimum possible cluster size is 1 node, the only machine will simultaneously act as a leader manager and worker. Based on the size of the project and the minimum requirements for fault tolerance, it was decided to use this approach.

Looking ahead, I’ll say that since the first production delivery, which was in mid-June, there have been no problems associated with this cluster organization (but this does not mean that such an organization is at least somewhat acceptable in any medium-large projects, which are subject to fault tolerance requirements).

Docker Stack

In the “swarm” mode, the deployment of stacks (sets of docker services) is responsible docker stack

It supports docker-compose configs, allowing you to optionally use deploy options.  

For example, using these parameters, resources were limited for each of the evaluation microservice instances (we allocate N cores for N instances, in the microservice itself we limit the number of cores used by PyTorch to one)

attrai_estimator:
  image: 'erqups/attrai_estimator:1.2'
  deploy:
    replicas: 4
    resources:
      limits:
        cpus: '4'
    restart_policy:
      condition: on-failure
      …

It is important to note that Redis, RabbitMQ and Graylog are stateful services and it will not work to scale them as simply as "attrai-estimator".

Foreshadowing the question - why not Kubernetes?

It seems that the use of Kubernetes in small and medium-sized projects is an overhead, all the necessary functionality can be obtained from Docker Swarm, which is quite user friendly for a container orchestrator, and also has a low entry threshold.

Infrastructure

All this was deployed on VDS with the following characteristics:

  • CPU: 4 cores Intel® Xeon® Gold 5120 CPU @ 2.20GHz
  • RAM: 8 GB
  • SSD: 160GB

After local load testing, it seemed that with a serious influx of users, this machine would be enough back to back.

But, immediately after the deployment, I posted a link to one of the most popular imageboards in the CIS (yes, the same one), after which people became interested and in a few hours the service successfully processed tens of thousands of images. At the same time, at peak times, CPU and RAM resources were not even used by half.

General overview of the service architecture for evaluating appearance based on neural networks
General overview of the service architecture for evaluating appearance based on neural networks

Some more graphics

Number of unique users and evaluation requests, since deployment, by day

General overview of the service architecture for evaluating appearance based on neural networks

Estimation Pipeline Inference Time Distribution

General overview of the service architecture for evaluating appearance based on neural networks

Conclusions

Summarizing, I can say that the architecture and approach to orchestration of containers fully justified themselves - even at peak moments there were no drops and subsidence in processing time. 

I think that small and medium-sized projects that use real-time inference of neural networks on the CPU in their process can successfully adopt the practices described in this article.

I will add that initially the article was longer, but in order not to post a longread, I decided to omit some points in this article - we will return to them in the following publications.

You can poke the bot in Telegram - @AttraiBot, it will work at least until the end of autumn 2020. Let me remind you - no user data is stored - neither the original images, nor the results of the evaluation pipeline - everything is demolished after processing.

Source: habr.com

Add a comment