Load optimization on a Highload project with ElasticSearch

Hey Habr! My name is Maxim Vasiliev, I work as an analyst and project manager at FINCH. Today I would like to tell you how, using ElasticSearch, we were able to process 15 million requests in 6 minutes and optimize the daily loads on the site of one of our clients. Unfortunately, we will have to do without names, since we have an NDA, we hope that the content of the article will not suffer from this. Let's go.

How the project works

On our backend, we create services that ensure the performance of our client's websites and mobile application. The general structure can be seen in the diagram:

Load optimization on a Highload project with ElasticSearch

In the process of work, we process a large number of transactions: purchases, payments, operations with user balances, for which we store a lot of logs, as well as import and export this data to external systems.

There are also reverse processes when we receive data from the client and transfer it to the user. In addition, there are still processes for working with payments and bonus programs.

Brief background

Initially, we used PostgreSQL as the only data store. Its standard advantages for a DBMS: the presence of transactions, a developed data sampling language, a wide range of tools for integration; combined with good performance satisfied our needs for quite a long time.

We stored absolutely all data in Postgres: from transactions to news. But the number of users grew, and with it the number of requests.

For understanding, the annual number of sessions in 2017 only on the desktop site is 131 million. In 2018 - 125 million. 2019 again 130 million. Add another 100-200 million from the mobile version of the site and the mobile application, and you will get a huge number of requests.

With the growth of the project, Postgres stopped coping with the load, we did not have time - a large number of various queries appeared, for which we could not create a sufficient number of indexes.

We understood that there was a need for other data stores that would provide our needs and take the load off PostgreSQL. Elasticsearch and MongoDB were considered as possible options. The latter lost on the following points:

  1. Slow indexing speed as the amount of data in indexes grows. With Elastic, the speed does not depend on the amount of data.
  2. No full text search

So we chose Elastic for ourselves and prepared for the transition.

Transition to Elastic

1. We started the transition from the point of sale search service. Our client has a total of about 70 points of sale, and this requires several types of searches on the site and in the application:

  • Text search by city name
  • Geosearch within a given radius from some point. For example, if the user wants to see which points of sale are closest to his home.
  • Search by a given square - the user draws a square on the map, and all points in this radius are shown to him.
  • Search by additional filters. Points of sale differ from each other in assortment

If we talk about the organization, then in Postgres we have a data source for both the map and the news, and in Elastic Snapshots are taken from the original data. The fact is that initially Postgres could not cope with the search by all criteria. Not only were there many indexes, they could also overlap, so the Postgres scheduler got lost and did not understand which index to use.

2. Next in line was the news section. Publications appear on the site every day so that the user does not get lost in the flow of information, the data must be sorted before issuing. This is what search is for: you can search the site by text match, and at the same time connect additional filters, since they are also made through Elastic.

3. Then we moved the transaction processing. Users can buy a certain product on the site and participate in a prize draw. After such purchases, we process a large amount of data, especially on weekends and holidays. For comparison, if on ordinary days the number of purchases is somewhere between 1,5-2 million, then on holidays the figure can reach 53 million.

At the same time, the data must be processed in the shortest possible time - users do not like to wait for the result for several days. There is no way to achieve such deadlines through Postgres - we often received locks, and while we were processing all requests, users could not check whether they received prizes or not. This is not very pleasant for business, so we moved the processing to Elasticsearch.

Periodicity

Now updates are configured event-based, according to the following conditions:

  1. Sales points. As soon as we receive data from an external source, we immediately start the update.
  2. News. As soon as any news is edited on the site, it is automatically sent to Elastic.

Here again it is worth mentioning the advantages of Elastic. In Postgres, when sending a request, you have to wait until it honestly processes all the records. You can send 10 records to Elastic and start working immediately, without waiting for the records to be distributed across all Shards. Of course, some Shard or Replica may not see the data right away, but everything will be available very soon.

Integration methods

There are 2 ways to integrate with Elastic:

  1. Through a native client over TCP. The native driver is gradually dying out: it is no longer supported, it has a very inconvenient syntax. Therefore, we practically do not use it and try to completely abandon it.
  2. Through an HTTP interface that can use both JSON requests and Lucene syntax. The last one is a text engine that uses Elastic. In this version, we get the ability to Batch through JSON requests over HTTP. This is the option we are trying to use.

Thanks to the HTTP interface, we can use libraries that provide an asynchronous implementation of the HTTP client. We can take advantage of Batch and the asynchronous API, which results in high performance, which helped a lot in the days of the big promotion (more on that below)

Some numbers for comparison:

  • Saving Postgres bounty users in 20 threads without grouping: 460713 records in 42 seconds
  • Elastic + reactive client for 10 threads + batch for 1000 elements: 596749 records in 11 seconds
  • Elastic + reactive client for 10 threads + batch for 1000 elements: 23801684 entries in 4 minutes

Now we have written an HTTP request manager that builds JSON as Batch/not Batch and sends it via any HTTP client, regardless of the library. You can also choose to send requests synchronously or asynchronously.

In some integrations, we still use the official transport client, but this is just a matter of the next refactoring. In this case, a custom client built on the basis of Spring WebClient is used for processing.

Load optimization on a Highload project with ElasticSearch

big promotion

Once a year, the project hosts a big promotion for users - this is the same Highload, since at this time we work with tens of millions of users at the same time.

Usually peaks of loads occur during the holidays, but this promotion is on a completely different level. The year before last, on the day of the promotion, we sold 27 units of goods. The data was processed for more than half an hour, which caused inconvenience for users. Users received prizes for participation, but it became clear that the process needed to be accelerated.

At the beginning of 2019, we decided that we needed ElasticSearch. For a whole year, we organized the processing of the received data in Elastic and their issuance in the api of the mobile application and website. As a result, the next year during the campaign, we processed 15 entries in 131 minutes.

Since we have a lot of people who want to buy goods and participate in the drawing of prizes in promotions, this is a temporary measure. Now we are sending up-to-date information to Elastic, but in the future we plan to transfer archived information for the past months to Postgres as a permanent storage. In order not to clog the Elastic index, which also has its limitations.

Conclusion / conclusions

At the moment, we have transferred all the services that we wanted to Elastic and have paused on this for now. Now we are building an index in Elastic on top of the main persistent storage in Postgres, which takes over the user load.

In the future, we plan to transfer services if we understand that the data request becomes too diverse and is searched for an unlimited number of columns. This is no longer a task for Postgres.

If we need full-text search in functionality or if we have a lot of different search criteria, then we already know that this needs to be translated into Elastic.

⌘⌘⌘

Thanks for reading. If your company also uses ElasticSearch and has its own implementation cases, then tell us. It will be interesting to know how others are πŸ™‚

Source: habr.com

Add a comment