Hey Habr!

In light of current events due to the coronavirus, a number of Internet services began to receive an increased load. For example, one of the retail chains in the UK just stopped the site with online ordersbecause there was not enough capacity. And it is far from always possible to speed up the server by simply adding more powerful equipment, however, customer requests must be processed (or they will go to competitors).

In this article, I will briefly talk about popular practices that will allow you to make a fast and fault-tolerant service. However, from the possible development schemes, I selected only those that are now easy to use. For each item, you either have ready-made libraries, or you have the opportunity to solve the problem using a cloud platform.

Horizontal scaling

The simplest and most well-known item. Conventionally, the two most common load distribution schemes are horizontal and vertical scaling. In the first case you allow services to run in parallel, thereby distributing the load between them. In the second you order more powerful servers or optimize the code.

For example, I will take an abstract cloud storage of files, that is, some analogue of OwnCloud, OneDrive, and so on.

A standard picture of such a circuit is below, but it only demonstrates the complexity of the system. After all, we need to somehow synchronize services. What happens if a user saves a file from a tablet and then wants to view it from a phone?

The difference between the approaches: in vertical scaling, we are ready to increase the capacity of nodes, and in horizontal scaling, we are ready to add new nodes to distribute the load.

CQRS

Command Query Responsibility Segregation a rather important pattern, as it allows different clients not only to connect to different services, but also to receive the same event streams. Its bonuses are not so obvious for a simple application, but it is extremely important (and simple) for a loaded service. Its essence: incoming and outgoing data streams should not intersect. That is, you cannot send a request and expect a response, instead you send a request to service A, but receive a response in service B.

The first bonus of this approach is the ability to break the connection (in the broad sense of the word) in the process of executing a long query. For example, let's take a more or less standard sequence:

The client sent a request to the server.
The server started a long processing.
The server responded to the client with a result.

Imagine that in point 2 there was a connection break (or the network reconnected, or the user went to another page, breaking the connection). In this case, it will be difficult for the server to send a response to the user with information about what exactly was processed. Using CQRS, the sequence will be slightly different:

The client has subscribed to updates.
The client sent a request to the server.
The server replied "request accepted".
The server responded with a result through the channel from point "1".

As you can see, the scheme is a little more complicated. Moreover, the intuitive request-response approach is missing here. However, as you can see, breaking the connection during the processing of the request will not lead to an error. Moreover, if in fact the user is connected to the service from several devices (for example, from a mobile phone and from a tablet), you can make sure that the response comes to both devices.

Interestingly, the code for processing incoming messages becomes the same (not 100%) for events that were influenced by the client itself, and for other events, including those from other clients.

However, in reality, we get additional bonuses due to the fact that a unidirectional stream can be processed in a functional style (using RX and analogues). And this is already a serious plus, since in fact the application can be made completely reactive, and even with the use of a functional approach. For fat programs, this can significantly save development and support resources.

If we combine this approach with horizontal scaling, then as a bonus we get the ability to send requests to one server and receive responses from another. Thus, the client can choose the service that is convenient for him, and the system inside will still be able to process events correctly.

Event Sourcing

As you know, one of the main features of a distributed system is the absence of a common time, a common critical section. For one process, you can make a synchronization (on the same mutexes), inside which you are sure that no one else is executing this code. However, for a distributed system, this is dangerous, since this will require overhead costs, and all the charm of scaling will be killed - all the same, all components will wait for one.

From here we get an important fact - a fast distributed system cannot be synchronized, because then we will reduce performance. On the other hand, often we need a certain consistency of components. And for this you can use the approach with eventual consistency, where it is guaranteed that in the absence of data changes, some time after the last update (“eventually”), all queries will return the last updated value.

It is important to understand that for classic databases it is quite often used strict consistency, where each node has the same information (this is often achieved in the case when the transaction is considered established only after the response of the second server). There are some loosenings here due to isolation levels, but the general essence remains the same - you can live in a fully consistent world.

However, back to the original problem. If part of the system can be built with eventual consistency, then the following diagram can be constructed.

Important features of this approach:

Each incoming request is put into one queue.
While processing a request, the service may also place tasks on other queues.
Each incoming event has an ID (which is required for deduplication).
The queue ideologically works according to the "append only" scheme. You cannot delete elements from it or rearrange them.
The queue works according to the FIFO scheme (sorry for the tautology). If you need to do parallel execution, then you should shift objects to different queues in one of the stages.

Let me remind you that we are considering the case of online file storage. In this case, the system will look something like this:

It is important that the services in the diagram do not necessarily mean a separate server. Even the process may be the same. Another thing is important: ideologically, these things are separated in such a way that horizontal scaling can be easily applied.

And for two users, the scheme will look like this (services intended for different users are indicated in different colors):

Bonuses from such a combination:

Information processing services are separated. The queues are also separated. If we need to increase the throughput of the system, then we just need to run more services on more servers.
When we receive information from a user, we do not have to wait until the data is fully saved. On the contrary, it is enough for us to answer “ok”, and then gradually start working. At the same time, the queue smooths out peaks, since adding a new object is fast, and the user does not have to wait for a full pass through the entire cycle.
For example, I added a deduplication service that tries to merge identical files. If it works for a long time in 1% of cases, the client will hardly notice it (see above), which is a big plus, since we no longer require XNUMX% speed and reliability.

However, the disadvantages are immediately visible:

Our system has lost strict consistency. This means that if, for example, you subscribe to different services, then theoretically you can get a different state (since one of the services may not have time to receive a notification from the internal queue). As another consequence, the system now has no common time. That is, it is impossible, for example, to sort all events simply by the time of arrival, since the clocks between servers may not be synchronous (moreover, the same time on two servers is a utopia).
No events can now be simply rolled back (as one could do with a database). Instead, a new event needs to be added − compensation event, which will change the last state to the required one. As an example from a similar area: without rewriting the history (which is bad in some cases), you cannot rollback a commit in git, but you can do a special rollback commit, which will essentially just return the old state. However, both the erroneous commit and the rollback will remain in the history.
The data schema can change from release to release, but old events can no longer be updated to the new standard (since events, in principle, cannot be changed).

As you can see, Event Sourcing gets along well with CQRS. Moreover, it is already difficult to implement a system with efficient and convenient queues, but without data flow separation, because you will have to add synchronization points that will neutralize the entire positive effect of queues. Applying both approaches at the same time, it is necessary to slightly adjust the code of the program. In our case, when sending a file to the server, only “ok” comes in the response, which only means that “the operation of adding a file is saved”. Formally, this does not mean that the data is already available on other devices (for example, the deduplication service can rebuild the index). However, after some time, the client will receive a notification in the style of “file X saved”.

As a result:

The number of file upload statuses is increasing: instead of the classic “file sent”, we get two: “file added to queue on the server” and “file saved in storage”. The latter means that other devices may already start receiving the file (adjusted for the fact that the queues operate at different speeds).
Due to the fact that the submission information now comes through different channels, we need to come up with solutions to get the file processing status. As a consequence of this: unlike the classic request-response, the client can be restarted while the file is being processed, but the status of this processing itself will be correct. And this item works, in fact, out of the box. As a consequence: we are now more tolerant of failures.

Sharding

As described above, there is no strong consistency in event sourcing systems. This means that we can use several storages without any synchronization between them. Approaching our task, we can:

Separate files by type. For example, pictures/videos can be decoded and a more efficient format can be selected.
Separate accounts by country. Due to many laws, this may be required, but this architectural scheme provides such a possibility automatically.

If you want to transfer data from one storage to another, then standard tools are indispensable here. Unfortunately, in this case, you need to stop the queue, make the migration, and then run it. In the general case, data cannot be transferred on the fly, however, if the event queue is stored completely, and you have snapshots of the previous storage states, then we can replay the events as follows:

In the Event Source, each event has its own identifier (ideally, not decreasing). This means that we can add a field to the storage - the id of the last processed element.
We duplicate the queue so that all events can be processed for several independent storages (the first is the one in which data is already stored, and the second is new, but empty for now). The second turn, naturally, is not processed yet.
We start the second queue (that is, we start replaying events).
When the new queue is relatively empty (that is, the average time difference between adding an element and extracting it is acceptable), you can start switching readers to the new store.

As you can see, we did not have, and still do not have, strict consistency in the system. There is only eventual constistency, that is, a guarantee that events are processed in the same order (however, possibly with a different delay). And, using this, we can relatively easily transfer data without stopping the system to the other side of the globe.

Thus, continuing our example of online storage for files, such an architecture already gives us a number of bonuses:

We can move objects closer to users, and in a dynamic way. Thus, the quality of service can be improved.
We may store some data within companies. For example, Enterprise users often require their data to be stored in controlled data centers (to avoid data leaks). With sharding, we can easily support this. And the task is even easier if the customer has a compatible cloud (for example, Azure self-hosted).
And most importantly, we can not do it. Indeed, for a start, we would be quite satisfied with one storage for all accounts (in order to quickly start working). And the key feature of this system is that although it is extensible, at the initial stage it is quite simple. You just don't have to immediately write code that works with a million separate independent queues, etc. If necessary, this can be done in the future.

Static Content Hosting

This point may seem quite obvious, but it is still necessary for a more or less standard loaded application. Its essence is simple: all static content is distributed not from the same server where the application is located, but from special ones allocated specifically for this business. As a result, these operations are faster (the conditional nginx serves files faster and less expensive than the Java server). Plus CDN architecture (Content Delivery Network) allows us to place our files closer to end users, which has a positive effect on the convenience of working with the service.

The simplest and most standard example of static content is a set of scripts and images for a website. Everything is simple with them - they are known in advance, then the archive is uploaded to CDN servers, from where it is distributed to end users.

However, in reality, for static content, you can apply an approach somewhat similar to the lambda architecture. Let's return to our task (online file storage), in which we need to distribute files to users. The simplest solution in the forehead is to make a service that, for each user request, does all the necessary checks (authorization, etc.), and then downloads the file directly from our storage. The main disadvantage of this approach is that static content (and a file with a certain revision, in fact, is static content) is distributed by the same server that contains the business logic. Instead, you can make the following scheme:

The server issues a download URL. It can be of the form file_id + key, where key is a mini-digital signature that gives the right to access the resource for the next day.
The distribution of the file is handled by a simple nginx with the following options:
- Content caching. Since this service can be located on a separate server, we left ourselves a reserve for the future with the ability to store all the latest downloaded files on disk.
- Checking the key at the time of connection creation
Optional: streaming content processing. For example, if we compress all files in the service, then we can do the unzipping directly in this module. As a consequence: IO operations are done where they belong. The archiver in Java will easily allocate a lot of extra memory, however, rewriting the service with business logic to conditional Rust / C ++ may also be inefficient. In our case, different processes (or even services) are used, and therefore it is possible to quite effectively separate business logic and IO operations.

Such a scheme is not very similar to the distribution of static content (since we do not unload the entire package of statics somewhere), but in reality, this approach is precisely about the distribution of immutable data. Moreover, this scheme can be generalized to other cases where the content is not just static, but can be presented as a set of immutable and non-removable blocks (although they can be added).

As another example (for consolidation): if you have worked with Jenkins / TeamCity, then you know that both solutions are written in Java. Both are Java processes that handle both build orchestration and content management. In particular, they both have tasks like "transfer a file/folder from the server". As an example: the issuance of artifacts, the transfer of source code (when the agent does not download the code directly from the repository, but the server does it for him), access to logs. All these tasks differ in the load on IO. That is, it turns out that the server responsible for complex business logic, at the same time, must be able to effectively push large data streams through itself. And what is most interesting, such an operation can be delegated to the same nginx according to exactly the same scheme (except that the data key should be added to the request).

However, if we return to our system, we get a similar scheme:

As you can see, the system is radically more complicated. Now it's not just a mini-process that stores files locally. Now you need not the easiest support, API versioning, etc. Therefore, after all the diagrams are drawn, it is best to evaluate in detail whether the extensibility is worth the cost. However, if you want to be able to expand the system (including to work with even more users), then you will have to go for similar solutions. But, as a result, the architectural system is ready to increase the load (almost every component can be cloned for horizontal scaling). The system can be updated without stopping it (simply, some operations will slow down slightly).

As I said at the very beginning, now a number of Internet services have begun to receive an increased load. And some of them just stopped working properly. In fact, the systems failed at the exact moment when the business should be making money. That is, instead of delaying delivery, instead of suggesting to customers "plan for delivery in the coming months," the system simply said, "go to the competition." Actually, this is the price of low productivity: losses will occur exactly when the profit would be the highest.

Conclusion

All these approaches were known before. The same VK has long been using the idea of Static Content Hosting to display pictures. A bunch of online games use the Sharding scheme to separate players by region or to separate game locations (if the world itself is one). Event Sourcing approach is actively used in e-mail. Most of the applications of traders where data is constantly coming in are actually built on the CQRS approach in order to be able to filter the received data. Well, horizontal scaling has been used in many services for a long time.

However, most importantly, all these patterns have become very easy to apply in modern applications (if they are appropriate, of course). Clouds offer Sharding and horizontal scaling at once, which is much easier than ordering different dedicated servers in different data centers on your own. CQRS has become much easier, if only because of the development of libraries such as RX. About 10 years ago, a rare web site could support this. Event Sourcing is also incredibly easy to set up thanks to ready-made containers with Apache Kafka. 10 years ago this would have been an innovation, now it is commonplace. Similarly, with Static Content Hosting: due to more convenient technologies (including because there is detailed documentation and a large database of answers), this approach has become even simpler.

As a result, the implementation of a number of rather complex architectural patterns has now become much easier, which means it is better to take a closer look at it in advance. If in a ten-year-old application one of the solutions above was abandoned due to the high cost of implementation and operation, now, in a new application, or after refactoring, you can make a service that will already be architecturally both extensible (in terms of performance) and ready to new requests from customers (for example, to localize personal data).

And most importantly, please don't use these approaches if you have a simple application. Yes, they are beautiful and interesting, but for a site with a peak visit of 100 people, you can often get by with a classic monolith (at least from the outside, inside everything can be divided into modules, etc.).

Source: habr.com

Convenient architectural patterns