How we made the core of Alfa-Bank's investment business based on Tarantool

How we made the core of Alfa-Bank's investment business based on Tarantool
Frame from the film "Our Secret Universe: The Hidden Life of the Cell"

The investment business is one of the most difficult areas in the banking world, because there are not only loans, loans and deposits, but also securities, currencies, commodities, derivatives and all sorts of complexities in the form of structural products.

Recently, we have seen an increase in the financial literacy of the population. More and more people are involved in trading in the securities markets. Individual investment accounts appeared not so long ago. They allow you to trade in the securities markets and either receive tax deductions or pay no taxes. And all the clients who come to us want to manage their portfolio and see real-time reporting. Moreover, most often this portfolio is multi-product, that is, people are clients of various business areas.

In addition, the needs of regulators, both Russian and foreign, are also growing.

To meet current needs and lay the foundation for future upgrades, we have developed a core investment business based on Tarantool.

Some statistics. The investment business of Alfa-Bank provides brokerage services for individuals and legal entities to provide an opportunity to trade on various securities markets, depository services for the storage of securities, trust management services for individuals with private and large capital, services for the issuance of securities for other companies. The investment business of Alfa-Bank is more than 3 quotes per second, which are downloaded from various trading platforms. More than 300 transactions are concluded on the markets during a working day on behalf of the bank or its clients. Up to 5 order executions per second occur on external and internal platforms. At the same time, all clients, both internal and external, want to see their positions in real time.

prehistory

Somewhere since the beginning of the 2000s, our areas of investment business have been developing independently: exchange trading, brokerage services, currency trading, over-the-counter trading in securities and various derivatives. As a result, we fell into the trap of functional wells. What it is? Each line of business has its own systems that duplicate each other's functions. Each system has its own data model, although they operate with the same concepts: deals, instruments, counterparties, quotes, and so on. And as each system evolved independently, a diverse zoo of technologies emerged.

In addition, the systems code base is already rather outdated, because some products originated in the mid-1990s. And in some areas it slowed down the development process, there were performance problems.

Requirements for a new solution

Business has realized that technological transformation is vital for further development. We were given the following tasks:

  1. Collect all business data in a single fast storage and in a single data model.
  2. We must not lose or change this information.
  3. It is necessary to version the data, because at any time the regulator can ask for statistics for past years.
  4. We must not just bring some new, fashionable DBMS, but create a platform for solving business problems.

In addition, our architects set their conditions:

  1. The new solution must be enterprise-class, that is, it must have already been tested in some large companies.
  2. The solution operation mode must be mission critical. This means that we must be present in several data centers at the same time and calmly survive the shutdown of one data center.
  3. The system must be horizontally scalable. The fact is that all of our current systems are only vertically scalable, and we are already hitting the ceiling due to the low growth in hardware power. Therefore, the moment has come when we need to have a horizontally scalable system in order to survive.
  4. Among other things, we were told that the solution should be cheap.

We went the standard way: we formulated the requirements and contacted the purchasing department. From there we received a list of companies that, in general, are ready to do this for us. We told everyone about the problem, and from six of them we received an assessment of the solutions.

We at the bank do not trust anyone's word, we like to test everything ourselves. Therefore, a prerequisite for our tender was the passage of load tests. We formulated test tasks for the load, and three out of six companies have already agreed to implement a prototype solution based on in-memory technologies at their own expense in order to test it.

I won’t tell you how we tested everything and how long it took, I’ll just summarize: the best performance in load tests was shown by a Tarantool-based solution prototype from the Mail.ru Group development team. We signed a contract and started development. Four people were from Mail.ru Group, and from Alfa-Bank there were three developers, three system analysts, a solution architect, a product owner and a Scrum master.

Next, I’ll talk about how our system grew, how it evolved, what we did and why.

Development

First of all, we asked ourselves how to get data from our current systems. We decided that HTTP is fine for us, because all current systems communicate with each other by sending XML or JSON over HTTP.

We use the HTTP server built into Tarantool, because we don't need to terminate SSL sessions, and its performance is enough for us.

As I already said, all our systems live in different data models, and at the input we need to bring the object to the model that we will describe in ourselves. A language was needed to transform the data. We chose imperative Lua. We run all the code for data transformation in the sandbox - this is a safe place, beyond which the running code does not go. To do this, we simply make a loadstring of the desired code, creating an environment with functions that cannot block or drop anything.

How we made the core of Alfa-Bank's investment business based on Tarantool
After the transformation, the data must be checked for compliance with the model that we are creating. We discussed for a long time what the model should be, what language to use to describe it. We settled on Apache Avro because the language is simple and it has support from Tarantool. New versions of the model and user code can be sent into operation several times a day, with or without load, at any time of the day, and very quickly adapt to changes.

How we made the core of Alfa-Bank's investment business based on Tarantool
After verification, the data must be saved. We do this using vshard (we have geo-spaced replicas of shards).

How we made the core of Alfa-Bank's investment business based on Tarantool
At the same time, the specifics are such that most of the systems that send us data do not care if we received them or not. Therefore, from the very beginning, we implemented a repair queue. What it is? If for some reason the object did not pass the data transformation or verification, then we still confirm the receipt, but at the same time we save the object to the repair queue. It is consistent, located in the main storage with business data. We immediately wrote an administrator interface for it, various metrics and alerts. As a result, we do not lose data. Even if something has changed in the source, if the data model has changed, we will immediately detect it and can adapt.

How we made the core of Alfa-Bank's investment business based on Tarantool
Now you need to learn how to retrieve the saved data. We carefully analyzed our systems and saw that on the classic stack from Java and Oracle, there is always some kind of ORM that converts data from a relational form to an object one. So why not immediately give objects to systems in the form of a graph? Therefore, we gladly took GraphQL, which satisfied all our needs. It allows you to receive data in the form of graphs, pull out only what you need right now. You can even version the API with a lot of flexibility.

How we made the core of Alfa-Bank's investment business based on Tarantool
Almost immediately, we realized that the extracted data was not enough for us. We made functions that can be tied to objects in the model - in fact, calculated fields. That is, we bind a certain function to the field, which, for example, calculates the average quote price. And the external consumer that requests the data doesn't even know it's a calculated field.

How we made the core of Alfa-Bank's investment business based on Tarantool
Implemented an authentication system.

How we made the core of Alfa-Bank's investment business based on Tarantool
Then we noticed that several roles crystallize in our decision. A role is a kind of aggregator of functions. As a rule, roles have a different hardware usage profile:

  • T-Connect: handles incoming connections, is CPU limited, consumes little memory, does not store state.
  • IB-Core: transforms the data it receives via the Tarantool protocol, that is, it operates with tablets. It also does not store state and is scalable.
  • Storage: only stores data, does not use any logic. In this role, the simplest interfaces are implemented. Scalable thanks to vshard.

How we made the core of Alfa-Bank's investment business based on Tarantool
That is, with the help of roles, we have untied different parts of the cluster from each other, which can be scaled independently of each other.

So, we have created an asynchronous record of the transactional data flow and a repair queue with an admin interface. The write is asynchronous from a business point of view: if we are guaranteed to write data to ourselves, no matter where, then we will confirm this. If not confirmed, then something went wrong, the data needs to be sent. This is what write asynchrony is all about.

The test is

From the very beginning of the project, we decided that we would try to impose test driven development. We write unit tests in Lua using the tarantool/tap framework, integration tests in Python using the pytest framework. At the same time, both developers and analysts are involved in writing integration tests.

How do we use test driven development?

If we want some new feature, we try to write a test for it first. Having found a bug, be sure to first write for the test, and only then fix it. At first, it’s hard to work like that, there is misunderstanding on the part of employees, even sabotage: “Let’s quickly fix it now, do something new, and then cover it with tests.” Only this “later” almost never comes.

Therefore, you must force yourself to write tests first of all, ask others to do it. Believe me, test driven development brings benefits even in the short term. You will feel that life has become easier for you. We feel that 99% of the code is now covered by tests. It seems like a lot, but we don't have any problems: the tests are run on every commit.

However, we love load testing most of all, consider it the most important and conduct it regularly.

I'll tell you a little story about how we conducted the first stage of load testing of one of the first versions. We put the system on the developer's laptop, turned on the load and got 4 thousand transactions per second. Good result for a laptop. We put it on a virtual load stand of four servers, weaker than in production. Expanded to the minimum. We start, and we get the result worse than on a laptop in one thread. Shock content.

We got very upset. We are watching the load of servers, and they, it turns out, are idle.

How we made the core of Alfa-Bank's investment business based on Tarantool
We call the developers, and they explain to us, people who come from the Java world, that Tarantool is single-threaded. It can only be effectively used by one processor core under load. Then we deployed the maximum possible number of Tarantool instances on each server, turned on the load and already received 14,5 thousand transactions per second.

How we made the core of Alfa-Bank's investment business based on Tarantool
I'll explain again. Due to the division into roles that use resources in different ways, our roles responsible for processing connections and data transformation loaded only the processor, and it was strictly proportional to the load.

How we made the core of Alfa-Bank's investment business based on Tarantool
How we made the core of Alfa-Bank's investment business based on Tarantool
In this case, the memory was used only for processing incoming connections and temporary objects.

How we made the core of Alfa-Bank's investment business based on Tarantool
And on the storage servers, on the contrary, the processor load grew, but much more slowly than on the servers that process connections.

How we made the core of Alfa-Bank's investment business based on Tarantool
And memory consumption grew in direct proportion to the downloaded amount of data.

How we made the core of Alfa-Bank's investment business based on Tarantool

Services

To develop our new product as an application platform, we made a component for deploying services and libraries on it.

Services are not just small pieces of code that operate on some fields. They can be quite large and complex structures that are part of a cluster, check reference data, twist business logic and give answers. We also export the service schema to GraphQL, and the consumer receives a universal data access point, with introspection throughout the model. It is very comfortable.

Since services contain many more functions, we decided that there should be libraries in which we would take out frequently used code. We added them to a safe environment, having previously checked that this does not break anything for us. And now we can set additional environments for functions in the form of libraries.

We wanted to have a platform not only for storage, but also for computing. And since we already had a bunch of replicas and shards, we implemented a kind of distributed computing and called it map reduce, because it turned out to be similar to the original map reduce.

Old systems

Not all of our older systems can call us over HTTP and use GraphQL, although they support this protocol. Therefore, we have created a mechanism that allows you to replicate data to these systems.

How we made the core of Alfa-Bank's investment business based on Tarantool
If something changes with us, in the Storage role, peculiar triggers work and the message with the changes enters the processing queue. It is sent to the external system using a separate replicator role. This role does not store state.

New improvements

As you remember, from a business point of view, we made an asynchronous write. But then they realized that it would not be enough, because there is a class of systems that need to immediately receive an answer about the status of the operation. Therefore, we extended our GraphQL and added mutations. They organically fit into the existing paradigm of working with data. We have a single point of both reading and writing for another class of systems.

How we made the core of Alfa-Bank's investment business based on Tarantool
We also realized that services alone would not be enough for us, because there are quite heavy reports that need to be built once a day, a week, a month. This can take a long time, and reports can even block Tarantool's event loop. Therefore, we made separate roles: scheduler and runner. Runners do not store state. They run heavy tasks that we cannot count on the fly. And the scheduler role monitors the schedule for launching these tasks, which is described in the configuration. The tasks themselves are stored in the same place as the business data. When the right time comes, the scheduler takes the task, gives it to some runner, he calculates it and saves the result.

How we made the core of Alfa-Bank's investment business based on Tarantool
Not all tasks need to run on a schedule. Some reports need to be read on demand. As soon as this requirement arrives, a task is formed in the sandbox and sent to the runner for execution. After some time, the user asynchronously receives a response that everything has been calculated, the report is ready.

How we made the core of Alfa-Bank's investment business based on Tarantool
Initially, we followed the paradigm of keeping all data, versioning and not deleting it. But in life, from time to time, you still have to delete something, mostly some raw or intermediate information. Based on expirationd, we made a mechanism for cleaning the storage from obsolete data.

How we made the core of Alfa-Bank's investment business based on Tarantool
We also understand that sooner or later there will be a situation when there will be not enough space for storing data in memory, but nevertheless the data must be stored. For these purposes, we will soon make disk storage.

How we made the core of Alfa-Bank's investment business based on Tarantool

Conclusion

We started with the task of loading data into a single model, spent three months developing it. We had six data provider systems. The entire code for transforming into a single model is about 30 thousand lines in Lua. And most of the work is still ahead. Sometimes there is not enough motivation of neighboring teams, which complicates the work of many circumstances. If you ever face such a task, then the time that seems normal to you for its implementation, multiply by three, or even by four.

Also remember that existing problems in business processes cannot be solved with the help of a new DBMS, even if it is very productive. What I mean? At the start of our project, we gave our customers the impression that now we will bring a new fast database, and we will live! Processes will go faster, everything will be fine. In fact, technology does not solve the problems that business processes have, because business processes are people. And you need to work with people, not technology.

Test-driven development can be painful and time-consuming in the early stages. But the positive effect of it will be noticeable even in the short term, when you do not need to do anything to conduct regression testing.

It is extremely important to conduct load testing at all stages of development. The sooner you notice some kind of flaw in the architecture, the easier it will be to fix it, this will save you a lot of time in the future.

There is nothing wrong with Lua. Anyone can learn to write on it: Java developer, JavaScript developer, Python developer, front-end or back-end. We even have analysts writing on it.

When we talk about the fact that we do not have SQL, it terrifies people. “How do you get data without SQL? Is that possible? Certainly. SQL is not needed in an OLTP class system. There is an alternative in the form of some language that returns you immediately document-oriented view. For example GraphQL. And there is an alternative in the form of distributed computing.

If you understand that you will need to scale, then design your solution on Tarantool, immediately in such a way that it can run in parallel on dozens of Tarantool instances. If you don't do this, it will be difficult and painful later, because Tarantool can only effectively use one processor core.

Source: habr.com

Add a comment