Tarantool Cartridge: Lua backend sharding in three lines

Tarantool Cartridge: Lua backend sharding in three lines

We at Mail.ru Group have Tarantool - this is an application server in Lua, which is also a database in combination (or vice versa?). It's fast and cool, but the possibilities of a single server are still not unlimited. Vertical scaling is not a panacea either, so Tarantool has tools for horizontal scaling — the vshard module [1]. It allows you to shard data across multiple servers, but you have to tinker with it to set it up and screw in business logic.

Good news: we have collected cones (for example [2], [3]) and filed another framework that will significantly simplify the solution of this problem.

Tarantool Cartridge is a new framework for developing complex distributed systems. It allows you to focus on writing business logic instead of solving infrastructure problems. Under the cut, I will tell you how this framework works and how to write distributed services with it.

And what, in fact, is the problem?

We have a tarantula, we have a vshard - what more could you want?

First, it's a matter of convenience. The vshard configuration is configured via Lua tables. For a distributed system of multiple Tarantool processes to work correctly, the configuration must be the same everywhere. Nobody wants to do it manually. Therefore, all kinds of scripts, Ansible, deployment systems are used.

Cartridge manages the vshard configuration itself, it does this based on its own own distributed configuration. Basically, it is a simple YAML file, a copy of which is stored in every instance of Tarantool. The simplification lies in the fact that the framework itself monitors its configuration and ensures that it is the same everywhere.

Secondly, it's again a matter of convenience. The vshard configuration has nothing to do with the development of business logic and only distracts the programmer from work. When we discuss the architecture of a project, most often we are talking about individual components and their interaction. It’s too early to think about rolling out a cluster for 3 data centers.

We solved these problems time after time, and at some point we managed to develop an approach that allows us to simplify the work with the application throughout its entire life cycle: creation, development, testing, CI / CD, maintenance.

Cartridge introduces the concept of a role for each Tarantool process. Roles are the concept that allows the developer to focus on writing code. All roles available in the project can be run on a single instance of Tarantool, and this will be enough for tests.

Main features of Tarantool Cartridge:

  • automated cluster orchestration;
  • expanding application functionality with new roles;
  • application template for development and deployment;
  • built-in automatic sharding;
  • integration with the Luatest test framework;
  • cluster management using WebUI and API;
  • packaging and deployment tools.

Hello World!

I can't wait to show the framework itself, so we'll leave the story about the architecture for later, and start simple. Assuming that Tarantool itself is already installed, the only thing left to do is

$ tarantoolctl rocks install cartridge-cli
$ export PATH=$PWD/.rocks/bin/:$PATH

These two commands will install the command line utilities and allow you to create your first application from the template:

$ cartridge create --name myapp

And this is what we get:

myapp/
├── .git/
├── .gitignore
├── app/roles/custom.lua
├── deps.sh
├── init.lua
├── myapp-scm-1.rockspec
├── test
│   ├── helper
│   │   ├── integration.lua
│   │   └── unit.lua
│   ├── helper.lua
│   ├── integration/api_test.lua
│   └── unit/sample_test.lua
└── tmp/

This is a git repository with "Hello, World!" application. Let's immediately try to run it, after installing the dependencies (including the framework itself):

$ tarantoolctl rocks make
$ ./init.lua --http-port 8080

So, we have one node of the future sharded application running. An inquisitive layman can immediately open the web interface, configure a cluster from one node with the mouse and enjoy the result, but it's too early to rejoice. So far, the application does not know how to do anything useful, so I will talk about deployment later, and now it's time to write code.

Application Development

Just imagine, we are designing a project that should receive data, save it and build a report once a day.

Tarantool Cartridge: Lua backend sharding in three lines

We start drawing the diagram and place three components on it: gateway, storage and scheduler. We are working on the architecture. Since we use vshard as storage, we add vshard-router and vshard-storage to the schema. Neither the gateway nor the scheduler will directly access the storage, there is a router for this, it was created for this.

Tarantool Cartridge: Lua backend sharding in three lines

This diagram still doesn't quite accurately reflect what we'll be creating in the project, because the components look abstract. We still need to see how this is projected onto the real Tarantool - let's group our components by processes.

Tarantool Cartridge: Lua backend sharding in three lines

It makes little sense to keep vshard-router and gateway on separate instances. Why do we need to go over the network once again, if this is already the responsibility of the router? They must be run within the same process. That is, both gateway and vshard.router.cfg are initialized in one process, and let them interact locally.

At the design stage, it was convenient to work with three components, but as a developer, while writing code, I don’t want to think about running three instances of Tarnatool. I need to run tests and check that I spelled gateway correctly. Or maybe I want to demonstrate a feature to my colleagues. Why should I bother with deploying three instances? This is how the concept of roles was born. A role is a regular luash module, the life cycle of which is managed by Cartridge. In this example, there are four of them - gateway, router, storage, scheduler. There may be more in another project. All roles can be run in one process, and that will be enough.

Tarantool Cartridge: Lua backend sharding in three lines

And when it comes to deployment to staging or production, then we will assign each Tarantool process its own set of roles depending on the hardware capabilities:

Tarantool Cartridge: Lua backend sharding in three lines

Topology management

Information about where which roles are running must be stored somewhere. And this “somewhere” is a distributed configuration, which I already mentioned above. The most important thing in it is the topology of the cluster. Here are 3 replication groups of 5 Tarantool processes:

Tarantool Cartridge: Lua backend sharding in three lines

We do not want to lose data, so we take care of information about running processes. Cartridge keeps track of the configuration with a two-phase commit. As soon as we want to update the configuration, it first checks that all instances are available and ready to accept the new configuration. After that, the second phase applies the config. Thus, even if one copy is temporarily unavailable, nothing bad will happen. The configuration simply will not apply and you will see an error in advance.

The topology section also contains such an important parameter as the leader of each replication group. This is usually the instance that is being written to. The rest are most often read-only, although there may be exceptions. Sometimes brave developers are not afraid of conflicts and can write data to several replicas in parallel, but there are some operations that, in spite of everything, should not be performed twice. For this there is a sign of a leader.

Tarantool Cartridge: Lua backend sharding in three lines

Life of roles

In order for an abstract role to exist in such an architecture, the framework must manage them somehow. Naturally, management occurs without restarting the Tarantool process. There are 4 callbacks for managing roles. Cartridge itself will call them depending on what it has written in the distributed configuration, thereby applying the configuration to specific roles.

function init()
function validate_config()
function apply_config()
function stop()

Every role has a function init. It is called once either when a role is enabled or when Tarantool is restarted. There it is convenient, for example, to initialize box.space.create, or the scheduler can start some background fiber that will perform work at certain intervals.

single function init may not be enough. Cartridge allows roles to take advantage of the distributed configuration that it uses to store the topology. We can declare a new section in the same configuration and store a business configuration fragment in it. In my example, this can be a data schema, or schedule settings for the scheduler role.

Cluster calls validate_config и apply_config every time the distributed configuration changes. When a configuration is applied by a two-phase commit, the cluster checks that each role is ready to accept this new configuration and reports an error to the user if necessary. When everyone agreed that the configuration is normal, then the apply_config.

Roles also have a method stopThe that is needed to clean up the vitals of the role. If we say that the scheduler is no longer needed on this server, it can stop those fibers that it started with init.

Roles can interact with each other. We are used to writing function calls in Lua, but it may happen that in this process there is no role we need. To facilitate calls over the network, we use the rpc (remote procedure call) auxiliary module, which is built on the basis of the standard netbox built into Tarantool. This can be useful if, for example, your gateway wants to directly ask the scheduler to do the job right now, rather than waiting for a day.

Another important point is to ensure fault tolerance. Cartridge uses the SWIM protocol to monitor health [4]. In short, processes exchange "rumors" with each other over UDP - each process tells its neighbors the latest news, and they respond. If suddenly the answer did not come, Tarantool begins to suspect something is wrong, and after a while he recites death and starts telling everyone around this news.

Tarantool Cartridge: Lua backend sharding in three lines

Based on this protocol, Cartridge organizes automatic failure handling. Each process monitors its environment, and if the leader suddenly stops responding, then the replica can take over its role, and Cartridge configures the running roles accordingly.

Tarantool Cartridge: Lua backend sharding in three lines

You need to be careful here, because frequent switching back and forth can lead to data conflicts during replication. Turning on automatic failover at random, of course, is not worth it. You need to clearly understand what is happening, and be sure that replication will not break after the leader is restored and the crown is returned to him.

From all of the above, you might get the feeling that roles are like microservices. In a sense, they are, only as modules inside Tarantool processes. But there are also a number of fundamental differences. First, all project roles must live in the same codebase. And all Tarantool processes must run from the same codebase so that there are no surprises like when we try to initialize the scheduler, but it simply does not exist. Also, you should not allow differences in code versions, because the behavior of the system in such a situation is very difficult to predict and debug.

Unlike Docker, we can't just take an "image" of a role, take it to another machine, and run it there. Our roles are not as isolated as Docker containers. Also, we cannot run two identical roles on the same instance. A role is either there or it isn't, in a sense it's a singleton. And thirdly, within the entire replication group, the roles must be the same, because otherwise it would be ridiculous - the data is the same, but the configuration is different.

Deployment tools

I promised to show how Cartridge helps deploy applications. To make life easier for others, the framework packages RPM packages:

$ cartridge pack rpm myapp -- упакует для нас ./myapp-0.1.0-1.rpm
$ sudo yum install ./myapp-0.1.0-1.rpm

The installed package contains almost everything you need: both the application and the installed luash dependencies. Tarantool will also arrive on the server as a dependency of the RPM package, and our service is ready to be launched. This is done through systemd, but first you need to write a little configuration. At a minimum, specify the URI of each process. Three is enough for an example.

$ sudo tee /etc/tarantool/conf.d/demo.yml <<CONFIG
myapp.router: {"advertise_uri": "localhost:3301", "http_port": 8080}
myapp.storage_A: {"advertise_uri": "localhost:3302", "http_enabled": False}
myapp.storage_B: {"advertise_uri": "localhost:3303", "http_enabled": False}
CONFIG

There is an interesting nuance here. Instead of specifying just the binary protocol port, we specify the entire public address of the process, including the hostname. This is necessary so that the cluster nodes know how to connect to each other. It's a bad idea to use 0.0.0.0 as the advertise_uri, it should be the public IP, not the bind socket. Without it, nothing will work, so Cartridge will simply not let you launch a node with an incorrect advertise_uri.

Now that the configuration is ready, you can start the processes. Since the usual systemd unit does not allow more than one process to start, applications on the Cartridge are installed by the so-called. instantiated units that work like this:

$ sudo systemctl start myapp@router
$ sudo systemctl start myapp@storage_A
$ sudo systemctl start myapp@storage_B

In the configuration, we specified the HTTP port on which Cartridge serves the web interface - 8080. Let's go to it and see:

Tarantool Cartridge: Lua backend sharding in three lines

We see that the processes, although they are running, are not yet configured. The cartridge does not yet know who should replicate with whom and cannot make a decision on its own, so it is waiting for our actions. And we don’t have much choice: the life of a new cluster begins with the configuration of the first node. Then we add the rest to the cluster, assign roles to them, and at this point the deployment can be considered successfully completed.

Pour a glass of your favorite drink and relax after a long working week. The application can be operated.

Tarantool Cartridge: Lua backend sharding in three lines

Results

And what about the results? Try, use, leave feedback, start tickets on github.

references

[1] Tarantool » 2.2 » Reference » Rocks reference » Module vshard

[2] How we implemented the core of Alfa-Bank's investment business based on Tarantool

[3] New generation billing architecture: transformation with the transition to Tarantool

[4] SWIM - cluster building protocol

[5] GitHub - tarantool/cartridge-cli

[6] GitHub - tarantool/cartridge

Source: habr.com

Add a comment