In Tarantool, you can combine a super-fast database and an application to work with them. Here's how easy it is to do

Five years ago I tried to work with Tarantool, but then it didn't work for me. But recently I held a webinar where I talked about Hadoop, about how MapReduce works. There I was asked the question - “Why not use Tarantool for this task?”.

For the sake of curiosity, I decided to return to it, test the latest version - and this time I really liked the project. Now I will show how to write a simple application in Tarantool, load it and check the performance, and you will see how easy and cool everything is there.

In Tarantool, you can combine a super-fast database and an application to work with them. Here's how easy it is to do

What is Tarantool

Tarantool positions itself as an ultra-fast database. You can put any data you want in there. Plus, replicate them, shard - that is, split a huge amount of data across several servers and combine the results from them - make fault-tolerant master-master links.

Secondly, it is an application server. You can write your applications on it, work with data, for example, delete old entries in the background according to certain rules. You can write an Http server directly in Tarantula that will work with data: give out their number, write new data there and reduce it all to a master.

I read an article on how the guys made a 300-line message queue, which just tears and thrashes - they have a minimum performance of 20 messages per second. Here you can really turn around and write a very large application, and these will not be stored, as in PostgreS.

Approximately such a server, only simple, I will try to describe in this article.

Installation

For the test, I started three standard virtual machines - a 20 gigabyte hard drive, Ubuntu 18.04. 2 virtual CPU and 4 gigabytes of memory.

We install Tarantool - run a bash script or add a repository and do apt get install Tarantool. Link to the script - (curl -L https://tarantool.io/installer.sh | VER=2.4 sudo -E bash). We have commands like:

tarantoolctl is the main command for managing Tarantula instances.
/etc/tarantool - here lies the entire configuration.
var/log/tarantool - here are the logs.
var/lib/tarantool - here lies the data, and then they are divided into instances.

There are instance-available and instance-enable folders - it contains what will be launched - the instance configuration file with lua code, which describes which ports it listens on, what memory is available to it, Vinyl engine settings, code that works at startup servers, sharding, queues, deletion of obsolete data, and so on.

Instances work like in PostgreS. For example, you want to run multiple copies of a database that hangs on different ports. It turns out that several database instances are launched on one server, which hang on different ports. They can have completely different settings - one instance implements one logic, the second - another.

Instance management

We have the tarantoolctl command which allows us to manage Tarantula instances. For example, tarantoolctl check example will check the configuration file and say the file is ok if there are no syntax errors.

You can see the status of the instance - tarantoolctl status example. In the same way, you can do start, stop, restart.

Once an instance is running, there are two ways to connect to it.

1. Administrative console

By default, Tarantool opens a socket and sends plain ASCII text to control the Tarantula. Connection to the console always occurs under the admin user, there is no authentication, so you do not need to take out the console port to control the Tarantula outside.

To connect in this way, you need to enter Tarantoolctl enter instance name. The command will launch the console and connect as the admin user. Never expose the console port to the outside - it is better to leave it as a unit socket. Then only those who have write access to the socket will be able to connect to the Tarantula.

This method is needed for administrative things. To work with data, use the second method - the binary protocol.

2. Using a binary protocol to connect to a specific port

There is a listen directive in the configuration, which opens the port for external communications. This port is used with binary protocol and authentication is enabled there.

For this connection, tarantoolctl connect to port number is used. Using it, you can connect to remote servers, use authentication and give various access rights.

Data Recording and Box Module

Since Tarantool is both a database and an application server, it has various modules. We are interested in the box module - it implements work with data. When you write something to a box, Tarantool writes the data to disk, stores it in memory, or does something else with it.

Record

For example, we go into the box module and call the box.once function. It will force Tarantool to run our code when the server is initialized. We create a space where our data will be stored.

local function bootstrap()
    local space = box.schema.create_space('example')
    space:create_index('primary')
    box.schema.user.grant('guest', 'read,write,execute', 'universe')

    -- Keep things safe by default
    --  box.schema.user.create('example', { password = 'secret' })
    --  box.schema.user.grant('example', 'replication')
    --  box.schema.user.grant('example', 'read,write,execute', 'space', 'example')
end

After that, we create a primary index - primary - by which we can search for data. By default, if no parameters are specified, the first field in each entry for the primer index will be used.

Then we make a grant to the guest user, under which we connect via a binary protocol. We allow reading, writing and executing in the entire instance.

Compared to conventional databases, everything is quite simple here. We have space - an area in which our data is simply stored. Each entry is called a tuple. It is packaged in a MessagePack. This is a very cool format - it is binary and takes up less space - 18 bytes versus 27.

In Tarantool, you can combine a super-fast database and an application to work with them. Here's how easy it is to do

It is quite convenient to work with him. Almost every line, every data entry can have completely different columns.

We can view all spaces using the Box.space command. To select a specific instance, we write box.space example and get full information on it.

There are two types of engines built into Tarantool: Memory and Vinyl. Memory stores all data in memory. Therefore, everything works simply and quickly. The data is dumped to disk, and there is also a write ahead log mechanism, so we won't lose anything if the server crashes.

Vinyl stores data on disk in a more familiar form - that is, you can store more data than we have memory, and Tarantula will read it from disk.

Now we will use Memory.

unix/:/var/run/tarantool/example.control> box.space.example
---
- engine: memtx
  before_replace: 'function: 0x41eb02c8'
  on_replace: 'function: 0x41eb0568'
  ck_constraint: []
  field_count: 0
  temporary: false
  index:
    0: &0
      unique: true
      parts:
      - type: unsigned
        is_nullable: false
        fieldno: 1
      id: 0
      space_id: 512
      type: TREE
      name: primary
    primary: *0
  is_local: false
  enabled: true
  name: example
  id: 512
...

unix/:/var/run/tarantool/example.control>

Index:

A primary index must be created for any space, because nothing will work without it. As in any database, we create the first field - the record ID.

Parts:

This is where we specify what our index consists of. It consists of one part - the first field that we will use, type unsigned - a positive integer. As far as I remember from the documentation, the maximum number that can be is 18 quintillion. A lot of awesome.

We can then insert data using the insert command.

unix/:/var/run/tarantool/example.control> box.space.example:insert{1, 'test1', 'test2'}
---
- [1, 'test1', 'test2']
...

unix/:/var/run/tarantool/example.control> box.space.example:insert{2, 'test2', 'test3', 'test4'}
---
- [2, 'test2', 'test3', 'test4']
...

unix/:/var/run/tarantool/example.control> box.space.example:insert{3, 'test3'}
---
- [3, 'test3']
...

unix/:/var/run/tarantool/example.control> box.space.example:insert{4, 'test4'}
---
- [4, 'test4']
...

unix/:/var/run/tarantool/example.control>

The first field is used as the primary key, so it must be unique. We are not limited by the number of columns, so we can insert as much data as we like there. They are specified in the MessagePack format, which I described above.

Data output

Then we can display the data with the select command.

Box.example.select with the key {1} will display the desired entry. If we omit the key, we will see all the records we have. They are all different in the number of columns, but here, in principle, there is no concept of columns - there are field numbers.

There can be absolutely as much data as you like. And for example, we need to search for them in the second field. To do this, we make a new secondary index.


box.space.example:create_index( ‘secondary’, { type = ‘TREE’, unique = false, parts = {{field = 2, type =’string’} }}) 

We use the Create_index command.
We call it Secondary.

After that, you need to specify the parameters. The index type is TREE. It may not be unique, so we enter Unique = false.

Then we indicate what parts our index consists of. Field is the number of the field to which we bind the index, and specifies the string type. And so it was created.

unix/:/var/run/tarantool/example.control> box.space.example:create_index('secondary', { type = 'TREE', unique = false, parts = {{field = 2, type = 'string'}}})
---
- unique: false
  parts:
  - type: string
    is_nullable: false
    fieldno: 2
  id: 1
  space_id: 512
  type: TREE
  name: secondary
...

unix/:/var/run/tarantool/example.control>

Now this is how we can call it:

unix/:/var/run/tarantool/example.control> box.space.example.index.secondary:select('test1')
---
- - [1, 'test1', 'test2']
...

Preservation

If we restart the instance and try to call the data again, we will see that they are not there - everything is empty. This happens because Tarantool makes checkpoints and saves the data to disk, but if we stop working before the next save, we will lose all operations - because we will recover from the last checkpoint, which was, for example, two hours ago.

Saving every second will also not work - because constantly dumping 20 GB to disk is a so-so idea.

For this, the concept of write-ahead log was invented and implemented. With its help, for each change in the data, a record is created in a small write-ahead log file.

Each entry up to the checkpoint is stored in them. For these files, we set the size - for example, 64 mb. When it fills up, the recording starts going to the second file. And after the restart, Tarantool recovers from the last checkpoint and then rolls over all later transactions until it stops.

In Tarantool, you can combine a super-fast database and an application to work with them. Here's how easy it is to do

To perform such a recording, you need to specify an option in the box.cfg settings (in the example.lua file):

wal_mode = “write”;

data usage

With what we have written now, you can use the Tarantula to store data and it will work very fast as a database. And now the cherry on the cake - what can you do with it all.

Writing an application

For example, let's write such an application for the Tarantula

See the application under the spoiler

box.cfg {
    listen = '0.0.0.0:3301';
    io_collect_interval = nil;
    readahead = 16320;
    memtx_memory = 128 * 1024 * 1024; -- 128Mb
    memtx_min_tuple_size = 16;
    memtx_max_tuple_size = 128 * 1024 * 1024; -- 128Mb
    vinyl_memory = 128 * 1024 * 1024; -- 128Mb
    vinyl_cache = 128 * 1024 * 1024; -- 128Mb
    vinyl_max_tuple_size = 128 * 1024 * 1024; -- 128Mb
    vinyl_write_threads = 2;
    wal_mode = "write";
    wal_max_size = 256 * 1024 * 1024;
    checkpoint_interval = 60 * 60; -- one hour
    checkpoint_count = 6;
    force_recovery = true;
    log_level = 5;
    log_nonblock = false;
    too_long_threshold = 0.5;
    read_only   = false
}

local function bootstrap()
    local space = box.schema.create_space('example')
    space:create_index('primary')

    box.schema.user.create('example', { password = 'secret' })
    box.schema.user.grant('example', 'read,write,execute', 'space', 'example')

    box.schema.user.create('repl', { password = 'replication' })
    box.schema.user.grant('repl', 'replication')
end

-- for first run create a space and add set up grants
box.once('replica', bootstrap)

-- enabling console access
console = require('console')
console.listen('127.0.0.1:3302')

-- http config
local charset = {}  do -- [0-9a-zA-Z]
    for c = 48, 57  do table.insert(charset, string.char(c)) end
    for c = 65, 90  do table.insert(charset, string.char(c)) end
    for c = 97, 122 do table.insert(charset, string.char(c)) end
end

local function randomString(length)
    if not length or length <= 0 then return '' end
    math.randomseed(os.clock()^5)
    return randomString(length - 1) .. charset[math.random(1, #charset)]
end

local http_router = require('http.router')
local http_server = require('http.server')
local json = require('json')

local httpd = http_server.new('0.0.0.0', 8080, {
    log_requests = true,
    log_errors = true
})

local router = http_router.new()

local function get_count()
 local cnt = box.space.example:len()
 return cnt
end

router:route({method = 'GET', path = '/count'}, function()
    return {status = 200, body = json.encode({count = get_count()})}
end)

router:route({method = 'GET', path = '/token'}, function()
    local token = randomString(32)
    local last = box.space.example:len()
    box.space.example:insert{ last + 1, token }
    return {status = 200, body = json.encode({token = token})}
end)

prometheus = require('prometheus')

fiber = require('fiber')
tokens_count = prometheus.gauge("tarantool_tokens_count",
                              "API Tokens Count")

function monitor_tokens_count()
  while true do
    tokens_count:set(get_count())
    fiber.sleep(5)
  end
end
fiber.create(monitor_tokens_count)

router:route( { method = 'GET', path = '/metrics' }, prometheus.collect_http)

httpd:set_router(router)
httpd:start()

We declare some table in lua which defines symbols. This table is needed to generate a random string.

local charset = {}  do -- [0-9a-zA-Z]
    for c = 48, 57  do table.insert(charset, string.char(c)) end
    for c = 65, 90  do table.insert(charset, string.char(c)) end
    for c = 97, 122 do table.insert(charset, string.char(c)) end
end

After that, we declare a function - randomString and give the length value in brackets.

local function randomString(length)
    if not length or length <= 0 then return '' end
    math.randomseed(os.clock()^5)
    return randomString(length - 1) .. charset[math.random(1, #charset)]
end

Then we connect the http router and http server to our Tarantula server, JSON, which we will give to the client.

local http_router = require('http.router')
local http_server = require('http.server')
local json = require('json')

After that, we start on port 8080 on all http server interfaces, which will log all requests and errors.

local httpd = http_server.new('0.0.0.0', 8080, {
    log_requests = true,
    log_errors = true
})

Next, we declare a route that if a request with the GET method arrives on port 8080 /count, then we call the function from one line. It returns a status - 200, 404, 403 or whatever we specify.

router:route({method = 'GET', path = '/count'}, function()
    return {status = 200, body = json.encode({count = get_count()})}
end)

In the body, we return json.encode, we specify count and getcount in it, which is called and shows the number of records in our database.

The second method

router:route({method = 'GET', path = '/token'}, function() 
    local token = randomString(32) 
    local last = box.space.example:len() 
    box.space.example:insert{ last + 1, token } 
    return {status = 200, body = json.encode({token = token})}
end)

Where in the line router:route({method = 'GET', path = '/token'}, function() we call the function and generate a token.

Line local token = randomString(32) is a random string of 32 characters.
In line local last = box.space.example:len() we pull out the last element.
And in the line box.space.example:insert{ last + 1, token } we write the data to our database, that is, we simply increase the ID by 1. This can be done, by the way, not only in such a clumsy way. Tarantula has sequences for this case.

We write the token there.

Thus, we wrote an application in one file. You can access the data right away, and the box module will do all the dirty work for you.

It listens to http and works with data, everything is in a single instance - both the application and the data. Therefore, everything happens quite quickly.

To run, we install the http module:

How we do it, see under the spoiler

root@test2:/# tarantoolctl rocks install http
Installing http://rocks.tarantool.org/http-scm-1.src.rock
Missing dependencies for http scm-1:
   checks >= 3.0.1 (not installed)

http scm-1 depends on checks >= 3.0.1 (not installed)
Installing http://rocks.tarantool.org/checks-3.0.1-1.rockspec

Cloning into 'checks'...
remote: Enumerating objects: 28, done.
remote: Counting objects: 100% (28/28), done.
remote: Compressing objects: 100% (19/19), done.
remote: Total 28 (delta 1), reused 16 (delta 1), pack-reused 0
Receiving objects: 100% (28/28), 12.69 KiB | 12.69 MiB/s, done.
Resolving deltas: 100% (1/1), done.
Note: checking out '580388773ef11085015b5a06fe52d61acf16b201'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

  git checkout -b <new-branch-name>

No existing manifest. Attempting to rebuild...
checks 3.0.1-1 is now installed in /.rocks (license: BSD)

-- The C compiler identification is GNU 7.5.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Found TARANTOOL: /usr/include (found version "2.4.2-80-g18f2bc82d")
-- Tarantool LUADIR is /.rocks/share/tarantool/rocks/http/scm-1/lua
-- Tarantool LIBDIR is /.rocks/share/tarantool/rocks/http/scm-1/lib
-- Configuring done
-- Generating done
CMake Warning:
  Manually-specified variables were not used by the project:

    version


-- Build files have been written to: /tmp/luarocks_http-scm-1-V4P9SM/http/build.luarocks
Scanning dependencies of target httpd
[ 50%] Building C object http/CMakeFiles/httpd.dir/lib.c.o
In file included from /tmp/luarocks_http-scm-1-V4P9SM/http/http/lib.c:32:0:
/tmp/luarocks_http-scm-1-V4P9SM/http/http/lib.c: In function ‘tpl_term’:
/usr/include/tarantool/lauxlib.h:144:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
    (*(B)->p++ = (char)(c)))
    ~~~~~~~~~~~^~~~~~~~~~~~
/tmp/luarocks_http-scm-1-V4P9SM/http/http/lib.c:62:7: note: in expansion of macro ‘luaL_addchar’
       luaL_addchar(b, '\');
       ^~~~~~~~~~~~
/tmp/luarocks_http-scm-1-V4P9SM/http/http/lib.c:63:6: note: here
      default:
      ^~~~~~~
In file included from /tmp/luarocks_http-scm-1-V4P9SM/http/http/lib.c:39:0:
/tmp/luarocks_http-scm-1-V4P9SM/http/http/tpleval.h: In function ‘tpe_parse’:
/tmp/luarocks_http-scm-1-V4P9SM/http/http/tpleval.h:147:9: warning: this statement may fall through [-Wimplicit-fallthrough=]
    type = TPE_TEXT;
    ~~~~~^~~~~~~~~~
/tmp/luarocks_http-scm-1-V4P9SM/http/http/tpleval.h:149:3: note: here
   case TPE_LINECODE:
   ^~~~
In file included from /tmp/luarocks_http-scm-1-V4P9SM/http/http/lib.c:40:0:
/tmp/luarocks_http-scm-1-V4P9SM/http/http/httpfast.h: In function ‘httpfast_parse’:
/tmp/luarocks_http-scm-1-V4P9SM/http/http/httpfast.h:372:22: warning: this statement may fall through [-Wimplicit-fallthrough=]
                 code = 0;
                 ~~~~~^~~
/tmp/luarocks_http-scm-1-V4P9SM/http/http/httpfast.h:374:13: note: here
             case status:
             ^~~~
/tmp/luarocks_http-scm-1-V4P9SM/http/http/httpfast.h:393:23: warning: this statement may fall through [-Wimplicit-fallthrough=]
                 state = message;
                 ~~~~~~^~~~~~~~~
/tmp/luarocks_http-scm-1-V4P9SM/http/http/httpfast.h:395:13: note: here
             case message:
             ^~~~
[100%] Linking C shared library lib.so
[100%] Built target httpd
[100%] Built target httpd
Install the project...
-- Install configuration: "Debug"
-- Installing: /.rocks/share/tarantool/rocks/http/scm-1/lua/http/VERSION.lua
-- Installing: /.rocks/share/tarantool/rocks/http/scm-1/lib/http/lib.so
-- Installing: /.rocks/share/tarantool/rocks/http/scm-1/lua/http/server/init.lua
-- Installing: /.rocks/share/tarantool/rocks/http/scm-1/lua/http/server/tsgi_adapter.lua
-- Installing: /.rocks/share/tarantool/rocks/http/scm-1/lua/http/nginx_server/init.lua
-- Installing: /.rocks/share/tarantool/rocks/http/scm-1/lua/http/router/init.lua
-- Installing: /.rocks/share/tarantool/rocks/http/scm-1/lua/http/router/fs.lua
-- Installing: /.rocks/share/tarantool/rocks/http/scm-1/lua/http/router/matching.lua
-- Installing: /.rocks/share/tarantool/rocks/http/scm-1/lua/http/router/middleware.lua
-- Installing: /.rocks/share/tarantool/rocks/http/scm-1/lua/http/router/request.lua
-- Installing: /.rocks/share/tarantool/rocks/http/scm-1/lua/http/router/response.lua
-- Installing: /.rocks/share/tarantool/rocks/http/scm-1/lua/http/tsgi.lua
-- Installing: /.rocks/share/tarantool/rocks/http/scm-1/lua/http/utils.lua
-- Installing: /.rocks/share/tarantool/rocks/http/scm-1/lua/http/mime_types.lua
-- Installing: /.rocks/share/tarantool/rocks/http/scm-1/lua/http/codes.lua
http scm-1 is now installed in /.rocks (license: BSD)

root@test2:/#

We also need prometheus to run:

root@test2:/# tarantoolctl rocks install prometheus
Installing http://rocks.tarantool.org/prometheus-scm-1.rockspec

Cloning into 'prometheus'...
remote: Enumerating objects: 19, done.
remote: Counting objects: 100% (19/19), done.
remote: Compressing objects: 100% (19/19), done.
remote: Total 19 (delta 2), reused 5 (delta 0), pack-reused 0
Receiving objects: 100% (19/19), 10.73 KiB | 10.73 MiB/s, done.
Resolving deltas: 100% (2/2), done.
prometheus scm-1 is now installed in /.rocks (license: BSD)

root@test2:/#

We start and can access modules

root@test2:/# curl -D - -s http://127.0.0.1:8080/token
HTTP/1.1 200 Ok
Content-length: 44
Server: Tarantool http (tarantool v2.4.2-80-g18f2bc82d)
Connection: keep-alive

{"token":"e2tPq9l5Z3QZrewRf6uuoJUl3lJgSLOI"}

root@test2:/# curl -D - -s http://127.0.0.1:8080/token
HTTP/1.1 200 Ok
Content-length: 44
Server: Tarantool http (tarantool v2.4.2-80-g18f2bc82d)
Connection: keep-alive

{"token":"fR5aCA84gj9eZI3gJcV0LEDl9XZAG2Iu"}

root@test2:/# curl -D - -s http://127.0.0.1:8080/count
HTTP/1.1 200 Ok
Content-length: 11
Server: Tarantool http (tarantool v2.4.2-80-g18f2bc82d)
Connection: keep-alive

{"count":2}root@test2:/#

/count gives us status 200.
/token issues a token and writes this token to the database.

Testing speed

Let's run a benchmark for 50 requests. Competitive requests will be 000.

root@test2:/# ab -c 500 -n 50000 http://127.0.0.1:8080/token
This is ApacheBench, Version 2.3 <$Revision: 1807734 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 127.0.0.1 (be patient)
Completed 5000 requests
Completed 10000 requests
Completed 15000 requests
Completed 20000 requests
Completed 25000 requests
Completed 30000 requests
Completed 35000 requests
Completed 40000 requests
Completed 45000 requests
Completed 50000 requests
Finished 50000 requests


Server Software:        Tarantool
Server Hostname:        127.0.0.1
Server Port:            8080

Document Path:          /token
Document Length:        44 bytes

Concurrency Level:      500
Time taken for tests:   14.578 seconds
Complete requests:      50000
Failed requests:        0
Total transferred:      7950000 bytes
HTML transferred:       2200000 bytes
Requests per second:    3429.87 [#/sec] (mean)
Time per request:       145.778 [ms] (mean)
Time per request:       0.292 [ms] (mean, across all concurrent requests)
Transfer rate:          532.57 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0   10 103.2      0    3048
Processing:    12   69 685.1     15   13538
Waiting:       12   69 685.1     15   13538
Total:         12   78 768.2     15   14573

Percentage of the requests served within a certain time (ms)
  50%     15
  66%     15
  75%     16
  80%     16
  90%     16
  95%     16
  98%     21
  99%     42
 100%  14573 (longest request)
root@test2:/#

Tokens are issued. And we are constantly recording data. 99% of requests were completed in 42 milliseconds. Accordingly, we have about 3500 requests per second on a small machine, where there are 2 cores and 4 gigabytes of memory.

You can also select some 50000 token and see its value.

You can use not only http, run background functions that process your data. Plus there are various triggers. For example, you can call functions on updates, check something - fix conflicts.

You can write script applications directly in the database server itself, and not be limited by anything, connect any modules and implement any logic.

The application server can access external servers, collect data and add it to its database. Data from this database will be used by other applications.

This will be done by the Tarantula itself, and there is no need to write a separate application.

In conclusion

This is just the first part of a big job. The second one will be published very soon on the Mail.ru Group blog, and we will definitely add a link to it in this article.

If you're interested in attending events where we create these things online and asking questions in real time, connect to DevOps by REBRAIN channel.

If you need to move to the cloud or have questions about your infrastructure, Feel free to submit a request.

PS We have 2 free audits per month, perhaps your project will be one of them.

Source: habr.com

Add a comment