Five years ago I tried to work with Tarantool, but then it didn't work for me. But recently I held a webinar where I talked about Hadoop, about how MapReduce works. There I was asked the question - “Why not use Tarantool for this task?”.
For the sake of curiosity, I decided to return to it, test the latest version - and this time I really liked the project. Now I will show how to write a simple application in Tarantool, load it and check the performance, and you will see how easy and cool everything is there.
What is Tarantool
Tarantool positions itself as an ultra-fast database. You can put any data you want in there. Plus, replicate them, shard - that is, split a huge amount of data across several servers and combine the results from them - make fault-tolerant master-master links.
Secondly, it is an application server. You can write your applications on it, work with data, for example, delete old entries in the background according to certain rules. You can write an Http server directly in Tarantula that will work with data: give out their number, write new data there and reduce it all to a master.
I read an article on how the guys made a 300-line message queue, which just tears and thrashes - they have a minimum performance of 20 messages per second. Here you can really turn around and write a very large application, and these will not be stored, as in PostgreS.
Approximately such a server, only simple, I will try to describe in this article.
Installation
For the test, I started three standard virtual machines - a 20 gigabyte hard drive, Ubuntu 18.04. 2 virtual CPU and 4 gigabytes of memory.
We install Tarantool - run a bash script or add a repository and do apt get install Tarantool. Link to the script - (curl -L
tarantoolctl is the main command for managing Tarantula instances.
/etc/tarantool - here lies the entire configuration.
var/log/tarantool - here are the logs.
var/lib/tarantool - here lies the data, and then they are divided into instances.
There are instance-available and instance-enable folders - it contains what will be launched - the instance configuration file with lua code, which describes which ports it listens on, what memory is available to it, Vinyl engine settings, code that works at startup servers, sharding, queues, deletion of obsolete data, and so on.
Instances work like in PostgreS. For example, you want to run multiple copies of a database that hangs on different ports. It turns out that several database instances are launched on one server, which hang on different ports. They can have completely different settings - one instance implements one logic, the second - another.
Instance management
We have the tarantoolctl command which allows us to manage Tarantula instances. For example, tarantoolctl check example will check the configuration file and say the file is ok if there are no syntax errors.
You can see the status of the instance - tarantoolctl status example. In the same way, you can do start, stop, restart.
Once an instance is running, there are two ways to connect to it.
1. Administrative console
By default, Tarantool opens a socket and sends plain ASCII text to control the Tarantula. Connection to the console always occurs under the admin user, there is no authentication, so you do not need to take out the console port to control the Tarantula outside.
To connect in this way, you need to enter Tarantoolctl enter instance name. The command will launch the console and connect as the admin user. Never expose the console port to the outside - it is better to leave it as a unit socket. Then only those who have write access to the socket will be able to connect to the Tarantula.
This method is needed for administrative things. To work with data, use the second method - the binary protocol.
2. Using a binary protocol to connect to a specific port
There is a listen directive in the configuration, which opens the port for external communications. This port is used with binary protocol and authentication is enabled there.
For this connection, tarantoolctl connect to port number is used. Using it, you can connect to remote servers, use authentication and give various access rights.
Data Recording and Box Module
Since Tarantool is both a database and an application server, it has various modules. We are interested in the box module - it implements work with data. When you write something to a box, Tarantool writes the data to disk, stores it in memory, or does something else with it.
Record
For example, we go into the box module and call the box.once function. It will force Tarantool to run our code when the server is initialized. We create a space where our data will be stored.
local function bootstrap()
local space = box.schema.create_space('example')
space:create_index('primary')
box.schema.user.grant('guest', 'read,write,execute', 'universe')
-- Keep things safe by default
-- box.schema.user.create('example', { password = 'secret' })
-- box.schema.user.grant('example', 'replication')
-- box.schema.user.grant('example', 'read,write,execute', 'space', 'example')
end
After that, we create a primary index - primary - by which we can search for data. By default, if no parameters are specified, the first field in each entry for the primer index will be used.
Then we make a grant to the guest user, under which we connect via a binary protocol. We allow reading, writing and executing in the entire instance.
Compared to conventional databases, everything is quite simple here. We have space - an area in which our data is simply stored. Each entry is called a tuple. It is packaged in a MessagePack. This is a very cool format - it is binary and takes up less space - 18 bytes versus 27.
It is quite convenient to work with him. Almost every line, every data entry can have completely different columns.
We can view all spaces using the Box.space command. To select a specific instance, we write box.space example and get full information on it.
There are two types of engines built into Tarantool: Memory and Vinyl. Memory stores all data in memory. Therefore, everything works simply and quickly. The data is dumped to disk, and there is also a write ahead log mechanism, so we won't lose anything if the server crashes.
Vinyl stores data on disk in a more familiar form - that is, you can store more data than we have memory, and Tarantula will read it from disk.
Now we will use Memory.
unix/:/var/run/tarantool/example.control> box.space.example
---
- engine: memtx
before_replace: 'function: 0x41eb02c8'
on_replace: 'function: 0x41eb0568'
ck_constraint: []
field_count: 0
temporary: false
index:
0: &0
unique: true
parts:
- type: unsigned
is_nullable: false
fieldno: 1
id: 0
space_id: 512
type: TREE
name: primary
primary: *0
is_local: false
enabled: true
name: example
id: 512
...
unix/:/var/run/tarantool/example.control>
Index:
A primary index must be created for any space, because nothing will work without it. As in any database, we create the first field - the record ID.
Parts:
This is where we specify what our index consists of. It consists of one part - the first field that we will use, type unsigned - a positive integer. As far as I remember from the documentation, the maximum number that can be is 18 quintillion. A lot of awesome.
We can then insert data using the insert command.
unix/:/var/run/tarantool/example.control> box.space.example:insert{1, 'test1', 'test2'}
---
- [1, 'test1', 'test2']
...
unix/:/var/run/tarantool/example.control> box.space.example:insert{2, 'test2', 'test3', 'test4'}
---
- [2, 'test2', 'test3', 'test4']
...
unix/:/var/run/tarantool/example.control> box.space.example:insert{3, 'test3'}
---
- [3, 'test3']
...
unix/:/var/run/tarantool/example.control> box.space.example:insert{4, 'test4'}
---
- [4, 'test4']
...
unix/:/var/run/tarantool/example.control>
The first field is used as the primary key, so it must be unique. We are not limited by the number of columns, so we can insert as much data as we like there. They are specified in the MessagePack format, which I described above.
Data output
Then we can display the data with the select command.
Box.example.select with the key {1} will display the desired entry. If we omit the key, we will see all the records we have. They are all different in the number of columns, but here, in principle, there is no concept of columns - there are field numbers.
There can be absolutely as much data as you like. And for example, we need to search for them in the second field. To do this, we make a new secondary index.
box.space.example:create_index( ‘secondary’, { type = ‘TREE’, unique = false, parts = {{field = 2, type =’string’} }})
We use the Create_index command.
We call it Secondary.
After that, you need to specify the parameters. The index type is TREE. It may not be unique, so we enter Unique = false.
Then we indicate what parts our index consists of. Field is the number of the field to which we bind the index, and specifies the string type. And so it was created.
unix/:/var/run/tarantool/example.control> box.space.example:create_index('secondary', { type = 'TREE', unique = false, parts = {{field = 2, type = 'string'}}})
---
- unique: false
parts:
- type: string
is_nullable: false
fieldno: 2
id: 1
space_id: 512
type: TREE
name: secondary
...
unix/:/var/run/tarantool/example.control>
Now this is how we can call it:
unix/:/var/run/tarantool/example.control> box.space.example.index.secondary:select('test1')
---
- - [1, 'test1', 'test2']
...
Preservation
If we restart the instance and try to call the data again, we will see that they are not there - everything is empty. This happens because Tarantool makes checkpoints and saves the data to disk, but if we stop working before the next save, we will lose all operations - because we will recover from the last checkpoint, which was, for example, two hours ago.
Saving every second will also not work - because constantly dumping 20 GB to disk is a so-so idea.
For this, the concept of write-ahead log was invented and implemented. With its help, for each change in the data, a record is created in a small write-ahead log file.
Each entry up to the checkpoint is stored in them. For these files, we set the size - for example, 64 mb. When it fills up, the recording starts going to the second file. And after the restart, Tarantool recovers from the last checkpoint and then rolls over all later transactions until it stops.
To perform such a recording, you need to specify an option in the box.cfg settings (in the example.lua file):
wal_mode = “write”;
data usage
With what we have written now, you can use the Tarantula to store data and it will work very fast as a database. And now the cherry on the cake - what can you do with it all.
Writing an application
For example, let's write such an application for the Tarantula
See the application under the spoiler
box.cfg {
listen = '0.0.0.0:3301';
io_collect_interval = nil;
readahead = 16320;
memtx_memory = 128 * 1024 * 1024; -- 128Mb
memtx_min_tuple_size = 16;
memtx_max_tuple_size = 128 * 1024 * 1024; -- 128Mb
vinyl_memory = 128 * 1024 * 1024; -- 128Mb
vinyl_cache = 128 * 1024 * 1024; -- 128Mb
vinyl_max_tuple_size = 128 * 1024 * 1024; -- 128Mb
vinyl_write_threads = 2;
wal_mode = "write";
wal_max_size = 256 * 1024 * 1024;
checkpoint_interval = 60 * 60; -- one hour
checkpoint_count = 6;
force_recovery = true;
log_level = 5;
log_nonblock = false;
too_long_threshold = 0.5;
read_only = false
}
local function bootstrap()
local space = box.schema.create_space('example')
space:create_index('primary')
box.schema.user.create('example', { password = 'secret' })
box.schema.user.grant('example', 'read,write,execute', 'space', 'example')
box.schema.user.create('repl', { password = 'replication' })
box.schema.user.grant('repl', 'replication')
end
-- for first run create a space and add set up grants
box.once('replica', bootstrap)
-- enabling console access
console = require('console')
console.listen('127.0.0.1:3302')
-- http config
local charset = {} do -- [0-9a-zA-Z]
for c = 48, 57 do table.insert(charset, string.char(c)) end
for c = 65, 90 do table.insert(charset, string.char(c)) end
for c = 97, 122 do table.insert(charset, string.char(c)) end
end
local function randomString(length)
if not length or length <= 0 then return '' end
math.randomseed(os.clock()^5)
return randomString(length - 1) .. charset[math.random(1, #charset)]
end
local http_router = require('http.router')
local http_server = require('http.server')
local json = require('json')
local httpd = http_server.new('0.0.0.0', 8080, {
log_requests = true,
log_errors = true
})
local router = http_router.new()
local function get_count()
local cnt = box.space.example:len()
return cnt
end
router:route({method = 'GET', path = '/count'}, function()
return {status = 200, body = json.encode({count = get_count()})}
end)
router:route({method = 'GET', path = '/token'}, function()
local token = randomString(32)
local last = box.space.example:len()
box.space.example:insert{ last + 1, token }
return {status = 200, body = json.encode({token = token})}
end)
prometheus = require('prometheus')
fiber = require('fiber')
tokens_count = prometheus.gauge("tarantool_tokens_count",
"API Tokens Count")
function monitor_tokens_count()
while true do
tokens_count:set(get_count())
fiber.sleep(5)
end
end
fiber.create(monitor_tokens_count)
router:route( { method = 'GET', path = '/metrics' }, prometheus.collect_http)
httpd:set_router(router)
httpd:start()
We declare some table in lua which defines symbols. This table is needed to generate a random string.
local charset = {} do -- [0-9a-zA-Z]
for c = 48, 57 do table.insert(charset, string.char(c)) end
for c = 65, 90 do table.insert(charset, string.char(c)) end
for c = 97, 122 do table.insert(charset, string.char(c)) end
end
After that, we declare a function - randomString and give the length value in brackets.
local function randomString(length)
if not length or length <= 0 then return '' end
math.randomseed(os.clock()^5)
return randomString(length - 1) .. charset[math.random(1, #charset)]
end
Then we connect the http router and http server to our Tarantula server, JSON, which we will give to the client.
local http_router = require('http.router')
local http_server = require('http.server')
local json = require('json')
After that, we start on port 8080 on all http server interfaces, which will log all requests and errors.
local httpd = http_server.new('0.0.0.0', 8080, {
log_requests = true,
log_errors = true
})
Next, we declare a route that if a request with the GET method arrives on port 8080 /count, then we call the function from one line. It returns a status - 200, 404, 403 or whatever we specify.
router:route({method = 'GET', path = '/count'}, function()
return {status = 200, body = json.encode({count = get_count()})}
end)
In the body, we return json.encode, we specify count and getcount in it, which is called and shows the number of records in our database.
The second method
router:route({method = 'GET', path = '/token'}, function()
local token = randomString(32)
local last = box.space.example:len()
box.space.example:insert{ last + 1, token }
return {status = 200, body = json.encode({token = token})}
end)
Where in the line router:route({method = 'GET', path = '/token'}, function() we call the function and generate a token.
Line local token = randomString(32) is a random string of 32 characters.
In line local last = box.space.example:len() we pull out the last element.
And in the line box.space.example:insert{ last + 1, token } we write the data to our database, that is, we simply increase the ID by 1. This can be done, by the way, not only in such a clumsy way. Tarantula has sequences for this case.
We write the token there.
Thus, we wrote an application in one file. You can access the data right away, and the box module will do all the dirty work for you.
It listens to http and works with data, everything is in a single instance - both the application and the data. Therefore, everything happens quite quickly.
To run, we install the http module:
How we do it, see under the spoiler
root@test2:/# tarantoolctl rocks install http
Installing http://rocks.tarantool.org/http-scm-1.src.rock
Missing dependencies for http scm-1:
checks >= 3.0.1 (not installed)
http scm-1 depends on checks >= 3.0.1 (not installed)
Installing http://rocks.tarantool.org/checks-3.0.1-1.rockspec
Cloning into 'checks'...
remote: Enumerating objects: 28, done.
remote: Counting objects: 100% (28/28), done.
remote: Compressing objects: 100% (19/19), done.
remote: Total 28 (delta 1), reused 16 (delta 1), pack-reused 0
Receiving objects: 100% (28/28), 12.69 KiB | 12.69 MiB/s, done.
Resolving deltas: 100% (1/1), done.
Note: checking out '580388773ef11085015b5a06fe52d61acf16b201'.
You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.
If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:
git checkout -b <new-branch-name>
No existing manifest. Attempting to rebuild...
checks 3.0.1-1 is now installed in /.rocks (license: BSD)
-- The C compiler identification is GNU 7.5.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Found TARANTOOL: /usr/include (found version "2.4.2-80-g18f2bc82d")
-- Tarantool LUADIR is /.rocks/share/tarantool/rocks/http/scm-1/lua
-- Tarantool LIBDIR is /.rocks/share/tarantool/rocks/http/scm-1/lib
-- Configuring done
-- Generating done
CMake Warning:
Manually-specified variables were not used by the project:
version
-- Build files have been written to: /tmp/luarocks_http-scm-1-V4P9SM/http/build.luarocks
Scanning dependencies of target httpd
[ 50%] Building C object http/CMakeFiles/httpd.dir/lib.c.o
In file included from /tmp/luarocks_http-scm-1-V4P9SM/http/http/lib.c:32:0:
/tmp/luarocks_http-scm-1-V4P9SM/http/http/lib.c: In function ‘tpl_term’:
/usr/include/tarantool/lauxlib.h:144:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
(*(B)->p++ = (char)(c)))
~~~~~~~~~~~^~~~~~~~~~~~
/tmp/luarocks_http-scm-1-V4P9SM/http/http/lib.c:62:7: note: in expansion of macro ‘luaL_addchar’
luaL_addchar(b, '\');
^~~~~~~~~~~~
/tmp/luarocks_http-scm-1-V4P9SM/http/http/lib.c:63:6: note: here
default:
^~~~~~~
In file included from /tmp/luarocks_http-scm-1-V4P9SM/http/http/lib.c:39:0:
/tmp/luarocks_http-scm-1-V4P9SM/http/http/tpleval.h: In function ‘tpe_parse’:
/tmp/luarocks_http-scm-1-V4P9SM/http/http/tpleval.h:147:9: warning: this statement may fall through [-Wimplicit-fallthrough=]
type = TPE_TEXT;
~~~~~^~~~~~~~~~
/tmp/luarocks_http-scm-1-V4P9SM/http/http/tpleval.h:149:3: note: here
case TPE_LINECODE:
^~~~
In file included from /tmp/luarocks_http-scm-1-V4P9SM/http/http/lib.c:40:0:
/tmp/luarocks_http-scm-1-V4P9SM/http/http/httpfast.h: In function ‘httpfast_parse’:
/tmp/luarocks_http-scm-1-V4P9SM/http/http/httpfast.h:372:22: warning: this statement may fall through [-Wimplicit-fallthrough=]
code = 0;
~~~~~^~~
/tmp/luarocks_http-scm-1-V4P9SM/http/http/httpfast.h:374:13: note: here
case status:
^~~~
/tmp/luarocks_http-scm-1-V4P9SM/http/http/httpfast.h:393:23: warning: this statement may fall through [-Wimplicit-fallthrough=]
state = message;
~~~~~~^~~~~~~~~
/tmp/luarocks_http-scm-1-V4P9SM/http/http/httpfast.h:395:13: note: here
case message:
^~~~
[100%] Linking C shared library lib.so
[100%] Built target httpd
[100%] Built target httpd
Install the project...
-- Install configuration: "Debug"
-- Installing: /.rocks/share/tarantool/rocks/http/scm-1/lua/http/VERSION.lua
-- Installing: /.rocks/share/tarantool/rocks/http/scm-1/lib/http/lib.so
-- Installing: /.rocks/share/tarantool/rocks/http/scm-1/lua/http/server/init.lua
-- Installing: /.rocks/share/tarantool/rocks/http/scm-1/lua/http/server/tsgi_adapter.lua
-- Installing: /.rocks/share/tarantool/rocks/http/scm-1/lua/http/nginx_server/init.lua
-- Installing: /.rocks/share/tarantool/rocks/http/scm-1/lua/http/router/init.lua
-- Installing: /.rocks/share/tarantool/rocks/http/scm-1/lua/http/router/fs.lua
-- Installing: /.rocks/share/tarantool/rocks/http/scm-1/lua/http/router/matching.lua
-- Installing: /.rocks/share/tarantool/rocks/http/scm-1/lua/http/router/middleware.lua
-- Installing: /.rocks/share/tarantool/rocks/http/scm-1/lua/http/router/request.lua
-- Installing: /.rocks/share/tarantool/rocks/http/scm-1/lua/http/router/response.lua
-- Installing: /.rocks/share/tarantool/rocks/http/scm-1/lua/http/tsgi.lua
-- Installing: /.rocks/share/tarantool/rocks/http/scm-1/lua/http/utils.lua
-- Installing: /.rocks/share/tarantool/rocks/http/scm-1/lua/http/mime_types.lua
-- Installing: /.rocks/share/tarantool/rocks/http/scm-1/lua/http/codes.lua
http scm-1 is now installed in /.rocks (license: BSD)
root@test2:/#
We also need prometheus to run:
root@test2:/# tarantoolctl rocks install prometheus
Installing http://rocks.tarantool.org/prometheus-scm-1.rockspec
Cloning into 'prometheus'...
remote: Enumerating objects: 19, done.
remote: Counting objects: 100% (19/19), done.
remote: Compressing objects: 100% (19/19), done.
remote: Total 19 (delta 2), reused 5 (delta 0), pack-reused 0
Receiving objects: 100% (19/19), 10.73 KiB | 10.73 MiB/s, done.
Resolving deltas: 100% (2/2), done.
prometheus scm-1 is now installed in /.rocks (license: BSD)
root@test2:/#
We start and can access modules
root@test2:/# curl -D - -s http://127.0.0.1:8080/token
HTTP/1.1 200 Ok
Content-length: 44
Server: Tarantool http (tarantool v2.4.2-80-g18f2bc82d)
Connection: keep-alive
{"token":"e2tPq9l5Z3QZrewRf6uuoJUl3lJgSLOI"}
root@test2:/# curl -D - -s http://127.0.0.1:8080/token
HTTP/1.1 200 Ok
Content-length: 44
Server: Tarantool http (tarantool v2.4.2-80-g18f2bc82d)
Connection: keep-alive
{"token":"fR5aCA84gj9eZI3gJcV0LEDl9XZAG2Iu"}
root@test2:/# curl -D - -s http://127.0.0.1:8080/count
HTTP/1.1 200 Ok
Content-length: 11
Server: Tarantool http (tarantool v2.4.2-80-g18f2bc82d)
Connection: keep-alive
{"count":2}root@test2:/#
/count gives us status 200.
/token issues a token and writes this token to the database.
Testing speed
Let's run a benchmark for 50 requests. Competitive requests will be 000.
root@test2:/# ab -c 500 -n 50000 http://127.0.0.1:8080/token
This is ApacheBench, Version 2.3 <$Revision: 1807734 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking 127.0.0.1 (be patient)
Completed 5000 requests
Completed 10000 requests
Completed 15000 requests
Completed 20000 requests
Completed 25000 requests
Completed 30000 requests
Completed 35000 requests
Completed 40000 requests
Completed 45000 requests
Completed 50000 requests
Finished 50000 requests
Server Software: Tarantool
Server Hostname: 127.0.0.1
Server Port: 8080
Document Path: /token
Document Length: 44 bytes
Concurrency Level: 500
Time taken for tests: 14.578 seconds
Complete requests: 50000
Failed requests: 0
Total transferred: 7950000 bytes
HTML transferred: 2200000 bytes
Requests per second: 3429.87 [#/sec] (mean)
Time per request: 145.778 [ms] (mean)
Time per request: 0.292 [ms] (mean, across all concurrent requests)
Transfer rate: 532.57 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 10 103.2 0 3048
Processing: 12 69 685.1 15 13538
Waiting: 12 69 685.1 15 13538
Total: 12 78 768.2 15 14573
Percentage of the requests served within a certain time (ms)
50% 15
66% 15
75% 16
80% 16
90% 16
95% 16
98% 21
99% 42
100% 14573 (longest request)
root@test2:/#
Tokens are issued. And we are constantly recording data. 99% of requests were completed in 42 milliseconds. Accordingly, we have about 3500 requests per second on a small machine, where there are 2 cores and 4 gigabytes of memory.
You can also select some 50000 token and see its value.
You can use not only http, run background functions that process your data. Plus there are various triggers. For example, you can call functions on updates, check something - fix conflicts.
You can write script applications directly in the database server itself, and not be limited by anything, connect any modules and implement any logic.
The application server can access external servers, collect data and add it to its database. Data from this database will be used by other applications.
This will be done by the Tarantula itself, and there is no need to write a separate application.
In conclusion
This is just the first part of a big job. The second one will be published very soon on the Mail.ru Group blog, and we will definitely add a link to it in this article.
If you're interested in attending events where we create these things online and asking questions in real time, connect to
If you need to move to the cloud or have questions about your infrastructure,
PS We have 2 free audits per month, perhaps your project will be one of them.
Source: habr.com