Speed ​​up Tarantool PHP connectors with Async, Swoole and Parallel

Speed ​​up Tarantool PHP connectors with Async, Swoole and Parallel

In the PHP ecosystem, there are currently two connectors for working with the Tarantool server - this is the official PECL extension tarantool/tarantool-phpwritten in C, and tarantool-php/client, written in PHP. I am the author of the latter.

In this article, I would like to share the results of testing the performance of both libraries and show how, with minimal changes in the code, you can achieve 3-5 performance gains (on synthetic tests!).

What will we test?

We will test the above synchronous connectors running asynchronously, in parallel, and asynchronously-parallel. 🙂 Also, we don't want to touch the code of the connectors themselves. At the moment, there are several extensions available that allow you to achieve what you want:

  • Swoole is a high performance asynchronous framework for PHP. Used by Internet giants such as Alibaba and Baidu. Since version 4.1.0 there is a magic method SwooleRuntime::enableCoroutine(), which allows you to "transform synchronous PHP network libraries into asynchronous ones with one line of code."
  • Async is, until recently, a very promising extension for asynchronous work in PHP. Why until recently? Unfortunately, for a reason unknown to me, the author deleted the repository and the further fate of the project is vague. Will have to take advantage one from forks. Like Swoole, this extension allows you to easily turn pants on to enable asynchrony by replacing the standard implementation of TCP and TLS streams with their asynchronous versions. This is done through the option "async.tcp=1«.
  • Parallel - a fairly new extension from the notorious Joe Watkins, the author of such libraries as phpdbg, apcu, pthreads, pcov, uopz. The extension provides an API for multi-threading in PHP and is positioned as a replacement for pthreads. A significant limitation of the library is that it only works with the ZTS (Zend Thread Safe) version of PHP.

How will we test?

Let's start a Tarantool instance with write-ahead logging disabled (wal_mode = none) and increased network buffer (readahead = 1 * 1024 * 1024). The first option will exclude work with the disk, the second one will make it possible to read more requests from the operating system buffer and thereby minimize the number of system calls.

For benchmarks that work with data (insert, delete, read, etc.), before starting the benchmark, a memtx-space will be (re)created, in which the values ​​of the primary index are created by the generator of ordered integer values ​​(sequence).
The space DDL looks like this:

space = box.schema.space.create(config.space_name, {id = config.space_id, temporary = true})
space:create_index('primary', {type = 'tree', parts = {1, 'unsigned'}, sequence = true})
space:format({{name = 'id', type = 'unsigned'}, {name = 'name', type = 'string', is_nullable = false}})

If necessary, before running the benchmark, the space is filled with 10,000 tuples of the form

{id, "tuplе_<id>"}

Tuples are accessed by a random key value.

The benchmark itself is a single request to the server, which is executed 10,000 times (revolutions), which, in turn, are performed in iterations. The iterations are repeated until all time deviations between the 5 iterations are within the 3%* margin of error. After that, the average result is taken. There is a 1 second pause between iterations to prevent the processor from throttling. The Lua garbage collector is disabled before each iteration and forced to run after it completes. The PHP process is started only with the extensions necessary for the benchmark, with output buffering enabled and the garbage collector disabled.

* The number of revolutions, iterations and the error threshold can be changed in the benchmark settings.

Test environment

The results published below were made on a MacBookPro (2015), operating system - Fedora 30 (kernel version 5.3.8-200.fc30.x86_64). Tarantool was launched in docker with the parameter "--network host".

Package versions:

Tarantool: 2.3.0-115-g5ba5ed37e
Docker: 19.03.3, build a872fc2f86
PHP: 7.3.11 (cli) (built: Oct 22 2019 08:11:04)
tarantool/client: 0.6.0
rybakit/msgpack: 0.6.1
ext-tarantool: 0.3.2 (+ patch for 7.3)*
ext-msgpack: 2.0.3
ext-async: 0.3.0-8c1da46
ext-swoole: 4.4.12
ext-parallel: 1.1.3

* Unfortunately, the official connector does not work with PHP version > 7.2. To compile and run the PHP 7.3 extension, I had to use patch.

The results

Synchronous mode

Tarantool's protocol uses a binary format message pack to serialize messages. In the PECL connector, serialization is hidden deep in the bowels of the library and affect the encoding process from userland code does not seem possible. The pure PHP connector, on the other hand, provides the ability to customize the encoding process by extending the standard encoder or by using your own implementation. There are two encoders available out of the box, one based on msgpack/msgpack-php (Official MessagePack PECL extension), the other is on rybakit/msgpack (in pure PHP).

Before comparing connectors, let's measure the performance of the MessagePack encoders for the PHP connector and in further tests we will use the one that shows the best result:

Speed ​​up Tarantool PHP connectors with Async, Swoole and Parallel
Although the PHP version (Pure) is inferior to the PECL extension in speed, in real projects I would still recommend using rybakit/msgpack, because the format specification is only partially implemented in the official MessagePack extension (for example, there is no support for custom data types, without which you cannot use Decimal, a new data type introduced in Tarantool 2.3) and has a number of others problems (including compatibility issues with PHP 7.4). Well, in general, the project looks abandoned.

So, let's measure the performance of connectors in synchronous mode:

Speed ​​up Tarantool PHP connectors with Async, Swoole and Parallel
As you can see from the graph, the PECL connector (Tarantool) shows better performance compared to the PHP connector (Client). But this is not surprising, given that the latter, in addition to being implemented in a slower language, actually does more work: a new object is created with each call Request и Response (in the case of Select, also Criteria, and in the case of Update/Upsert, Operations), individual entities Connection, Packer и Handler also add an overhead. Obviously, there is a price to pay for flexibility. However, in general, the PHP interpreter shows good performance, although there is a difference, but it is negligible and will probably be even less when using preloading in PHP 7.4, not to mention JIT in PHP 8.

We move on. Tarantool 2.0 introduced support for SQL. Let's try to execute the Select, Insert, Update and Delete operations using the SQL protocol and compare the results with the noSQL (binary) equivalents:

Speed ​​up Tarantool PHP connectors with Async, Swoole and Parallel
The SQL results are not very impressive (recall that we are still testing synchronous mode). However, I would not get upset about this ahead of time, SQL support is still under active development (relatively recently, for example, support for prepared statements) and, judging by the list issues, the SQL engine is waiting for a number of optimizations in the future.

Async

Well, now let's see how the Async extension can help us improve the results above. To write asynchronous programs, the extension provides an API based on coroutines, and we will use it. Empirically, we find out that the optimal number of coroutines for our environment is 25:

Speed ​​up Tarantool PHP connectors with Async, Swoole and Parallel
We “smear” 10,000 operations over 25 coroutines and see what happens:

Speed ​​up Tarantool PHP connectors with Async, Swoole and Parallel
The number of operations per second increased by more than 3 times for tarantool-php/client!

Sadly, the PECL connector didn't start with ext-async.

And what about SQL?

Speed ​​up Tarantool PHP connectors with Async, Swoole and Parallel
As you can see, in asynchronous mode, the difference between the binary protocol and SQL has become within the margin of error.

Swoole

Again, we find out the optimal number of coroutines, now for Swoole:
Speed ​​up Tarantool PHP connectors with Async, Swoole and Parallel
Let's stop at 25. Let's repeat the same trick as with the Async extension - distribute 10,000 operations among 25 coroutines. In addition, we will add another test in which we will divide all the work into 2 two processes (that is, each process will perform 5,000 operations in 25 coroutines). Processes will be created using SwooleProcess.

Results:

Speed ​​up Tarantool PHP connectors with Async, Swoole and Parallel
Swole shows a slightly lower result compared to Async when run in one process, but with 2 processes the picture changes dramatically (the number 2 was not chosen by chance, on my machine it was 2 processes that showed the best result).

By the way, the Async extension also has an API for working with processes, but there I didn’t notice any difference from running benchmarks in one or more processes (it’s possible that I screwed up somewhere).

SQL vs binary protocol:

Speed ​​up Tarantool PHP connectors with Async, Swoole and Parallel
As with Async, the difference between binary and SQL operations is leveled out in asynchronous mode.

Parallel

Since the Parallel extension is not about coroutines, but about threads, let's measure the optimal number of parallel threads:

Speed ​​up Tarantool PHP connectors with Async, Swoole and Parallel
It is 16 on my machine. Let's run connector benchmarks on 16 parallel threads:

Speed ​​up Tarantool PHP connectors with Async, Swoole and Parallel
As you can see, the result is even better than with asynchronous extensions (not counting Swoole running on 2 processes). Note that for the PECL connector, the Update and Upsert operations are empty. This is due to the fact that these operations crashed with an error - I don’t know, due to the fault of ext-parallel, ext-tarantool, or both.

Now let's compare SQL performance:

Speed ​​up Tarantool PHP connectors with Async, Swoole and Parallel
Notice the similarity with the graph for connectors running synchronously?

Together

And finally, let's summarize all the results in one graph to see the big picture for the tested extensions. Let's add only one new test to the chart, which we haven't done yet - we'll run Async coroutines in parallel using Parallel*. The idea of ​​integrating the above extensions is already was discussed authors, but no consensus has been reached, you will have to do it yourself.

* It was not possible to launch Swoole coroutines with Parallel, it seems these extensions are incompatible.

So the final results:

Speed ​​up Tarantool PHP connectors with Async, Swoole and Parallel

Instead of a conclusion

In my opinion, the results turned out to be quite worthy, and for some reason I am sure that this is not the limit! Whether you need to decide this in a real project is entirely up to you, I can only say that for me it was an interesting experiment that allows me to evaluate how much you can “squeeze” out of a synchronous TCP connector with minimal effort. If you have any ideas for improving the benchmarks, I will gladly consider your pull request. All code with run instructions and results is published in a separate repositories.

Source: habr.com

Add a comment