Network driver performance comparison in 10 programming languages

A group of researchers from German universities ΠΎΠΏΡƒΠ±Π»ΠΈΠΊΠΎΠ²Π°Π»Π° findings experiment, during which 10 variants of a typical driver for 10-gigabit Intel Ixgbe (X5xx) network cards were developed in different programming languages. The driver runs in user space and is implemented in C, Rust, Go, C#, Java, OCaml, Haskell, Swift, JavaScript, and Python. When writing the code, the focus was on achieving the highest possible performance, taking into account the characteristics of each language. In terms of functionality, all options are identical and consist of approximately 1000 lines of code. Project achievements extend under the BSD license.

The Rust version of the driver was very close in performance to the reference C driver. With a load with a single sending of blocks of 32 packets, the Rust driver lagged a little, but in tests with more than 32 packets per block, it practically did not differ in speed from the C driver and demonstrated performance at the level of processing 28 million packets per second on a server with a Xeon CPU E3-1230 v2 3.3 GHz.

Network driver performance comparison in 10 programming languages

The next niche in terms of performance was occupied by Go and C# drivers, which showed fairly close results (the Go driver won in tests with blocks that included up to 16 packages, and began to lose slightly in tests with more than 16 packages in a block). With 256 packets per block, the peak performance for the C# driver was approximately 28Mpps, and for the Go drivers, approximately 25Mpps.

Next, with fairly close results, followed by drivers for
Java, OCaml and Haskell, which were already noticeably behind the options previously considered and could not overcome the bar of 12 million packets per second. Drivers based on Swift and JavaScript showed an even greater backlog, which were able to process streams at the level of 5 million packets per second.

The driver in the Python language closed the rating, which was able to process only 0.14 million packets per second. The Python implementation was used to evaluate the speed of interpreters without JIT and without specific optimizations (the code was run using CPython 3.7 and was not compatible with PyPy, but it is noted that optimizing data structures could improve performance by about 10 times).

Additionally, latency tests were carried out, which showed the effectiveness of buffering and the impact of the garbage collector. The test measured the latency after each packet was redirected by the driver, compared to a known send time. The leaders were still the C and Rust drivers, the results of which were almost indistinguishable for a flow of 1 million packets per second (about 20 Β΅s). The driver in the Go language performed well, which was only slightly behind the leaders and also kept at the level of 20 Β΅s. The C# driver showed delays of about 50 Β΅s.
Drivers based on JavaScript and Java showed the biggest delays (delays over 300 Β΅s).

Network driver performance comparison in 10 programming languages

The study was carried out in order to evaluate the possibility of developing drivers and operating system components in languages ​​of a higher level than C. Currently, 39 out of 40 Linux memory problems are driver-related, so the issues of adopting a safer language and moving drivers out of the kernel and into user space remain relevant and manufacturers are already actively experimenting in this direction (for example, Google has developed a TCP stack for OS Fuchsia in Go, CloudFlare has created implementation of the QUIC protocol in Rust, Apple moved the TCP stack on mobile devices to user space).

In the course of the work carried out, it was concluded that the Rust language is the best candidate for driver development. The features provided by Rust allow you to get rid of the problems that arise due to low-level memory handling, at the cost of a performance penalty of about 2%-10% compared to C-language drivers. Go and C# are also found to be suitable for building system components in situations where sub-millisecond latency caused by the garbage collector is acceptable.

Source: opennet.ru

Add a comment