Kubernetes Networking Plugin (CNI) Benchmark Results over 10 Gbps Network (updated April 2019)

Kubernetes Networking Plugin (CNI) Benchmark Results over 10 Gbps Network (updated April 2019)
This is my update previous benchmark, which is now running on Kubernetes 1.14 with the latest CNI as of April 2019.

Firstly, I want to thank the Cilium team: the guys helped me check and fix the metrics monitoring scripts.

What has changed since November 2018

Here's what has changed since then (if you're interested):

Flannel remains the fastest and simplest CNI but still lacks network policy and encryption support.

Romana is no longer maintained, so we've removed it from the benchmark.

WeaveNet now supports network policies for Ingress and Egress! But performance has dropped.

In Calico, you still need to manually adjust the maximum packet size (MTU) for best performance. Calico offers two options for installing CNI, so you don't need a separate ETCD store:

  • storing state in the Kubernetes API as a data store (cluster size < 50 nodes);
  • store state in Kubernetes API as data store with Typha proxy to offload K8S API (cluster size > 50 nodes).

Calico announced support application layer policies on top of Istio for application layer security.

Cilium now supports encryption! Cilium provides encryption with IPSec tunnels and offers an alternative to the encrypted WeaveNet network. But WeaveNet is faster than Cilium with encryption enabled.

Cilium is now easier to deploy thanks to the built-in ETCD operator.

The Cilium team has tried to cut weight off their CNI by cutting down on memory and CPU usage, but the competition is still lighter.

Benchmark context

The benchmark is run on three non-virtualized Supermicro servers with a 10 Gb Supermicro switch. The servers are connected directly to the switch via SFP+ passive DAC cables and configured on the same VLAN with jumbo frames (MTU 9000).

Kubernetes 1.14.0 is installed on Ubuntu 18.04 LTS with Docker 18.09.2 (the default version of Docker in this release).

To improve reproducibility, we decided to always set up the master on the first node, host the back end of the benchmark on the second server, and the client side on the third. To do this, we use NodeSelector in Kubernetes deployments.

The benchmark results will be described according to the following scale:

Kubernetes Networking Plugin (CNI) Benchmark Results over 10 Gbps Network (updated April 2019)

Choosing a CNI for a Benchmark

This is a benchmark only for CNI from the list in the section about creating a single master cluster with kubeadm See the official Kubernetes documentation. Of the 9 CNIs, we will take only 6: we will exclude those that are difficult to install and / or do not work without configuration according to the documentation (Romana, Contiv-VPP and JuniperContrail / TungstenFabric).

We will compare the following CNIs:

  • Calico v3.6
  • Canal v3.6 (essentially Flannel for networking + Calico as a firewall)
  • Cilium 1.4.2
  • Flannel 0.11.0
  • kube-router 0.2.5
  • WeaveNet 2.5.1

Installation

The easier it is to install the CNI, the better our first impression will be. All CNIs from the benchmark are very easy to install (with one or two commands).

As we said, the servers and switch are configured with jumbo frames enabled (we set MTU 9000). We would be happy if CNI automatically determined the MTU based on the configuration of the adapters. However, only Cilium and Flannel managed this. The rest of the CNIs have GitHub requests to add automatic MTU discovery, but we will configure it manually by changing the ConfigMap for Calico, Canal and Kube-router, or by passing in an environment variable for WeaveNet.

What is the problem with wrong MTU? This diagram shows the difference between WeaveNet with default MTU and jumbo frames enabled:

Kubernetes Networking Plugin (CNI) Benchmark Results over 10 Gbps Network (updated April 2019)
How does the MTU parameter affect throughput

We figured out how important MTU is for performance, and now let's see how our CNI automatically determines it:

Kubernetes Networking Plugin (CNI) Benchmark Results over 10 Gbps Network (updated April 2019)
CNI automatically detect MTU

The graph shows that you need to adjust the MTU for Calico, Canal, Kube-router and WeaveNet for optimal performance. Cilium and Flannel were able to correctly determine the MTU themselves without any settings.

Security

We will compare CNI security in two aspects: the ability to encrypt transmitted data and the implementation of Kubernetes network policies (according to real tests, not documentation).

Only two CNIs encrypt data: Cilium and WeaveNet. Encryption WeaveNet enabled by setting the encryption password as a CNI environment variable. IN documentation WeaveNet is described in a complicated way, but everything is simply done simply. Encryption cilium configured by commands, by creating Kubernetes secrets, and by modifying the daemonSet (a little more complicated than in WeaveNet, but Cilium has step-by-step instructions).

As for the implementation of the network policy, they have succeeded Calico, Canal, Cilium and WeaveNet, where you can configure Ingress and Egress rules. For kube-router there are rules only for Ingress, and Flannel there are no network policies at all.

Here are the overall results:

Kubernetes Networking Plugin (CNI) Benchmark Results over 10 Gbps Network (updated April 2019)
Security Performance Benchmark Results

Performance

This benchmark shows the average throughput over at least three runs of each test. We test the performance of TCP and UDP (with iperf3), real applications like HTTP (with Nginx and curl) or FTP (with vsftpd and curl), and finally performance of applications using SCP-based encryption (using client and server openSSH).

For all tests, we made a bare metal benchmark (green line) to compare CNI performance with native network performance. Here we use the same scale, but in color:

  • Yellow = very good
  • Orange = good
  • Blue = so-so
  • Red = bad

We will not take misconfigured CNIs and will only show results for CNIs with the correct MTU. (Note: Cilium doesn't calculate the MTU correctly when encryption is enabled, so you'll have to manually reduce the MTU to 8900 in version 1.4. In the next version, 1.5, this is done automatically.)

Here are the results:

Kubernetes Networking Plugin (CNI) Benchmark Results over 10 Gbps Network (updated April 2019)
TCP performance

All CNIs performed well in the TCP benchmark. Encrypted CNIs are far behind because encryption is expensive.

Kubernetes Networking Plugin (CNI) Benchmark Results over 10 Gbps Network (updated April 2019)
UDP performance

Here, too, all CNIs are doing well. CNI with encryption showed almost the same result. Cilium is slightly behind the competition, but it's only 2,3% of bare metal, so the result is not bad. Don't forget that only Cilium and Flannel figured out the MTU correctly themselves, and these are their results without further tweaking.

Kubernetes Networking Plugin (CNI) Benchmark Results over 10 Gbps Network (updated April 2019)

How about a real application? As you can see, for HTTP the overall performance is slightly lower than for TCP. Even if using HTTP with TCP, we configured iperf3 in the TCP benchmark to avoid a slow start, which would affect the HTTP benchmark. Everyone did well here. Kube-router has a clear advantage, and WeaveNet did not perform well: about 20% worse than bare metal. Cilium and WeaveNet with encryption look quite sad.

Kubernetes Networking Plugin (CNI) Benchmark Results over 10 Gbps Network (updated April 2019)

With FTP, another TCP-based protocol, results vary. Flannel and Kube-router manage, while Calico, Canal and Cilium are a little behind and are about 10% slower than bare metal. WeaveNet falls short by as much as 17%, but encrypted WeaveNet is 40% ahead of encrypted Cilium.

Kubernetes Networking Plugin (CNI) Benchmark Results over 10 Gbps Network (updated April 2019)

With SCP, you can immediately see what SSH encryption costs us. Almost all CNIs are doing well, and WeaveNet is falling behind again. Cilium and WeaveNet are expectedly the worst with encryption due to double encryption (SSH + CNI).

Here is a summary table with the results:

Kubernetes Networking Plugin (CNI) Benchmark Results over 10 Gbps Network (updated April 2019)

Resource consumption

Now let's compare how CNIs consume resources under heavy loads (during TCP transmission, 10 Gbps). In performance tests, we compare CNI to bare metal (green line). For resource consumption, let's show pure Kubernetes (purple line) without CNI and see how much extra resources CNI consumes.

Let's start with memory. Here is the average for nodes RAM (excluding buffers and cache) in MB during transfer.

Kubernetes Networking Plugin (CNI) Benchmark Results over 10 Gbps Network (updated April 2019)
Memory consumption

Flannel and Kube-router showed excellent results - only 50 MB. Calico and Canal have 70 each. WeaveNet clearly consumes more than the rest - 130 MB, and Cilium uses as much as 400.
Now let's check the CPU time consumption. Notes: on the diagram, not percentages, but ppm, that is, 38 ppm for “bare iron” is 3,8%. Here are the results:

Kubernetes Networking Plugin (CNI) Benchmark Results over 10 Gbps Network (updated April 2019)
CPU Consumption

Calico, Canal, Flannel, and Kube-router are very CPU efficient - only 2% more than Kubernetes without CNI. WeaveNet is far behind with an extra 5%, followed by Cilium with 7%.

Here is the resource consumption summary:

Kubernetes Networking Plugin (CNI) Benchmark Results over 10 Gbps Network (updated April 2019)

Results

Table with all results:

Kubernetes Networking Plugin (CNI) Benchmark Results over 10 Gbps Network (updated April 2019)
General benchmark results

Conclusion

In the last part, I will express my subjective opinion about the results. Keep in mind that this benchmark only tests the throughput of a single connection on a very small cluster (3 nodes). It does not apply to large clusters (<50 nodes) or concurrent connections.

I advise using the following CNIs depending on the scenario:

  • Do you have in your cluster nodes with few resources (several GB of RAM, multiple cores) and you don't need security features - choose Flannel. It is one of the most economical CNIs. And it is compatible with a wide variety of architectures (amd64, arm, arm64, etc.). In addition, this is one of the two (the second is Cilium) CNI, which can automatically determine the MTU, so you don’t have to configure anything. Kube-router is also suitable, but it is not as standard and you will need to manually configure the MTU.
  • If needed encrypt the network for safety, take WeaveNet. Don't forget to set the MTU size if using jumbo frames, and enable encryption by specifying the password via an environment variable. But it’s better to forget about performance - this is the price for encryption.
  • For normal use I advise Calico. This CNI is widely used in various Kubernetes deployment tools (Kops, Kubespray, Rancher, etc.). As with WeaveNet, don't forget to set the MTU in ConfigMap if you are using jumbo frames. This is a multifunctional tool that is efficient in terms of resource consumption, performance and security.

And finally, I advise you to follow the development cilium. This CNI has a very active team that works a lot on their product (features, resource savings, performance, security, clustering…) and they have very interesting plans.

Kubernetes Networking Plugin (CNI) Benchmark Results over 10 Gbps Network (updated April 2019)
Visual Chart for CNI Selection

Source: habr.com

Add a comment