When Linux conntrack is no longer your friend

When Linux conntrack is no longer your friend

Connection tracking (β€œconntrack”) is a core feature of the Linux kernel networking stack. It allows the kernel to keep track of all logical network connections or flows and thereby identify all the packets that make up each flow so that they can be sequentially processed together.

Conntrack is an important kernel feature that is used in some basic cases:

  • NAT relies on conntrack information so it can handle all packets from the same stream in the same way. For example, when a pod accesses a Kubernetes service, the kube-proxy load balancer uses NAT to direct traffic to a specific pod within the cluster. Conntrack records that for a particular connection, all packets to the IP service must be sent to the same pod, and that packets returned by the backend pod must be NATed back to the pod from which the request came.
  • Stateful firewalls such as Calico rely on information from conntrack to whitelist "return" traffic. This allows you to write a network policy that says "allow my pod to connect to any remote IP" without having to write a policy to explicitly allow return traffic. (Without this, you would have to add the much less secure "allow packets to my pod from any IP" rule.)

In addition, conntrack generally improves system performance (reducing CPU consumption and packet latency) because only the first packet in a stream
must go through the full processing of the network stack to determine what to do with it. See post "Comparison of kube-proxy modesto see an example of how it works.

However, conntrack has its limitations...

So where did it all go wrong?

The conntrack table has a configurable maximum size, and if it fills up, connections will usually start to be rejected or dropped. There is enough free space in the table to process the traffic of most applications, and this will never become a problem. However, there are a few scenarios where you should consider using the conntrack table:

  • The most obvious case is if your server is handling an extremely large number of concurrently active connections. For example, if your conntrack table is set to 128k entries, but you have > 128k concurrent connections, you will surely run into a problem!
  • A slightly less obvious case is if your server is handling a very high number of connections per second. Even if connections are short-lived, they continue to be monitored by Linux for some period of time (120s by default). For example, if your conntrack table is set to 128k records and you are trying to process 1100 connections per second, they will exceed the size of the conntrack table, even if the connections are very short lived (128k/120s = 1092 connections/s).

There are several niche app types that fall into these categories. Also, if you have a lot of detractors, then flooding your server's conntrack table with many half-open connections can be used as part of a denial of service (DOS) attack. In both cases, conntrack can become the limiting bottleneck in your system. In some cases, tweaking the parameters of the conntrack table may be enough to suit your needs - by increasing the size or reducing the conntrack timeouts (but if you don't do it right, you'll run into a lot of trouble). For other cases it will be necessary to bypass conntrack for aggressive traffic.

A real example

To give a concrete example, one large SaaS provider we worked with had a number of memcached servers on hosts (not virtual machines), each of which handled 50K+ short-lived connections per second.

They experimented with the conntrack configuration, increasing the table sizes and reducing the tracking time, but the configuration was unreliable, the RAM consumption increased significantly, which was a problem (on the order of GB!), and the connections were so short that conntrack did not create its usual performance gain (reduced consumption CPU or packet delays).

As an alternative, they turned to Calico. Calico network policies allow you to not use conntrack for certain types of traffic (using the doNotTrack option for policies). This gave them the level of performance they needed plus the extra layer of security provided by Calico.

What will you have to do to bypass conntrack?

  • Do-not-track network policies should generally be symmetrical. In the case of a SaaS provider: their applications were running inside the protected zone and therefore, using network policy, they could whitelist traffic from other specific applications that were allowed to access memcached.
  • The do-not-track policy does not respect the direction of the connection. Thus, if the memcached server is hacked, it could theoretically try to connect to any of the memcached clients if it uses the correct source port. However, if you have correctly defined the network policy for your memcached clients, then these connection attempts will still be rejected on the client side.
  • The do-not-track policy is applied to each packet, as opposed to regular policies that only apply to the first packet in a stream. This can increase CPU usage per packet, as a policy needs to be applied for each packet. But for short-lived connections, this overhead is balanced by a reduction in conntrack processing overhead. For example, in the case of a SaaS provider, the number of packets for each connection was very small, so the additional CPU overhead when applying policies to each packet was justified.

Let's get to the tests

We ran the test on a single pod with a memcached server and multiple memcached client pods running on remote nodes so that we could start a very high number of connections per second. The server with the memcached pod had 8 cores and 512k entries in the conntrack table (the default configured table size for the host).
We measured the performance difference between: no network policy; with the usual Calico policy; and Calico do-not-track policy.

For the first test, we set the number of connections to 4.000 per second, so we could focus on the difference in CPU consumption. There was no significant difference between no policy and normal policy here, but do-not-track increased CPU consumption by about 20% :

When Linux conntrack is no longer your friend

In the second test, we launched as many connections as our clients could generate and measured the maximum number of connections per second that our memcached server could handle. As expected, in the "no policy" and "normal policy" cases both hit the conntrack limit of over 4,000 connections per second (512k / 120s = 4,369 connections/s). With the do-not-track policy, our clients were sending 60,000 connections per second without any problems. We are sure that we could increase this number by connecting more clients, but we feel that these numbers are already enough to illustrate the message of this article!

When Linux conntrack is no longer your friend

Conclusion

Conntrack is an important kernel feature. He does an excellent job. Often it is used by key components of the system. However, in some specific scenarios, conntrack's overhead outweighs the normal benefits it provides. In this scenario, Calico network policies can be used to selectively disable the use of conntrack while improving network security. For all other traffic, conntrack continues to be your friend!

Also read other articles on our blog:

Source: habr.com

Add a comment