Calico for networking in Kubernetes: an introduction and a bit of experience

Calico for networking in Kubernetes: an introduction and a bit of experience

The purpose of the article is to introduce the reader to the basics of networking and network policy management in Kubernetes, as well as to a third-party Calico plugin that extends the standard features. Along the way, the convenience of the configuration and some features will be demonstrated using real examples from the experience of our operation.

A Quick Introduction to Kubernetes Network Appliance

A Kubernetes cluster is unthinkable without a network. We have already published materials on their basics:An Illustrated Guide to Networking in Kubernetes" and "An Introduction to Kubernetes Network Policies for Security ProfessionalsΒ».

In the context of this article, it is important to note that K8s itself is not responsible for the network connectivity between containers and nodes: all kinds of CNI plugins (Container Networking Interface). For more information on this concept, we also told.

For example, the most common of these plugins is Flannel - provides full network connectivity between all cluster nodes by raising bridges on each node, assigning a subnet to it. However, full and unregulated availability is not always beneficial. To provide some kind of minimal isolation in the cluster, it is necessary to intervene in the configuration of the firewall. In the general case, it is given to the management of the same CNI, due to which any third-party interventions in iptables can be interpreted incorrectly or ignored altogether.

And "out of the box" for organizing network policy management in a Kubernetes cluster is provided NetworkPolicy API. This resource, which spans selected namespaces, may contain rules to restrict access from one application to another. It also allows you to configure accessibility between specific pods, environments (namespaces) or blocks of IP addresses:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: test-network-policy
  namespace: default
spec:
  podSelector:
    matchLabels:
      role: db
  policyTypes:
  - Ingress
  - Egress
  ingress:
  - from:
    - ipBlock:
        cidr: 172.17.0.0/16
        except:
        - 172.17.1.0/24
    - namespaceSelector:
        matchLabels:
          project: myproject
    - podSelector:
        matchLabels:
          role: frontend
    ports:
    - protocol: TCP
      port: 6379
  egress:
  - to:
    - ipBlock:
        cidr: 10.0.0.0/24
    ports:
    - protocol: TCP
      port: 5978

This is not the most primitive example of official documentation can once and for all discourage the desire to understand the logic of network policies. However, we will still try to understand the basic principles and methods of processing traffic flows using network policies ...

It is logical that there are 2 types of traffic: entering the pod (Ingress) and outgoing from it (Egress).

Calico for networking in Kubernetes: an introduction and a bit of experience

Actually, politics is divided into these 2 categories in the direction of movement.

The next required attribute is the selector; the one to whom the rule applies. This can be a pod (or a group of pods) or an environment (i.e. a namespace). An important detail: both types of these objects must contain a label (label in the terminology of Kubernetes) - politicians operate with them.

In addition to the finite number of selectors united by some kind of label, there is the possibility of writing rules like β€œAllow / deny all / all” in different variations. For this, constructions of the form are used:

  podSelector: {}
  ingress: []
  policyTypes:
  - Ingress

- in this example, incoming traffic is closed to all pods of the environment. The opposite behavior can be achieved with the following construction:

  podSelector: {}
  ingress:
  - {}
  policyTypes:
  - Ingress

Similarly for outgoing:

  podSelector: {}
  policyTypes:
  - Egress

- to turn it off. And here's what to include:

  podSelector: {}
  egress:
  - {}
  policyTypes:
  - Egress

Returning to the choice of a CNI plugin for a cluster, it is worth noting that not every network plugin supports NetworkPolicy. For example, the already mentioned Flannel does not know how to configure network policies, which bluntly said in the official repository. An alternative is also mentioned there - an Open Source project Calico, which significantly expands the standard set of Kubernetes APIs in terms of network policies.

Calico for networking in Kubernetes: an introduction and a bit of experience

Meet Calico: Theory

Calico plugin can be used in integration with Flannel (subproject Canal) or on its own, covering both network connectivity and availability management capabilities.

What opportunities does the use of the "boxed" solution K8s and the API set from Calico provide?

Here's what's built into the NetworkPolicy:

  • politicians are limited by the environment;
  • policies are applied to labeled pods;
  • rules can be applied to pods, environments or subnets;
  • rules can contain protocols, named or symbolic port references.

And here is how Calico extends these features:

  • policies can be applied to any object: pod, container, virtual machine, or interface;
  • rules can contain a specific action (prohibition, permission, logging);
  • as a target or source of rules, there can be a port, a range of ports, protocols, HTTP or ICMP attributes, IP or subnet (4th or 6th generation), any selectors (hosts, hosts, environments);
  • additionally, you can regulate the passage of traffic using the DNAT settings and traffic forwarding policies.

The first commits on GitHub in the Calico repository date back to July 2016, and a year later the project took a leading position in organizing Kubernetes network connectivity - this is stated, for example, by the results of a survey, by The New Stack:

Calico for networking in Kubernetes: an introduction and a bit of experience

Many large managed solutions with K8s, such as Amazon EX, Azure AKS, Google GKE and others began to recommend it for use.

In terms of performance, everything is great here. In testing their product, the Calico development team has shown astronomical performance, running over 50000 containers on 500 physical nodes at a rate of 20 containers per second. No scaling issues have been identified. Such results were announced already at the announcement of the first version. Independent research focused on throughput and resource consumption also confirms the performance of Calico, which is almost equal to Flannel. For example:

Calico for networking in Kubernetes: an introduction and a bit of experience

The project is developing very quickly, work is supported in popular managed K8s, OpenShift, OpenStack solutions, it is possible to use Calico when deploying a cluster using since, there are references to building Service Mesh networks (Here is an example use with Istio).

Practice with Calico

In the general case of using vanilla Kubernetes, installing CNI comes down to applying the file calico.yaml, downloaded from the official site, through kubectl apply -f.

As a rule, the current version of the plugin is compatible with the latest 2-3 versions of Kubernetes: work in older versions is not tested and is not guaranteed. According to the developers, Calico runs on Linux kernel above 3.10 running CentOS 7, Ubuntu 16 or Debian 8, on top of iptables or IPVS.

Isolation within the environment

For a general understanding, let's consider a simple case to understand how network policies in Calico notation differ from standard ones and how the approach to writing rules simplifies their readability and configuration flexibility:

Calico for networking in Kubernetes: an introduction and a bit of experience

There are 2 web applications deployed in the cluster: on Node.js and PHP, one of which uses Redis. To block access to Redis from PHP while still being connected to Node.js, just apply the following policy:

kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: allow-redis-nodejs
spec:
  podSelector:
    matchLabels:
      service: redis
  ingress:
  - from:
    - podSelector:
        matchLabels:
          service: nodejs
    ports:
    - protocol: TCP
      port: 6379

In essence, we allowed incoming traffic on the Redis port from Node.js. And they obviously didn't forbid anything else. As soon as a NetworkPolicy appears, then all selectors mentioned in it begin to be isolated, unless otherwise specified. In this case, the isolation rules do not apply to other objects that are not covered by the selector.

The example uses apiVersion Kubernetes out of the box, but nothing prevents you from using resource of the same name from the Calico distribution. The syntax is more detailed there, so you need to rewrite the rule for the above case in the following form:

apiVersion: crd.projectcalico.org/v1
kind: NetworkPolicy
metadata:
  name: allow-redis-nodejs
spec:
  selector: service == 'redis'
  ingress:
  - action: Allow
    protocol: TCP
    source:
      selector: service == 'nodejs'
    destination:
      ports:
      - 6379

The constructs mentioned above for allowing or denying all traffic through the regular NetworkPolicy API contain constructs with brackets that are difficult to read and remember. In the case of Calico, in order to change the logic of the firewall rule to the opposite, it is enough to change action: Allow on action: Deny.

Environment isolation

Now imagine a situation where the application generates business metrics for their collection in Prometheus and further analysis through Grafana. The upload may contain sensitive data, which is again available for public viewing by default. Let's close this data from prying eyes:

Calico for networking in Kubernetes: an introduction and a bit of experience

Prometheus, as a rule, is placed in a separate service environment - in the example it will be a namespace of the following form:

apiVersion: v1
kind: Namespace
metadata:
  labels:
    module: prometheus
  name: kube-prometheus

Field metadata.labels it wasn't by accident. As mentioned above, namespaceSelector (like podSelector) operates with labels. Therefore, in order to allow collecting metrics from all pods on a specific port, you will have to add some kind of label (or take one from the existing ones), and then apply a configuration like:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-metrics-prom
spec:
  podSelector: {}
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          module: prometheus
    ports:
    - protocol: TCP
      port: 9100

And in the case of using Calico policies, the syntax will be as follows:

apiVersion: crd.projectcalico.org/v1
kind: NetworkPolicy
metadata:
  name: allow-metrics-prom
spec:
  ingress:
  - action: Allow
    protocol: TCP
    source:
      namespaceSelector: module == 'prometheus'
    destination:
      ports:
      - 9100

In general, by adding this kind of policy for specific needs, you can protect against malicious or accidental interference with the operation of applications in a cluster.

The best practice, according to the creators of Calico, is the β€œBan everything and explicitly open what you need” approach, documented in official documentation (others have taken a similar approach, in particular in already mentioned article).

Applying Additional Calico Objects

Let me remind you that through the extended set of Calico APIs, you can control the availability of nodes, not limited to pods. In the following example using GlobalNetworkPolicy the possibility of passing ICMP requests in the cluster is closed (for example, pings from a pod to a node, between pods, or from a node to a pod's IP):

apiVersion: crd.projectcalico.org/v1
kind: GlobalNetworkPolicy
metadata:
  name: block-icmp
spec:
  order: 200
  selector: all()
  types:
  - Ingress
  - Egress
  ingress:
  - action: Deny
    protocol: ICMP
  egress:
  - action: Deny
    protocol: ICMP

In the case above, it remains possible for cluster nodes to β€œreach out” to each other via ICMP. And this issue is solved by means GlobalNetworkPolicyapplied to the entity HostEndpoint:

apiVersion: crd.projectcalico.org/v1
kind: GlobalNetworkPolicy
metadata:
  name: deny-icmp-kube-02
spec:
  selector: "role == 'k8s-node'"
  order: 0
  ingress:
  - action: Allow
    protocol: ICMP
  egress:
  - action: Allow
    protocol: ICMP
---
apiVersion: crd.projectcalico.org/v1
kind: HostEndpoint
metadata:
  name: kube-02-eth0
  labels:
    role: k8s-node
spec:
  interfaceName: eth0
  node: kube-02
  expectedIPs: ["192.168.2.2"]

VPN case

Finally, I will give a very real example of using Calico functions for the case with near-cluster interaction, when the standard set of policies is not enough. To access the web application, clients use a VPN tunnel, and this access is tightly controlled and limited to a specific list of services allowed to use:

Calico for networking in Kubernetes: an introduction and a bit of experience

Clients connect to the VPN through the standard UDP port 1194 and upon connection receive routes to cluster subnets of pods and services. Subnets are pushed as a whole so as not to lose services during restarts and address changes.

The port in the configuration is standard, which imposes some nuances on the process of configuring the application and its transfer to the Kubernetes cluster. For example, in the same AWS LoadBalancer for UDP, it appeared literally at the end of last year in a limited list of regions, and NodePort cannot be used due to its forwarding on all cluster nodes and it is impossible to scale the number of server instances for fault tolerance. Plus, you will have to change the default port range ...

As a result of the enumeration of possible solutions, the following was chosen:

  1. Pods with VPN are scheduled per node in mode hostNetwork, that is, to the actual IP.
  2. The service is posted outside through ClusterIP. On the node, a port is physically raised, which is accessible from the outside with small reservations (conditional presence of a real IP address).
  3. The definition of the node on which the pod rose is beyond the scope of our narrative. Let me just say that you can hard β€œnail” the service to the node or write a small sidecar service that will monitor the current IP address of the VPN service and edit the DNS records registered with clients - whoever has enough imagination for what.

From a routing point of view, we can uniquely identify a client behind a VPN by its IP address given by the VPN server. Below is a primitive example of restricting access to such a client to services, an illustration on the aforementioned Redis:

apiVersion: crd.projectcalico.org/v1
kind: HostEndpoint
metadata:
  name: vpnclient-eth0
  labels:
    role: vpnclient
    environment: production
spec:
  interfaceName: "*"
  node: kube-02
  expectedIPs: ["172.176.176.2"]
---
apiVersion: crd.projectcalico.org/v1
kind: GlobalNetworkPolicy
metadata:
  name: vpn-rules
spec:
  selector: "role == 'vpnclient'"
  order: 0
  applyOnForward: true
  preDNAT: true
  ingress:
  - action: Deny
    protocol: TCP
    destination:
      ports: [6379]
  - action: Allow
    protocol: UDP
    destination:
      ports: [53, 67]

Here, connection to port 6379 is strictly prohibited, but at the same time, the operation of the DNS service is preserved, the functioning of which quite often suffers when rules are drawn up. Because, as previously mentioned, when a selector appears, the default deny policy is applied to it, unless otherwise specified.

Results

Thus, using the extended Calico API, you can flexibly configure and dynamically change routing in and around the cluster. In general, its use can look like shooting sparrows from a cannon, and the implementation of an L3 network with BGP and IP-IP tunnels looks monstrous in a simple installation of Kubernetes in a flat network ... However, the rest of the tool looks quite viable and useful.

Cluster isolation to meet security requirements may not always be feasible, and it is in such cases that Calico (or a similar solution) comes to the rescue. The examples in this article (with minor modifications) are used in several of our customers' installations on AWS.

PS

Read also on our blog:

Source: habr.com

Add a comment