How a pod in Kubernetes gets an IP address

Note. transl.: This article, written by an SRE engineer at LinkedIn, details the "inner magic" in Kubernetes - more precisely, the interaction of CRI, CNI and kube-apiserver - what happens when the next pod needs to be assigned an IP address.

One of the basic requirements Kubernetes networking model is that each pod must have its own IP address and any other pod in the cluster must be able to contact it at that address. There are many network "providers" (Flannel, Calico, Canal, etc.) that help implement this network model.

When I first started working with Kubernetes, it was not entirely clear to me how exactly pods get their IP addresses. Even with an understanding of how the individual components function, it was difficult to imagine them working together. For example, I knew what CNI plugins were for, but had no idea how exactly they were called. Therefore, I decided to write this article to share knowledge about the various network components and how they work together in a Kubernetes cluster, which allows each pod to get its own unique IP address.

There are various ways to organize networking in Kubernetes - just like the various runtime options for containers. This post will use Flannel for networking in a cluster, and as an executable environment - Containerd. I also assume that you know how networking between containers works, so I will only briefly touch on it, purely for context.

Some basic concepts

Containers and networking at a glance

There are quite a few excellent posts on the web explaining how containers communicate with each other over the network. Therefore, I will only give a general overview of the basic concepts and limit myself to one approach, which involves creating a Linux bridge and encapsulating packages. Details are omitted, since the topic of container networking itself deserves a separate article. Links to some particularly informative and informative publications will be provided below.

Containers on the same host

One way to organize IP address communication between containers running on the same host involves creating a Linux bridge. To do this, Kubernetes (and Docker) create virtual devices veth (virtual ethernet). One end of the veth device connects to the container's network namespace, the other end connects to Linux bridge on the host network.

All containers on the same host have one end of veth connected to a bridge through which they can communicate with each other by IP addresses. The Linux bridge also has an IP address and acts as a gateway for outgoing (egress) traffic from pods destined for other nodes.

How a pod in Kubernetes gets an IP address

Containers on different hosts

Packet encapsulation is one way that containers on different hosts can communicate with each other using IP addresses. At Flannel, technology is what makes this possible. vxlan, which "packages" the source packet into a UDP packet and then sends it to its destination.

In a Kubernetes cluster, Flannel creates a vxlan device and updates the route table on each node accordingly. Each packet destined for a container on another host passes through the vxlan device and is encapsulated in a UDP packet. At the destination, the nested package is retrieved and redirected to the correct pod.

How a pod in Kubernetes gets an IP address
Note: This is just one of the ways to organize networking between containers.

What is CRI?

CRI (Container Runtime Interface) is a plugin that allows kubelet to use different container runtimes. The CRI API is built into various runtimes so users can choose the runtime they want.

What is CNI?

CNI project represents specification to organize a universal network solution for Linux containers. In addition, it includes plugins, which are responsible for various functions when setting up a pod's network. A CNI plugin is an executable that conforms to the specification (we'll discuss some of the plugins below).

Subnetting hosts to assign IP addresses to pods

Because each pod in the cluster must have an IP address, it's important to make sure that this address is unique. This is achieved by assigning each node a unique subnet, from which the pods on that node are then assigned IP addresses.

Node IPAM Controller

When nodeipam passed as a flag parameter --controllers kube-controller-manager, it allocates each node a separate subnet (podCIDR) from the CIDR cluster (i.e. the range of IP addresses for the cluster network). Since these podCIDRs do not overlap, it becomes possible for each pod to be assigned a unique IP address.

A Kubernetes node is assigned a podCIDR when it first registers with the cluster. To change the podCIDR of nodes, you must de-register them and then re-register them, making appropriate changes to the Kubernetes control layer configuration in between. You can display the podCIDR of a node with the following command:

$ kubectl get no <nodeName> -o json | jq '.spec.podCIDR'
10.244.0.0/24

Kubelet, container runtime and CNI plugins: how it all works

Scheduling a pod to a node involves doing a lot of preparatory work. In this section, I'll only focus on those that are directly related to setting up a pod's network.

Scheduling a pod to a node triggers the following chain of events:

How a pod in Kubernetes gets an IP address

Information: Containerd CRI Plugin Architecture.

Interaction between container runtime and CNI plugins

Each network provider has its own CNI plugin. The container runtime runs it to configure the network for the pod as it starts up. In the case of containerd, the CNI plugin is launched by the plugin Containerd C.R.I..

Moreover, each provider has its own agent. It is installed in all Kubernetes nodes and is responsible for the network configuration of the pods. This agent either comes bundled with the CNI config or creates it on its own on the node. The config helps the CRI plugin determine which CNI plugin to call.

The location of the CNI config can be configured; by default it is in /etc/cni/net.d/<config-file>. Cluster administrators are also responsible for installing CNI plugins on each cluster node. Their location is also configurable; default directory - /opt/cni/bin.

When using containerd, the paths for the config and plugin binaries can be set in the section [plugins.Β«io.containerd.grpc.v1.criΒ».cni] Π² containerd configuration file.

Since we're using Flannel as our network provider, let's talk a little about setting it up:

  • Flanneld (Flannel's daemon) is usually installed into a cluster as a DaemonSet with install-cni as init container.
  • Install-cni creates CNI configuration file (/etc/cni/net.d/10-flannel.conflist) at each node.
  • Flanneld creates a vxlan device, retrieves network metadata from the API server, and monitors pod updates. As they are created, it propagates routes to all pods throughout the cluster.
  • These routes allow pods to communicate with each other by IP addresses.

For more information about the work of Flannel, I recommend using the links at the end of the article.

Here is the interaction diagram between the Containerd CRI plugin and the CNI plugins:

How a pod in Kubernetes gets an IP address

As seen above, the kubelet calls the Containerd CRI plugin to create the pod, which already calls the CNI plugin to set up the pod's network. In this case, the network provider's CNI plugin calls other core CNI plugins to configure various aspects of the network.

Interaction between CNI plugins

There are various CNI plugins, whose task is to help set up network communication between containers on the host. This article will discuss three of them.

Flannel CNI plugin

When using Flannel as a network provider, the Containerd CRI component calls Flannel CNI plugin, using the CNI config file /etc/cni/net.d/10-flannel.conflist.

$ cat /etc/cni/net.d/10-flannel.conflist
{
  "name": "cni0",
  "plugins": [
    {
      "type": "flannel",
      "delegate": {
         "ipMasq": false,
        "hairpinMode": true,
        "isDefaultGateway": true
      }
    }
  ]
}

The Flannel CNI plugin works in conjunction with Flanneld. During startup, Flanneld retrieves the podCIDR and other network related details from the API server and saves them to a file. /run/flannel/subnet.env.

FLANNEL_NETWORK=10.244.0.0/16 
FLANNEL_SUBNET=10.244.0.1/24
FLANNEL_MTU=1450 
FLANNEL_IPMASQ=false

The Flannel CNI plugin uses data from /run/flannel/subnet.env to configure and call the bridge CNI plugin.

Bridge CNI Plugin

This plugin is called with the following configuration:

{
  "name": "cni0",
  "type": "bridge",
  "mtu": 1450,
  "ipMasq": false,
  "isGateway": true,
  "ipam": {
    "type": "host-local",
    "subnet": "10.244.0.0/24"
  }
}

The first time it is called, it creates a Linux bridge with Β«nameΒ»: Β«cni0Β», which is specified in the config. Then a veth pair is created for each pod. One end connects to the container's network namespace, the other end connects to a Linux bridge on the host's network. Bridge CNI Plugin connects all host containers to a Linux bridge on the host network.

When the veth pair is configured, the Bridge plugin calls the host-local IPAM CNI plugin. The IPAM plugin type can be configured in the CNI config that the CRI plugin uses to call the Flannel CNI plugin.

Host-Local IPAM CNI Plugins

Bridge CNI calls host-local IPAM CNI plugin with the following configuration:

{
  "name": "cni0",
  "ipam": {
    "type": "host-local",
    "subnet": "10.244.0.0/24",
    "dataDir": "/var/lib/cni/networks"
  }
}

Host-local IPAM plugin (IP Adress Management - IP address management) returns the IP address for the container from the subnet and stores the allocated IP on the host in the directory specified in the section dataDir β€” /var/lib/cni/networks/<network-name=cni0>/<ip>. This file contains the ID of the container to which the given IP address is assigned.

When calling the host-local IPAM plugin, it returns the following data:

{
  "ip4": {
    "ip": "10.244.4.2",
    "gateway": "10.244.4.3"
  },
  "dns": {}
}

Summary

Kube-controller-manager assigns a podCIDR to each node. Each node's pods get IP addresses from the address space in the allocated podCIDR range. Since the podCIDRs of the nodes do not overlap, all pods get unique IP addresses.

The Kubernetes cluster administrator configures and installs the kubelet, container runtime, network provider agent, and copies the CNI plugins to each node. During startup, the network provider agent generates a CNI config. When a pod is scheduled to a node, the kubelet calls the CRI plugin to create it. Next, if containerd is used, the Containerd CRI plugin calls the CNI plugin specified in the CNI config to set up the pod's network. As a result, the pod gets an IP address.

It took me some time to understand all the subtleties and nuances of all these interactions. I hope the experience gained will help you better understand how Kubernetes works. If I'm wrong about anything, please contact me at Twitter or at the address [email protected]. Feel free to reach out if you would like to discuss aspects of this article or anything else. I'll be happy to chat with you!

references

Containers and network

How Flannel Works

CRI and CNI

PS from translator

Read also on our blog:

Source: habr.com

Add a comment