Our experience with data in etcd Kubernetes cluster directly (without K8s API)

Increasingly, clients are turning to us with a request to provide access to the Kubernetes cluster in order to be able to access services within the cluster: so that you can directly connect to a database or service, to connect a local application with applications inside the cluster ...

Our experience with data in etcd Kubernetes cluster directly (without K8s API)

For example, there is a need to connect from your local machine to the service memcached.staging.svc.cluster.local. We provide this with a VPN within the cluster to which the client connects. To do this, we announce the subnets of pods, services and push cluster DNS to the client. Thus, when the client tries to connect to the service memcached.staging.svc.cluster.local, the request goes to the cluster DNS and in response receives the address of this service from the cluster service network or the address of the pod.

We configure K8s clusters using kubeadm, where the default service subnet is 192.168.0.0/16, and the pod network is 10.244.0.0/16. Usually everything works fine, but there are a couple of points:

  • Subnet 192.168.*.* often used in customer office networks, and even more often in developers' home networks. And then we get conflicts: home routers work in this subnet and VPN pushes these subnets from the cluster to the client.
  • We have multiple clusters (production clusters, stage clusters and/or multiple dev clusters). Then, by default, all of them will have the same subnets for pods and services, which creates great difficulties for simultaneous work with services in several clusters.

We have long adopted the practice of using different subnets for services and pods within the same project - in general, so that all clusters are with different networks. However, there are a large number of clusters in operation that we would not like to roll over from scratch, since many services, stateful applications, etc. are running in them.

And then we asked ourselves: how to change the subnet in an existing cluster?

Searching of decisions

The most common practice is to recreate all services with type ClusterIP. As an option, can advise and such:

The following process has a problem: after everything configured, the pods come up with the old IP as a DNS nameserver in /etc/resolv.conf.
Since I still did not find the solution, i had to reset the entire cluster with kubeadm reset and init it again.

But this is not suitable for everyone ... Here are more detailed introductory for our case:

  • Used by Flannel;
  • There are clusters both in the clouds and on the hardware;
  • I would like to avoid re-deploying all services in the cluster;
  • There is a need to generally do everything with a minimum number of problems;
  • Kubernetes version - 1.16.6 (however, further steps will be similar for other versions);
  • The main task is to ensure that in a cluster deployed using kubeadm with a service subnet 192.168.0.0/16, replace it with 172.24.0.0/16.

And it just so happened that we have long been interested in seeing what and how is stored in etcd in Kubernetes, what can be done with it in general ... So we thought: β€œWhy not just update the data in etcd by replacing the old IP addresses (subnet) with the new ones?Β»

Looking for ready-made tools for working with data in etcd, we did not find anything that completely solved the task. (By the way, if you know of any utilities for working with data directly in etcd - we will be grateful for the links.) However, a good starting point was etcdhelper by OpenShift (thanks to its authors!).

This utility can connect to etcd using certificates and read data from there using commands ls, get, dump.

Add etcdhelper

The next thought is natural: β€œWhat prevents you from adding this utility by adding the ability to write data to etcd?”

It became a modified version of etcdhelper with two new features changeServiceCIDR ΠΈ changePodCIDR... On her code can be seen here.

What do the new features do? Algorithm changeServiceCIDR:

  • create a deserializer;
  • compile a regular expression to replace CIDR;
  • we go through all services with the ClusterIP type in the cluster:
    • decode the value from etcd into a Go object;
    • using a regular expression, we replace the first two bytes of the address;
    • assign the service an IP address from the new subnet;
    • create a serializer, convert the Go object to protobuf, write new data to etcd.

Function changePodCIDR essentially the same changeServiceCIDR - only instead of editing the service specification, we do it for the node and change .spec.PodCIDR to the new subnet.

Practice

Change serviceCIDR

The plan for the implementation of the task is very simple, but it implies a downtime at the time of re-creation of all pods in the cluster. After describing the basic steps, we will also share thoughts on how this downtime can be minimized in theory.

Preparatory steps:

  • installing the necessary software and building the patched etcdhelper;
  • etcd backup and /etc/kubernetes.

Brief action plan for changing serviceCIDR:

  • changing manifests of apiserver and controller-manager;
  • reissue of certificates;
  • changing the ClusterIP of services in etcd;
  • restart all pods in the cluster.

The following is a complete sequence of actions in detail.

1. Install etcd-client for data dump:

apt install etcd-client

2. We collect etcdhelper:

  • Install golang:
    GOPATH=/root/golang
    mkdir -p $GOPATH/local
    curl -sSL https://dl.google.com/go/go1.14.1.linux-amd64.tar.gz | tar -xzvC $GOPATH/local
    echo "export GOPATH="$GOPATH"" >> ~/.bashrc
    echo 'export GOROOT="$GOPATH/local/go"' >> ~/.bashrc
    echo 'export PATH="$PATH:$GOPATH/local/go/bin"' >> ~/.bashrc
  • We save ourselves etcdhelper.go, download dependencies, build:
    wget https://raw.githubusercontent.com/flant/examples/master/2020/04-etcdhelper/etcdhelper.go
    go get go.etcd.io/etcd/clientv3 k8s.io/kubectl/pkg/scheme k8s.io/apimachinery/pkg/runtime
    go build -o etcdhelper etcdhelper.go

3. Make backup etcd:

backup_dir=/root/backup
mkdir ${backup_dir}
cp -rL /etc/kubernetes ${backup_dir}
ETCDCTL_API=3 etcdctl --cacert=/etc/kubernetes/pki/etcd/ca.crt --key=/etc/kubernetes/pki/etcd/server.key --cert=/etc/kubernetes/pki/etcd/server.crt --endpoints https://192.168.199.100:2379 snapshot save ${backup_dir}/etcd.snapshot

4. Change the service subnet in the Kubernetes control plane manifests. In files /etc/kubernetes/manifests/kube-apiserver.yaml ΠΈ /etc/kubernetes/manifests/kube-controller-manager.yaml change the parameter --service-cluster-ip-range to the new subnet: 172.24.0.0/16 instead 192.168.0.0/16.

5. Since we are changing the service subnet for which kubeadm issues certificates for apiserver (including), they must be reissued:

  1. Let's see for which domains and IP addresses the current certificate is issued:
    openssl x509 -noout -ext subjectAltName </etc/kubernetes/pki/apiserver.crt
    X509v3 Subject Alternative Name:
        DNS:dev-1-master, DNS:kubernetes, DNS:kubernetes.default, DNS:kubernetes.default.svc, DNS:kubernetes.default.svc.cluster.local, DNS:apiserver, IP Address:192.168.0.1, IP Address:10.0.0.163, IP Address:192.168.199.100
  2. Let's prepare a minimal config for kubeadm:
    cat kubeadm-config.yaml
    apiVersion: kubeadm.k8s.io/v1beta1
    kind: ClusterConfiguration
    networking:
      podSubnet: "10.244.0.0/16"
      serviceSubnet: "172.24.0.0/16"
    apiServer:
      certSANs:
      - "192.168.199.100" # IP-адрСс мастСр ΡƒΠ·Π»Π°
  3. Let's delete the old crt and key, because without this the new certificate will not be issued:
    rm /etc/kubernetes/pki/apiserver.{key,crt}
  4. Let's reissue certificates for the API server:
    kubeadm init phase certs apiserver --config=kubeadm-config.yaml
  5. Check that the certificate has been issued for the new subnet:
    openssl x509 -noout -ext subjectAltName </etc/kubernetes/pki/apiserver.crt
    X509v3 Subject Alternative Name:
        DNS:kube-2-master, DNS:kubernetes, DNS:kubernetes.default, DNS:kubernetes.default.svc, DNS:kubernetes.default.svc.cluster.local, IP Address:172.24.0.1, IP Address:10.0.0.163, IP Address:192.168.199.100
  6. After reissuing the API server certificate, restart its container:
    docker ps | grep k8s_kube-apiserver | awk '{print $1}' | xargs docker restart
  7. Regenerate the config for admin.conf:
    kubeadm alpha certs renew admin.conf
  8. Let's edit the data in etcd:
    ./etcdhelper -cacert /etc/kubernetes/pki/etcd/ca.crt -cert /etc/kubernetes/pki/etcd/server.crt -key /etc/kubernetes/pki/etcd/server.key -endpoint https://127.0.0.1:2379 change-service-cidr 172.24.0.0/16 

    Attention! At this point, domain resolving stops working in the cluster, since in existing pods in /etc/resolv.conf the old CoreDNS address (kube-dns) is registered, and kube-proxy changed the iptables rules from the old subnet to the new one. Further in the article it is written about possible options to minimize downtime.

  9. Let's fix ConfigMaps in the namespace kube-system:
    kubectl -n kube-system edit cm kubelet-config-1.16

    - replace here clusterDNS to the new IP address of the kube-dns service: kubectl -n kube-system get svc kube-dns.

    kubectl -n kube-system edit cm kubeadm-config

    - fix data.ClusterConfiguration.networking.serviceSubnet to the new subnet.

  10. Since the kube-dns address has changed, you need to update the kubelet config on all nodes:
    kubeadm upgrade node phase kubelet-config && systemctl restart kubelet
  11. It remains to restart all pods in the cluster:
    kubectl get pods --no-headers=true --all-namespaces |sed -r 's/(S+)s+(S+).*/kubectl --namespace 1 delete pod 2/e'

Downtime minimization

Thoughts on how to minimize downtime:

  1. After changing the control plane manifests, create a new kube-dns service, for example, with the name kube-dns-tmp and new address 172.24.0.10.
  2. Do if in etcdhelper which will not modify the kube-dns service.
  3. Replace address in all kubelets ClusterDNS to the new one, while the old service will continue to work simultaneously with the new one.
  4. Wait until the pods with applications roll over either by themselves for natural reasons, or at an agreed time.
  5. Delete service kube-dns-tmp and change serviceSubnetCIDR for the kube-dns service.

This plan will minimize the downtime to ~ a minute - for the duration of the removal of the service kube-dns-tmp and changing the subnet for the service kube-dns.

podNetwork modification

At the same time, we decided to look at how to modify podNetwork using the resulting etcdhelper. The sequence of actions is as follows:

  • fix configs in kube-system;
  • fix the kube-controller-manager manifest;
  • change podCIDR directly in etcd;
  • reboot all cluster nodes.

Now more about these actions:

1. Modify ConfigMaps in the namespace kube-system:

kubectl -n kube-system edit cm kubeadm-config

- fix data.ClusterConfiguration.networking.podSubnet to a new subnet 10.55.0.0/16.

kubectl -n kube-system edit cm kube-proxy

- fix data.config.conf.clusterCIDR: 10.55.0.0/16.

2. Modify controller-manager's manifest:

vim /etc/kubernetes/manifests/kube-controller-manager.yaml

- fix --cluster-cidr=10.55.0.0/16.

3. Look at the current values .spec.podCIDR, .spec.podCIDRs, .InternalIP, .status.addresses for all cluster nodes:

kubectl get no -o json | jq '[.items[] | {"name": .metadata.name, "podCIDR": .spec.podCIDR, "podCIDRs": .spec.podCIDRs, "InternalIP": (.status.addresses[] | select(.type == "InternalIP") | .address)}]'

[
  {
    "name": "kube-2-master",
    "podCIDR": "10.244.0.0/24",
    "podCIDRs": [
      "10.244.0.0/24"
    ],
    "InternalIP": "192.168.199.2"
  },
  {
    "name": "kube-2-master",
    "podCIDR": "10.244.0.0/24",
    "podCIDRs": [
      "10.244.0.0/24"
    ],
    "InternalIP": "10.0.1.239"
  },
  {
    "name": "kube-2-worker-01f438cf-579f9fd987-5l657",
    "podCIDR": "10.244.1.0/24",
    "podCIDRs": [
      "10.244.1.0/24"
    ],
    "InternalIP": "192.168.199.222"
  },
  {
    "name": "kube-2-worker-01f438cf-579f9fd987-5l657",
    "podCIDR": "10.244.1.0/24",
    "podCIDRs": [
      "10.244.1.0/24"
    ],
    "InternalIP": "10.0.4.73"
  }
]

4. Replace podCIDR by editing directly in etcd:

./etcdhelper -cacert /etc/kubernetes/pki/etcd/ca.crt -cert /etc/kubernetes/pki/etcd/server.crt -key /etc/kubernetes/pki/etcd/server.key -endpoint https://127.0.0.1:2379 change-pod-cidr 10.55.0.0/16

5. Check that podCIDR has indeed changed:

kubectl get no -o json | jq '[.items[] | {"name": .metadata.name, "podCIDR": .spec.podCIDR, "podCIDRs": .spec.podCIDRs, "InternalIP": (.status.addresses[] | select(.type == "InternalIP") | .address)}]'

[
  {
    "name": "kube-2-master",
    "podCIDR": "10.55.0.0/24",
    "podCIDRs": [
      "10.55.0.0/24"
    ],
    "InternalIP": "192.168.199.2"
  },
  {
    "name": "kube-2-master",
    "podCIDR": "10.55.0.0/24",
    "podCIDRs": [
      "10.55.0.0/24"
    ],
    "InternalIP": "10.0.1.239"
  },
  {
    "name": "kube-2-worker-01f438cf-579f9fd987-5l657",
    "podCIDR": "10.55.1.0/24",
    "podCIDRs": [
      "10.55.1.0/24"
    ],
    "InternalIP": "192.168.199.222"
  },
  {
    "name": "kube-2-worker-01f438cf-579f9fd987-5l657",
    "podCIDR": "10.55.1.0/24",
    "podCIDRs": [
      "10.55.1.0/24"
    ],
    "InternalIP": "10.0.4.73"
  }
]

6. Reboot all cluster nodes one by one.

7. If at least one node has old podCIDR, then kube-controller-manager will fail to start and pods in the cluster will not be scheduled.

In fact, changing podCIDR can be done more simply (for example, so). But after all, we wanted to learn how to work with etcd directly, because there are cases when editing Kubernetes objects in etcd - only possible variant. (For example, you can’t just change the Service field without downtime spec.clusterIP.)

Π‘onclusion

The article discusses the possibility of working with data in etcd directly, i.e. bypassing the Kubernetes API. Sometimes this approach allows you to do "tricky things." We tested the operations given in the text on real K8s clusters. However, their status of readiness for widespread use is PoC (proof of concept). Therefore, if you want to use a modified version of etcdhelper on your clusters, do so at your own risk.

PS

Read also on our blog:

Source: habr.com

Add a comment