Kubernetes 1.16: Highlights of what's new

Kubernetes 1.16: Highlights of what's new

Today Wednesday takes place the next release of Kubernetes is 1.16. According to the tradition that has developed for our blog, for the tenth anniversary time we are talking about the most significant changes in the new version.

The information used to prepare this material is taken from Kubernetes enhancements tracking tables, CHANGELOG-1.16 and related issues, pull requests, and Kubernetes Enhancement Proposals (KEP). So, let's go! ..

Nodes

A truly large number of notable innovations (in alpha version status) are presented on the side of the nodes of K8s clusters (Kubelet).

First, there are the so-called «ephemeral containers» (Ephemeral Containers), designed to simplify debugging processes in pods. The new mechanism allows you to run special containers that start in the namespace of existing pods and live for a short time. Their purpose is to interact with other pods and containers in order to solve any problems and debug. A new command has been implemented for this feature kubectl debug, similar in essence to kubectl exec: only instead of running a process in a container (as in the case exec) it starts the container in the pod. For example, this command will attach a new container to a pod:

kubectl debug -c debug-shell --image=debian target-pod -- bash

Details about ephemeral containers (and examples of how to use them) can be found in corresponding KEP. The current implementation (in K8s 1.16) is an alpha version, and among the criteria for its transition to a beta version is "testing the Ephemeral Containers API for at least 2 releases [Kubernetes]".

NB: In its essence and even the name of the feature resembles an existing plugin kubectl-debugabout which we already wrote. It is assumed that with the advent of ephemeral containers, the development of a separate external plugin will stop.

Another innovation - PodOverhead - intended to provide pod overhead calculation mechanism, which can be very different depending on the executable environment (runtime) used. As an example, the authors this kep cite Kata Containers, which require the guest kernel, kata agent, init system, etc. to be running. When overhead gets this big, it can't be ignored, which means there needs to be a way to take it into account for further quoting, scheduling, etc. For its implementation in PodSpec field added Overhead *ResourceList (compare with data in RuntimeClass, if one is used).

Another notable innovation is node topology manager (Node Topology Manager), designed to unify the approach to fine-tuning the distribution of hardware resources for various components in Kubernetes. This initiative is caused by the growing need of various modern systems (from the field of telecommunications, machine learning, financial services, etc.) for high-performance parallel computing and minimization of delays in the execution of operations, for which they use advanced CPU and hardware acceleration capabilities. Such optimizations in Kubernetes have so far been achieved thanks to disparate components (CPU manager, Device manager, CNI), and now they will add a single internal interface that unifies the approach and simplifies the connection of new similar - so-called topology-aware - components on the Kubelet side. Details - in corresponding KEP.

Kubernetes 1.16: Highlights of what's new
Topology Manager Component Diagram

The next feature is checking containers while they are running (startup probe). As you know, for containers that take a long time to launch, it is difficult to get an up-to-date status: they are either “killed” even before they actually start functioning, or they fall into a deadlock for a long time. New check (enabled via a feature gate called StartupProbeEnabled) cancels—or rather, postpones—any other checks until the pod has finished running. For this reason, the feature was originally called pod-startup liveness-probe holdoff. For pods that take a long time to start, you can poll the status at relatively short time intervals.

In addition, an improvement for RuntimeClass is introduced immediately in beta status, adding support for "heterogeneous clusters". C RuntimeClass Scheduling now it is not at all necessary for each node to have support for each RuntimeClass: for pods, you can choose RuntimeClass without thinking about the cluster topology. Previously, to achieve this - so that pods ended up on nodes with support for everything they needed - you had to assign the appropriate rules to NodeSelector and tolerations. IN KEP it talks about usage examples and, of course, implementation details.

Network

Two significant networking features that appeared for the first time (in alpha) in Kubernetes 1.16 are:

  • Support dual network stack - IPv4/IPv6 - and its corresponding "understanding" at the level of pods, nodes, services. It includes IPv4-to-IPv4 and IPv6-to-IPv6 interaction between pods, from pods to external services, reference implementations (within the Bridge CNI, PTP CNI and Host-Local IPAM plugins), as well as reverse compatibility with Kubernetes clusters running only over IPv4 or IPv6. Implementation details are in KEP.

    An example of displaying two types of IP addresses (IPv4 and IPv6) in the list of pods:

    kube-master# kubectl get pods -o wide
    NAME               READY     STATUS    RESTARTS   AGE       IP                          NODE
    nginx-controller   1/1       Running   0          20m       fd00:db8:1::2,192.168.1.3   kube-minion-1
    kube-master#

  • New API for EndpointEndpoint Slice API. It solves the performance/scalability issues of the existing Endpoint API that affect various components in the control-plane (apiserver, etcd, endpoints-controller, kube-proxy). The new API will be added to the Discovery API group and will be able to serve tens of thousands of backend endpoints on each service in a cluster consisting of a thousand nodes. To do this, each Service is mapped to N objects EndpointSlice, each of which has no more than 100 endpoints by default (the value is configurable). The EndpointSlice API will also provide opportunities for its future development: support for multiple IP addresses for each pod, new states for endpoints (not only Ready и NotReady), dynamic subsetting for endpoints.

The one presented in the last release advanced to the beta version finalizernamed service.kubernetes.io/load-balancer-cleanup and attached to each service with type LoadBalancer. At the time of removal of such a service, it prevents the actual removal of the resource until the "cleanup" of all relevant resources of the balancer is completed.

API Machinery

The real "stabilization milestone" is in the area of ​​the Kubernetes API server and interaction with it. This was largely due to transfer to the status of stable who do not need special introduction CustomResourceDefinitions (CRD), which have had beta status since the distant Kubernetes 1.7 (and this is June 2017!). The same stabilization came to related features:

  • "subresources" (subresources) with /status и /scale for CustomResources;
  • transformation versions for CRD based on an external webhook;
  • recently introduced (in K8s 1.15) defaults (defaulting) and automatic deletion of fields (pruning) for CustomResources;
  • opportunity using the OpenAPI v3 schema to create and publish OpenAPI documentation used to validate CRD resources on the server side.

Another mechanism that has long become familiar to Kubernetes administrators: admission webhook - was also in beta status for a long time (since K8s 1.9) and is now declared stable.

Two other features have reached beta: server side apply и watch bookmarks.

And the only significant innovation in the alpha version was failure from SelfLink — a special URI that represents the specified object and is part of ObjectMeta и ListMeta (i.e. part of any object in Kubernetes). Why are they rejecting it? Motivation "in a simple way" sounds as the absence of real (irresistible) reasons for this field to still exist. The more formal reasons are to optimize performance (removing an unnecessary field) and to simplify the work of generic-apiserver, which is forced to handle such a field in a special way (this is the only field that is set right before the object is serialized). Real "obsolescence" (under beta) SelfLink will happen by Kubernetes 1.20, and the final one will be 1.21.

Data Storage

The main work in the area of ​​storage, as in previous releases, is observed in the area CSI support. The main changes here are:

  • for the first time (in alpha version) appeared support for CSI plugins for Windows worker nodes: the current way of working with storages will also replace the in-tree plugins in the Kubernetes core and FlexVolume plugins from Microsoft based on Powershell;

    Kubernetes 1.16: Highlights of what's new
    Kubernetes for Windows implementation of CSI plugins

  • opportunity resizing CSI volumes, introduced back in K8s 1.12, has grown to beta;
  • a similar "elevation" (from alpha to beta) has been achieved by the ability to use CSI to create local ephemeral volumes (CSI Inline Volume Support).

Introduced in the previous version of Kubernetes volume clone function (using existing PVCs as DataSource to create new PVCs) is also now in beta status.

Scheduler

Two notable planning changes (both in alpha):

  • EvenPodsSpreading - opportunity use pods instead of application logical units for "fair distribution" of loads (like Deployment and ReplicaSet) and adjusting this distribution (as a hard requirement or as a soft condition, i.e. priority). The feature will expand the existing distribution options for planned pods, currently limited to options PodAffinity и PodAntiAffinity, giving administrators more fine-grained control in this matter, which means better high availability and optimized resource consumption. Details - in KEP.
  • Using BestFit Policy в RequestedToCapacityRatio Priority Function during pod scheduling, which will allow use bin packing (“packaging into containers”) for both basic resources (processor, memory) and extended ones (like GPU). For details see KEP.

    Kubernetes 1.16: Highlights of what's new
    Pod scheduling: before using best fit policy (directly via default scheduler) and using it (via scheduler extender)

Additionally, represented the ability to create your own plugins for the scheduler outside the main Kubernetes development tree (out-of-tree).

Other changes

Also in the release of Kubernetes 1.16, you can note initiative on bringing available metrics in full order, or more precisely, in accordance with official regulations to K8s instrumentation. They largely rely on the respective Prometheus documentation. Inconsistencies arose for various reasons (for example, some metrics were simply created even before the current instructions appeared), and the developers decided that it was time to bring everything to a single standard, "in line with the rest of the Prometheus ecosystem." The current implementation of this initiative is alpha and will be progressively upgraded to beta (1.17) and stable (1.18) in future Kubernetes releases.

In addition, the following changes can be noted:

  • Windows Support Development с emergence of Kubeadm utilities for this OS (alpha version), the opportunity RunAsUserName for Windows containers (alpha version), improvement Group Managed Service Account (gMSA) support up to beta, support mount/attach for vSphere volumes.
  • recycled data compression mechanism in API responses. Previously, an HTTP filter was used for these purposes, which imposed a number of restrictions that prevented its inclusion by default. Now "transparent request compression" works: clients sending Accept-Encoding: gzip in the header receive a GZIP-compressed response if its size exceeded 128 KB. Go clients automatically support compression (send the right header), so they will immediately notice a decrease in traffic. (Slight modifications may be required for other languages.)
  • Became possible scaling HPA from/to zero pods based on external metrics. If scaling is based on objects/external metrics, then when workloads are idle, you can automatically scale to 0 replicas to save resources. This feature should be especially useful for cases where workers are requesting GPU resources, and the number of different types of idle workers exceeds the number of available GPUs.
  • New client - k8s.io/client-go/metadata.Client - for "generalized" access to objects. It is designed to easily retrieve metadata (i.e. subsection metadata) from cluster resources and perform operations with them from the category of garbage collection and quota.
  • Build Kubernetes can now without obsolete ("built-in" in-tree) cloud providers (alpha version).
  • To the kubeadm utility added experimental (alpha) ability to apply customize patches during operations init, join и upgrade. Learn more about how to use the flag --experimental-kustomize, see in KEP.
  • New endpoint for apiserver - readyz, - allows you to export information about its readiness (readiness). Also, the API server has a flag --maximum-startup-sequence-duration, allowing you to control its restarts.
  • two features for Azure declared stable: support availability zones (Availability Zones) and cross resource group (RG). In addition, Azure added:
  • AWS has got support for EBS on Windows and optimized EC2 API calls DescribeInstances.
  • Kubeadm is now on its own migrates CoreDNS configuration when updating the CoreDNS version.
  • Binaries etcd in the corresponding Docker image done world-executable, which allows you to run this image without the need for root privileges. Also, the etcd migration image ceased support for the etcd2 version.
  • В Cluster Autoscaler 1.16.0 switched to using distroless as a base image, improved performance, added new cloud providers (DigitalOcean, Magnum, Packet).
  • Updates in used/dependant software: Go 1.12.9, etcd 3.3.15, CoreDNS 1.6.2.

PS

Read also on our blog:

Source: habr.com

Add a comment