Autoscaling Kubernetes Applications with Prometheus and KEDA

Autoscaling Kubernetes Applications with Prometheus and KEDABalloon Man by Cimuanos

Scalability is a key requirement for cloud applications. With Kubernetes, scaling an application is as easy as increasing the number of replicas for a given deployment, or ReplicaSet - but it's a manual process.

Kubernetes allows you to automatically scale applications (i.e. Pods in a deployment or ReplicaSet) declaratively using the Horizontal Pod Autoscaler specification. The default criterion for autoscaling is CPU usage metrics (resource metrics), but you can integrate custom metrics and externally provided metrics.

Team Kubernetes aaS from Mail.ru translated an article on how to use external metrics to automatically scale a Kubernetes application. To show how everything works, the author uses HTTP access request metrics, they are collected using Prometheus.

Instead of horizontal pod autoscaling, Kubernetes Event Driven Autoscaling (KEDA) is an open source Kubernetes operator. It natively integrates with the Horizontal Pod Autoscaler to provide seamless autoscaling (including to/from zero) for event-driven workloads. The code is available at GitHub.

Brief overview of system operation

Autoscaling Kubernetes Applications with Prometheus and KEDA

The diagram is a brief description of how everything works:

  1. The app provides HTTP hit count metrics in Prometheus format.
  2. Prometheus is configured to collect these metrics.
  3. The Prometheus scaler in KEDA is configured to automatically scale the application based on the number of HTTP hits.

Now let's talk about each element in detail.

KEDA and Prometheus

Prometheus is an open source system monitoring and alerting toolkit, part Cloud Native Computing Foundation. Collects metrics from various sources and saves as time series data. To visualize data, you can use grafana or other visualization tools that work with the Kubernetes API.

KEDA supports the scaler concept - it acts as a bridge between KEDA and the external system. The scaler implementation is specific to each target system and extracts data from it. KEDA then uses them to control autoscaling.

Scalers support multiple data sources such as Kafka, Redis, Prometheus. That is, KEDA can be used to automatically scale Kubernetes deployments using Prometheus metrics as criteria.

test application

The Golang test application provides HTTP access and performs two important functions:

  1. Uses the Prometheus Go client library to instrument the application and provide an http_requests metric that contains a hit count. The endpoint where Prometheus metrics are available is located by URI /metrics.
    var httpRequestsCounter = promauto.NewCounter(prometheus.CounterOpts{
           Name: "http_requests",
           Help: "number of http requests",
       })
    
  2. In response to a request GET the application increments the value of the key (access_count) in Redis. This is an easy way to get the job done as part of an HTTP handler and also check Prometheus metrics. The value of the metric must be the same as the value access_count in Redis.
    func main() {
           http.Handle("/metrics", promhttp.Handler())
           http.HandleFunc("/test", func(w http.ResponseWriter, r 
    *http.Request) {
               defer httpRequestsCounter.Inc()
               count, err := client.Incr(redisCounterName).Result()
               if err != nil {
                   fmt.Println("Unable to increment redis counter", err)
                   os.Exit(1)
               }
               resp := "Accessed on " + time.Now().String() + "nAccess count " + strconv.Itoa(int(count))
               w.Write([]byte(resp))
           })
           http.ListenAndServe(":8080", nil)
       }
    

The application is deployed to Kubernetes via Deployment. It also creates a service ClusterIP, it allows the Prometheus server to get application metrics.

Here application deployment manifest.

Prometheus Server

The Prometheus deployment manifest consists of:

  • ConfigMap - to transfer the Prometheus config;
  • Deployment - to deploy Prometheus in a Kubernetes cluster;
  • ClusterIP β€” service for access to UI Prometheus;
  • ClusterRole, ClusterRoleBinding ΠΈ ServiceAccount - for auto-detection of services in Kubernetes (Auto-discovery).

Here manifest to run Prometheus.

KEDA Prometheus ScaledObject

The scaler acts as a bridge between KEDA and an external system from which to get metrics. ScaledObject is a custom resource, it needs to be deployed to synchronize the deployment with the event source, in this case Prometheus.

ScaledObject contains information about deployment scaling, metadata about the source of the event (for example, secrets for the connection, queue name), polling interval, recovery period, and other data. It leads to the appropriate autoscale resource (HPA definition) to scale the deployment.

When the object ScaledObject is deleted, its corresponding HPA definition is cleared.

Here is the definition ScaledObject for our example, it uses a scaler Prometheus:

apiVersion: keda.k8s.io/v1alpha1
kind: ScaledObject
metadata:
 name: prometheus-scaledobject
 namespace: default
 labels:
   deploymentName: go-prom-app
spec:
 scaleTargetRef:
   deploymentName: go-prom-app
 pollingInterval: 15
 cooldownPeriod:  30
 minReplicaCount: 1
 maxReplicaCount: 10
 triggers:
 - type: prometheus
   metadata:
     serverAddress: 
http://prometheus-service.default.svc.cluster.local:9090
     metricName: access_frequency
     threshold: '3'
     query: sum(rate(http_requests[2m]))

Consider the following points:

  1. He points to Deployment With name go-prom-app.
  2. trigger type - Prometheus. The Prometheus server address is mentioned along with the metric name, threshold, and PromQL queryto be used. PromQL Query βˆ’ sum(rate(http_requests[2m])).
  3. According to pollingInterval, KEDA queries Prometheus for a target every fifteen seconds. At least one pod is supported (minReplicaCount), and the maximum number of pods does not exceed maxReplicaCount (in this example, ten).

Can install minReplicaCount equal to zero. In this case, KEDA activates the deployment from zero to one, and then provides HPA for further automatic scaling. The reverse order is also possible, that is, scaling from one to zero. In the example, we didn't choose zero because it's an HTTP service and not an on-demand system.

The magic inside autoscale

The threshold value is used as a trigger for scaling the deployment. In our example, the PromQL query sum(rate (http_requests [2m])) returns the aggregate HTTP request rate (requests per second) measured over the last two minutes.

Since the threshold value is three, then there will be one pod until the value sum(rate (http_requests [2m])) less than three. If the value increases, an additional sub is added every time sum(rate (http_requests [2m])) increases by three. For example, if the value is from 12 to 14, then the number of pods is four.

Now let's try to customize!

presetting

All you need is a Kubernetes cluster and a configured utility kubectl. This example uses a cluster minikube, but you can take any other. To set up a cluster, there is guide.

Install latest version on Mac:

curl -Lo minikube 
https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64 
&& chmod +x minikube
sudo mkdir -p /usr/local/bin/
sudo install minikube /usr/local/bin/

Set kubectlto access the Kubernetes cluster.

Install latest version on Mac:

curl -LO 
"https://storage.googleapis.com/kubernetes-release/release/$(curl -s
https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/darwin/amd64/kubectl"
chmod +x ./kubectl
sudo mv ./kubectl /usr/local/bin/kubectl
kubectl version

Installing KEDA

You can deploy KEDA in several ways, they are listed in documentation. I'm using monolithic YAML:

kubectl apply -f
https://raw.githubusercontent.com/kedacore/keda/master/deploy/KedaScaleController.yaml

KEDA and its components are installed in the namespace keda. Command to check:

kubectl get pods -n keda

Wait until it starts under KEDA Operator - it will switch to Running State. And after that, continue.

Installing Redis with Helm

If you don't have Helm installed, use this leadership. Command to install on Mac:

brew install kubernetes-helm
helm init --history-max 200

helm init initializes the local command line interface and also sets Tiller to a Kubernetes cluster.

kubectl get pods -n kube-system | grep tiller

Wait for the Tiller Pod to enter the Running state.

Translator's Note: The author uses Helm@2, which requires the Tiller server component to be installed. Now Helm@3 is relevant, the server part is not needed for it.

After installing Helm, one command is enough to start Redis:

helm install --name redis-server --set cluster.enabled=false --set 
usePassword=false stable/redis

Verify that Redis started successfully:

kubectl get pods/redis-server-master-0

Wait until under Redis enters state Running.

Application Deployment

Deployment command:

kubectl apply -f go-app.yaml

//output
deployment.apps/go-prom-app created
service/go-prom-app-service created

Check that everything is running:

kubectl get pods -l=app=go-prom-app

Wait for Redis to transition to state Running.

Prometheus Server Deployment

The Prometheus Manifesto uses Kubernetes Service Discovery for Prometheus. It allows you to dynamically discover application pods based on a service label.

kubernetes_sd_configs:
   - role: service
   relabel_configs:
   - source_labels: [__meta_kubernetes_service_label_run]
     regex: go-prom-app-service
     action: keep

For deployment:

kubectl apply -f prometheus.yaml

//output
clusterrole.rbac.authorization.k8s.io/prometheus created
serviceaccount/default configured
clusterrolebinding.rbac.authorization.k8s.io/prometheus created
configmap/prom-conf created
deployment.extensions/prometheus-deployment created
service/prometheus-service created

Check that everything is running:

kubectl get pods -l=app=prometheus-server

Wait until under Prometheus enters the state Running.

Use kubectl port-forward to access the Prometheus UI (or API server) at http://localhost:9090.

kubectl port-forward service/prometheus-service 9090

Deploying the KEDA Autoscale Configuration

Command to create ScaledObject:

kubectl apply -f keda-prometheus-scaledobject.yaml

Check the logs of the KEDA operator:

KEDA_POD_NAME=$(kubectl get pods -n keda 
-o=jsonpath='{.items[0].metadata.name}')
kubectl logs $KEDA_POD_NAME -n keda

The result looks something like this:

time="2019-10-15T09:38:28Z" level=info msg="Watching ScaledObject:
default/prometheus-scaledobject"
time="2019-10-15T09:38:28Z" level=info msg="Created HPA with 
namespace default and name keda-hpa-go-prom-app"

Check under applications. One instance must be running because minReplicaCount equals 1:

kubectl get pods -l=app=go-prom-app

Verify that the HPA resource was successfully created:

kubectl get hpa

You should see something like:

NAME                   REFERENCE                TARGETS     MINPODS   MAXPODS   REPLICAS   AGE
keda-hpa-go-prom-app   Deployment/go-prom-app   0/3 (avg)   1         10        1          45s

Health Check: App Access

To access our application's REST endpoint, run:

kubectl port-forward service/go-prom-app-service 8080

You can now access the Go application using the address http://localhost:8080. To do this, run the command:

curl http://localhost:8080/test

The result looks something like this:

Accessed on 2019-10-21 11:29:10.560385986 +0000 UTC 
m=+406004.817901246
Access count 1

At this point, also check Redis. You will see that the key access_count increased to 1:

kubectl exec -it redis-server-master-0 -- redis-cli get access_count
//output
"1"

Make sure the metric value http_requests the same:

curl http://localhost:8080/metrics | grep http_requests
//output
# HELP http_requests number of http requests
# TYPE http_requests counter
http_requests 1

Creating a load

We will use hey - utility for load generation:

curl -o hey https://storage.googleapis.com/hey-release/hey_darwin_amd64 
&& chmod a+x hey

You can also download the utility for Linux or Windows.

Run it:

./hey http://localhost:8080/test

By default, the utility sends 200 requests. You can verify this using Prometheus metrics as well as Redis.

curl http://localhost:8080/metrics | grep http_requests
//output
# HELP http_requests number of http requests
# TYPE http_requests counter
http_requests 201
kubectl exec -it redis-server-master-0 -- redis-cli get access_count
//output
201

Validate the value of the actual metric (returned by the PromQL query):

curl -g 
'http://localhost:9090/api/v1/query?query=sum(rate(http_requests[2m]))'
//output
{"status":"success","data":{"resultType":"vector","result":[{"metric":{},"value":[1571734214.228,"1.686057971014493"]}]}}

In this case, the actual result is 1,686057971014493 and displayed in the field value. This is not enough for scaling, since the threshold we set is 3.

More load!

In the new terminal, keep track of the number of application pods:

kubectl get pods -l=app=go-prom-app -w

Let's increase the load with the command:

./hey -n 2000 http://localhost:8080/test

After a while, you will see that HPA is scaling up the deployment and launching new pods. Check HPA to be sure:

kubectl get hpa
NAME                   REFERENCE                TARGETS         MINPODS   MAXPODS   REPLICAS   AGE
keda-hpa-go-prom-app   Deployment/go-prom-app   1830m/3 (avg)   1         10        6          4m22s

If the load is intermittent, the deployment will shrink to the point where only one Pod is running. If you want to check the actual metric (returned by the PromQL query), then use the command:

curl -g 
'http://localhost:9090/api/v1/query?query=sum(rate(http_requests[2m]))'

cleaning

//Delete KEDA
kubectl delete namespace keda
//Delete the app, Prometheus server and KEDA scaled object
kubectl delete -f .
//Delete Redis
helm del --purge redis-server

Conclusion

KEDA allows you to automatically scale your Kubernetes deployments (to/from zero) based on data from external metrics. For example, based on Prometheus metrics, queue length in Redis, consumer latency in a Kafka topic.

KEDA integrates with an external source and also provides its metrics through Metrics Server for the Horizontal Pod Autoscaler.

Good Luck!

What else to read:

  1. Best Practices and Best Practices for Running Containers and Kubernetes in Production Environments.
  2. 90+ Useful Tools for Kubernetes: Deployment, Management, Monitoring, Security and More.
  3. Our channel Around Kubernetes in Telegram.

Source: habr.com

Add a comment