Save on Kubernetes Cloud Costs on AWS

The translation of the article was prepared on the eve of the start of the course "Infrastructure platform based on Kubernetes".

Save on Kubernetes Cloud Costs on AWS

How to save on cloud costs when working with Kubernetes? There is no one right answer, but this article outlines a few tools to help you manage your resources more efficiently and reduce cloud computing costs.

I wrote this article with Kubernetes for AWS in mind, but it will apply (almost) the same way to other cloud providers. I'm assuming your cluster(s) already have autoscaling configured (cluster-autoscaler). Removing resources and downsizing your deployment will only save money if it also reduces your fleet of worker nodes (EC2 instances).

This article will cover:

Cleaning Up Unused Resources

Working in a rapidly changing environment is great. We want technical organizations accelerated. Faster software delivery also means more PR rollouts, preview environments, prototypes, and analytics solutions. Everything deployed on Kubernetes. Who has time to clean up test deployments manually? It's easy to forget about deleting a week old experiment. The cloud bill will eventually go up because we forgot to close:

Save on Kubernetes Cloud Costs on AWS

(Henning Jacobs:
Giza:
(quoted by) Corey Quinn:
Myth: Your AWS score is a function of how many users you have.
Fact: Your AWS score is a function of how many engineers you have.

Ivan Kurnosov (in reply):
Real fact: Your AWS account is a function of how many things you forgot to disable/remove.)

Kubernetes Janitor (kube-janitor) helps clean up your cluster. The janitor configuration is flexible for both global and local use:

  • General rules for the entire cluster can determine the maximum time to live (TTL - time-to-live) for PR / test deployments.
  • Individual resources can be annotated with janitor/ttl, for example to automatically remove the spike/prototype after 7 days.

The general rules are defined in the YAML file. Its path is passed as a parameter --rules-file in kube-janitor. Here is an example rule to remove all namespaces with -pr- in the name two days later:

- id: cleanup-resources-from-pull-requests
  resources:
    - namespaces
  jmespath: "contains(metadata.name, '-pr-')"
  ttl: 2d

The following example regulates the use of the application label on Deployment and StatefulSet pods for all new Deployments/StatefulSet in 2020, but at the same time allows tests to run without this label for a week:

- id: require-application-label
  # ΡƒΠ΄Π°Π»ΠΈΡ‚ΡŒ deployments ΠΈ statefulsets Π±Π΅Π· ΠΌΠ΅Ρ‚ΠΊΠΈ "application"
  resources:
    - deployments
    - statefulsets
  # см. http://jmespath.org/specification.html
  jmespath: "!(spec.template.metadata.labels.application) && metadata.creationTimestamp > '2020-01-01'"
  ttl: 7d

Running a limited time demo for 30 minutes on the cluster where kube-janitor is running:

kubectl run nginx-demo --image=nginx
kubectl annotate deploy nginx-demo janitor/ttl=30m

Another source of rising costs is persistent volumes (AWS EBS). Deleting a Kubernetes StatefulSet does not delete its persistent volumes (PVC - PersistentVolumeClaim). Unused EBS volumes can easily add up to hundreds of dollars per month. Kubernetes Janitor has a feature to clean up unused PVCs. For example, this rule will remove all PVCs that are not mounted by the module and are not referenced by a StatefulSet or CronJob:

# ΡƒΠ΄Π°Π»ΠΈΡ‚ΡŒ всС PVC, ΠΊΠΎΡ‚ΠΎΡ€Ρ‹Π΅ Π½Π΅ смонтированы ΠΈ Π½Π° ΠΊΠΎΡ‚ΠΎΡ€Ρ‹Π΅ Π½Π΅ ΡΡΡ‹Π»Π°ΡŽΡ‚ΡΡ StatefulSets
- id: remove-unused-pvcs
  resources:
  - persistentvolumeclaims
  jmespath: "_context.pvc_is_not_mounted && _context.pvc_is_not_referenced"
  ttl: 24h

Kubernetes Janitor can help you keep your cluster clean and prevent slowly accumulating cloud computing costs. For deployment and configuration instructions, see README.

Zoom out after business hours

Test and staging systems are usually required to run only during business hours. Some production applications like back office/admin tools also require only limited availability and can be turned off at night.

Kubernetes Downscaler (kube-downscaler) allows users and operators to downscale the system during off-hours. Deployments and StatefulSets can scale to zero replicas. CronJobs may be suspended. Kubernetes Downscaler is configurable for the entire cluster, one or more namespaces, or individual resources. You can set either "idle time" or vice versa "work time". For example, to zoom out as much as possible during the night and weekends:

image: hjacobs/kube-downscaler:20.4.3
args:
  - --interval=30
  # Π½Π΅ ΠΎΡ‚ΠΊΠ»ΡŽΡ‡Π°Ρ‚ΡŒ ΠΊΠΎΠΌΠΏΠΎΠ½Π΅Π½Ρ‚Ρ‹ инфраструктуры
  - --exclude-namespaces=kube-system,infra
  # Π½Π΅ ΠΎΡ‚ΠΊΠ»ΡŽΡ‡Π°Ρ‚ΡŒ kube-downscaler, Π° Ρ‚Π°ΠΊΠΆΠ΅ ΠΎΡΡ‚Π°Π²ΠΈΡ‚ΡŒ Postgres Operator, Ρ‡Ρ‚ΠΎΠ±Ρ‹ ΠΈΡΠΊΠ»ΡŽΡ‡Π΅Π½Π½Ρ‹ΠΌΠΈ Π‘Π” ΠΌΠΎΠΆΠ½ΠΎ Π±Ρ‹Π»ΠΎ ΡƒΠΏΡ€Π°Π²Π»ΡΡ‚ΡŒ
  - --exclude-deployments=kube-downscaler,postgres-operator
  - --default-uptime=Mon-Fri 08:00-20:00 Europe/Berlin
  - --include-resources=deployments,statefulsets,stacks,cronjobs
  - --deployment-time-annotation=deployment-time

Here is a schedule for scaling cluster worker nodes over the weekend:

Save on Kubernetes Cloud Costs on AWS

Scaling down from ~13 to 4 worker nodes certainly makes a noticeable difference in the AWS bill.

But what if I need to work during the "downtime" of the cluster? Certain deployments can be permanently excluded from scaling by adding the downscaler/exclude: true annotation. Deployments can be temporarily excluded using the downscaler/exclude-until annotation with an absolute timestamp in YYYY-MM-DD HH:MM (UTC) format. If necessary, the entire cluster can be scaled back by deploying an annotated pod downscaler/force-uptime, for example, by running the nginx blank:

kubectl run scale-up --image=nginx
kubectl annotate deploy scale-up janitor/ttl=1h # ΡƒΠ΄Π°Π»ΠΈΡ‚ΡŒ Ρ€Π°Π·Π²Π΅Ρ€Ρ‚Ρ‹Π²Π°Π½ΠΈΠ΅ Ρ‡Π΅Ρ€Π΅Π· час
kubectl annotate pod $(kubectl get pod -l run=scale-up -o jsonpath="{.items[0].metadata.name}") downscaler/force-uptime=true

See README kube-downscalerif you are interested in deployment instructions and additional options.

Use horizontal autoscaling

Many applications/services deal with a dynamic loading pattern: sometimes their modules are idle, and sometimes they run at full capacity. Running a constant fleet of pods to handle the peak load is not economical. Kubernetes supports horizontal autoscaling across a resource HorizontalPodAutoscaler (HPA). CPU usage is often a good metric for scaling:

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: my-app
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 3
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        averageUtilization: 100
        type: Utilization

Zalando created a component to easily connect custom metrics for scaling: Kube Metrics Adapter (kube-metrics-adapter) is a generic metrics adapter for Kubernetes that can collect and serve custom and external metrics for pod autoscaling horizontally. It supports scaling based on Prometheus metrics, SQS queues, and other settings. For example, to scale a deployment for a custom metric provided by the application itself as JSON in /metrics, use:

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: myapp-hpa
  annotations:
    # metric-config.<metricType>.<metricName>.<collectorName>/<configKey>
    metric-config.pods.requests-per-second.json-path/json-key: "$.http_server.rps"
    metric-config.pods.requests-per-second.json-path/path: /metrics
    metric-config.pods.requests-per-second.json-path/port: "9090"
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: myapp
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: Pods
    pods:
      metric:
        name: requests-per-second
      target:
        averageValue: 1k
        type: AverageValue

Configuring horizontal autoscaling with HPA should be one of the default performance improvements for stateless services. Spotify has a presentation with their experience and recommendations for HPA: scale your deployments, not your wallet.

Reducing redundant resource reservations

Kubernetes workloads determine their CPU/memory needs through "resource requests". CPU resources are measured in virtual cores or more commonly in "millicores" (millicores), for example, 500m implies 50% vCPU. Memory resources are measured in bytes, and common suffixes can be used, such as 500Mi, meaning 500 megabytes. Resource requests "lock" capacity on worker nodes, meaning a pod with a CPU request of 1000m on a node with 4 vCPUs will leave only 3 vCPUs available to other pods. [1]

Slack (redundant reserve) is the difference between requested resources and actual usage. For example, a pod that requests 2 GiB of memory but only uses 200 MiB has ~1,8 GiB of β€œexcess” memory. Excess costs money. It can be roughly estimated that 1 GiB of excess memory costs ~$10/month. [2]

Kubernetes Resource Report (kube-resource-report) displays excess reserves and can help you identify savings potential:

Save on Kubernetes Cloud Costs on AWS

Kubernetes Resource Report shows the excess aggregated by application and team. This allows you to find places where resource requests can be lowered. The generated HTML report only provides a snapshot of resource usage. You should look at cpu/memory usage over time to determine adequate resource requests. Here is a Grafana chart for a "typical" CPU-heavy service, with all pods using significantly less than the 3 requested CPU cores:

Save on Kubernetes Cloud Costs on AWS

Reducing the CPU request from 3000m to ~400m frees up resources for other workloads and allows the cluster to be smaller.

β€œAverage CPU usage of EC2 instances often fluctuates in the single digit percentage range.” writes Corey Quinn. While for EC2 estimating the right size can be a bad decision, modifying some of the Kubernetes resource requests in the YAML file is easy and can bring huge savings.

But do we really want people to change values ​​in YAML files? No, machines can do it much better! Kubernetes Vertical Pod Autoscaler (VPA) does just that: adapts resource requests and limits according to the workload. Here is an example graph of Prometheus CPU requests (thin blue line) adapted by VPA over time:

Save on Kubernetes Cloud Costs on AWS

Zalando uses VPA in all its clusters for infrastructure components. Non-critical applications can also use VPA.

Goldilocks by Fairwind is a tool that creates a VPA for each deployment in a namespace and then displays a VPA recommendation in its dashboard. It can help developers set the correct cpu/memory requests for their applications:

Save on Kubernetes Cloud Costs on AWS

I wrote a small Blog post about VPA in 2019, and recently in CNCF End User Community discussed VPA issue.

Using EC2 Spot Instances

Last but not least, AWS EC2 costs can be reduced by using Spot Instances as Kubernetes worker nodes [3]. Spot Instances are available at up to 90% off on-demand pricing. Running Kubernetes on EC2 Spot is a good combo: you need to specify multiple different instance types for higher availability, meaning you can get a larger node for the same or lower price, and the increased capacity can be used by containerized Kubernetes workloads.

How to run Kubernetes on EC2 Spot? There are several options: use a third party service like SpotInst (now called "Spot", don't ask me why), or just add Spot AutoScalingGroup (ASG) to your cluster. For example, here is a CloudFormation snippet for a "capacity-optimized" Spot ASG with multiple instance types:

MySpotAutoScalingGroup:
 Properties:
   HealthCheckGracePeriod: 300
   HealthCheckType: EC2
   MixedInstancesPolicy:
     InstancesDistribution:
       OnDemandPercentageAboveBaseCapacity: 0
       SpotAllocationStrategy: capacity-optimized
     LaunchTemplate:
       LaunchTemplateSpecification:
         LaunchTemplateId: !Ref LaunchTemplate
         Version: !GetAtt LaunchTemplate.LatestVersionNumber
       Overrides:
         - InstanceType: "m4.2xlarge"
         - InstanceType: "m4.4xlarge"
         - InstanceType: "m5.2xlarge"
         - InstanceType: "m5.4xlarge"
         - InstanceType: "r4.2xlarge"
         - InstanceType: "r4.4xlarge"
   LaunchTemplate:
     LaunchTemplateId: !Ref LaunchTemplate
     Version: !GetAtt LaunchTemplate.LatestVersionNumber
   MinSize: 0
   MaxSize: 100
   Tags:
   - Key: k8s.io/cluster-autoscaler/node-template/label/aws.amazon.com/spot
     PropagateAtLaunch: true
     Value: "true"

Some notes on using Spot with Kubernetes:

  • You need to handle Spot terminations, for example by draining the node at an instance stop
  • Zalando uses fork official cluster autoscaling with node pool priorities
  • Spot nodes can be forced accept β€œregistrations” of workloads to run on Spot

Summary

I hope you find some of the tools presented useful in reducing your cloud computing bill. You can find most of the content of the article also in my talk at DevOps Gathering 2019 on YouTube and as slides.

What are your best practices for saving cloud costs on Kubernetes? Please let me know at Twitter (@try_except_).

[1] In fact, less than 3 vCPUs will remain usable as host throughput is reduced by reserved system resources. Kubernetes distinguishes between the physical capacity of a host and "provisioned" resources (Node Allocatable).

[2] Calculation example: one copy of m5.large with 8 GiB of memory is ~84 USD per month (eu-central-1, On-Demand), i.e. blocking 1/8 node is approximately ~$10 per month.

[3] There are many more ways to reduce your EC2 bill, such as Reserved Instances, Savings Plan, etc. - I won't cover those topics here, but you should definitely check them out!

Learn more about the course.

Source: habr.com

Add a comment