How to save on cloud costs when working with Kubernetes? There is no one right answer, but this article outlines a few tools to help you manage your resources more efficiently and reduce cloud computing costs.
I wrote this article with Kubernetes for AWS in mind, but it will apply (almost) the same way to other cloud providers. I'm assuming your cluster(s) already have autoscaling configured (cluster-autoscaler). Removing resources and downsizing your deployment will only save money if it also reduces your fleet of worker nodes (EC2 instances).
Working in a rapidly changing environment is great. We want technical organizations accelerated. Faster software delivery also means more PR rollouts, preview environments, prototypes, and analytics solutions. Everything deployed on Kubernetes. Who has time to clean up test deployments manually? It's easy to forget about deleting a week old experiment. The cloud bill will eventually go up because we forgot to close:
(Henning Jacobs:
Giza:
(quoted by) Corey Quinn:
Myth: Your AWS score is a function of how many users you have.
Fact: Your AWS score is a function of how many engineers you have.
Ivan Kurnosov (in reply):
Real fact: Your AWS account is a function of how many things you forgot to disable/remove.)
Kubernetes Janitor (kube-janitor) helps clean up your cluster. The janitor configuration is flexible for both global and local use:
General rules for the entire cluster can determine the maximum time to live (TTL - time-to-live) for PR / test deployments.
Individual resources can be annotated with janitor/ttl, for example to automatically remove the spike/prototype after 7 days.
The general rules are defined in the YAML file. Its path is passed as a parameter --rules-file in kube-janitor. Here is an example rule to remove all namespaces with -pr- in the name two days later:
The following example regulates the use of the application label on Deployment and StatefulSet pods for all new Deployments/StatefulSet in 2020, but at the same time allows tests to run without this label for a week:
Running a limited time demo for 30 minutes on the cluster where kube-janitor is running:
kubectl run nginx-demo --image=nginx
kubectl annotate deploy nginx-demo janitor/ttl=30m
Another source of rising costs is persistent volumes (AWS EBS). Deleting a Kubernetes StatefulSet does not delete its persistent volumes (PVC - PersistentVolumeClaim). Unused EBS volumes can easily add up to hundreds of dollars per month. Kubernetes Janitor has a feature to clean up unused PVCs. For example, this rule will remove all PVCs that are not mounted by the module and are not referenced by a StatefulSet or CronJob:
Kubernetes Janitor can help you keep your cluster clean and prevent slowly accumulating cloud computing costs. For deployment and configuration instructions, see README.
Zoom out after business hours
Test and staging systems are usually required to run only during business hours. Some production applications like back office/admin tools also require only limited availability and can be turned off at night.
Kubernetes Downscaler (kube-downscaler) allows users and operators to downscale the system during off-hours. Deployments and StatefulSets can scale to zero replicas. CronJobs may be suspended. Kubernetes Downscaler is configurable for the entire cluster, one or more namespaces, or individual resources. You can set either "idle time" or vice versa "work time". For example, to zoom out as much as possible during the night and weekends:
Here is a schedule for scaling cluster worker nodes over the weekend:
Scaling down from ~13 to 4 worker nodes certainly makes a noticeable difference in the AWS bill.
But what if I need to work during the "downtime" of the cluster? Certain deployments can be permanently excluded from scaling by adding the downscaler/exclude: true annotation. Deployments can be temporarily excluded using the downscaler/exclude-until annotation with an absolute timestamp in YYYY-MM-DD HH:MM (UTC) format. If necessary, the entire cluster can be scaled back by deploying an annotated pod downscaler/force-uptime, for example, by running the nginx blank:
kubectl run scale-up --image=nginx
kubectl annotate deploy scale-up janitor/ttl=1h # ΡΠ΄Π°Π»ΠΈΡΡ ΡΠ°Π·Π²Π΅ΡΡΡΠ²Π°Π½ΠΈΠ΅ ΡΠ΅ΡΠ΅Π· ΡΠ°Ρ
kubectl annotate pod $(kubectl get pod -l run=scale-up -o jsonpath="{.items[0].metadata.name}") downscaler/force-uptime=true
See README kube-downscalerif you are interested in deployment instructions and additional options.
Use horizontal autoscaling
Many applications/services deal with a dynamic loading pattern: sometimes their modules are idle, and sometimes they run at full capacity. Running a constant fleet of pods to handle the peak load is not economical. Kubernetes supports horizontal autoscaling across a resource HorizontalPodAutoscaler (HPA). CPU usage is often a good metric for scaling:
Zalando created a component to easily connect custom metrics for scaling: Kube Metrics Adapter (kube-metrics-adapter) is a generic metrics adapter for Kubernetes that can collect and serve custom and external metrics for pod autoscaling horizontally. It supports scaling based on Prometheus metrics, SQS queues, and other settings. For example, to scale a deployment for a custom metric provided by the application itself as JSON in /metrics, use:
Configuring horizontal autoscaling with HPA should be one of the default performance improvements for stateless services. Spotify has a presentation with their experience and recommendations for HPA: scale your deployments, not your wallet.
Reducing redundant resource reservations
Kubernetes workloads determine their CPU/memory needs through "resource requests". CPU resources are measured in virtual cores or more commonly in "millicores" (millicores), for example, 500m implies 50% vCPU. Memory resources are measured in bytes, and common suffixes can be used, such as 500Mi, meaning 500 megabytes. Resource requests "lock" capacity on worker nodes, meaning a pod with a CPU request of 1000m on a node with 4 vCPUs will leave only 3 vCPUs available to other pods. [1]
Slack (redundant reserve) is the difference between requested resources and actual usage. For example, a pod that requests 2 GiB of memory but only uses 200 MiB has ~1,8 GiB of βexcessβ memory. Excess costs money. It can be roughly estimated that 1 GiB of excess memory costs ~$10/month. [2]
Kubernetes Resource Report (kube-resource-report) displays excess reserves and can help you identify savings potential:
Kubernetes Resource Report shows the excess aggregated by application and team. This allows you to find places where resource requests can be lowered. The generated HTML report only provides a snapshot of resource usage. You should look at cpu/memory usage over time to determine adequate resource requests. Here is a Grafana chart for a "typical" CPU-heavy service, with all pods using significantly less than the 3 requested CPU cores:
Reducing the CPU request from 3000m to ~400m frees up resources for other workloads and allows the cluster to be smaller.
βAverage CPU usage of EC2 instances often fluctuates in the single digit percentage range.β writes Corey Quinn. While for EC2 estimating the right size can be a bad decision, modifying some of the Kubernetes resource requests in the YAML file is easy and can bring huge savings.
But do we really want people to change values ββin YAML files? No, machines can do it much better! Kubernetes Vertical Pod Autoscaler (VPA) does just that: adapts resource requests and limits according to the workload. Here is an example graph of Prometheus CPU requests (thin blue line) adapted by VPA over time:
Goldilocks by Fairwind is a tool that creates a VPA for each deployment in a namespace and then displays a VPA recommendation in its dashboard. It can help developers set the correct cpu/memory requests for their applications:
Last but not least, AWS EC2 costs can be reduced by using Spot Instances as Kubernetes worker nodes [3]. Spot Instances are available at up to 90% off on-demand pricing. Running Kubernetes on EC2 Spot is a good combo: you need to specify multiple different instance types for higher availability, meaning you can get a larger node for the same or lower price, and the increased capacity can be used by containerized Kubernetes workloads.
How to run Kubernetes on EC2 Spot? There are several options: use a third party service like SpotInst (now called "Spot", don't ask me why), or just add Spot AutoScalingGroup (ASG) to your cluster. For example, here is a CloudFormation snippet for a "capacity-optimized" Spot ASG with multiple instance types:
What are your best practices for saving cloud costs on Kubernetes? Please let me know at Twitter (@try_except_).
[1] In fact, less than 3 vCPUs will remain usable as host throughput is reduced by reserved system resources. Kubernetes distinguishes between the physical capacity of a host and "provisioned" resources (Node Allocatable).
[2] Calculation example: one copy of m5.large with 8 GiB of memory is ~84 USD per month (eu-central-1, On-Demand), i.e. blocking 1/8 node is approximately ~$10 per month.
[3] There are many more ways to reduce your EC2 bill, such as Reserved Instances, Savings Plan, etc. - I won't cover those topics here, but you should definitely check them out!