Building a Scalable API on AWS Spot Instances

Hi all! My name is Kirill, I'm CTO at Adapty. Most of our architecture is on AWS, and today I'm going to talk about how we cut server costs by 3x by using Spot Instances in production, and how to autoscale them. First there will be an overview of how it works, and then detailed instructions for getting started.

What are Spot Instances?

Spot instances are servers of other AWS users that are currently idle and they are selling them at a big discount (Amazon writes up to 90%, in our experience ~3x, varies by region, AZ and instance type). Their main difference from the usual ones is that they can turn off at any time. Therefore, for a long time we thought that it was normal to use them for dev environments, or for tasks of calculating something, with saving intermediate results on S3 or in the database, but not for selling. There are third-party solutions that allow you to use spots on prod, but there are a lot of crutches for our case, so we did not implement them. The approach described in the article works completely within the framework of the standard AWS functionality, without additional scripts, crons, etc.

Here are some screenshots that show the price history for Spot Instances.

m5.large in eu-west-1 (Ireland). The price has been mostly stable for 3 months, saving 2.9x at the moment.

Building a Scalable API on AWS Spot Instances

m5.large in the us-east-1 (N. Virginia) region. Price fluctuates continuously over the course of 3 months, with current savings ranging from 2.3x to 2.8x depending on availability zone.

Building a Scalable API on AWS Spot Instances

t3.small in us-east-1 (N. Virginia). Price stable for 3 months, currently saving 3.4x.

Building a Scalable API on AWS Spot Instances

Service architecture

The basic architecture of the service, which we will talk about in this article, is shown in the diagram below.

Building a Scalable API on AWS Spot Instances

Application Load Balancer → EC2 Target Group → Elastic Container Service

An Application Load Balancer (ALB) is used as a balancer, which sends requests to the EC2 Target Group (TG). The TG is responsible for opening ports for the ALBs on the instances and binding them to the Elastic Container Service (ECS) container ports. ECS is an analogue of Kubernetes in AWS, which deals with the management of Docker containers.

On one instance, there can be several running containers with the same ports, so we cannot set them fixed. ECS tells TG that it is starting a new task (in Kubernetes terminology, this is called a pod), it checks for free ports on the instance and assigns one of them to the task being launched. Also, TG regularly checks whether the instance and api are working on it using health check, and if it sees any problems, it stops sending requests there.

EC2 Auto Scaling Groups + ECS Capacity Providers

The diagram above does not show the EC2 Auto Scaling Groups (ASG) service. From the name you can understand that it is responsible for scaling instances. At the same time, until recently, AWS did not have a built-in ability to manage the number of running machines from ECS. ECS allowed scaling the number of tasks, for example, by CPU usage, RAM, or the number of requests. But if tasks occupied all free instances, then new machines did not automatically rise.

This has changed with the advent of ECS Capacity Providers (ECS CP). Now each service in the ECS can be associated with ASG, and if the tasks do not fit on running instances, then new ones will rise (but within the established ASG limits). This also works in the opposite direction, if the ECS CP sees idle instances without tasks, then it will instruct ASG to turn them off. ECS CP has the ability to specify the target percentage of instances loading, so that a certain number of machines are always free for quick scaling of tasks, I will talk about this a little later.

EC2 Launch Templates

The last service that I will talk about before moving on to a detailed description of the creation of this infrastructure is EC2 Launch Templates. It allows you to create a template by which all machines will start, so as not to repeat this every time from scratch. Here you can select the type of machine to start, security group, disk image, and many other options. You can also specify user data that will be uploaded to all launched instances. You can run scripts in user data, for example, you can edit the contents of a file ECS agent configuration.

One of the most important configuration options in this article is ECS_ENABLE_SPOT_INSTANCE_DRAINING=true. If this setting is enabled, then as soon as the ECS receives a signal that the Spot Instance is being picked up, it transfers all tasks that work on it to the Draining status. No new tasks will be assigned to this instance, if there are tasks that want to roll out to it right now, they are cancelled. Requests from the balancer also stop coming. The instance deletion notification arrives 2 minutes before the actual event. Therefore, if your service does not perform tasks for more than 2 minutes and does not save anything to disk, then you can use Spot Instances without data loss.

About the disk - AWS recently did it is possible to use Elastic File System (EFS) together with ECS, with this scheme even a disk is not an obstacle, but we have not tried it, since in principle we do not need a disk to store state. By default, after receiving a SIGINT (sent at the moment the task is transferred to the Draining status), all running tasks will be stopped after 30 seconds, even if they have not yet completed, you can change this time using the parameter ECS_CONTAINER_STOP_TIMEOUT. The main thing is not to expose it for more than 2 minutes for spot cars.

Service creation

We proceed directly to the creation of the described service. In the process, I will describe in addition a few useful points that were not mentioned above. In general, this is a step-by-step instruction, but I will not consider some very basic or, on the contrary, very specific cases. All actions are performed in the AWS visual console, but they can be replayed programmatically using CloudFormation or Terraform. At Adapty, we use Terraform.

EC2 Launch Template

In this service, the configuration of the machines that will be used is created. Templates are managed in the EC2 -> Instances -> Launch templates section.

Amazon machine image (AMI) - specify the disk image with which all instances will be launched. For ECS, in most cases, you should use the optimized image from Amazon. It is regularly updated and contains everything you need to run ECS. To find out the current image ID, go to the page Amazon ECS-optimized AMIs, select the used region and copy the AMI ID for it. For example, for the us-east-1 region, the current ID at the time of writing is ami-00c7c1cf5bdc913ed. This ID must be inserted into the Specify a custom value item.

instance type - Specify the instance type. Choose the one that best suits your task.

Key pair (login) - specify the certificate with which you can connect to the instance via SSH, if necessary.

Network settings - Specify the network settings. networking platform in most cases should be a Virtual Private Cloud (VPC). security groups - security groups for your instances. Since we will use the balancer before the instances, I recommend specifying a group here that allows incoming connections only from the balancer. That is, you will have 2 security groups, one for the balancer, which allows incoming (inbound) connections from everywhere on ports 80 (http) and 443 (https), and the second for machines, which allows incoming connections on any ports from the balancer group. Outgoing (outbound) connections in both groups must be opened via the TCP protocol to all ports to all addresses. You can restrict ports and addresses for outgoing connections, but then you need to constantly monitor that you are not trying to access somewhere on a closed port.

Storage (volumes) - specify the parameters of disks for machines. The disk size cannot be less than that specified in AMI, for ECS Optimized - 30 GiB.

advanced details - specify additional parameters.

Purchasing options - whether we want to buy Spot Instances. We want to, but we will not check this box here, we will configure it in the Auto Scaling Group, there are more options.

IAM instance profile - specify the role with which the instances will be launched. In order for instances to work in ECS, they need rights, which usually lie in the role ecsInstanceRole. In some cases it can be created, if not, then here instruction about how to do it. After creation, we specify it in the template.
Then there are many parameters, basically you can leave default values ​​everywhere, but each of them has a clear description. I always enable the EBS-optimized instance and T2/T3 Unlimited options if used burstable instances.

User time - Specify user data. We will edit the file /etc/ecs/ecs.config, which contains the configuration of the ECS agent.
An example of what user data might look like:

#!/bin/bash
echo ECS_CLUSTER=DemoApiClusterProd >> /etc/ecs/ecs.config
echo ECS_ENABLE_SPOT_INSTANCE_DRAINING=true >> /etc/ecs/ecs.config
echo ECS_CONTAINER_STOP_TIMEOUT=1m >> /etc/ecs/ecs.config
echo ECS_ENGINE_AUTH_TYPE=docker >> /etc/ecs/ecs.config
echo "ECS_ENGINE_AUTH_DATA={"registry.gitlab.com":{"username":"username","password":"password"}}" >> /etc/ecs/ecs.config

ECS_CLUSTER=DemoApiClusterProd - the parameter indicates that the instance belongs to a cluster with a given name, that is, this cluster will be able to host its tasks on this server. We have not yet created a cluster, but we will use this name when creating it.

ECS_ENABLE_SPOT_INSTANCE_DRAINING=true — the parameter indicates that when a signal is received about the shutdown of a Spot instance, all tasks on it should be transferred to the Draining status.

ECS_CONTAINER_STOP_TIMEOUT=1m - parameter specifies that after receiving the SIGINT signal, all tasks have 1 minute before they are killed.

ECS_ENGINE_AUTH_TYPE=docker - the parameter indicates that the docker scheme is used as the authorization mechanism

ECS_ENGINE_AUTH_DATA=... - connection parameters to the private container registry, where your Docker images are stored. If it is public, then nothing needs to be specified.

In this article, I will use a public image from Docker Hub, so specify the parameters ECS_ENGINE_AUTH_TYPE и ECS_ENGINE_AUTH_DATA not necessary.

Good to know: it is recommended to update the AMI regularly, because the new versions update the versions of Docker, Linux, ECS agent, etc. To keep this in mind, you can set up notifications about the release of new versions. You can receive notifications by email and manually update, or you can write a Lambda function that will automatically create a new version of the Launch Template with an updated AMI.

EC2 Auto Scaling Group

The Auto Scaling Group is responsible for launching and scaling instances. Groups are managed in the EC2 -> Auto Scaling -> Auto Scaling Groups section.

launch template - select the template created in the previous step. Leave the default version.

Purchase options and instance types - specify the types of instances for the cluster. Adhere to launch template uses the instance type from Launch Template. Combine purchase options and instance types allows you to flexibly configure instance types. We will use it.

optional on-demand base - the number of regular, non-spot instances that will always work.

On-Demand percentage above base - the percentage of regular and spot instances, 50-50 will be distributed equally, 20-80 for each regular instance will be raised 4 spot. For this example, I will indicate 50-50, but in reality we most often do 20-80, in some cases 0-100.

Instance types - here you can specify additional types of instances that will be used in the cluster. We never used because I don't really understand the meaning of this story. Maybe it's the limits for specific types of instances, but they increase easily through support. If you know the application, I will be glad to read in the comments)

Building a Scalable API on AWS Spot Instances

Network - network settings, select VPC and subnets for machines, in most cases you should select all available subnets.

load balancing - balancer settings, but we will do it separately, we do not touch anything here. health checks will also be configured later.

Group size - specify the limits on the number of machines in the cluster and the desired number of machines at the start. The number of machines in the cluster will never be less than the minimum specified and greater than the maximum, even if the metrics should scale.

Scaling policies - scaling parameters, but we will scale based on running ECS ​​tasks, so we will configure the scaling later.

Instance scale-in protection - protection of instances from deletion when scaling down. We enable it so that ASG does not delete a machine that has running tasks. ECS Capacity Provider will disable protection for instances that do not have tasks.

add tags - you can specify tags for instances (for this, the Tag new instances checkbox must be checked). I recommend that you specify the Name tag, then all instances that are launched within the group will have the same name, it is convenient to view them in the console.

Building a Scalable API on AWS Spot Instances

After creating a group, open it and go to the Advanced configurations section, which is why not all options are visible in the console at the creation stage.

Termination policies - rules that are taken into account when deleting instances. They are applied in order. We usually use the ones in the picture below. First, instances with the oldest Launch Template are deleted (for example, if we updated the AMI, we created a new version, but all instances managed to switch to it). Then the instances that are closest to the next checkout time in terms of billing are selected. And then the oldest ones are selected by launch date.

Building a Scalable API on AWS Spot Instances

Good to know: to update all machines in the cluster, it is convenient to use Instance Refresh. Combine this with the Lambda function from the previous step and you have a fully automated instance update system. Before upgrading all machines, you must disable instance scale-in protection for all instances in the group. Not a setting in a group, but protection from the machines themselves, this is done on the Instance management tab.

Application Load Balancer and EC2 Target Group

The balancer is created in the EC2 → Load Balancing → Load Balancers section. We will use the Application Load Balancer, a comparison of different types of load balancers can be found at service page.

listeners - it makes sense to make ports 80 and 443 and redirect from 80 to 443 later using the balancer rules.

Availability Zones - in most cases, we select all access zones.

Configure Security Settings - here you specify the SSL certificate for the balancer, the most convenient option is make a certificate in ACM. About differences Security Policy can be read in documentation, you can leave it selected by default ELBSecurityPolicy-2016-08. After creating a balancer, you will see it dns name, to which you need to configure the CNAME for your domain. For example, this is how it looks in Cloudflare.

Building a Scalable API on AWS Spot Instances

Security Group - create or select a security group for the balancer, I wrote more about this a little higher in the EC2 Launch Template → Network settings section.

Target group - we create a group that is responsible for routing requests from the balancer to machines and checks their availability in order to replace them in case of problems. target type must be Instance, Protocol и Port (The Harbour District) any, if you use HTTPS to communicate between the balancer and instances, then you need to upload a certificate to them. In this example, we will not do this, we will just leave port 80.

health checks — service health check parameters. In a real service, this should be a separate request that implements important parts of the business logic, for this example, I will leave the default settings. Next, you can select the request interval, timeout, success codes, etc. In our example, we will specify Success codes 200-399, because the Docker image that will be used returns a 304 code.

Building a Scalable API on AWS Spot Instances

Register Targets - machines for the group are selected here, but in our case, ECS will deal with this, so we just skip this step.

Good to know: at the level of the balancer, you can enable logs that will be saved in S3 in a certain format. From there, they can be exported to third-party services for analytics, or you can make SQL queries directly on the data in S3 with help of Athena. It is convenient and works without any additional code. I also recommend setting up the deletion of logs from the S3 bucket after a specified period of time.

ECS Task Definition

In the previous steps, we created everything related to the service infrastructure, now we move on to describing the containers that we will run. This is done in the ECS → Task Definitions section.

Launch type compatibility - select EC2.

Task execution IAM role - choose ecsTaskExecutionRole. With it, logs are written, access to secret variables is given, etc.

In the Container Definitions section, click Add Container.

Image - link to the image with the project code, in this example I will use a public image from Docker Hub bitnami/node-example:0.0.1.

Memory Limits - memory limits for the container. hard limit - a hard limit, if the container goes beyond the specified value, then the docker kill command will be executed, the container will die immediately. Soft limit - soft limit, the container can go beyond the specified value, but this parameter will be taken into account when placing tasks on machines. For example, if the machine has 4 GiB of RAM, and the soft limit of the container is 2048 MiB, then this machine can have a maximum of 2 running tasks with this container. In reality, 4 GiB of RAM is slightly less than 4096 MiB, which can be viewed on the ECS Instances tab in the cluster. Soft limit cannot be greater than hard limit. It is important to understand that if there are several containers in one task, then their limits are summed up.

Port mappings - In host port specify 0, which means that the port will be assigned dynamically, it will be tracked by the Target Group. container port - the port on which your application runs is often set in the command for execution, or assigned in your application code, Dockerfile, etc. For our example, we use 3000 because it is specified in Dockerfile image used.

health check - container health check parameters, not to be confused with the one configured in the Target Group.

Environment - environment settings. CPU units - similar to Memory limits, only about the processor. Each processor core is 1024 units, so if the server has a dual-core processor, and the container has a value of 512, then 4 tasks with this container can be launched on one server. CPU units always correspond to the number of cores, they cannot be a little less, as in the case of memory.

Command - command to start the service inside the container, all parameters are separated by commas. It could be gunicorn, npm, etc. If not specified, then the value of the CMD directive from the Dockerfile will be used. Specify npm,start.

environment variables - container environment variables. It can be either just text data or secret variables from Secret Manager or Parameter Store.

Storage and Logging - here we will configure logging in CloudWatch Logs (log service from AWS). To do this, just enable the Auto-configure CloudWatch Logs checkbox. After creating the Task Definition, a group of logs will be automatically created in CloudWatch. By default, logs are stored in it indefinitely, I recommend changing the Retention period from Never Expire to the required period. This is done in CloudWatch Log groups, you need to click on the current period and select a new one.

Building a Scalable API on AWS Spot Instances

ECS Cluster and ECS Capacity Provider

Go to the ECS → Clusters section to create a cluster. We choose EC2 Linux + Networking as a template.

cluster name - very important, we make the same name here as specified in the Launch Template in the parameter ECS_CLUSTER, in our case - DemoApiClusterProd. Check the box Create an empty cluster. Optionally, you can enable Container Insights to view service metrics in CloudWatch. If you did everything correctly, then in the ECS Instances section you will see machines that were created in the Auto Scaling group.

Building a Scalable API on AWS Spot Instances

Go to the tab Capacity Providers and create a new one. Let me remind you that it is needed in order to manage the creation and shutdown of machines, depending on the number of running ECS ​​tasks. It is important to note that a provider can only be associated with one group.

auto scaling group - select the previously created group.

managed scaling - enable so that the provider can scale the service.

target capacity % - what percentage of loading cars with tasks we need. If you specify 100%, then all machines will always be occupied by running tasks. If you specify 50%, then half of the cars will always be free. In this case, if there is a sharp jump in load, new taxis will immediately get on free cars, without having to wait for the deployment of instances.

Managed termination protection - enable, this parameter allows the provider to remove the protection of instances from deletion. This happens when there are no active tasks on the machine and allows Target capacity %.

ECS Service and scaling setup

The last step :) To create a service, you need to go to the previously created cluster on the Services tab.

launch type - you need to click on Switch to capacity provider strategy and select the previously created provider.

Building a Scalable API on AWS Spot Instances

Task Definition - select the previously created Task Definition and its revision.

Service name - in order not to get confused, we always specify the same as the Task Definition.

service type - always Replica.

Number of tasks — the desired number of active tasks in the service. This parameter is controlled by scaling, but it still needs to be specified.

Minimum healthy percent и maximum percent - determine the behavior of tasks during deployment. The default values ​​of 100 and 200 indicate that at the time of deployment, the number of tasks will increase by a factor of two, and then return to the desired one. If 1 task works for you, min=0, and max=100, then during deployment it will be killed, and after that a new one will rise, that is, it will be simple. If 1 task works, min=50, max=150, then the deployment will not happen at all, because 1 task cannot be divided in half or increased by one and a half times.

Deployment type - leave Rolling update.

Placement Templates - rules for placing tasks on cars. The default is AZ Balanced Spread - this means who each new task will be placed on a new instance until the machines in all availability zones rise. We usually do BinPack - CPU and Spread - AZ, with this policy, tasks are placed as densely as possible on one machine per CPU. If a new machine needs to be created, it is created in a new availability zone.

Building a Scalable API on AWS Spot Instances

load balancer type - select Application Load Balancer.

Service IAM role - choose ecsServiceRole.

Load balancer name - select the previously created balancer.

Health check grace period - a pause before performing health checks after rolling out a new task, we usually set 60 seconds.

Container to load balance - in the Target group name item, select the group created earlier, and everything will be automatically filled.

Building a Scalable API on AWS Spot Instances

Service Auto Scaling — service scaling parameters. Select Configure Service Auto Scaling to adjust your service's desired count. Set the minimum and maximum number of tasks when scaling.

IAM role for Service Auto Scaling - choose AWSServiceRoleForApplicationAutoScaling_ECSService.

Automatic task scaling policies — Rules for scaling. There are 2 types:

  1. Target tracking - tracking the target metric (CPU / RAM usage or the number of requests for each task). For example, we want the average CPU load to be 85%, when it gets higher, then new tasks will be added until it reaches the target value. If the load is lower, then the tasks, on the contrary, will be removed if protection against scaling down is not enabled (Disable scale-in).
  2. step scaling - reaction to a random event. Here you can set up a reaction to any event (CloudWatch Alarm), when it occurs, you can add or remove the specified number of tasks, or specify the exact number of tasks.

A service can have several scaling rules, this can be useful, the main thing is to make sure that they do not conflict with each other.

Conclusion

If you followed the instructions and used the same Docker image, your service should return such a page.

Building a Scalable API on AWS Spot Instances

  1. We have created a template by which all the machines in the service start. We also learned how to update cars when the template changes.
  2. We have set up the processing of the Spot instance stop signal, so within a minute after receiving it, all running tasks are removed from the machine, so nothing is lost or interrupted.
  3. We raised the balancer to evenly distribute the load across the machines.
  4. We have created a service that runs on Spot Instances, due to this, the cost of machines is reduced by about 3 times.
  5. We set up autoscaling in both directions to handle the increase in loads, but at the same time not pay for downtime.
  6. We use the Capacity Provider so that the application manages the infrastructure (machines) and not vice versa.
  7. We are great.

If you have predictable load spikes, for example, you are advertising in a large email campaign, you can adjust the scaling by schedule.

You can also scale based on data from different parts of your system. For example, we have a function mailing individual promotional offers mobile app users. Sometimes the campaign is sent to 1M+ people. There is always a large increase in API requests after such a mailing, as many users access the application at the same time. So if we see that the queue for sending promo pushes has become significantly more than the standard indicators, we can immediately launch several additional machines and tasks to be ready for the load.

I would be glad if you tell in the comments interesting cases of using Spot Instances and ECS, or something related to scaling.

Soon there will be articles about how we process thousands of analytical events per second on a predominantly serverless stack (with money) and how services are deployed using GitLab CI and Terraform Cloud.

Subscribe to us, it will be interesting!

Only registered users can participate in the survey. Sign in, you are welcome.

Do you use spot instances in production?

  • 22,2%Yes6

  • 66,7%No18

  • 11,1%I learned about them from the article, I plan to use3

27 users voted. 5 users abstained.

Source: habr.com

Add a comment