Amazon Redshift Parallel Scaling Guide and Test Results

Amazon Redshift Parallel Scaling Guide and Test Results

We at Skyeng use Amazon Redshift, including parallel scaling, so we found this article by Stefan Gromoll, founder of dotgo.com, for intermix.io interesting. After the translation - a bit of our experience from the engineer according to Daniyar Belkhodzhaev.

Amazon Redshift Architecture allows scaling by adding new nodes to the cluster. The need to cope with peak requests can lead to over-provisioning of nodes. Parallel scaling (Concurrency Scaling), as opposed to adding new nodes, increases computing power as needed.

Amazon Redshift parallel scaling gives Redshift clusters additional capacity to handle peak demand. It works by pushing requests to new "parallel" clusters in the background. Requests are routed based on WLM configuration and rules.

Pricing for parallel scaling is based on a credit model with a free usage rate. Over Free Credits are billed based on the time the Parallel Scaling Cluster processes requests.

The author tested parallel scaling on one of the internal clusters. In this post, he will talk about the results of testing and give tips on how to get started.

Cluster Requirements

To use parallel scaling, an Amazon Redshift cluster must meet the following requirements:

- platform: EC2-VPC;
- node type: dc2.8xlarge, ds2.8xlarge, dc2.large, or ds2.xlarge;
- number of nodes: 2 to 32 (single node clusters are not supported).

Valid Request Types

Parallel scaling is not suitable for all types of queries. In the first version, it only processes read requests that meet three conditions:

- Read-only SELECT queries (although more types are planned);
- the query does not refer to a table with the INTERLEAVED sorting style;
- The query does not use Amazon Redshift Spectrum to refer to external tables.

To be routed to a Parallel Scaling Cluster, a request must be queued. In addition, requests that are suitable for the queue SQA (Short Query Acceleration), will not run on Parallel Scaling Clusters.

Queues and SQA require proper configuration Redshift Workload Management (WLM). We recommend optimizing your WLM first - this will reduce the need for parallel scaling. And this is important because parallel scaling is only free for a certain number of hours. AWS claims that parallel scaling will be free for 97% of customers, which brings us to the question of pricing.

Cost of parallel scaling

For parallel scaling, AWS offers a credit model. Each active cluster Amazon RedShift accumulates credits hourly, up to one hour of free parallel scaling credits per day.

You only pay when the usage of Parallel Scaling Clusters exceeds the amount of credits you received.

Pricing is based on a per-second on-demand rate for a parallel cluster that is used in excess of the free quota. You only get charged for the time your queries run, with a minimum charge of one minute, each time a Parallel Scaling Cluster is activated. Per-second on-demand rate is calculated based on general pricing principles Amazon RedShift, i.e. depends on the type of node and the number of nodes in your cluster.

Running Parallel Scaling

Parallel scaling is run for each WLM queue. Go to the AWS Redshift console and select Workload Management from the left navigation menu. Select your cluster's WLM settings group from the following drop-down menu.

You will see a new column called "Concurrency Scaling Mode" next to each queue. The default is "Disabled". Click "Edit" and you can change the settings for each queue.

Amazon Redshift Parallel Scaling Guide and Test Results

Configuration

Parallel scaling works by directing appropriate requests to new dedicated clusters. New clusters have the same size (type and number of nodes) as the main cluster.

The number of clusters used for parallel scaling defaults to one (1), with a configurable total of ten (10) clusters.
The total number of clusters for parallel scaling can be set by the max_concurrency_scaling_clusters parameter. Increasing this value provides additional standby clusters.

Amazon Redshift Parallel Scaling Guide and Test Results

Monitoring

The AWS Redshift console has several additional charts. The "Max Configured Concurrency Scaling Clusters" chart displays the value of max_concurrency_scaling_clusters over time.

Amazon Redshift Parallel Scaling Guide and Test Results

The number of active scaling clusters is displayed in the user interface under "Concurrency Scaling Activity":

Amazon Redshift Parallel Scaling Guide and Test Results

The Queries tab has a column showing whether the query was run on the primary cluster or a Parallel Scaling cluster:

Amazon Redshift Parallel Scaling Guide and Test Results

Whether a particular query was run on the main cluster or through a parallel scaling cluster, it is stored in stl_query.concurrency_scaling_status.

Amazon Redshift Parallel Scaling Guide and Test Results

A value of 1 indicates that the query was run on a Parallel Scaling Cluster, while other values ​​indicate that it was run on the primary cluster.

Example:

Amazon Redshift Parallel Scaling Guide and Test Results

Parallel scaling information is also stored in some other tables and views, such as SVCS_CONCURRENCY_SCALING_USAGE. In addition, there are a number of catalog tables that store information about parallel scaling.

The results

The authors started parallel scaling for one queue in the internal cluster at approximately 18/30/00 29.03.2019:3:20 GMT. Changed the max_concurrency_scaling_clusters parameter to 30 at approximately 00/29.03.2019/XNUMX XNUMX:XNUMX:XNUMX.

To simulate a request queue, and reduced the number of slots for this queue from 15 to 5.

Below is a chart of the intermix.io dashboard showing the number of running and queued requests after the number of slots has been reduced.

Amazon Redshift Parallel Scaling Guide and Test Results

We can see that the queue time for requests has increased, with the maximum time being over 5 minutes.

Amazon Redshift Parallel Scaling Guide and Test Results

Here is the relevant information from the AWS console on what happened during this time:

Amazon Redshift Parallel Scaling Guide and Test Results

Redshift launched three (3) parallel scaling clusters as configured. It appears that these clusters were not fully utilized, even though many of the requests in our cluster were queued.

The usage graph correlates with the zoom activity graph:

Amazon Redshift Parallel Scaling Guide and Test Results

A few hours later, the authors checked the queue and it looks like 6 queries were running with parallel scaling. We also selectively checked two requests through the user interface. We did not check how to use these values ​​when several parallel clusters are active at once.

Amazon Redshift Parallel Scaling Guide and Test Results

Conclusions

Parallel scaling can reduce the time spent in the request queue during peak periods.

Based on the results of the baseline test, it turned out that the situation with loading requests has partially improved. However, parallel scaling alone did not solve all concurrency problems.

This is due to restrictions on the types of queries that can use parallel scaling. For example, authors have many tables with interleaved sort keys, and most of our workload is writing.

Although parallel scaling is not a one-size-fits-all solution to WLM setup, using this feature is simple and straightforward anyway.

Therefore, the author recommends using it for your WLM queues. Start with a single parallel cluster and monitor the peak load through the console to determine if the new clusters are being fully utilized.

As AWS adds support for additional query types and tables, parallel scaling should gradually become more and more efficient.

Comment from Daniyar Belkhodzhaev, Skyeng Data Engineer

We at Skyeng also immediately noticed the possibility of parallel scaling.
The functionality is very attractive, especially considering that according to AWS, most users will not even have to pay extra for it.

It so happened that in mid-April we had an unusual flurry of requests to the Redshift cluster. During this period, we often resorted to the help of Concurrency Scaling, sometimes an additional cluster worked 24 hours a day without stopping.

This allowed, if not completely solving the problem with queues, then at least making the situation acceptable.

Our observations largely coincide with the impression of the guys from intermix.io.

We also noticed that despite having requests waiting in the queue, not all requests were immediately redirected to the parallel cluster. Apparently this is due to the fact that the parallel cluster still takes time to start. As a result, during short-term peak loads, we still have small queues, and the corresponding alarms have time to work.

After getting rid of abnormal loads in April, we, as AWS intended, entered the occasional mode - within the free quota.
You can track parallel scaling costs in AWS Cost Explorer. You need to select Service - Redshift, Usage Type - CS, for example USW2-CS:dc2.large.

You can read more about prices in Russian here.

Source: habr.com

Add a comment