Load balancing in Openstack

In large cloud systems, the issue of automatic balancing or load balancing on computing resources is especially acute. Tionix also took care of this issue (a developer and operator of cloud services, we are part of the Rostelecom group of companies).

And, since our main development platform is Openstack, and we, like all people, are lazy, it was decided to pick up some kind of ready-made module that is already part of the platform. Our choice fell on Watcher, which we decided to use for our needs.
Load balancing in Openstack
First, let's look at terms and definitions.

Terms and definitions

Goal is a human-readable, observable and measurable end result to be achieved. There are one or more strategies to achieve each goal. A strategy is an implementation of an algorithm that is capable of finding a solution for a given goal.

Action is an elementary task that changes the current state of the target managed resource of the OpenStack cluster, such as: migrating a virtual machine (migration), changing the power state of a node (change_node_power_state), changing the state of the nova service (change_nova_service_state), changing the flavor (resize), registering a NOP message (nop), no action for a certain amount of time - pause (sleep), disk transfer (volume_migrate).

Action Plan - a specific flow of actions carried out in a certain order to achieve a specific Goal. The Action Plan also contains an assessed global performance with a set of performance indicators. The action plan is generated by the Watcher upon a successful audit, as a result of which the strategy used finds a solution to achieve the goal. An action plan consists of a list of sequential actions.

Audit is a cluster optimization request. Optimization is performed in order to achieve one Goal in a given cluster. For each successful audit, Watcher generates an Action Plan.

Audit Scope is a set of resources within which an audit is performed (availability zone(s), node aggregators, individual compute or storage nodes, etc.). The audit scope is defined in each template. If no audit scope is specified, the entire cluster is audited.

Audit Template — a saved set of settings for starting an audit. Templates are required in order to repeatedly run audits with the same settings. The template must necessarily contain the purpose of the audit, if the strategies are not specified, then the most appropriate of the existing strategies are selected.

Cluster is a set of physical machines that provide compute, storage, and network resources and are managed by the same OpenStack control node.

Cluster Data Model (CDM) is a logical representation of the current state and topology of the resources managed by the cluster.

Efficiency Indicator - An indicator that indicates how the solution created using this strategy is being implemented. Performance indicators are goal-specific and are typically used to calculate the global effectiveness of the resulting action plan.

Efficiency Specification is a set of specific features associated with each Goal that defines the various performance indicators that a strategy that ensures the achievement of the corresponding goal must provide in its solution. Indeed, each solution proposed by the strategy will be checked against the specification before its global effectiveness is calculated.

"Counting" engine (Scoring Engine) is an executable that has well-defined inputs, well-defined outputs, and performs a purely mathematical task. Thus, the calculation does not depend on the environment in which it is performed - it will give the same result anywhere.

Watcher Planner is part of the Watcher decision mechanism. This module takes the set of actions generated by the strategy and creates a workflow plan that defines how to schedule these different actions in time and for each action, what are the preconditions.

Goals and strategies Watcher

Goal
Strategy

Dummy goal
Dummy Strategy 

Dummy Strategy using sample Scoring Engines

Dummy strategy with resize

Saving Energy
Saving Energy Strategy

Server Consolidation
Basic Offline Server Consolidation

VM Workload Consolidation Strategy

Workload Balancing
Workload Balance Migration Strategy

Storage Capacity Balance Strategy

Workload stabilization

Noisy Neighbor
Noisy Neighbor

Thermal Optimization
Outlet temperature based strategy

Airflow Optimization
Uniform airflow migration strategy

hardware maintenance
Zone migration

unclassified
actuator

Dummy goal - reserved goal that is used for testing purposes.

Related strategies: Dummy Strategy, Dummy Strategy using sample Scoring Engines, and Dummy strategy with resize. Dummy strategy is a dummy strategy used for integration testing through Tempest. This strategy does not provide any useful optimization, its only purpose is to use Tempest tests.

Dummy strategy using sample Scoring Engines - the strategy is similar to the previous one, differs only in the use of a “scoring engine” sample that calculates using machine learning methods.

Dummy strategy with resize - the strategy is similar to the previous one, differs only in the use of flavor change (migration and resize).

Not used in production.

Saving Energy - Minimize energy consumption. The Saving Energy Strategy in conjunction with the VM Workload Consolidation Strategy (Server Consolidation) is able to implement Dynamic Power Management (DPM) features that conserve energy by dynamically consolidating workloads even during periods of low resource usage by migrating virtual machines to fewer nodes , and unnecessary nodes are disabled. After consolidation, the strategy proposes a decision to enable/disable nodes according to the given parameters: “min_free_hosts_num” is the number of free enabled hosts that are waiting for load, and “free_used_percent” is the percentage of free enabled hosts to the number of hosts that are occupied by machines. For the strategy to work, it must be turned on and configured Ironic to work with power on/off on the nodes.

Strategy parameters

parameter
type
by default
description

free_used_percent
Number
10.0
the ratio of the number of free computing nodes to the number of computing nodes with virtual machines

min_free_hosts_num
Int
1
minimum number of free computing nodes

The cloud must have at least two nodes. The method used is to change the node's power state (change_node_power_state). The strategy does not require the collection of metrics.

Server Consolidation - minimize the number of compute nodes (consolidation). It has two strategies: Basic Offline Server Consolidation and VM Workload Consolidation Strategy.

The Basic Offline Server Consolidation strategy minimizes the total number of servers used and also minimizes the number of migrations.

The basic strategy requires the following metrics:

metrics
service
plugins
comment

compute.node.cpu.percent
ceilometer
none
 

cpu_util
ceilometer
none
 

Strategy parameters: migration_attempts — number of combinations to search for potential candidates for shutdown (default, 0, no limits), period — time interval in seconds to get static aggregation from the metric data source (default, 700).

Methods used: migration, changing the state of the nova service (change_nova_service_state).

The VM Workload Consolidation Strategy is based on a first-fit heuristic algorithm that focuses on measured CPU utilization and attempts to minimize nodes that are over-loaded or under-loaded given resource capacity constraints. This strategy provides a solution that results in more efficient use of cluster resources using the following four steps:

  1. Unloading phase - processing of overused resources;
  2. Consolidation phase - handling underused resources;
  3. Solution optimization - reducing the number of migrations;
  4. Disable unused compute nodes.

The strategy requires the following metrics:

metrics
service
plugins
comment

memory
ceilometer
none
 

disk.root.size
ceilometer
none
 

The following metrics are optional, but improve strategy accuracy if available:

metrics
service
plugins
comment

memory.resident
ceilometer
none
 

cpu_util
ceilometer
none
 

Strategy parameters: period — time interval in seconds for getting static aggregation from the metric data source (default, 3600).

Uses the same methods as the previous strategy. More here.

Workload Balancing — balance the workload between computing nodes. The goal has three strategies: Workload Balance Migration Strategy, Workload stabilization, Storage Capacity Balance Strategy.

The Workload Balance Migration Strategy triggers VM migrations based on the host VM workload. A migration decision is made whenever the node's % CPU or RAM usage exceeds the specified threshold. In doing so, the moved VM should bring the host closer to the average workload of all hosts.

Requirements

  • Use of physical processors;
  • At least two physical compute nodes;
  • Installed and configured Ceilometer component - ceilometer-agent-compute, running on each compute node, and Ceilometer API, as well as collecting the following metrics:

metrics
service
plugins
comment

cpu_util
ceilometer
none
 

memory.resident
ceilometer
none
 

Strategy parameters:

parameter
type
by default
description

metrics
String
'cpu_util'
Metrics that underlie: 'cpu_util', 'memory.resident'.

threshold
Number
25.0
Workload threshold for migration.

Period
Number
300
The cumulative time period of the Ceilometer.

The method used is migration.

Workload stabilization is a strategy for workload stabilization using live migration. The strategy is based on the standard deviation algorithm and determines if there is congestion in the cluster and responds to it by running machine migrations to stabilize the cluster.

Requirements

  • Use of physical processors;
  • At least two physical compute nodes;
  • Installed and configured Ceilometer component - ceilometer-agent-compute, running on each compute node, and Ceilometer API, as well as collecting the following metrics:

metrics
service
plugins
comment

cpu_util
ceilometer
none
 

memory.resident
ceilometer
none
 

Storage Capacity Balance Strategy (implemented starting with Queens) - The strategy migrates disks depending on the load on Cinder pools. A transfer decision is made whenever the pool utilization exceeds the specified threshold. The disk being moved should bring the pool closer to the average load of all Cinder pools.

Requirements and restrictions

  • At least two Cinder pools;
  • Possibility of disk migration.
  • The cluster data model is Cinder cluster data model collector.

Strategy parameters:

parameter
type
by default
description

volume_threshold
Number
80.0
Disk threshold for volume balancing.

The method used is disk migration (volume_migrate).

Noisy Neighbor - Identify and migrate a "noisy neighbor" - a low priority VM that is negatively impacting the performance of a high priority VM in terms of IPC by overusing Last Level Cache. Own strategy: Noisy Neighbor (used strategy parameter is cache_threshold (default value is 35), when performance drops to the specified value, migration is started. For the strategy to work, enabled LLC (Last Level Cache) metrics, latest Intel server with CMT support, as well as collecting the following metrics:

metrics
service
plugins
comment

cpu_l3_cache
ceilometer
none
Requires Intel CMT.

Cluster data model (default): Nova cluster data model collector. The method used is migration.

Working with this goal through the Dashboard is not fully implemented in Queens.

Thermal Optimization - optimize the temperature regime. The outlet (exhaust air) temperature is one of the important thermal telemetry systems to measure the thermal/workload status of a server. There is one strategy for the target, the Outlet temperature based strategy, which makes decisions to move workloads to nodes with favorable temperature conditions (lowest outlet temperature) when the outlet temperature of the source hosts reaches a configurable threshold.

The strategy requires a server with installed and configured Intel Power Node Manager 3.0 or later, as well as collecting the following metrics:

metrics
service
plugins
comment

hardware.ipmi.node.outlet_temperature
ceilometer
IPMI
 

Strategy parameters:

parameter
type
by default
description

threshold
Number
35.0
Temperature threshold for migration.

Period
Number
30
The time interval, in seconds, to retrieve the statistical aggregation from the metric data source.

The method used is migration.

Airflow Optimization — optimize the ventilation mode. Own strategy - Uniform Airflow using live migration. The strategy triggers a virtual machine migration whenever the airflow from the server's fan exceeds the specified threshold.

For the strategy to work, you need:

  • Hardware: compute nodes <supporting NodeManager 3.0;
  • At least two compute nodes;
  • The ceilometer-agent-compute and Ceilometer API component installed and configured on each computing node, which can successfully report such metrics as air flow, system power, inlet temperature:

metrics
service
plugins
comment

hardware.ipmi.node.airflow
ceilometer
IPMI
 

hardware.ipmi.node.temperature
ceilometer
IPMI
 

hardware.ipmi.node.power
ceilometer
IPMI
 

The strategy requires a server with Intel Power Node Manager 3.0 or later installed and configured.

Limitations: The concept is not intended for production.

It is proposed to use this algorithm with continuous audits, since only one virtual machine is planned to be migrated per iteration.

Live migrations are possible.

Strategy parameters:

parameter
type
by default
description

threshold_airflow
Number
400.0
Airflow threshold for migration Unit is 0.1CFM

threshold_inlet_t
Number
28.0
Inlet temperature threshold for migration decision

threshold_power
Number
350.0
System power threshold for migration decision

Period
Number
30
The time interval, in seconds, to retrieve the statistical aggregation from the metric data source.

The method used is migration.

hardware maintenance — maintenance of hardware. The strategy related to this goal is Zone migration. The strategy is a tool for effective automatic and minimal migration of virtual machines and disks in case of hardware maintenance needs. The strategy builds a plan of action in accordance with the weights: a set of actions that has more weight will be planned before others. There are two configuration options: action weights (action_weights) and parallelization (parallelization).

Restrictions: Adjustment of action weights and parallelization is required.

Strategy parameters:

parameter
type
by default
description

compute_nodes
array
none
Compute nodes for migration.

storage_pools
array
none
Storage nodes for migration.

parallel_total
integer
6
The total number of activities to be performed in parallel.

parallel_per_node
integer
2
The number of actions performed in parallel for each compute node.

parallel_per_pool
integer
2
The number of actions to run in parallel for each storage pool.

priority
object
none
Priority list for virtual machines and disks.

with_attached_volume
boolean
False
False - Virtual machines will be migrated after all disks have been migrated. True - Virtual machines will be migrated after all attached disks have been migrated.

Elements of the array of computing nodes:

parameter
type
by default
description

src_node
string
none
The compute node from which the virtual machines are being migrated (required).

dst_node
string
none
Calculate the host to which the virtual machines are being migrated.

Storage nodes array elements:

parameter
type
by default
description

src_pool
string
none
The storage pool from which disks are being migrated (required).

dst_pool
string
none
The storage pool to which the disks are being migrated.

src_type
string
none
Source disk type (required).

dst_type
string
none
The resulting disk type (required).

Elements of object priority:

parameter
type
by default
description

project
array
none
Project names.

compute_node
array
none
Compute node names.

storage_pool
array
none
Storage pool names.

compute
enum
none
VM parameters [“vcpu_num”, “mem_size”, “disk_size”, “created_at”].

storage
enum
none
Disk parameters [“size”, “created_at”].

The methods used are virtual machine migration, disk migration.

unclassified — an auxiliary goal used to facilitate the strategy development process. It does not contain specifications and can be used whenever the strategy is not yet associated with an existing goal. This goal can also be used as a transitional stage. The strategy associated with this goal is Actuator.   

Create a new goal

Watcher Decision Engine has an “external goal” plugin interface that allows you to integrate an external goal that can be achieved with a strategy.

Before you create a new goal, you should make sure that none of the existing goals match your needs.

Creating a new plugin

To create a new target, you must: extend the target class, implement the class method get_name() to return the unique ID of the new target you want to create. This unique identifier must match the entry point name you declare later.

Next, you need to implement the class method get_display_name() to return the translated display name of the target you want to create (don't use a variable to return the translated string so it can be automatically collected by the translation tool.).

Implement a class method get_translatable_display_name()to return the translation key (actually the English display name) of your new target. The return value must match the string translated to get_display_name().

Implement its method get_efficacy_specification()to return the efficiency specification for your target. The get_efficacy_specification() method returns an Unclassified() instance provided by Watcher. This efficiency specification is useful in the process of developing your goal because it matches the empty specification.

Read more here

Watcher architecture (more here).

Load balancing in Openstack

Components

Load balancing in Openstack

Watch API - a component that implements the REST API provided by Watcher. Interaction mechanisms: CLI, Horizon plugin, Python SDK.

Watcher DB - Watcher database.

Watch Applier - a component that implements the execution of an action plan created by the Watcher Decision Engine component.

Watcher Decision Engine — the component responsible for computing a set of potential optimization actions to meet the audit objective. If no strategy is specified, the component chooses the most appropriate one on its own.

Watcher Metrics Publisher - a component that collects and calculates some metrics or events and publishes them to the CEP endpoint. The functionality of the component can also be provided by the Ceilometer publisher.

Complex Event Processing (CEP) Engine - engine of complex event processing. For performance reasons, there can be multiple CEP Engine instances running at the same time, each handling a specific type of metric/events. In the Watcher system, CEP launches two types of actions: - write the corresponding events / metrics to the time series database; - send appropriate events to the Watcher Decision Engine component when this event can affect the result of the current optimization strategy, since the Openstack cluster is not a static system.

The interaction of components is carried out using the AMQP protocol.

Watcher Configuration

Scheme of interaction with Watcher

Load balancing in Openstack

Watcher test results

  1. On the Optimization - Action plans 500 page, the error (both on pure Queens and on the stand with Tionix modules), appears only after the audit starts and the action plan is generated, the empty one opens normally.
  2. On the Action details tab of the error, it is not possible to get the purpose and strategy of the audit (both on pure Queens and on the stand with Tionix modules).
  3. Audits with the purpose of Dummy (test) are created and run normally, action plans are generated.
  4. Audits with the Unclassified target are not generated because the target is not functional and is intended to be an intermediate setting when creating new strategies.
  5. Workload Balancing (Storage Capacity balance strategy) audits are created successfully, but no action plan is generated. Storage pool optimization is not required.
  6. Audits against the Workload Balancing target (Workload Balance Migration Strategy) are created successfully, but no action plan is generated.
  7. Audits with the Workload Balancing target (Workload Stabilization Strategy) fail.
  8. Noisy Neighbor audits are created successfully, but no action plan is generated.
  9. Audits for the purpose of Hardware maintenance are created successfully, the action plan is not generated in full (performance indicators are generated, but the list of actions itself is not generated).
  10. Edits in nova.conf configs (in default section compute_monitors = cpu.virt_driver) on compute and control nodes do not fix errors.
  11. Audits against the Server Consolidation target (Basic strategy) also fail.
  12. Audits against the Server Consolidation target (VM workload consolidation strategy) fail. In the logs there is an error in obtaining the initial data. Discussing the error specifically here.
    We tried to specify Watcher in the config file (it didn’t help - as a result of an error on all Optimization pages, returning to the original contents of the config file does not correct the situation):

    [watcher_strategies.basic] datasource = ceilometer, gnocchi
  13. Audits with the Saving Energy objective fail. Judging by the logs, the problem is still in the absence of Ironic, it will not work without baremetal service.
  14. Audits with the Thermal Optimization target fail. The traceback is the same as for Server Consolidation (VM workload consolidation strategy) (source data error)
  15. Audits with the Airflow Optimization target fail.

The following audit completion errors also occur. Traceback in the decision-engine.log logs (the state of the cluster is not defined).

→ Bug discussion here

Conclusion

The result of our two-month research was the unequivocal conclusion that in order to obtain a full-fledged, working load balancing system, we will have to, in this part, come to grips with finalizing the toolkit for the Openstack platform.

Watcher has shown itself to be a serious and rapidly developing product with huge potential, which will require a lot of serious work to fully exploit it.

But more on that in the next articles in the series.

Source: habr.com

Add a comment