Metric storage: how we moved from Graphite+Whisper to Graphite+ClickHouse

Hi all! In his last article I wrote about organizing a modular monitoring system for a microservice architecture. Nothing stands still, our project is constantly growing, and so is the number of stored metrics. How we organized the transition from Graphite + Whisper to Graphite + ClickHouse under high load conditions, read about the expectations from it and the results of the migration under the cut.

Metric storage: how we moved from Graphite+Whisper to Graphite+ClickHouse

Before I tell you how we organized the transition from storing metrics in Graphite + Whisper to Graphite + ClickHouse, I would like to give information about the reasons for making such a decision and about the disadvantages of Whisper that we lived with for a long time.

Graphite+Whisper problems

1. High load on the disk subsystem

At the time of the transition, approximately 1.5 million metrics per minute flew to us. With this flow, disk utilization on the servers was ~30%. In general, it was quite acceptable - everything worked stably, was written quickly, read quickly ... Until one of the development teams rolled out a new feature and began to send us 10 million metrics per minute. It was then that the disk subsystem tightened up, and we saw 100% utilization. The problem was quickly solved, but the sediment remained.

2. Lack of replication and consistency

Most likely, like everyone who uses / used Graphite + Whisper, we poured the same stream of metrics to several Graphite servers at once in order to create fault tolerance. And there were no special problems with this - until the moment when one of the servers did not fall for some reason. Sometimes we managed to bring up the fallen server quickly enough, and carbon-c-relay managed to fill it with metrics from its cache, and sometimes not. And then there was a hole in the metrics, which we covered with rsync. The procedure was quite long. Saved only by the fact that this happened very rarely. We also periodically took a random set of metrics and compared them with other similar ones on neighboring nodes of the cluster. In about 5% of cases, several values ​​differed, which did not make us very happy.

3. Large amount of space occupied

Since we write in Graphite not only infrastructure, but also business metrics (and now also metrics from Kubernetes), we quite often get a situation in which there are only a few values ​​in the metric, and the .wsp file is created taking into account the entire retention period, and takes up a pre-allocated amount of space, which we had ~ 2 MB. The problem is aggravated by the fact that there are a lot of such files over time, and when building reports on them, reading empty points takes a lot of time and resources.

I would like to note right away that the problems described above can be dealt with by various methods and with varying degrees of efficiency, but the more data you begin to receive, the more they become aggravated.

Having all of the above (taking into account the previous Articles), as well as a constant increase in the number of received metrics, the desire to transfer all metrics to a storage interval of 30 seconds. (up to 10 seconds if necessary), we decided to try Graphite+ClickHouse as a promising alternative to Whisper.

Graphite+ClickHouse. expectations

Having visited several meetups of the guys from Yandex, having read a couple of articles on Habré, having gone through the documentation and found sane components for tying ClickHouse under Graphite, we decided to act!

Would like to get the following:

  • reduce disk subsystem utilization from 30% to 5%;
  • reduce the amount of space occupied from 1TB to 100GB;
  • be able to receive 100 million metrics per minute to the server;
  • data replication and fault tolerance out of the box;
  • do not sit on this project for a year and make the transition for some sane period;
  • switch without downtime.

Pretty ambitious, right?

Graphite+ClickHouse. Components

To receive data via the Graphite protocol and then write them to ClickHouse, I chose carbon-clickhouse (golang).

The last ClickHouse release of the stable version 1.1.54253 was chosen as the database for storing time series. When working with it, there were problems: a mountain of errors poured into the logs, and it was not entirely clear what to do with them. In discussion with Roman Lomonosov (author of carbon-clickhouse, graphite-clickhouse and much more) the older one was chosen release 1.1.54236. Errors disappeared - everything began to work with a bang.

To read data from ClickHouse was selected graphite-clickhouse (golang). As an API for Graphite − carbonapi (golang). To organize replication between tables, ClickHouse was used zookeeper. For routing metrics, we left our beloved carbon-c-relay (C) (see previous article).

Graphite+ClickHouse. Table structure

“graphite” is a database we created for monitoring tables.

“graphite.metrics” is a table with the ReplicatedReplacingMergeTree engine (replicated ReplacingMergeTree). This table stores the names of the metrics and the paths to them.

CREATE TABLE graphite.metrics ( Date Date, Level UInt32, Path String, Deleted UInt8, Version UInt32 ) ENGINE = ReplicatedReplacingMergeTree('/clickhouse/tables/replicator/graphite.metrics', ‘r1’, Date, (Level, Path), 8192, Version);

“graphite.data” is a table with the ReplicatedGraphiteMergeTree engine (replicated GraphiteMergeTree). This table stores metric values.

CREATE TABLE graphite.data ( Path String, Value Float64, Time UInt32, Date Date, Timestamp UInt32 ) ENGINE = ReplicatedGraphiteMergeTree('/clickhouse/tables/replicator/graphite.data', 'r1', Date, (Path, Time), 8192, 'graphite_rollup')

“graphite.date_metrics” is a conditionally filled table with the ReplicatedReplacingMergeTree engine. This table contains the names of all the metrics that were encountered during the day. The reasons for the creation are described in the section "Problems" at the end of this article.

CREATE MATERIALIZED VIEW graphite.date_metrics ( Path String,  Level UInt32,  Date Date) ENGINE = ReplicatedReplacingMergeTree('/clickhouse/tables/replicator/graphite.date_metrics', 'r1', Date, (Level, Path, Date), 8192) AS SELECT toUInt32(length(splitByChar('.', Path))) AS Level, Date, Path FROM graphite.data

“graphite.data_stat” is a conditional table with the ReplicatedAggregatingMergeTree engine (replicated AggregatingMergeTree). This table records the number of incoming metrics, broken down to 4 levels of nesting.

CREATE MATERIALIZED VIEW graphite.data_stat ( Date Date,  Prefix String,  Timestamp UInt32,  Count AggregateFunction(count)) ENGINE = ReplicatedAggregatingMergeTree('/clickhouse/tables/replicator/graphite.data_stat', 'r1', Date, (Timestamp, Prefix), 8192) AS SELECT toStartOfMonth(now()) AS Date, replaceRegexpOne(Path, '^([^.]+.[^.]+.[^.]+).*$', '1') AS Prefix, toUInt32(toStartOfMinute(toDateTime(Timestamp))) AS Timestamp, countState() AS Count FROM graphite.data  GROUP BY Timestamp, Prefix

Graphite+ClickHouse. Scheme of interaction of components

Metric storage: how we moved from Graphite+Whisper to Graphite+ClickHouse

Graphite+ClickHouse. Data migration

As we remember from the expectations from this project, the transition to ClickHouse should be without downtime, respectively, we had to somehow switch our entire monitoring system to the new storage as transparently as possible for our users.
We did it this way.

  • A rule was added to carbon-c-relay to send an additional stream of metrics to the carbon-clickhouse of one of the servers involved in the replication of ClickHouse tables.

  • We wrote a small python script that, using the whisper-dump library, read all the .wsp files from our storage and sent this data to the carbon-clickhouse described above in 24 threads. The number of accepted metric values ​​in carbon-clickhouse reached 125 million / min., and ClickHouse did not even sweat.

  • We created a separate DataSource in Grafana in order to debug the functions used in existing dashboards. Revealed a list of features that we used, but they were not implemented in carbonapi. We finished these functions, and sent PRs to the authors of carbonapi (special thanks to them).

  • To switch the reading load in the balancer settings, we changed endpoints from graphite-api (API interface for Graphite+Whisper) to carbonapi.

Graphite+ClickHouse. results

  • reduced the utilization of the disk subsystem from 30% to 1%;

    Metric storage: how we moved from Graphite+Whisper to Graphite+ClickHouse

  • reduced the amount of space occupied from 1 TB to 300 GB;
  • we have the ability to receive 125 million metrics per minute per server (peaks at the time of migration);
  • transferred all metrics to a thirty-second storage interval;
  • received data replication and fault tolerance;
  • switched without downtime;
  • It took about 7 weeks for everything.

Graphite+ClickHouse. Problems

In our case, there were some pitfalls. Here's what we encountered after the transition.

  1. ClickHouse does not always reread configs on the fly, sometimes it needs to be reloaded. For example, in the case of the description of the zookeeper cluster in the ClickHouse config, it was not applied until the clickhouse-server was restarted.
  2. There were no large ClickHouse requests, so in our graphite-clickhouse, the ClickHouse connection string looks like this:
    url = "http://localhost:8123/?max_query_size=268435456&max_ast_elements=1000000"
  3. ClickHouse quite often releases new versions of stable releases, they may contain surprises: be careful.
  4. Dynamically created containers in kubernetes send a large number of metrics with a short and random lifetime. There are not many points according to such metrics, and there are no problems with the place. But when building queries, ClickHouse raises a huge number of these same metrics from the 'metrics' table. In 90% of cases, there is no data for them outside the window (24 hours). But the time spent searching for this data in the 'data' table is spent, and ultimately rests on a timeout. In order to solve this problem, we began to maintain a separate view with information on the metrics that were encountered during the day. Thus, when building reports (graphs) for dynamically created containers, we poll only those metrics that were encountered within the specified window, and not for the entire time, which greatly accelerated the creation of reports on them. For the above solution was collected graphite-clickhouse (fork), which includes the implementation of working with the date_metrics table.

Graphite+ClickHouse. tags

Since version 1.1.0 Graphite has become official support tags. And we are actively thinking about what and how to do to support this initiative in the graphite+clickhouse stack.

Graphite+ClickHouse. Anomaly detector

Based on the infrastructure described above, we have implemented a prototype anomaly detector, and it works! But about him - in the next article.

Subscribe, press the up arrow and be happy!

Source: habr.com

Add a comment