Ma'ajiyar awo: yadda muka sauya daga Graphite+whisper zuwa Graphite+ClickHouse

Sannu duka! A cikin nasa labarin karshe Na rubuta game da tsara tsarin sa ido na zamani don gine-ginen microservice. Babu wani abu da ya tsaya cak, aikinmu yana ci gaba da girma, haka kuma adadin ma'aunin da aka adana. Yadda muka tsara sauyi daga Graphite+Whisper zuwa Graphite+ClickHouse a ƙarƙashin babban yanayin kaya, karanta game da tsammanin daga gare ta da sakamakon ƙaura a ƙarƙashin yanke.

Ma'ajiyar awo: yadda muka sauya daga Graphite+whisper zuwa Graphite+ClickHouse

Kafin in gaya muku yadda muka tsara sauyi daga adana ma'auni a cikin Graphite+Whisper zuwa Graphite+ClickHouse, Ina so in ba da bayani game da dalilan yanke irin wannan shawarar da kuma game da rashin amfanin Wasiƙar da muka yi rayuwa tare da su na dogon lokaci.

Matsalolin Graphite+ Wasiwa

1. High load a kan faifai subsystem

A lokacin miƙa mulki, kusan ma'auni miliyan 1.5 suna zuwa mana a cikin minti ɗaya. Tare da irin wannan kwarara, amfani da faifai akan sabobin ya kasance ~ 30%. Gabaɗaya, wannan abu ne mai karɓuwa - duk abin da ya yi aiki da ƙarfi, an rubuta shi da sauri, karanta da sauri ... Har sai ɗaya daga cikin ƙungiyoyin ci gaba ya fitar da sabon fasalin kuma ya fara aiko mana da ma'auni miliyan 10 a cikin minti daya. Wannan shine lokacin da tsarin faifan diski ya ƙara ƙarfi, kuma mun ga amfani 100%. Da sauri aka warware matsalar, amma saura ya rage.

2. Rashin maimaitawa da daidaito

Mafi mahimmanci, kamar duk wanda ke amfani da/amfani da Graphite+Whisper, mun zubar da ma'auni iri ɗaya akan sabar Graphite da yawa lokaci guda don ƙirƙirar haƙuri mara kyau. Kuma babu wasu matsaloli na musamman game da wannan - har zuwa lokacin da ɗaya daga cikin sabobin ya fado saboda wasu dalilai. Wani lokaci mukan yi nasarar ɗaukar sabar da ta faɗi cikin sauri, kuma carbon-c-relay yayi nasarar loda ma'auni daga ma'ajin sa a ciki, amma wani lokacin a'a. Sannan akwai rami a cikin ma'auni, wanda muka cika da rsync. Tsarin ya daɗe sosai. Alherin ceto kawai shine wannan ya faru da wuya. Mun kuma ɗauki saitin ma'auni bazuwar lokaci-lokaci tare da kwatanta su da wasu nau'ikan iri ɗaya akan maƙallan maƙwabta na tari. A cikin kusan 5% na lokuta, dabi'u da yawa sun bambanta, waɗanda ba mu da farin ciki sosai.

3. Babban sawun ƙafa

Tun da muka rubuta a cikin Graphite ba kawai abubuwan more rayuwa ba, har ma da ma'aunin kasuwanci (kuma yanzu ma ma'auni daga Kubernetes), sau da yawa muna samun yanayin da ma'aunin ya ƙunshi ƴan ƙima kawai, kuma an ƙirƙiri fayil ɗin .wsp yana la'akari da duk riƙewa. lokaci, kuma yana ɗaukar adadin sarari da aka riga aka keɓe, wanda a gare mu ya kasance ~ 2MB. Matsalar ta kara tsananta ta yadda yawancin fayiloli masu kama da juna suna bayyana a kan lokaci, kuma lokacin gina rahotanni a kansu, karanta abubuwan da ba su da amfani suna ɗaukar lokaci mai yawa da albarkatu.

Ina so in lura nan da nan cewa matsalolin da aka bayyana a sama za a iya magance su ta hanyar amfani da hanyoyi daban-daban da kuma tasiri daban-daban, amma yawancin bayanan da kuka fara karba, suna daɗaɗawa.

Samun duk abubuwan da ke sama (la'akari da baya labarai), kazalika da ci gaba da karuwa a cikin adadin da aka karɓa, sha'awar canja wurin duk matakan zuwa tazarar ajiya na 30 seconds. (har zuwa daƙiƙa 10 idan ya cancanta), mun yanke shawarar gwada Graphite+ClickHouse a matsayin madadin alƙawarin madadin Whisper.

Graphite+ClickHouse. Abubuwan da ake tsammani

Bayan ziyartar da dama gamuwa na mutane daga Yandex, bayan karanta labarai guda biyu akan Habré, Bayan mun shiga cikin takaddun kuma mun sami abubuwan da suka dace don ɗaure ClickHouse a ƙarƙashin Graphite, mun yanke shawarar ɗaukar mataki!

Ina so in karɓi waɗannan abubuwa:

  • rage amfani da tsarin faifai daga 30% zuwa 5%;
  • rage yawan sararin samaniya daga 1TB zuwa 100GB;
  • iya karɓar ma'auni miliyan 100 a cikin minti daya cikin sabar;
  • kwafin bayanai da haƙurin kuskure daga cikin akwatin;
  • kada ku zauna a kan wannan aikin har tsawon shekara guda kuma ku yi canji a cikin lokaci mai dacewa;
  • canza ba tare da bata lokaci ba.

Mai tsananin buri, dama?

Graphite+ClickHouse. Abubuwan da aka gyara

Don karɓar bayanai ta hanyar ƙa'idar Graphite kuma daga baya rikodin shi a ClickHouse, na zaɓa carbon-clickhouse (golan).

Sabuwar sakin ClickHouse, ingantaccen sigar 1.1.54253, an zaɓi shi azaman bayanan adana jerin lokaci. Akwai matsaloli lokacin aiki tare da shi: dutsen kurakurai da aka zuba a cikin gungumen azaba, kuma ba a bayyana cikakken abin da za a yi da su ba. A cikin tattaunawa da Roman Lomonosov (marubuci na carbon-clickhouse, graphite-clickhouse da yawa, da yawa) an zaɓi babba. saki 1.1.54236. Kurakurai sun ɓace - duk abin da ya fara aiki tare da bang.

An zaɓa don karanta bayanai daga ClickHouse graphite-slickhouse (golan). A matsayin API don Graphite - carbonapi (golan). An yi amfani da ClickHouse don tsara kwafi tsakanin teburi mai kula da gidan dabbobi. Don ma'aunin tuƙi, mun bar ƙaunataccenmu carbon-c-relay (DA) (duba labarin da ya gabata).

Graphite+ClickHouse. Tsarin tebur

"graphite" shine ma'aunin bayanai da muka ƙirƙira don sa ido kan tebur.

“graphite.metrics” - tebur tare da injin ReplicatedReplacingMergeTree (wanda aka kwafi Sauya MergeTree). Wannan tebur yana adana sunayen ma'auni da hanyoyin zuwa gare su.

CREATE TABLE graphite.metrics ( Date Date, Level UInt32, Path String, Deleted UInt8, Version UInt32 ) ENGINE = ReplicatedReplacingMergeTree('/clickhouse/tables/replicator/graphite.metrics', ‘r1’, Date, (Level, Path), 8192, Version);

“graphite.data” - tebur tare da injin ReplicatedGraphiteMergeTree (wanda aka kwafi GraphiteMergeTree). Wannan tebur yana adana ƙimar awo.

CREATE TABLE graphite.data ( Path String, Value Float64, Time UInt32, Date Date, Timestamp UInt32 ) ENGINE = ReplicatedGraphiteMergeTree('/clickhouse/tables/replicator/graphite.data', 'r1', Date, (Path, Time), 8192, 'graphite_rollup')

“graphite.date_metrics” tebur ne mai cike da sharadi tare da injin ReplicatedReplacingMergeTree. Wannan tebur yana rubuta sunayen duk ma'auni waɗanda aka ci karo da su yayin rana. An bayyana dalilan halittarsa ​​a cikin sashe "Matsaloli" a karshen wannan labarin.

CREATE MATERIALIZED VIEW graphite.date_metrics ( Path String,  Level UInt32,  Date Date) ENGINE = ReplicatedReplacingMergeTree('/clickhouse/tables/replicator/graphite.date_metrics', 'r1', Date, (Level, Path, Date), 8192) AS SELECT toUInt32(length(splitByChar('.', Path))) AS Level, Date, Path FROM graphite.data

"graphite.data_stat" - tebur cike bisa ga yanayin, tare da injin ReplicatedAggregatingMergeTree (wanda aka kwafi. Haɗin MergeTree). Wannan tebur yana rikodin adadin ma'auni masu shigowa, wanda aka rushe zuwa matakan gida 4.

CREATE MATERIALIZED VIEW graphite.data_stat ( Date Date,  Prefix String,  Timestamp UInt32,  Count AggregateFunction(count)) ENGINE = ReplicatedAggregatingMergeTree('/clickhouse/tables/replicator/graphite.data_stat', 'r1', Date, (Timestamp, Prefix), 8192) AS SELECT toStartOfMonth(now()) AS Date, replaceRegexpOne(Path, '^([^.]+.[^.]+.[^.]+).*$', '1') AS Prefix, toUInt32(toStartOfMinute(toDateTime(Timestamp))) AS Timestamp, countState() AS Count FROM graphite.data  GROUP BY Timestamp, Prefix

Graphite+ClickHouse. Jadawalin hulɗar sashi

Ma'ajiyar awo: yadda muka sauya daga Graphite+whisper zuwa Graphite+ClickHouse

Graphite+ClickHouse. Hijira bayanai

Kamar yadda muke tunawa daga tsammanin wannan aikin, sauyawa zuwa ClickHouse yakamata ya kasance ba tare da raguwa ba; saboda haka, dole ne mu canza ko ta yaya duk tsarin sa ido zuwa sabon ma'ajiyar a sarari yadda zai yiwu ga masu amfani da mu.
Haka muka yi.

  • An ƙara ƙa'ida zuwa carbon-c-relay don aika ƙarin ma'auni na ma'auni zuwa gidan danna carbon-click na ɗaya daga cikin sabar da ke shiga cikin kwafin tebur na ClickHouse.

  • Mun rubuta ƙaramin rubutun a cikin Python, wanda, ta amfani da ɗakin karatu na whisper-jump, karanta duk fayilolin .wsp daga ma'ajiyar mu kuma muka aika wannan bayanan zuwa gidan-clickhouse da aka kwatanta a sama a cikin zaren 24. Adadin ma'aunin awo da aka karɓa a cikin gidan dannawa na carbon ya kai miliyan 125/min, kuma ClickHouse bai ma fasa gumi ba.

  • Mun ƙirƙiri keɓan DataSource a cikin Grafana don gyara ayyukan da aka yi amfani da su a cikin dashboards. Mun gano jerin ayyukan da muka yi amfani da su, amma ba a aiwatar da su a cikin carbonapi ba. Mun ƙara waɗannan ayyuka kuma mun aika PRs ga marubutan carbonapi (godiya ta musamman a gare su).

  • Don canza nauyin karatu a cikin saitunan ma'auni, mun canza maƙallan ƙarshe daga graphite-api (Api interface for Graphite+Whisper) zuwa carbonapi.

Graphite+ClickHouse. sakamako

  • rage yawan amfani da tsarin faifai daga 30% zuwa 1%;

    Ma'ajiyar awo: yadda muka sauya daga Graphite+whisper zuwa Graphite+ClickHouse

  • rage yawan sararin samaniya daga 1 TB zuwa 300 GB;
  • muna da ikon karɓar ma'auni miliyan 125 a cikin minti daya a cikin sabar (kololuwa a lokacin ƙaura);
  • canja wurin duk ma'auni zuwa tazarar ajiya na daƙiƙa talatin da biyu;
  • karbi kwafin bayanai da haƙurin kuskure;
  • canzawa ba tare da bata lokaci ba;
  • Sai da aka shafe kusan sati 7 ana kammala komai.

Graphite+ClickHouse. Matsaloli

A wajenmu, akwai wasu matsaloli. Wannan shi ne abin da muka ci karo da shi bayan sauyin mulki.

  1. ClickHouse ba koyaushe yana sake karanta saiti akan tashi ba; wani lokacin yana buƙatar sake kunnawa. Misali, a yanayin bayanin gungu mai kula da zoo a cikin saitin ClickHouse, ba a yi amfani da shi ba har sai an sake kunna uwar garken dannawa.
  2. Manyan Buƙatun ClickHouse ba su shiga ba, don haka a cikin graphite-clickhouse igiyar haɗin ClickHouse ɗinmu tana kama da haka:
    url = "http://localhost:8123/?max_query_size=268435456&max_ast_elements=1000000"
  3. ClickHouse sau da yawa yana fitar da sabbin juzu'ai na tsayayyen sakewa; suna iya ƙunsar abubuwan ban mamaki: yi hankali.
  4. Kwantena masu ƙarfi da aka ƙirƙira a cikin kubernetes suna aika adadi mai yawa na awo tare da gajeriyar rayuwa bazuwar. Babu maki da yawa don irin waɗannan ma'auni, kuma babu matsaloli tare da sarari. Amma lokacin gina tambayoyin, ClickHouse yana ɗaukar adadi mai yawa na waɗannan ma'auni ɗaya daga teburin 'metrics'. A cikin 90% na lokuta, babu bayanai akan su fiye da taga (24 hours). Amma ana kashe lokaci don neman wannan bayanan a cikin tebur 'bayanai', kuma a ƙarshe ya shiga cikin lokacin ƙarewa. Don magance wannan matsala, mun fara kula da ra'ayi daban tare da bayanai game da ma'auni da aka ci karo da rana. Don haka, lokacin gina rahotanni (zane-zane) don kwantena masu ƙarfi da aka ƙirƙira, muna tambayar waɗannan ma'auni ne kawai waɗanda aka ci karo da su a cikin taga da aka bayar, kuma ba na tsawon lokaci ba, wanda ya hanzarta gina rahotanni a kansu. Don bayani da aka bayyana a sama, na tattara graphite-clickhouse (cokali mai yatsa), wanda ya haɗa da aiwatar da aiki tare da tebur date_metrics.

Graphite+ClickHouse. Tags

Tare da sigar 1.1.0 Graphite ya zama hukuma tags goyon baya. Kuma muna yin tunani sosai game da menene da kuma yadda za mu yi don tallafawa wannan yunƙurin a cikin graphite+clickhouse stack.

Graphite+ClickHouse. Anomaly ganowa

Dangane da abubuwan more rayuwa da aka kwatanta a sama, mun aiwatar da wani samfuri na gano abin da ba a so, kuma yana aiki! Amma ƙarin game da shi a talifi na gaba.

Biyan kuɗi, danna kibiya ta sama kuma ku yi farin ciki!

source: www.habr.com

Add a comment