
Molweni, habr.
Ukuba umntu uxhaphaza inkqubo kwaye wadibana nomba wokusebenza kogcino (IO, indawo yedisk edliwe), ke ithuba lokuba iClickHouse iphoswe njengendawo yokutshintsha kufuneka ithambekele kwenye. Le nkcazo ithetha ukuba ukuphunyezwa komntu wesithathu sele kusetyenziswa njengedaemon efumana i-metrics, umzekelo. okanye .
ClickHouse isombulula iingxaki ezichazwe kakuhle. Ngokomzekelo, emva kokudlulisa i-2TiB yedatha kwi-whisper, ingena kwi-300GiB. Andiyi kuhlala kuthelekiso ngokweenkcukacha; kukho amanqaku amaninzi kwesi sihloko. Ukongeza, kude kube mva nje, ayizizo zonke izinto ezazigqibelele ngokugcinwa kwethu kweClickHouse.
Iingxaki ngendawo esetyenzisiweyo
Ekuqaleni, yonke into kufuneka isebenze kakuhle. Ukulandela , yenza uqwalaselo lwesikimu sogcino lweemetrics (ukuqhubela phambili retention), emva koko yenza itafile ngokwengcebiso ye-backend ekhethiweyo ye-graphite-web: + okanye , kuxhomekeke ekubeni yeyiphi istaki esisetyenzisiweyo. Kwaye ... ixesha ibhombu iyahamba.
Ukuze uqonde ukuba yeyiphi, kufuneka wazi ukuba ufaka njani umsebenzi kunye nendlela yobomi eyongezelelweyo yedatha kwiitafile zeenjini zosapho *MergeTree ClickHouse (iitshati ezithatyathwe kwi Alexey Zatelepin):
- Ifakiwe
Π±Π»ΠΎΠΊidatha. Kwimeko yethu, yayiyimilinganiselo efikileyo.

- Ibhloko nganye enjalo ihlelwa ngokwesitshixo phambi kokuba ibhalwe kwidisk.
ORDER BYicacisiwe xa kuyilwa itheyibhile. - Emva kokuhlelwa,
ΠΊΡΡΠΎΠΊ(part) idatha ibhalwe kwidisk.

- Umncedisi ubeka iliso ngasemva ukuze kungabikho maninzi amaqhekeza anjalo, kwaye iqalise ngasemva
ΡΠ»ΠΈΡΠ½ΠΈΡ(merge, ukudibanisa emva koku).


- Umncedisi uyayeka ukusebenza udibaniso lodwa nje ukuba idatha iyeka ngokusebenzayo ukungena kwi
ΠΏΠ°ΡΡΠΈΡΠΈΡ(partition), kodwa ungaqala inkqubo ngesandla ngomyaleloOPTIMIZE. - Ukuba kukho iqhekeza elinye kuphela eliseleyo kwisahlulelo, ngoko awuyi kukwazi ukuqhuba ukudibanisa usebenzisa umyalelo oqhelekileyo omele uwusebenzise
OPTIMIZE ... FINAL
Ke, iimethrikhi zokuqala zifika. Kwaye bathatha indawo ethile. Iziganeko ezilandelayo zinokwahluka noko ngokuxhomekeke kwizinto ezininzi:
- Iqhosha lokwahlula linokuba lincinci kakhulu (ngosuku) okanye libe likhulu kakhulu (iinyanga ezininzi).
- Ubumbeko logcino lusenokungena kwimithamo emininzi ebalulekileyo yodityaniso lwedatha ngaphakathi kwesahlulelo esisebenzayo (apho iimetrics zirekhodwa), okanye mhlawumbi akunjalo.
- Ukuba kukho idatha eninzi, ngoko ke iinqununu zokuqala, ezithi ngenxa yokudibana ngasemva zisenokuba zikhulu (ukuba ukhetha isitshixo sokwahlula esingasisona silungileyo), azizuzidibanisa neziqendwana ezincinci ezintsha.
Kwaye ihlala iphela ngokufanayo. Indawo ehlala iimetrics kwiClickHouse yonyuka kuphela ukuba:
- musa ukufaka isicelo
OPTIMIZE ... FINALngesandla okanye - musa ukufaka idatha kuzo zonke izahlulelo ngokuqhubekayo, ukuze kungekudala okanye kamva udibaniso olungasemva luqale
Indlela yesibini ibonakala ilula kakhulu ukuyisebenzisa kwaye, ngoko ke, ayichanekanga kwaye yazanywa kuqala.
Ndibhale iskripthi se-python esilula esithumele iimethrikhi ze-dummy yonke imihla kwiminyaka eyi-4 edlulileyo kwaye iqhube i-cron ngeyure nganye.
Ukusukela ekubeni yonke imisebenzi yeClickHouse DBMS isekwe kwinto yokuba le nkqubo kungekudala okanye kamva yenza wonke umsebenzi ongasemva, kodwa akwaziwa ukuba nini, andikwazanga ukulinda ixesha apho amaqhekeza amakhulu amadala aqala ukudibanisa kunye. amatsha amancinci. Kuye kwacaca ukuba kufuneka sijonge indlela yokwenza usetyenziso olunyanzelweyo.

Ulwazi kwiitafile zeClickHouse inkqubo
Makhe sijonge kwisakhiwo setafile . Olu lwazi olubanzi malunga neqhekeza ngalinye lazo zonke iitafile kumncedisi we ClickHouse. Iqulethe, phakathi kwezinye izinto, le miqolo ilandelayo:
- igama le-db (
database); - igama letafile (
table); - igama lesahlulelo kunye nesazisi (
partition&partition_id); - xa iqhekeza ladalwa (
modification_time); - Ubuncinci kunye nowona mhla kwiqhekeza (ukwahlulwa kwenziwa ngemini) (
min_date&max_date);
Kukwakho netafile , enemihlaba elandelayo enomdla:
- igama le-db (
Tables.database); - igama letafile (
Tables.table); - Ubudala bemetric xa udibaniso olulandelayo kufuneka lusetyenziswe (
age);
Ngoko:
- Sinetafile yee-chunks kunye netheyibhile yemithetho yokudibanisa.
- Sidibanisa ukuhlangana kwabo kwaye sifumane zonke iitafile *GraphiteMergeTree.
- Sijonge zonke izahlulo apho:
- ngaphezu kwesinye isiqwenga
- okanye ixesha lifikile lokusebenzisa umgaqo olandelayo wokudibanisa, kunye
modification_timemdala kunalomzuzu.
Ukuphunyezwa
Esi sicelo
SELECT
concat(p.database, '.', p.table) AS table,
p.partition_id AS partition_id,
p.partition AS partition,
-- Π‘Π°ΠΌΠΎΠ΅ "ΡΡΠ°ΡΠΎΠ΅" ΠΏΡΠ°Π²ΠΈΠ»ΠΎ, ΠΊΠΎΡΠΎΡΠΎΠ΅ ΠΌΠΎΠΆΠ΅Ρ Π±ΡΡΡ ΠΏΡΠΈΠΌΠ΅Π½Π΅Π½ΠΎ Π΄Π»Ρ
-- ΠΏΠ°ΡΡΠΈΡΠΈΠΈ, Π½ΠΎ Π½Π΅ Π² Π±ΡΠ΄ΡΡΠ΅ΠΌ, ΡΠΌ (*)
max(g.age) AS age,
-- ΠΠΎΠ»ΠΈΡΠ΅ΡΡΠ²ΠΎ ΠΊΡΡΠΊΠΎΠ² Π² ΠΏΠ°ΡΡΠΈΡΠΈΠΈ
countDistinct(p.name) AS parts,
-- ΠΠ° ΡΠ°ΠΌΡΡ ΡΡΠ°ΡΡΡΡ ΠΌΠ΅ΡΡΠΈΠΊΡ Π² ΠΏΠ°ΡΡΠΈΡΠΈΠΈ ΠΏΡΠΈΠ½ΠΈΠΌΠ°Π΅ΡΡΡ 00:00:00 ΡΠ»Π΅Π΄ΡΡΡΠ΅Π³ΠΎ Π΄Π½Ρ
toDateTime(max(p.max_date + 1)) AS max_time,
-- ΠΠΎΠ³Π΄Π° ΠΏΠ°ΡΡΠΈΡΠΈΡ Π΄ΠΎΠ»ΠΆΠ½Π° Π±ΡΡΡ ΠΎΠΏΡΠΈΠΌΠΈΠ·ΠΈΡΠΎΠ²Π°Π½Π½Π°
max_time + age AS rollup_time,
-- ΠΠΎΠ³Π΄Π° ΡΠ°ΠΌΡΠΉ ΡΡΠ°ΡΡΠΉ ΠΊΡΡΠΎΠΊ Π² ΠΏΠ°ΡΡΠΈΡΠΈΠΈ Π±ΡΠ» ΠΎΠ±Π½ΠΎΠ²Π»ΡΠ½
min(p.modification_time) AS modified_at
FROM system.parts AS p
INNER JOIN
(
-- ΠΡΠ΅ ΠΏΡΠ°Π²ΠΈΠ»Π° Π΄Π»Ρ Π²ΡΠ΅Ρ
ΡΠ°Π±Π»ΠΈΡ *GraphiteMergeTree
SELECT
Tables.database AS database,
Tables.table AS table,
age
FROM system.graphite_retentions
ARRAY JOIN Tables
GROUP BY
database,
table,
age
) AS g ON
(p.table = g.table)
AND (p.database = g.database)
WHERE
-- Π’ΠΎΠ»ΡΠΊΠΎ Π°ΠΊΡΠΈΠ²Π½ΡΠ΅ ΠΊΡΡΠΊΠΈ
p.active
-- (*) Π ΡΠΎΠ»ΡΠΊΠΎ ΡΡΡΠΎΠΊΠΈ, Π³Π΄Π΅ ΠΏΡΠ°Π²ΠΈΠ»Π° Π°Π³Π³ΡΠ΅Π³Π°ΡΠΈΠΈ ΡΠΆΠ΅ Π΄ΠΎΠ»ΠΆΠ½Ρ Π±ΡΡΡ ΠΏΡΠΈΠΌΠ΅Π½Π΅Π½Ρ
AND ((toDateTime(p.max_date + 1) + g.age) < now())
GROUP BY
table,
partition
HAVING
-- Π’ΠΎΠ»ΡΠΊΠΎ ΠΏΠ°ΡΡΠΈΡΠΈΠΈ, ΠΊΠΎΡΠΎΡΡΠ΅ ΠΌΠ»Π°Π΄ΡΠ΅ ΠΌΠΎΠΌΠ΅Π½ΡΠ° ΠΎΠΏΡΠΈΠΌΠΈΠ·Π°ΡΠΈΠΈ
(modified_at < rollup_time)
-- ΠΠ»ΠΈ Ρ Π½Π΅ΡΠΊΠΎΠ»ΡΠΊΠΈΠΌΠΈ ΠΊΡΡΠΊΠ°ΠΌΠΈ
OR (parts > 1)
ORDER BY
table ASC,
partition ASC,
age ASCibuyisela ulwahlulo lwetafile nganye *GraphiteMergeTree enodibaniso olunokukhulula isithuba sedisk. Ekuphela kwento esele yenziwe kukuhamba kubo bonke ngesicelo OPTIMIZE ... FINAL. Ukuphunyezwa kokugqibela kwakhona kuthathela ingqalelo into yokuba akukho mfuneko yokubamba izahlulelo ngokurekhoda okusebenzayo.
Yile nto kanye eyenziwa yiprojekthi . Oogxa bangaphambili baseYandex.Market bazama kwimveliso, umphumo womsebenzi unokubonwa ngezantsi.

Ukuba uqhuba inkqubo kumncedisi nge ClickHouse, iya kuqalisa ukusebenza kwimo yedaemon. Kanye ngeyure isicelo siya kwenziwa, kujongwa ukuba ingaba izahlulelo ezitsha ezindala kuneentsuku ezintathu ziye zavela na ezinokwandiswa.
Izicwangciso zethu zangoku kukubonelela ubuncinane iipakethe zedeb, kwaye ukuba kuyenzeka kwakhona rpm.
Endaweni yesiphelo
Kwiinyanga ezidlulileyo ze-9+ bendingaphakathi kwinkampani yam uchithe ixesha elininzi encwina ekudibaneni kweClickHouse kunye negraphite-web. Yayingamava alungileyo, akhokelela kutshintsho olukhawulezayo ukusuka kwi-whisper ukuya kwiClickHouse njengendawo yokugcina iimetriki. Ndiyathemba ukuba eli nqaku liyinto yokuqala kuthotho malunga nokuba loluphi uphuculo esilwenzile kwiindawo ezahlukeneyo zesitaki, kwaye kuya kwenziwa ntoni kwixesha elizayo.
Iilitha ezininzi zebhiya kunye neentsuku zolawulo zichithwe ekuphuhliseni isicelo, kunye , endifuna ukubonisa umbulelo wam kuye. Kwaye kwakhona ukuphonononga eli nqaku.
umthombo: www.habr.com




