Kuenda kuClickHouse: Makore matatu gare gare

Makore matatu apfuura Viktor Tarnavsky naAlexey Milovidov kubva kuYandex pachikuva YakakwiraLoad ++ kuudzwa, ClickHouse yakanaka sei, uye kuti haina kunonoka sei. Uye padanho rinotevera paivepo Alexander Zaitsev с report nezve kutamira DzvanyaImba kubva kune imwe yekuongorora DBMS uye nemhedziso iyo DzvanyaImba, hongu, yakanaka, asi kwete yakanyanya kunaka. Apo muna 2016 kambani HupenyuStreet, uko Alexander akazoshanda, akanga achishandura multi-petabyte analytical system kuti DzvanyaImba, yakanga iri nzira inofadza ye "yellow brick" yakazara nengozi dzisingazivikanwe - DzvanyaImba kumashure kwaiita senzvimbo yezvimbambaira.

Makore matatu gare gare DzvanyaImba yakava nani zvikuru - panguva ino Alexander akatanga kambani Altinity, iyo isingangobatsiri vanhu kuenda DzvanyaImba akawanda emapurojekiti, asi zvakare inovandudza chigadzirwa pachayo pamwe chete nevanoshanda navo kubva kuYandex. Zvino DzvanyaImba kuchiri kusiri kungofamba-famba, asi kusisiri nzvimbo ine zvimbambaira.

Alexander anga achishanda nemasisitimu akagoverwa kubvira 2003, achigadzira mapurojekiti makuru pa MySQL, Oracle ΠΈ Vertica. Pekupedzisira YakakwiraLoad ++ 2019 Alexander, mumwe wemapiyona ekushandisa DzvanyaImba, akaudza kuti iyi DBMS chii ikozvino. Tichadzidza pamusoro pezvinhu zvikuru DzvanyaImba: yakasiyana sei nemamwe masisitimu uye mune zvipi zviitiko zvinonyanya kushanda kuishandisa. Tichishandisa mienzaniso, isu tichatarisa ichangoburwa uye chirongwa-chakaedzwa maitiro ekuvaka masisitimu anoenderana DzvanyaImba.


Retrospective: zvakaitika makore matatu apfuura

Makore matatu apfuura takatamisa kambani HupenyuStreet pamusoro DzvanyaImba kubva kune imwe dhatabhesi yekuongorora, uye ad network analytics kutama kwaiita seizvi:

  • June 2016. In Open Source akazviratidza DzvanyaImba uye chirongwa chedu chakatanga;
  • Nyamavhuvhu. Umbowo hweChirevo: yakakura kushambadzira network, zvivakwa uye 200-300 terabytes yedata;
  • Gumiguru. Data yekutanga yekugadzira;
  • Zvita. Iyo yakazara chigadzirwa mutoro ndeye 10-50 bhiriyoni zviitiko pazuva.
  • Chikumi 2017. Kutama kwakabudirira kwevashandisi kuenda DzvanyaImba, 2,5 petabytes yedata pane sumbu remaseva makumi matanhatu.

Panguva yekutama, pakava nekunzwisisa kwakawedzera kuti DzvanyaImba igadziriro yakanaka iyo inofadza kushanda nayo, asi iyi ipurojekiti yemukati yeYandex. Nokudaro, pane nuances: Yandex inotanga kutarisana nevatengi vayo vemukati uye chete ipapo nenharaunda uye zvido zvevashandisi vekunze, uye ClickHouse ipapo haina kusvika kune bhizinesi nhanho munzvimbo dzakawanda dzinoshanda. Ndosaka takavamba Altinity munaKurume 2017 kugadzira DzvanyaImba kunyange nekukurumidza uye nyore nyore kwete yeYandex chete, asiwo kune vamwe vashandisi. Uye zvino isu:

  • Isu tinodzidzisa uye tinobatsira kuvaka zvigadziriso zvichibva pane DzvanyaImba kuitira kuti vatengi vasapinda mumatambudziko, uye kuitira kuti mhinduro pakupedzisira ishande;
  • Isu tinopa 24/7 rutsigiro DzvanyaImba- kuiswa;
  • Isu tinogadzira edu ecosystem mapurojekiti;
  • Tinoshingaira kuzvipira kwatiri DzvanyaImba, kupindura zvikumbiro zvevashandisi vanoda kuona zvimwe zvinhu.

Uye zvechokwadi, isu tinobatsira nekutamira kune DzvanyaImba с MySQL, Vertica, pangataura, Greenplum, Redshift nemamwe masisitimu. Takabatikana mukufamba kwakasiyana-siyana, uye ose akabudirira.

Kuenda kuClickHouse: Makore matatu gare gare

Sei kutamira DzvanyaImba

Hainonoke! Ichi ndicho chikonzero chikuru. DzvanyaImba -Inokurumidza dhatabhesi kune akasiyana mamiriro:

Kuenda kuClickHouse: Makore matatu gare gare

Random quotes kubva kuvanhu vanga vachishanda nevanhu kwenguva yakareba DzvanyaImba.

Scalability. Pane imwe dhatabhesi iwe unogona kuwana kuita kwakanaka pane imwe chidimbu chehardware, asi DzvanyaImba iwe unogona kuyera kwete chete wakatwasuka, asiwo wakachinjika, nekungowedzera maseva. Zvese hazvishande mushe sezvatinoda, asi zvinoshanda. Iwe unogona kuwedzera iyo system sezvo bhizinesi rako richikura. Izvo zvakakosha kuti isu tirege kuganhurwa nemhinduro ikozvino uye pane nguva dzose mukana wekusimudzira.

Portability. Hapana kubatanidzwa kuchinhu chimwe. Somuenzaniso, ne Amazon RedShift Zvakaoma kutamira kumwe. A DzvanyaImba unogona kuiisa palaptop yako, sevha, kuendesa kune gore, enda ku Kubernetes - hapana zvinorambidzwa pakushanda kwezvivakwa. Izvi zvakanakira munhu wese, uye iyi mukana wakakura uyo mamwe akawanda akafanana dhatabhesi asingagoni kuzvirumbidza nawo.

Flexibility. DzvanyaImba hairegi pane chimwe chinhu, semuenzaniso, Yandex.Metrica, asi inovandudza uye inoshandiswa mune zvakawanda uye zvakasiyana mapurojekiti uye maindasitiri. Inogona kuwedzerwa nekuwedzera masimba matsva ekugadzirisa matambudziko matsva. Semuenzaniso, zvinotendwa kuti kuchengeta matanda mudhatabhesi itsika dzakaipa, saka vakauya nazvo Elasticsearch. Asi nekuda kwekuchinjika DzvanyaImba, iwe unogona zvakare kuchengeta matanda mairi, uye kazhinji izvi zvakatonyanya nani pane mukati Elasticsearch - mukati DzvanyaImba izvi zvinoda ka10 pasi pesimbi.

Free Open Source. Haufanire kubhadhara chero chinhu. Iko hakuna chikonzero chekutaurirana mvumo yekuisa iyo system pane yako laptop kana server. Hapana mari yakavanzika. Panguva imwecheteyo, hapana imwe Open Source database tekinoroji inogona kukwikwidza mukumhanya nayo DzvanyaImba. MySQL, MariaDB, Greenplum - vose vanononoka.

Community, drive uye mafaro... Iva DzvanyaImba nharaunda yakanakisa: kusangana, chats uye Alexey Milovidov, anotibhadharisa tese nesimba rake uye tariro.

Kuenda kuClickHouse

Kuenda ku DzvanyaImba nokuda kwechimwe chikonzero, iwe unongoda zvinhu zvitatu chete:

  • Nzwisisa miganhu DzvanyaImba uye chii chisina kukodzera.
  • Tora mukana tekinoroji uye masimba ayo makuru.
  • Kuedza. Kutonzwisisa kuti inoshanda sei DzvanyaImba, hazviiti nguva dzose kufanotaura kuti ichange ichikurumidza rini, ichange ichinonoka, pazvinenge zviri nani, uye pazvichanyanya kuipa. Saka edza.

Kufamba dambudziko

Pane imwe chete "asi": kana iwe ukatamira DzvanyaImba kubva pane chimwe chinhu, saka kazhinji chimwe chinhu chinotadza. Isu takajaira kune mamwe maitiro uye zvinhu zvinoshanda mune yedu yatinofarira dhatabhesi. Semuenzaniso, chero munhu anoshanda naye SQL-databases inofunga zvinotevera seti yemabasa anosungirwa:

  • kutengeserana;
  • zvipingamupinyi;
  • consistency;
  • indices;
  • UPDATE/DELETE;
  • NULLs;
  • milliseconds;
  • otomatiki mhando inokandira;
  • majoini akawanda;
  • zvidimbu zvisina mwero;
  • cluster manejimendi zvishandiso.

Kutora basa kunosungirwa, asi makore matatu apfuura mukati DzvanyaImba Hapana rimwe remabasa aya raivepo! Iye zvino isingasviki hafu yezvisati zvaitwa zvinosara: kutengeserana, zvipingaidzo, Consistency, milliseconds uye mhando yekukanda.

Uye chinhu chikuru ndechekuti in DzvanyaImba mamwe maitiro uye maitiro haashande kana kushanda zvakasiyana nezvatakajaira. Zvese zvinoonekwa mukati DzvanyaImba, zvinoenderana ne "ClickHouse nzira", i.e. mabasa akasiyana kubva kune mamwe dhatabhesi. Semuyenzaniso:

  • Indexes haina kusarudzwa, asi yakasvetuka.
  • UPDATE/DELETE kwete synchronous, asi asynchronous.
  • Kune akawanda anojoinha, asi hapana query planner. Maitirwo azvinoitwa kazhinji haana kunyatso kujeka kune vanhu vanobva pasirese yenyika.

ClickHouse Zvinyorwa

Muna 1960, nyanzvi yemasvomhu yekuAmerica kubva kuHungary Wigner EP akanyora nyaya "Kushanda kusinganzwisisike kwemasvomhu musainzi yechisikigo” (β€œThe Incomprehensible Effectiveness of Mathematics in the Natural Sciences”) kuti nyika yakatipoteredza nokuda kwechimwe chikonzero inosanorondedzerwa nemitemo yemasvomhu. Mathematics isayenzi isinganzwisisike, uye mitemo yepanyama inoratidzwa muchimiro chesvomhu haisi duku, uye Wigner EP akasimbisa kuti izvi zvinoshamisa.

Sekuona kwangu, DzvanyaImba - zvakafanana zvinoshamisa. Kudzokorora Wigner, tinogona kutaura izvi: kushanda kusingafungidziki kunoshamisa DzvanyaImba mune zvakasiyana siyana zvekuongorora maapplication!

Kuenda kuClickHouse: Makore matatu gare gare

Somuenzaniso, ngatitore Real-Nguva Data Warehouse, iyo data inorongedzerwa nguva dzese. Tinoda kugamuchira zvikumbiro kubva kwairi nekunonoka kwechipiri. Ndapota - shandisa DzvanyaImba, nekuti iyi ndiyo mamiriro ayakagadzirirwa. DzvanyaImba Aya ndiwo mashandisirwo aanoitwa kwete pawebhu chete, asiwo mukushambadzira uye ongororo yemari, AdTech, uye mu Kuona hutsotsin. IN Real-time Data Warehouse chirongwa chakaoma chakarongeka se "nyeredzi" kana "snowflake" inoshandiswa, matafura akawanda ane ONA (dzimwe nguva akawanda), uye iyo data inowanzochengetwa uye kuchinjwa mune mamwe masisitimu.

Ngatitorei chimwe chiitiko - Nguva Series: kutarisa kwemidziyo, network, nhamba dzekushandisa, Internet yezvinhu. Pano tinosangana nezviitiko zviri nyore zvakarongwa nenguva. DzvanyaImba yakanga isati yagadzirirwa izvi, asi yakaratidza kushanda zvakanaka, ndicho chikonzero makambani makuru anoshandisa DzvanyaImba senzvimbo yekutarisisa ruzivo. Kuongorora kana yakakodzera DzvanyaImba yenguva-yakatevedzana, takaita bhenji zvichienderana nemaitiro uye mhedzisiro InfluxDB ΠΈ TimescaleDB - specialized nguva-yakatevedzana databases. Zvakabuda, icho DzvanyaImba, kunyangwe pasina optimization yemabasa akadaro, anohwina pane imwe nyika ndima:

Kuenda kuClickHouse: Makore matatu gare gare

Π’ nguva-yakatevedzana Kazhinji tafura yakamanikana inoshandiswa - makoramu madiki akati wandei. Yakawanda data inogona kubva mukutarisisa - mamirioni emarekodhi pasekondi - uye anowanzo kuuya mudiki kuputika (-Chaicho nguva kuyerera). Naizvozvo, chinyorwa chekuisa chakasiyana chinodiwa, uye mibvunzo pachayo ine yavo chaiyo.

Log Management. Kuunganidza matanda mu database kazhinji kwakaipa, asi DzvanyaImba izvi zvinogona kuitwa nemamwe mazwi sezvatsanangurwa pamusoro apa. Makambani akawanda anoshandisa DzvanyaImba chaizvo nokuda kwechinangwa ichi. Muchiitiko ichi, isu tinoshandisa furati yakafara tafura kwatinochengeta matanda ese (semuenzaniso, mune fomu JSON), kana kucheka kuita zvidimbu. Data inowanzoiswa mumabheji makuru (mafaira), uye isu tinotsvaga neimwe ndima.

Kune rimwe nerimwe remabasa aya, madhatabhesi ehunyanzvi anowanzo shandiswa. DzvanyaImba munhu anogona kuzviita zvese uye zvakanaka zvekuti anoakunda. Ngatitarisei zvino nguva-yakatevedzana mamiriro ezvinhu, uye nzira ye "kubika" nemazvo DzvanyaImba nokuda kwechiitiko ichi.

Nguva-Series

Parizvino iyi ndiyo huru mamiriro ayo DzvanyaImba yakafunga nezvechigadzikiso chenguva dzose. Nguva-yakatevedzana iboka rezviitiko zvakarongwa munguva, zvinomiririra shanduko mune imwe nzira nekufamba kwenguva. Semuenzaniso, iyi inogona kunge iri chiyero chemoyo pazuva kana huwandu hwemaitiro muhurongwa. Zvese zvinopa nguva zvikwekwe zvine mamwe mativi ndizvo nguva-yakatevedzana:

Kuenda kuClickHouse: Makore matatu gare gare

Mazhinji emhando idzi dzezviitiko zvinobva mukutarisisa. Izvi zvinogona kunge zvisiri kungotarisa webhu chete, asiwo zvishandiso zvechokwadi: mota, maindasitiri masisitimu, IoT, mafekitari kana matekisi asina kutakura, mu trunk iyo Yandex iri kutoisa DzvanyaImba-server.

Semuenzaniso, kune makambani anounganidza data kubva kune zvikepe. Masekonzi mashoma ega ega, masensa ari mungarava yemidziyo anotumira mazana ezviyero zvakasiyana. Mainjiniya vanovadzidza, vanovaka modhi uye vanoedza kunzwisisa kuti chikepe chinoshandiswa sei nemazvo, nekuti ngarava yemidziyo haifanire kunge isina basa kwesekondi. Chero nguva yekudzikisa kurasikirwa kwemari, saka zvakakosha kufanotaura nzira kuitira kuti kumira zvishoma.

Mazuva ano kune kukura kweakasarudzika dhatabhesi anoyera nguva-yakatevedzana. Panzvimbo DB-Injini Iwo akasiyana dhatabhesi akaiswa neimwe nzira, uye unogona kuaona nemhando:

Kuenda kuClickHouse: Makore matatu gare gare

Mhando inokurumidza kukura ndeye time seriess. Grafu databases iri kukura zvakare, asi time seriesyave ichikura nekukurumidza mumakore mashoma apfuura. Vamiriri vemhando dzemhuri iyi yedatabase vari InfluxDB, Prometheus, KDB, TimescaleDB (yakavakwa pamusoro PostgreSQL), mhinduro kubva Amazon. DzvanyaImba inogona kushandiswa pano zvakare, uye inoshandiswa. Rega ndikupe mienzaniso mishoma yeruzhinji.

Mumwe wemapiyona ikambani CloudFlare (CDN-mupi). Vanoongorora zvavo CDN kuburikidza DzvanyaImba (DNS- zvikumbiro, HTTP-queries) ine mutoro muhombe - 6 miriyoni zviitiko pasekondi. Zvose zvinopinda Kafka, anoenda ku DzvanyaImba, iyo inopa kukwanisa kuona dashboards yezviitiko muhurongwa munguva chaiyo.

Comcast - mumwe wevatungamiriri mukufambiswa kwemashoko muU.SA: Internet, digital terevhizheni, telephony. Vakagadzira yakafanana control system CDN mukati Open Source chirongwa Apache Traffic Control kushanda nedata rako rakakura. DzvanyaImba rinoshandiswa semushure mekuongorora.

percona yakavakwa mukati DzvanyaImba mukati mako PMMkuchengetedza kutarisa kwakasiyana-siyana MySQL.

Zvakananga Zvinodiwa

Nguva-yakatevedzana dhatabhesi ine yavo chaiyo yavanoda.

  • Kukurumidza kuisa kubva kune vakawanda vamiririri. Isu tinofanirwa kuisa data kubva kune akawanda hova nekukurumidza. DzvanyaImba Inoita izvi nemazvo nekuti zvese zvayaiisa hazvisi kuvharisa. Chero isa ifaira idzva padhisiki, uye zvidiki zvinoiswa zvinogona kuvharwa neimwe nzira kana imwe. IN DzvanyaImba Zviri nani kuisa data mumabheji makuru pane mutsara mumwechete panguva.
  • Flexible scheme. The nguva-yakatevedzana isu kazhinji hatizivi chimiro che data zvachose. Izvo zvinogoneka kuvaka yekutarisa sisitimu yeimwe application, asi zvino zvinonetsa kuishandisa kune imwe application. Izvi zvinoda hurongwa hunoshanduka. DzvanyaImba, inokutendera kuti uite izvi, kunyangwe iri chigadziko chakanyorwa zvakasimba.
  • Kuchengetedza kwakanaka uye kukanganwa data. Kazhinji in nguva-yakatevedzana huwandu hukuru hwe data, saka inofanirwa kuchengetwa zvine hungwaru sezvinobvira. Somuenzaniso, pa InfluxDB kudzvanya kwakanaka ndicho chimiro chayo chikuru. Asi kunze kwekuchengetedza, iwe unofanirwawo kukwanisa "kukanganwa" data yekare uye kuita imwe mhando ye downsampling - kuverenga otomatiki kweakaunganidzwa.
  • Inokurumidza kubvunza pa data rakaunganidzwa. Dzimwe nguva zvinonakidza kutarisa maminetsi ekupedzisira e5 nekurongeka kwemamilliseconds, asi pamwedzi wedata miniti kana yechipiri granularity ingasadikanwa - huwandu hwehuwandu hwakakwana. Tsigiro yerudzi urwu inodiwa, kana zvisina kudaro chikumbiro chemwedzi mitatu chinotora nguva yakareba kwazvo kupedzisa kunyange mukati DzvanyaImba.
  • Zvikumbiro zvakaita se "poindi yekupedzisira, saΒ». Izvi zvakajairika kune nguva-yakatevedzana mibvunzo: tarisa kuyerwa kwekupedzisira kana mamiriro ehurongwa panguva t. Iyi haisi mibvunzo inonakidza kune dhatabhesi, asi iwe unofanirwawo kukwanisa kuzviita.
  • "Gluing" nguva dzakatevedzana. Nguva-yakatevedzana inguva yakatevedzana. Kana paine mbiri dzenguva dzakateerana, dzinowanzoda kubatanidzwa uye kuwirirana. Izvo hazvina kunaka kuita izvi pane ese dhatabhesi, kunyanya ine isina kurongwa nguva yakatevedzana: heano mamwe mapoinzi enguva, kune mamwe. Iwe unogona kufunga nezvepakati, asi kamwe kamwe pachava negomba ipapo, saka hazvina kujeka.

Ngationei kuti izvi zvinodiwa zvinozadzikiswa sei mukati DzvanyaImba.

The scheme

Π’ DzvanyaImba chirongwa che nguva-yakatevedzana inogona kuitwa nenzira dzakasiyana, zvichienderana nehuwandu hwekugara kwe data. Zvinogoneka kuvaka sisitimu pane yenguva dzose data kana isu tichiziva ese metrics pamberi. Somuenzaniso, ndakaita izvi CloudFlare pamwe nekutarisa CDN is a well optimized system. Iwe unogona kuvaka imwe yakajairika sisitimu inotarisisa iyo yese zvivakwa uye akasiyana masevhisi. Panyaya yedata risiri renguva, isu hatizive pachine nguva zvatiri kutarisa - uye iyi ingangove ndiyo yakajairika nyaya.

Regular data. Columns. Chirongwa chacho chiri nyore - makoramu ane mhando dzinodiwa:

CREATE TABLE cpu (
  created_date Date DEFAULT today(),  
  created_at DateTime DEFAULT now(),  
  time String,  
  tags_id UInt32,  /* join to dim_tag */
  usage_user Float64,  
  usage_system Float64,  
  usage_idle Float64,  
  usage_nice Float64,  
  usage_iowait Float64,  
  usage_irq Float64,  
  usage_softirq Float64,  
  usage_steal Float64,  
  usage_guest Float64,  
  usage_guest_nice Float64
) ENGINE = MergeTree(created_date, (tags_id, created_at), 8192);

Iyi itafura yenguva dzose inotarisisa imwe mhando yehurongwa hwekurodha chiitiko (mushandisi, maitiro, havana, zvakanaka) Zviri nyore uye zviri nyore, asi zvisingachinjiki. Kana isu tichida chirongwa chinoshanduka, saka tinogona kushandisa arrays.

Irregular data. Arrays:

CREATE TABLE cpu_alc (
  created_date Date,  
  created_at DateTime,  
  time String,  
  tags_id UInt32,  
  metrics Nested(
    name LowCardinality(String),  
    value Float64
  )
) ENGINE = MergeTree(created_date, (tags_id, created_at), 8192);

SELECT max(metrics.value[indexOf(metrics.name,'usage_user')]) FROM ...

mamiriro Yakagadzirwa ane mitsara miviri: metrics.name ΠΈ metrics.value. Pano iwe unogona kuchengeta data rekutarisa rakadai sehurongwa hwemazita uye nhevedzano yezviyero zvechiitiko chega chega. Kuti uwedzere optimization, pachinzvimbo chechimwe chimiro chakadaro, unogona kugadzira akati wandei. Somuenzaniso, imwe ye Float-value, imwe - ye Int-zvinoreva nokuti Int Ndinoda kuchengetedza zvakanyanya.

Asi chimiro chakadaro chinonyanya kuoma kuwana. Iwe uchafanirwa kushandisa chivakwa chakakosha, uchishandisa akakosha mabasa kuburitsa hunhu hwekutanga index uyezve array:

SELECT max(metrics.value[indexOf(metrics.name,'usage_user')]) FROM ...

Asi ichiri kushanda nekukurumidza. Imwe nzira yekuchengetedza data isina kujairika ndeye mutsara.

Irregular data. Strings. Mune iyi nzira yechinyakare, isina arrays, mazita uye kukosha zvinochengetwa panguva imwe chete. Kana 5 zviyero zvichibva kune imwe mudziyo kamwechete, 000 mitsara inogadzirwa mudhatabhesi:

CREATE TABLE cpu_rlc (
  created_date Date,  
  created_at DateTime,  
  time String,  
  tags_id UInt32,  
  metric_name LowCardinality(String),  
  metric_value Float64
) ENGINE = MergeTree(created_date, (metric_name, tags_id, created_at), 8192);


SELECT 
    maxIf(metric_value, metric_name = 'usage_user'),
    ... 
FROM cpu_r
WHERE metric_name IN ('usage_user', ...)

DzvanyaImba inobata neizvi - ine zvakakosha zvekuwedzera DzvanyaImba SQL. Somuenzaniso, maxIf - basa rakakosha rinoverengera huwandu nemetric kana mamwe mamiriro asangana. Iwe unogona kunyora akati wandei akadaro muchikumbiro chimwe uye wobva waverenga kukosha kwemametric akati wandei.

Ngatienzanise nzira nhatu:

Kuenda kuClickHouse: Makore matatu gare gare

mashoko

Pano ini ndawedzera "Disk Data Size" kune imwe test data set. Panyaya yemakoramu, isu tine diki data saizi: yakanyanya kudzvanya, yakanyanya kubvunza kumhanya, asi isu tinobhadhara nekurekodha zvese kamwechete.

Mumamiriro ezvinhu arrays, zvinhu zvose zvishoma zvakanyanya kuipa. Iyo data ichiri yakanyatso kudzvanywa uye isina kurongeka pateni inogona kuchengetwa. Asi DzvanyaImba - iyo columnar dhatabhesi, uye patinotanga kuchengetedza zvese muhurongwa, inoshanduka kuita mutsara mumwe, uye isu tinobhadhara kuchinjika nekubudirira. Kune chero oparesheni, iwe uchafanirwa kuverenga iyo array yese mundangariro, wobva watsvaga chinhu chaunoda mairi - uye kana iyo array ichikura, ipapo kumhanya kunodzikisira.

Mune imwe yemakambani anoshandisa nzira iyi (semuenzaniso, ber), mitsara inochekwa kuita zvidimbu zve128 element. Data kubva kuzviuru zvakati kuti metrics ine vhoriyamu ye200 TB yedata/zuva inochengetwa kwete muhurongwa humwe, asi mugumi kana makumi matatu arrays ane yakakosha kuchengetedza logic.

Nzira iri nyore ndeye tambo. Asi iyo data haina kudzvanywa, saizi yetafura yakakura, uye kunyangwe mibvunzo yakavakirwa pamametric akati wandei, ClickHouse haishande zvakaringana.

Hybrid scheme

Ngatifungei kuti takasarudza array circuit. Asi kana isu tichiziva kuti mazhinji emadhibhodhi edu anoratidza chete mushandisi uye masisitimu metrics, isu tinokwanisa kuwedzera aya metrics kuita makoramu kubva kune akatevedzana padanho retafura nenzira iyi:

CREATE TABLE cpu_alc (
  created_date Date,  
  created_at DateTime,  
  time String,  
  tags_id UInt32,  
  metrics Nested(
    name LowCardinality(String),  
    value Float64
  ),
  usage_user Float64 
             MATERIALIZED metrics.value[indexOf(metrics.name,'usage_user')],
  usage_system Float64 
             MATERIALIZED metrics.value[indexOf(metrics.name,'usage_system')]
) ENGINE = MergeTree(created_date, (tags_id, created_at), 8192);

Pakuisa DzvanyaImba achazviverenga otomatiki. Nenzira iyi unogona kusanganisa bhizinesi nekunakidzwa: chirongwa chinochinjika uye chakajairwa, asi isu takaburitsa iyo inonyanya kushandiswa makoramu. Ziva kuti izvi zvaisada kushandura yekuisa uye ETLiyo inoramba ichiisa arrays mutafura. Takangodaro ALTER TABLE, yakawedzera vatauri vakati wandei uye isu tine yakasanganiswa uye nekukurumidza chirongwa chaunogona kutanga kushandisa ipapo ipapo.

Codecs uye compression

nokuti nguva-yakatevedzana Izvo zvine basa kuti unorongedza sei data nekuti huwandu hweruzivo hunogona kunge hwakakura kwazvo. IN DzvanyaImba Pane seti yezvishandiso zvekuzadzisa compression maitiro e1:10, 1:20, uye dzimwe nguva zvimwe. Izvi zvinoreva kuti 1 TB yedata isina kuvharwa padhisiki inotora 50-100 GB. Saizi diki yakanaka, data inogona kuverengwa nekugadziriswa nekukurumidza.

Kuti uwane huwandu hwepamusoro hwekumanikidza, DzvanyaImba inotsigira macodecs anotevera:

Kuenda kuClickHouse: Makore matatu gare gare

Tafura yemuenzaniso

CREATE TABLE benchmark.cpu_codecs_lz4 (
    created_date Date DEFAULT today(), 
    created_at DateTime DEFAULT now() Codec(DoubleDelta, LZ4), 
    tags_id UInt32, 
    usage_user Float64 Codec(Gorilla, LZ4), 
    usage_system Float64 Codec(Gorilla, LZ4), 
    usage_idle Float64 Codec(Gorilla, LZ4), 
    usage_nice Float64 Codec(Gorilla, LZ4), 
    usage_iowait Float64 Codec(Gorilla, LZ4), 
    usage_irq Float64 Codec(Gorilla, LZ4), 
    usage_softirq Float64 Codec(Gorilla, LZ4), 
    usage_steal Float64 Codec(Gorilla, LZ4), 
    usage_guest Float64 Codec(Gorilla, LZ4), 
    usage_guest_nice Float64 Codec(Gorilla, LZ4), 
    additional_tags String DEFAULT ''
)
ENGINE = MergeTree(created_date, (tags_id, created_at), 8192);

Pano tinotsanangura codec DoubleDelta mune imwe nyaya, mune yechipiri - Gorilla, uye isu zvirokwazvo tichawedzera zvimwe LZ4 compression. Nekuda kweizvozvo, saizi yedata pa diski yakaderedzwa zvakanyanya:

Kuenda kuClickHouse: Makore matatu gare gare

Izvi zvinoratidza kuti yakawanda sei nzvimbo iyo data imwechete inotora, asi uchishandisa akasiyana macodecs uye compression:

  • mune GZIP faira pane disk;
  • muClickHouse isina macodecs, asi neZSTD compression;
  • muClickHouse ine macodecs uye compression LZ4 uye ZSTD.

Zvinogona kuonekwa kuti matafura ane macodecs anotora nzvimbo shoma shoma.

Kukura nyaya

Kwete zvishoma zvakakosha kusarudza mhando yedata chaiyo:

Kuenda kuClickHouse: Makore matatu gare gare

Mumienzaniso yese iri pamusoro ndakashandisa Float64. Asi kana takasarudza Float32, ipapo zvingatove nani. Izvi zvakaratidzwa zvakanaka nevakomana vePerkona mune chinyorwa chakabatanidzwa pamusoro. Izvo zvakakosha kushandisa iyo yakanyanya compact mhando inokodzera basa: kunyangwe zvishoma kune disk saizi pane yekubvunza kumhanya. DzvanyaImba very sensitive kune izvi.

Kana uchigona kushandisa int32 panzvimbo ye int64, wozotarisira kuwedzera kwakapetwa kaviri mukuita. Iyo data inotora ndangariro shoma, uye ese "arithmetic" inoshanda nekukurumidza. DzvanyaImba mukati iine yakanyatsotaipa sisitimu; inoshandisa zvakanyanya mikana yese inopihwa neazvino masisitimu.

Aggregation uye Akagadziriswa Maonero

Aggregation uye maonerwo enyama anotendera iwe kugadzira akaunganidzwa ezviitiko zvakasiyana:

Kuenda kuClickHouse: Makore matatu gare gare

Semuenzaniso, iwe unogona kunge uine isina-yakaunganidzwa sosi data, uye iwe unogona kubatanidza akasiyana-siyana maonerwo enyama kwavari neatomatiki muchidimbu kuburikidza neinjini yakakosha. SummingMergeTree (SMT). SMT chinhu chakakosha chekubatanidza data chimiro chinoverengera maaggregates otomatiki. Raw data inoiswa mudhatabhesi, inounganidzwa otomatiki, uye madhibhodhi anogona kushandiswa pakarepo pairi.

TTL - "kanganwa" data yekare

Nzira yeku "kukanganwa" data iyo isingachadiwi? DzvanyaImba anoziva kuita izvi. Paunenge uchigadzira matafura, unogona kutsanangura TTL matauriro: semuenzaniso, kuti tinochengeta maminetsi data kwezuva rimwe, data remazuva ese kwemazuva makumi matatu, uye usambobata data revhiki kana remwedzi:

CREATE TABLE aggr_by_minute
…
TTL time + interval 1 day

CREATE TABLE aggr_by_day
…
TTL time + interval 30 day

CREATE TABLE aggr_by_week
…
/* no TTL */

Multi-tier -govanisa data pane disks

Kutora pfungwa iyi mberi, data inogona kuchengetwa mukati DzvanyaImba munzvimbo dzakasiyana. Ngatitii tinoda kuchengeta data rinopisa revhiki rapfuura panzvimbo inokurumidza zvikuru SSD, uye tinoisa data rakawanda mune imwe nzvimbo. IN DzvanyaImba izvi zvino zvinogoneka:

Kuenda kuClickHouse: Makore matatu gare gare

Iwe unogona kugadzirisa mutemo wekuchengetedza (kuchengetedza mutemo) Saka DzvanyaImba ichaendesa otomatiki data kana yasvika mamwe mamiriro kune imwe chengetedzo.

Asi handizvo zvoga. Pachinhanho chetafura chaiyo, unogona kutsanangura mitemo yenguva chaiyo iyo data inopinda mukuchengetedza kunotonhora. Semuenzaniso, data inochengetwa pane dhisiki yakakurumidza kwemazuva manomwe, uye zvese zvakura zvinotamirwa kune inononoka. Izvi zvakanaka nekuti zvinokutendera kuti uchengetedze sisitimu pakuita kwakanyanya, uchidzora mitengo uye kwete kutambisa mari pane inotonhora data:

CREATE TABLE 
... 
TTL date + INTERVAL 7 DAY TO VOLUME 'cold_volume', 
    date + INTERVAL 180 DAY DELETE

Unique Features DzvanyaImba

Munenge mune zvese DzvanyaImba Kune "zvakakosha" zvakadaro, asi zvinogadziriswa nekusarudzika - chimwe chinhu chisiri mune mamwe dhatabhesi. Semuenzaniso, heano mamwe emhando dzakasiyana DzvanyaImba:

  • Arrays. The DzvanyaImba tsigiro yakanaka kwazvo yearrays, pamwe nekugona kuita macalculation akaoma pavari.
  • Aggregating Data Structures. Ichi ndicho chimwe che "killer features" DzvanyaImba. Pasinei nenyaya yekuti vakomana vanobva kuYandex vanoti hatidi kuunganidza data, zvese zvakaunganidzwa mukati. DzvanyaImba, nokuti inokurumidza uye yakakodzera.
  • Materialized Views. Pamwe chete nekubatanidza data zvimiro, maonero akabatika anotendera iwe kuti uite nyore -Chaicho nguva aggregation.
  • ClickHouse SQL. Ichi chiwedzero chemutauro SQL nezvimwe zvekuwedzera uye zvakasarudzika zvinongowanikwa mukati DzvanyaImba. Pakutanga, yakanga yakafanana nekuwedzera kune rumwe rutivi, uye kukanganisa kune rumwe rutivi. Iye zvino zvinenge zvose zvisingabatsiri zvichienzaniswa SQL 92 takaibvisa, ikozvino ingori kuwedzera.
  • lambda-kutaura. Vachiri mune chero dhatabhesi here?
  • ML-kutsigira. Izvi zvinowanikwa mumadhatabhesi akasiyana, mamwe ari nani, mamwe akanyanya kuipa.
  • open source. Tinogona kuwedzera DzvanyaImba pamwe chete. Zvino mukati DzvanyaImba vanenge 500 vanopa, uye nhamba iyi iri kuramba ichikura.

Tricky mibvunzo

Π’ DzvanyaImba kune nzira dzakawanda dzakasiyana dzekuita chinhu chimwe chete. Semuenzaniso, unogona kudzorera kukosha kwekupedzisira kubva patafura nenzira nhatu dzakasiyana dze CPU (kunewo yechina, asi inotonyanya kushamisa).

Yekutanga inoratidza kuti zviri nyore sei kuita mukati DzvanyaImba mibvunzo kana iwe uchida kutarisa izvozvo tuple zviri mu subquery. Ichi chimwe chinhu chandakapotsa pachangu mune mamwe dhatabhesi. Kana ini ndichida kuenzanisa chimwe chinhu ne subquery, saka mune mamwe dhatabhesi chete scalar inogona kuenzaniswa nayo, asi kune akati wandei makoramu ndinoda kunyora. ONA. The DzvanyaImba unogona kushandisa tuple:

SELECT *
  FROM cpu 
 WHERE (tags_id, created_at) IN 
    (SELECT tags_id, max(created_at)
        FROM cpu 
        GROUP BY tags_id)

Nzira yechipiri inoita chinhu chimwe chete asi inoshandisa aggregate function argMax:

SELECT 
    argMax(usage_user), created_at),
    argMax(usage_system), created_at),
...
 FROM cpu 

Π’ DzvanyaImba kune akati wandei gumi nemaviri aggregate mabasa, uye kana ukashandisa combinator, saka maererano nemitemo ye combinatorics iwe uchawana angangoita chiuru chazvo. ArgMax - rimwe remabasa anoverenga kukosha kwepamusoro: chikumbiro chinodzorera kukosha shandisa_mushandisi, iyo iyo yakakosha kukosha inosvika created_at:

SELECT now() as created_at,
       cpu.*
  FROM (SELECT DISTINCT tags_id from cpu) base 
  ASOF LEFT JOIN cpu USING (tags_id, created_at)

ASOF JOIN - "gluing" mitsara ine nguva dzakasiyana. Ichi chinhu chakasarudzika chedatabase chinongowanikwa mukati kdb+. Kana paine mbiri dzenguva dzakatevedzana dzine nguva dzakasiyana, ASOF JOIN inokubvumira kuti ufambe uye uvasanganise muchikumbiro chimwe chete. Kune imwe neimwe kukosha mune imwe nguva inoteedzana, kukosha kwepedyo mune imwe kunowanikwa, uye ivo vanodzoserwa pamutsetse mumwechete:

Kuenda kuClickHouse: Makore matatu gare gare

Analytic Mabasa

Muchiyero SQL-2003 unogona kunyora sezvizvi:

SELECT origin,
       timestamp,
       timestamp -LAG(timestamp, 1) OVER (PARTITION BY origin ORDER BY timestamp) AS duration,
       timestamp -MIN(timestamp) OVER (PARTITION BY origin ORDER BY timestamp) AS startseq_duration,
       ROW_NUMBER() OVER (PARTITION BY origin ORDER BY timestamp) AS sequence,
       COUNT() OVER (PARTITION BY origin ORDER BY timestamp) AS nb
  FROM mytable
ORDER BY origin, timestamp;

Π’ DzvanyaImba Iwe haugone kuita izvo - hazvitsigire chiyero SQL-2003 uye pamwe haangambozviiti. Pane kudaro, mu DzvanyaImba Itsika kunyora seizvi:

Kuenda kuClickHouse: Makore matatu gare gare

Ndakavimbisa lambdas - heinoi!

Uyu ndiwo analogue yemubvunzo wekuongorora muchiyero SQL-2003: anoverenga musiyano pakati pezviviri timestamp, nguva, ordinal number - zvese zvatinowanzo funga nezve analytical mabasa. IN DzvanyaImba Tinozviverenga kuburikidza nemaitiro: kutanga tinoputira data muhuwandu, mushure mezvo tinoita zvose zvatinoda pane zvakarongwa, uye tozowedzera zvakare. Haina kunyanya kunyanyo batsira, inoda rudo rwekuita hurongwa hwekuita zvishoma, asi inochinjika zvakanyanya.

Special Features

Uyezve, mu DzvanyaImba mabasa mazhinji akasarudzika. Semuenzaniso, ungaziva sei kuti vangani zvirongwa zviri kuitika panguva imwe chete? Basa rakajairwa rekutarisa nderekuona iyo yakanyanya kuremerwa nechikumbiro chimwe. IN DzvanyaImba Pane basa rinokosha rechinangwa ichi:

Kuenda kuClickHouse: Makore matatu gare gare

Kazhinji, ClickHouse ine akakosha mabasa kune akawanda zvinangwa:

  • kumhanya Musiyano, kumhanyaUnganidzira, muvakidzani;
  • sumMap (kiyi, kukosha);
  • timeSeriesGroupSum (uid, timestamp, kukosha);
  • timeSeriesGroupRateSum(uid, timestamp, kukosha);
  • skewPop, skewSamp, kurtPop, kurtSamp;
  • NEKUZADZA / NEMATI;
  • simpleLinearRegression, stochasticLinearRegression.

Iyi haisi rondedzero yakazara yemabasa, pane 500-600 muhuwandu. Zano: zvese zvinoshanda mukati DzvanyaImba iri mutafura yehurongwa (kwete ese akanyorwa, asi ese anonakidza):

select * from system.functions order by name

DzvanyaImba inochengetedza ruzivo rwakawanda pamusoro payo, kusanganisira log tables, query_log, trace log, log yekushanda ine data blocks (chikamu_log), metrics log, uye system log, iyo inowanzo nyora kune dhisiki. Log metrics ndizvo nguva-yakatevedzana Π² DzvanyaImba saizvozvo DzvanyaImba: Iyo database pachayo inogona kuita basa nguva-yakatevedzana databases, nokudaro "kuzvidya" pachayo.

Kuenda kuClickHouse: Makore matatu gare gare

Ichi zvakare chinhu chakasiyana - sezvo isu tichiitira basa rakanaka nguva-yakatevedzana, sei tisingagoni kuchengeta zvose zvatinoda mukati medu? Hatidi Prometheus, tinochengeta zvinhu zvose kwatiri. Yakabatanidzwa grafana uye tinozviongorora pachedu. Zvisinei, kana DzvanyaImba inodonha, hatizooni chikonzero, saka kazhinji havaite izvozvo.

Masumbu makuru kana akawanda madiki DzvanyaImba

Chii chiri nani - sumbu rimwe rakakura kana akawanda madiki ClickHouses? Traditional approach to DWH isumbu rakakura umo masekete akagoverwa kune yega yega application. Tauya kune dhatabhesi maneja - tipe dhayagiramu, uye vakatipa imwe:

Kuenda kuClickHouse: Makore matatu gare gare

Π’ DzvanyaImba unogona kuzviita zvakasiyana. Iwe unogona kuita kuti application yega yega iwe yako DzvanyaImba:

Kuenda kuClickHouse: Makore matatu gare gare

Hatichada iyo huru inotyisa zvakare DWH uye maadmin asingagoneki. Tinogona kupa imwe neimwe application yayo DzvanyaImba, uye mugadziri anogona kuzviita pachake, kubvira DzvanyaImba iri nyore kwazvo kuisa uye haidi kuomarara manejimendi:

Kuenda kuClickHouse: Makore matatu gare gare

Asi kana tine zvakawanda DzvanyaImba, uye iwe unofanirwa kuiisa kazhinji, wobva wada kuita otomatiki iyi maitiro. Nokuda kweizvi tinogona, semuenzaniso, kushandisa Kubernetes ΠΈ clickhouse- mutyairi. IN Kubernetes ClickHouse unogona kuiisa "pa-tinya": Ndinogona kudzvanya bhatani, mhanyisa manifest uye dhatabhesi yakagadzirira. Ini ndinogona kubva ndagadzira dhizaini, kutanga kurodha metrics ipapo, uye mumaminetsi mashanu ndine dashboard yakagadzirira. grafana. Zviri nyore!

Chii mumagumo?

Uye saka, DzvanyaImba -Izvi:

  • Kurumidza. Munhu wose anoziva izvi.
  • Simply. Kupokana kushoma, asi ndinotenda kuti zvakaoma mukurovedza, zviri nyore mukurwa. Kana wanzwisisa sei DzvanyaImba inoshanda, saka zvese zviri nyore.
  • Pasi rose. Inokodzera kune akasiyana mascenario: DWH, Time Series, Log Storage. Asi handizvo OLTP dhatabhesi, saka usayedze kuita mapfupi ekuisa uye kuverenga ipapo.
  • Kufarira. Pamwe ndiye anoshanda naye DzvanyaImba, yakasangana nenguva dzakawanda dzinonakidza mupfungwa yakanaka uye yakaipa. Semuenzaniso, kuburitswa kutsva kwakabuda, zvese zvakamira kushanda. Kana kuti pawakanetseka nebasa kwemazuva maviri, asi mushure mekubvunza mubvunzo muTeregiramu chat, basa rakagadziriswa mumaminetsi maviri. Kana senge pamusangano pamushumo waLesha Milovidov, skrini kubva DzvanyaImba akatyora nhepfenyuro YakakwiraLoad ++. Chinhu cherudzi urwu chinoitika nguva dzese uye chinoita kuti hupenyu hwedu huome. DzvanyaImba yakajeka uye inonakidza!

Unogona kuona mharidzo pano.

Kuenda kuClickHouse: Makore matatu gare gare

Musangano wakamirirwa kwenguva refu wevagadziri vezvirongwa zvepamusoro-soro pa YakakwiraLoad ++ ichaitika munaNovember 9 uye 10 muSkolkovo. Chekupedzisira, uyu uchave musangano usiri pamhepo (zvisinei nedziviriro dzese dziripo), sezvo simba reHighLoad ++ risingagone kuiswa pamhepo.

Kumusangano, tinowana uye tinokuratidzai nyaya pamusoro pehukuru hwekugadzirisa tekinoroji: HighLoad ++ yaive, iri uye ichava iyo chete nzvimbo yaunogona kudzidza mumazuva maviri kuti Facebook, Yandex, VKontakte, Google neAmazon inoshanda sei.

Sezvo taita misangano yedu pasina kuvhiringwa kubva muna 2007, gore rino tichasangana kechi14. Munguva ino, musangano wakakura kagumi; gore rapfuura, chiitiko chakakosha cheindasitiri chakaunza pamwechete vatori vechikamu 10, vatauri 3339, mishumo uye kusangana, uye 165 mateki aimhanya panguva imwe chete.
Gore rakapera kwaive nemabhazi makumi maviri, marita 20 etii nekofi, marita 5280 ezvinwiwa zvemichero uye 1650 mabhodhoro emvura. Uye mamwe makirogiramu 10200 2640 ezvokudya, ndiro 16 000 nemakapu 25 000. Nenzira, nemari yakasimudzwa kubva pamapepa akadzokororwa, takadyara mbesa dzemuoki zana :)

Unogona kutenga matikiti pano, wana nhau nezvemusangano - pano, uye taura pane ese masocial network: teregiramu, Facebook, Vkontakte ΠΈ Twitter.

Source: www.habr.com

Voeg