Kusamukira ku ClickHouse: Patadutsa zaka 3

Zaka zitatu zapitazo Viktor Tarnavsky ndi Alexey Milovidov ochokera ku Yandex pa siteji Kuthamanga Kwambiri ++ anauza, ClickHouse ndi yabwino bwanji, komanso momwe imachepetsera. Ndipo pa siteji yotsatira panali Alexander Zaitsev с lipoti za kusamukira ku Dinani Nyumba kuchokera ku DBMS ina yowunikira komanso pomaliza kuti Dinani Nyumba, ndithudi, zabwino, koma osati kwambiri. Pamene mu 2016 kampani LifeStreet, kumene Alexander ankagwira ntchito, anali kutembenuza makina owerengera a petabyte kuti Dinani Nyumba, unali “msewu wa njerwa wachikasu” wochititsa chidwi wodzaza ndi zoopsa zosadziwika bwino - Dinani Nyumba nthawiyo zinkawoneka ngati malo otchera mabomba.

Patatha zaka zitatu Dinani Nyumba anakhala bwino kwambiri - nthawi imeneyi Alexander anayambitsa Altinity kampani, amene amathandiza anthu kusamukira Dinani Nyumba ntchito zambiri, komanso imapangitsanso malondawo pamodzi ndi anzawo ochokera ku Yandex. Tsopano Dinani Nyumba akadali osati kuyenda mosasamala, koma osatinso malo opangira mabomba.

Alexander wakhala akugwira ntchito ndi machitidwe ogawidwa kuyambira 2003, akupanga ntchito zazikulu MySQL, Oracle и Vertica. Pomaliza HighLoad ++ 2019 Alexander, m'modzi mwa oyambitsa kugwiritsa ntchito Dinani Nyumba, adauza zomwe DBMS iyi ili tsopano. Tiphunzira za zinthu zazikuluzikulu Dinani Nyumba: momwe zimasiyanirana ndi machitidwe ena komanso momwe zimakhalira bwino kuzigwiritsa ntchito. Pogwiritsa ntchito zitsanzo, tiwona machitidwe aposachedwa komanso oyesedwa ndi projekiti pamakina omanga motengera Dinani Nyumba.


Retrospective: zomwe zidachitika zaka 3 zapitazo

Zaka zitatu zapitazo tinasamutsa kampaniyo LifeStreet pa Dinani Nyumba kuchokera ku nkhokwe ina yowunikira, ndipo kusamuka kwa ad network analytics kumawoneka motere:

  • June 2016 Open Source anaonekera Dinani Nyumba ndipo polojekiti yathu inayamba;
  • Ogasiti. Umboni wa Concept: maukonde akuluakulu otsatsa, zomangamanga ndi ma data 200-300 terabytes;
  • October. Deta yoyamba yopanga;
  • December. Zogulitsa zonse ndizochitika 10-50 biliyoni patsiku.
  • June 2017. Kusamuka kwabwino kwa ogwiritsa ntchito ku Dinani Nyumba, 2,5 petabytes ya data pagulu la ma seva 60.

Pa nthawi ya kusamuka, panali kumvetsetsa kwakukulu kuti Dinani Nyumba Ndi dongosolo labwino lomwe ndi losangalatsa kugwira ntchito, koma iyi ndi ntchito yamkati ya Yandex. Choncho, pali ma nuances: Yandex adzayamba kulimbana ndi makasitomala ake amkati ndipo pokhapokha ndi anthu ammudzi ndi zosowa za ogwiritsa ntchito akunja, ndipo ClickHouse ndiye sanafikire mlingo wamalonda m'madera ambiri ogwira ntchito. Ichi ndichifukwa chake tidayambitsa Altinity mu Marichi 2017 kuti tipange Dinani Nyumba yachangu komanso yabwino osati Yandex yokha, komanso kwa ogwiritsa ntchito ena. Ndipo tsopano ife:

  • Timaphunzitsa ndikuthandizira kupanga mayankho motengera Dinani Nyumba kotero kuti makasitomala asalowe m'mavuto, komanso kuti yankho ligwire ntchito;
  • Timapereka chithandizo cha 24/7 Dinani Nyumba- kukhazikitsa;
  • Timapanga mapulojekiti athu achilengedwe;
  • Timadzipereka kwathunthu kwa ife tokha Dinani Nyumba, kuyankha zopempha kuchokera kwa ogwiritsa ntchito omwe akufuna kuwona zina.

Ndipo, ndithudi, timathandizira kusamukira Dinani Nyumba с MySQL, Vertica, Oracle, Greenplum, Kusintha ndi machitidwe ena. Takhala tikuchita nawo zinthu zosiyanasiyana, ndipo zonse zayenda bwino.

Kusamukira ku ClickHouse: Patadutsa zaka 3

Bwanji kusamukira? Dinani Nyumba

Osachedwetsa! Ichi ndi chifukwa chachikulu. Dinani Nyumba - Nawonso database yachangu kwambiri pazosintha zosiyanasiyana:

Kusamukira ku ClickHouse: Patadutsa zaka 3

Mawu osasinthika ochokera kwa anthu omwe akhala akugwira ntchito ndi anthu kwa nthawi yayitali Dinani Nyumba.

Scalability. Pazinthu zina zachinsinsi mutha kuchita bwino pagawo limodzi la Hardware, koma Dinani Nyumba mutha kukulitsa osati molunjika, komanso mopingasa, mongowonjezera ma seva. Chilichonse sichikuyenda bwino momwe timafunira, koma chimagwira ntchito. Mukhoza kuwonjezera dongosolo pamene bizinesi yanu ikukula. Ndikofunika kuti tisamangidwe ndi yankho tsopano ndipo nthawi zonse pali kuthekera kwachitukuko.

Kunyamula. Palibe cholumikizira ku chinthu chimodzi. Mwachitsanzo, ndi Redshift ya Amazon Ndizovuta kusuntha kwinakwake. A Dinani Nyumba mutha kuyiyika pa laputopu yanu, seva, tumizani kumtambo, pitani ku Kubernetes - palibe zoletsa pakugwira ntchito kwa zomangamanga. Izi ndizothandiza kwa aliyense, ndipo uwu ndi mwayi waukulu womwe nkhokwe zina zambiri zofananira sizingadzitamandire nazo.

Kusintha. Dinani Nyumba sichiyima pa chinthu chimodzi, mwachitsanzo, Yandex.Metrica, koma ikukula ndipo imagwiritsidwa ntchito m'mapulojekiti ndi mafakitale osiyanasiyana. Itha kukulitsidwa powonjezera maluso atsopano kuti athetse mavuto atsopano. Mwachitsanzo, akukhulupirira kuti kusunga zipika mu database ndi makhalidwe oipa, kotero iwo anabwera ndi Elasticsearch. Koma chifukwa cha kusinthasintha Dinani Nyumba, muthanso kusunga zipika mmenemo, ndipo nthawi zambiri izi zimakhala zabwinoko kuposa mkati Elasticsearch - mkati Dinani Nyumba izi zimafuna chitsulo chocheperako kakhumi.

Kwaulere Chotsani Chotsegula. Simuyenera kulipira kalikonse. Palibe chifukwa chokambirana chilolezo kuti muyike dongosolo pa laputopu kapena seva yanu. Palibe malipiro obisika. Nthawi yomweyo, palibe ukadaulo wina wa Open Source database womwe ungapikisane nawo mwachangu Dinani Nyumba. MySQL, MariaDB, Greenplum - onse amachedwa kwambiri.

Community, galimoto ndi zosangalatsa. At Dinani Nyumba anthu abwino kwambiri: kukumana, kucheza ndi Alexey Milovidov, yemwe amatiimba mlandu tonse ndi mphamvu zake komanso chiyembekezo.

Kusamukira ku ClickHouse

Kupita ku Dinani Nyumba pazifukwa zina, mumangofunika zinthu zitatu:

  • Zindikirani malire Dinani Nyumba ndi chimene sichiyenera.
  • Pezani mwayi teknoloji ndi mphamvu zake zazikulu.
  • Yesani. Ngakhale kumvetsetsa momwe zimagwirira ntchito Dinani Nyumba, sikutheka kuneneratu nthawi yomwe idzakhala mofulumira, nthawi yomwe idzachedwe, pamene idzakhala yabwino, komanso pamene idzakhala yoipitsitsa. Choncho yesani.

Kusuntha vuto

Pali imodzi yokha "koma": ngati mutasamukira Dinani Nyumba kuchokera ku chinthu china, ndiye kawirikawiri chinachake chimalakwika. Tidazolowera machitidwe ndi zinthu zina zomwe zimagwira ntchito pazosungidwa zomwe timakonda. Mwachitsanzo, aliyense wogwira naye ntchito SQL-databases amawona magawo otsatirawa ofunikira:

  • zochita;
  • zopinga;
  • kusasinthasintha;
  • zizindikiro;
  • KUSINTHA/KUFUTA;
  • NULLs;
  • milliseconds;
  • otomatiki mtundu akuponya;
  • kugwirizana zambiri;
  • magawo osakanikirana;
  • zida zoyendetsera magulu.

Kulemba ntchito ndikofunikira, koma zaka zitatu zapitazo Dinani Nyumba Palibe mwazinthu izi zomwe zidapezeka! Tsopano zosakwana theka la zomwe sizinakhazikitsidwe zatsala: zochitika, zopinga, Consistency, milliseconds ndi mtundu wa kuponyera.

Ndipo chinthu chachikulu ndichoti mu Dinani Nyumba machitidwe ndi njira zina sizigwira ntchito kapena sizigwira ntchito mosiyana ndi momwe timazolowera. Chilichonse chomwe chikuwoneka Dinani Nyumba, zikufanana ndi "ClickHouse njira", ndi. ntchito zimasiyana ndi nkhokwe zina. Mwachitsanzo:

  • Ma index sasankhidwa, koma kudumpha.
  • KUSINTHA/KUFUTA osati synchronous, koma asynchronous.
  • Pali zolumikizana zingapo, koma palibe chokonzekera mafunso. Momwe amagwiritsidwira ntchito nthawi zambiri sizidziwika bwino kwa anthu ochokera kudziko lankhokwe.

Zolemba za ClickHouse

Mu 1960, katswiri wa masamu waku America wochokera ku Hungary Wigner EP adalemba nkhani "Kuchita mopanda nzeru kwa masamu mu sayansi yachilengedwe” (“The Incomprehensible Effectiveness of Mathematics in the Natural Sciences”) kuti dziko lotizungulira pazifukwa zina likulongosoledwa bwino ndi malamulo a masamu. Masamu ndi sayansi yodziwika bwino, ndipo malamulo achilengedwe ofotokozedwa m'masamu sakhala ochepa, ndipo Wigner EP anatsindika kuti izi nzodabwitsa kwambiri.

M'malingaliro anga, Dinani Nyumba - chachilendo chomwecho. Kuti titchulenso Wigner, titha kunena izi: magwiridwe antchito osatheka ndi odabwitsa Dinani Nyumba m'magwiritsidwe osiyanasiyana osiyanasiyana!

Kusamukira ku ClickHouse: Patadutsa zaka 3

Mwachitsanzo, tiyeni titenge Real-Time Data Warehouse, momwe deta imakwezedwa pafupifupi mosalekeza. Tikufuna kulandira zopempha kuchokera kwa izo ndikuchedwa kachiwiri. Chonde - gwiritsani ntchito Dinani Nyumba, chifukwa izi ndizochitika zomwe zidapangidwira. Dinani Nyumba umu ndi momwe zimagwiritsidwira ntchito osati pa intaneti, komanso pakufufuza zamalonda ndi zachuma, AdTech, komanso mu Kuzindikira zachinyengon. MU Real-time Data Warehouse chiwembu chomangika chovuta ngati "nyenyezi" kapena "chipale chofewa" chimagwiritsidwa ntchito, matebulo ambiri ndi ONANI (nthawi zina zambiri), ndipo deta nthawi zambiri imasungidwa ndikusinthidwa m'makina ena.

Tiyeni titengepo chochitika china - Mndandanda wa Nthawi: kuyang'anira zipangizo, maukonde, ziwerengero za kagwiritsidwe ntchito, intaneti ya Zinthu. Apa tikukumana ndi zochitika zosavuta zomwe zidakonzedwa munthawi yake. Dinani Nyumba sichinapangidwe koyambirira kwa izi, koma yadziwonetsa kuti ikugwira ntchito bwino, chifukwa chake makampani akuluakulu amagwiritsa ntchito Dinani Nyumba ngati nkhokwe yowunikira zambiri. Kufufuza ngati kuli koyenera Dinani Nyumba kwa mndandanda wa nthawi, tinapanga benchmark kutengera njira ndi zotsatira InfluxDB и Nthawi - apadera nthawi-mndandanda nkhokwe. Zinapezeka, izo Dinani Nyumba, ngakhale popanda kukhathamiritsa kwa ntchito zotere, amapambana kudziko lina:

Kusamukira ku ClickHouse: Patadutsa zaka 3

В nthawi-mndandanda Kawirikawiri tebulo lopapatiza limagwiritsidwa ntchito - mizati yaying'ono ingapo. Zambiri zitha kubwera kuchokera pakuwunika - zolemba mamiliyoni pa sekondi imodzi - ndipo nthawi zambiri zimabwera pang'onopang'ono (pompopompo kusuntha). Chifukwa chake, cholembera chowonjezera chosiyana chikufunika, ndipo mafunsowo ali ndi zenizeni zawo.

Kusamalira Log. Kusonkhanitsa zipika mu nkhokwe nthawi zambiri kumakhala koyipa, koma Dinani Nyumba izi zikhoza kuchitika ndi ndemanga zina monga tafotokozera pamwambapa. Makampani ambiri amagwiritsa ntchito Dinani Nyumba ndendende kwa cholinga ichi. Pankhaniyi, timagwiritsa ntchito tebulo lathyathyathya pomwe timasunga zipika zonse (mwachitsanzo, mu mawonekedwe JSON), kapena kudula mzidutswa. Deta nthawi zambiri imayikidwa mumagulu akulu (mafayilo), ndipo timasaka ndi magawo ena.

Pazigawo zonsezi, ma database apadera amagwiritsidwa ntchito. Dinani Nyumba munthu akhoza kuchita zonse ndi bwino kwambiri kuti amapambana iwo. Tiyeni tsopano tione bwinobwino nthawi-mndandanda zochitika, ndi mmene "kuphikira" molondola Dinani Nyumba za chochitika ichi.

Mndandanda wa Nthawi

Pakadali pano ichi ndiye chochitika chachikulu chomwe Dinani Nyumba analingalira njira yoyenera. Mndandanda wa nthawi ndi gulu la zochitika zomwe zimakonzedwa munthawi yake, zomwe zimayimira kusintha kwa njira zina pakapita nthawi. Mwachitsanzo, izi zikhoza kukhala kugunda kwa mtima patsiku kapena kuchuluka kwa njira zomwe zili m'dongosolo. Chilichonse chomwe chimapatsa nthawi nkhupakupa ndi miyeso ina ndi nthawi-mndandanda:

Kusamukira ku ClickHouse: Patadutsa zaka 3

Zambiri mwa zochitika zamtunduwu zimachokera pakuwunika. Izi sizingakhale kuyang'anira ukonde kokha, komanso zida zenizeni: magalimoto, makina opanga mafakitale, IoT, mafakitale kapena ma taxi opanda anthu, mu thunthu lomwe Yandex ikuyika kale Dinani Nyumba- seva.

Mwachitsanzo, pali makampani omwe amasonkhanitsa deta kuchokera ku zombo. Pasekondi zingapo zilizonse, masensa omwe ali m'sitima yapamadzi amatumiza miyeso yambiri yosiyanasiyana. Akatswiri amawaphunzira, kupanga zitsanzo ndikuyesera kumvetsetsa momwe sitimayo imagwiritsidwira ntchito bwino, chifukwa sitima yapamadzi siyenera kukhala yopanda ntchito kwa mphindi imodzi. Nthawi iliyonse yopuma ndikutaya ndalama, choncho ndikofunika kufotokozera njirayo kuti kuyimitsidwa kumakhala kochepa.

Masiku ano pali kukula kwa ma database apadera omwe amayesa nthawi-mndandanda. Pamalo DB-Zipangizo Ma database osiyanasiyana amasankhidwa mwanjira ina, ndipo mutha kuwawona motengera mtundu:

Kusamukira ku ClickHouse: Patadutsa zaka 3

Mtundu womwe ukukula mwachangu ndi mndandanda wanthawis. Ma graph database akukulanso, koma mndandanda wanthawiZakhala zikukula mwachangu pazaka zingapo zapitazi. Oimira odziwika bwino a gulu ili la nkhokwe ndi InfluxDB, Prometheus, KDB, Nthawi (yomangidwa pa PostgreSQL), mayankho ochokera Amazon. Dinani Nyumba angagwiritsidwe ntchito panonso, ndipo amagwiritsidwa ntchito. Ndiroleni ndikupatseni zitsanzo za anthu ochepa.

Mmodzi mwa oyambitsa ndi kampani CloudFlare (CDN- wopereka). Amayang'anira awo CDN через Dinani Nyumba (DNS-zopempha, HTTP-mafunso) ndi katundu wambiri - zochitika 6 miliyoni pamphindikati. Zonse zimadutsa Kafka,akupita ku Dinani Nyumba, zomwe zimapereka mwayi wowona ma dashboards a zochitika mu dongosolo mu nthawi yeniyeni.

Comcast - m'modzi mwa atsogoleri muzoyankhulana ku USA: intaneti, kanema wawayilesi wa digito, telephony. Anapanga dongosolo lolamulira lofanana CDN mkati Chotsani Chotsegula ntchito Apache Traffic Control kuti mugwiritse ntchito ndi data yanu yayikulu. Dinani Nyumba amagwiritsidwa ntchito ngati backend kwa analytics.

percona anamangidwa mu Dinani Nyumba mkati mwanu PMMkusunga kuyang'anira zosiyanasiyana MySQL.

Zofunika Zachindunji

Zosungira nthawi zili ndi zofunikira zawozawo.

  • Kuyika mwachangu kuchokera kwa othandizira ambiri. Tiyenera kuyika deta kuchokera ku mitsinje yambiri mofulumira kwambiri. Dinani Nyumba Imachita izi bwino chifukwa zoyika zake zonse sizotsekereza. Aliyense ikani ndi fayilo yatsopano pa diski, ndipo zoyika zing'onozing'ono zimatha kusungidwa mwanjira ina. MU Dinani Nyumba Ndi bwino kuyika deta m'magulu akuluakulu kusiyana ndi mzere umodzi pa nthawi.
  • Chiwembu chosinthika. The nthawi-mndandanda nthawi zambiri sitidziwa dongosolo la data kwathunthu. N'zotheka kupanga ndondomeko yowunikira ntchito inayake, koma zimakhala zovuta kuigwiritsa ntchito pa ntchito ina. Izi zimafuna chiwembu chosinthika. Dinani Nyumba, amakulolani kuchita izi, ngakhale kuti ndi maziko olembedwa mwamphamvu.
  • Kusungidwa koyenera ndi kuyiwala za data. Nthawi zambiri mu nthawi-mndandanda deta yaikulu, choncho iyenera kusungidwa bwino momwe mungathere. Mwachitsanzo, pa InfluxDB psinjika wabwino ndi mbali yake yaikulu. Koma kuwonjezera kasungidwe, inunso muyenera kukhala wokhoza "kuyiwala" deta yakale ndi kuchita mtundu wina wa kutsitsa zitsanzo - kuwerengera zokha kwa magulu.
  • Mafunso ofulumira pa data yophatikizidwa. Nthawi zina ndizosangalatsa kuyang'ana mphindi 5 zomaliza ndikulondola kwa milliseconds, koma pamphindi yapamwezi ya data kapena granularity yachiwiri sizingafunike - ziwerengero zonse ndizokwanira. Thandizo lamtunduwu ndilofunika, apo ayi pempho la miyezi 3 limatenga nthawi yayitali kuti limalize Dinani Nyumba.
  • Zofunsa ngati "mfundo yomaliza, monga ya». Izi ndizofanana kwa nthawi-mndandanda mafunso: yang'anani muyeso womaliza kapena momwe dongosololi liliri panthawi yake t. Awa simafunso osangalatsa a database, koma muyeneranso kuwafunsa.
  • "Gluing" mndandanda wa nthawi. Mndandanda wa nthawi ndi mndandanda wa nthawi. Ngati pali mitundu iwiri ya nthawi, nthawi zambiri imafunika kulumikizidwa ndikulumikizana. Sikoyenera kuchita izi pazosungidwa zonse, makamaka ndi mndandanda wanthawi zosasinthika: apa pali nthawi zina, pali zina. Mutha kuganizira zapakati, koma mwadzidzidzi padzakhalabe dzenje pamenepo, kotero sizodziwika.

Tiyeni tiwone momwe zofunikira izi zimakwaniritsidwira Dinani Nyumba.

Chiwembu

В Dinani Nyumba dongosolo kwa nthawi-mndandanda zitha kuchitika m'njira zosiyanasiyana, kutengera kuchuluka kwa data. Ndizotheka kupanga dongosolo pa data wamba tikadziwa ma metrics onse pasadakhale. Mwachitsanzo, ndinachita zimenezi CloudFlare ndi kuyang'anira CDN ndi dongosolo wokometsedwa bwino. Mutha kupanga dongosolo lambiri lomwe limayang'anira zida zonse ndi mautumiki osiyanasiyana. Pankhani ya data yosawerengeka, sitidziwa pasadakhale zomwe tikuyang'anira - ndipo izi mwina ndizofala kwambiri.

Zambiri zokhazikika. Mizati. Chiwembucho ndi chosavuta - mizati yokhala ndi mitundu yofunikira:

CREATE TABLE cpu (
  created_date Date DEFAULT today(),  
  created_at DateTime DEFAULT now(),  
  time String,  
  tags_id UInt32,  /* join to dim_tag */
  usage_user Float64,  
  usage_system Float64,  
  usage_idle Float64,  
  usage_nice Float64,  
  usage_iowait Float64,  
  usage_irq Float64,  
  usage_softirq Float64,  
  usage_steal Float64,  
  usage_guest Float64,  
  usage_guest_nice Float64
) ENGINE = MergeTree(created_date, (tags_id, created_at), 8192);

Ili ndi tebulo lanthawi zonse lomwe limayang'anira mtundu wina wazinthu zotsitsa dongosolo (wosuta, dongosolo, osagwira ntchito, zabwino). Zosavuta komanso zosavuta, koma zosasinthika. Ngati tikufuna chiwembu chosinthika, ndiye kuti titha kugwiritsa ntchito masanjidwe.

Zambiri zosakhazikika. Mipikisano:

CREATE TABLE cpu_alc (
  created_date Date,  
  created_at DateTime,  
  time String,  
  tags_id UInt32,  
  metrics Nested(
    name LowCardinality(String),  
    value Float64
  )
) ENGINE = MergeTree(created_date, (tags_id, created_at), 8192);

SELECT max(metrics.value[indexOf(metrics.name,'usage_user')]) FROM ...

kapangidwe Wosungidwa pali mitundu iwiri: metrics.name и metrics.value. Apa mutha kusunga zidziwitso zowunikira mosasamala monga mndandanda wa mayina ndi miyeso yambiri pa chochitika chilichonse. Kuti muwonjezere kukhathamiritsa, m'malo mwa dongosolo limodzi lotere, mutha kupanga zingapo. Mwachitsanzo, imodzi ya sungunulani- mtengo, wina - chifukwa Int-kutanthauza chifukwa Int Ndikufuna kusunga bwino.

Koma dongosolo loterolo ndilovuta kulipeza. Muyenera kugwiritsa ntchito zomangamanga zapadera, pogwiritsa ntchito ntchito zapadera kuti mutulutse zikhalidwe za index ndiyeno gulu:

SELECT max(metrics.value[indexOf(metrics.name,'usage_user')]) FROM ...

Koma imagwirabe ntchito mwachangu. Njira ina yosungira deta yosakhazikika ndi mzere.

Zambiri zosakhazikika. Zingwe. Mwanjira iyi yachikhalidwe, popanda mindandanda, mayina ndi zikhalidwe zimasungidwa nthawi imodzi. Ngati miyeso 5 imachokera ku chipangizo chimodzi nthawi imodzi, mizere 000 imapangidwa mu database:

CREATE TABLE cpu_rlc (
  created_date Date,  
  created_at DateTime,  
  time String,  
  tags_id UInt32,  
  metric_name LowCardinality(String),  
  metric_value Float64
) ENGINE = MergeTree(created_date, (metric_name, tags_id, created_at), 8192);


SELECT 
    maxIf(metric_value, metric_name = 'usage_user'),
    ... 
FROM cpu_r
WHERE metric_name IN ('usage_user', ...)

Dinani Nyumba imagwirizana ndi izi - ili ndi zowonjezera zapadera Dinani Nyumba SQL. Mwachitsanzo, max ngati - ntchito yapadera yomwe imawerengera kuchuluka kwake ndi metric pamene zinthu zina zakwaniritsidwa. Mutha kulemba mawu angapo otere mu pempho limodzi ndikuwerengera nthawi yomweyo mtengo wama metric angapo.

Tiyeni tifanizire njira zitatu:

Kusamukira ku ClickHouse: Patadutsa zaka 3

tsatanetsatane

Apa ndawonjezera "Disk Data Size" pamayesero ena a data. Pankhani ya mizati, tili ndi kukula kochepa kwambiri kwa deta: kuponderezana kwakukulu, kuthamanga kwafunso, koma timalipira polemba zonse mwakamodzi.

Pankhani ya arrays, zonse zimakhala zoipitsitsa pang'ono. Deta ikadali yopanikizidwa bwino ndipo mawonekedwe osakhazikika amatha kusungidwa. Koma Dinani Nyumba - nkhokwe ya columnar, ndipo tikayamba kusunga zonse mumndandanda, zimasanduka mzere umodzi, ndipo timalipira kusinthasintha ndikuchita bwino. Pa ntchito iliyonse, muyenera kuwerenga mndandanda wonse kukumbukira, kenako pezani zomwe mukufuna - ndipo ngati gululo likukula, ndiye kuti liwiro limatsika.

M'modzi mwamakampani omwe amagwiritsa ntchito njirayi (mwachitsanzo, About), magulu amadulidwa mu zidutswa 128. Deta kuchokera ku ma metrics masauzande angapo okhala ndi voliyumu ya 200 TB ya data/tsiku imasungidwa osati pamndandanda umodzi, koma m'magulu 10 kapena 30 okhala ndi malingaliro apadera osungira.

Njira yosavuta ndiyo kugwiritsa ntchito zingwe. Koma detayo ndi yosakanizidwa bwino, kukula kwa tebulo ndi kwakukulu, ndipo ngakhale mafunso amachokera pazitsulo zingapo, ClickHouse siigwira ntchito bwino.

Pulogalamu ya Hybrid

Tiyerekeze kuti tasankha dera lalikulu. Koma ngati tidziwa kuti ma dashboards athu ambiri amangowonetsa ma metrics a ogwiritsa ntchito ndi makina okha, titha kuwonjezeranso ma metrics kukhala mipingo kuchokera pamndandanda womwe uli patebulo motere:

CREATE TABLE cpu_alc (
  created_date Date,  
  created_at DateTime,  
  time String,  
  tags_id UInt32,  
  metrics Nested(
    name LowCardinality(String),  
    value Float64
  ),
  usage_user Float64 
             MATERIALIZED metrics.value[indexOf(metrics.name,'usage_user')],
  usage_system Float64 
             MATERIALIZED metrics.value[indexOf(metrics.name,'usage_system')]
) ENGINE = MergeTree(created_date, (tags_id, created_at), 8192);

Polowetsa Dinani Nyumba adzaziwerengera zokha. Mwanjira iyi mutha kuphatikiza bizinesi ndi zosangalatsa: chiwembucho ndi chosinthika komanso chokhazikika, koma tidatulutsa mizati yomwe imagwiritsidwa ntchito kwambiri. Zindikirani kuti izi sizinafune kusintha kuyika ndi ETLzomwe zikupitilizabe kuyika magawo mu tebulo. Ife tangotero ALTER TABLE, adawonjezera oyankhula angapo ndipo tili ndi chiwembu chosakanizidwa komanso chachangu chomwe mutha kuyamba kugwiritsa ntchito nthawi yomweyo.

Codecs ndi compression

chifukwa nthawi-mndandanda Zimatengera momwe mumalongerera deta chifukwa kuchuluka kwa chidziwitso kumatha kukhala kwakukulu. MU Dinani Nyumba Pali zida zingapo kuti mukwaniritse 1:10, 1:20, ndipo nthawi zina zambiri. Izi zikutanthauza kuti 1 TB ya deta yosatsegulidwa pa disk imatenga 50-100 GB. Kukula kwakung'ono ndikwabwino, deta imatha kuwerengedwa ndikusinthidwa mwachangu.

Kuti mukwaniritse kupsinjika kwakukulu, Dinani Nyumba imathandizira ma codec awa:

Kusamukira ku ClickHouse: Patadutsa zaka 3

Tabulo lachitsanzo:

CREATE TABLE benchmark.cpu_codecs_lz4 (
    created_date Date DEFAULT today(), 
    created_at DateTime DEFAULT now() Codec(DoubleDelta, LZ4), 
    tags_id UInt32, 
    usage_user Float64 Codec(Gorilla, LZ4), 
    usage_system Float64 Codec(Gorilla, LZ4), 
    usage_idle Float64 Codec(Gorilla, LZ4), 
    usage_nice Float64 Codec(Gorilla, LZ4), 
    usage_iowait Float64 Codec(Gorilla, LZ4), 
    usage_irq Float64 Codec(Gorilla, LZ4), 
    usage_softirq Float64 Codec(Gorilla, LZ4), 
    usage_steal Float64 Codec(Gorilla, LZ4), 
    usage_guest Float64 Codec(Gorilla, LZ4), 
    usage_guest_nice Float64 Codec(Gorilla, LZ4), 
    additional_tags String DEFAULT ''
)
ENGINE = MergeTree(created_date, (tags_id, created_at), 8192);

Apa tikufotokozera codec DoubleDelta nthawi ina, yachiwiri - chiyendayekha, ndipo tidzawonjezeranso LZ4 kukanikiza. Zotsatira zake, kukula kwa data pa disk kumachepetsedwa kwambiri:

Kusamukira ku ClickHouse: Patadutsa zaka 3

Izi zikuwonetsa kuchuluka kwa malo omwe deta yomweyi imatenga, koma kugwiritsa ntchito ma codec ndi ma compression osiyanasiyana:

  • mu fayilo ya GZIP pa disk;
  • mu ClickHouse opanda ma codec, koma ndi ZSTD compression;
  • mu ClickHouse yokhala ndi ma codec ndi compression LZ4 ndi ZSTD.

Zitha kuwoneka kuti matebulo okhala ndi ma codec amatenga malo ochepa.

Nkhani za kukula

Osati zochepa zofunika sankhani mtundu wolondola wa data:

Kusamukira ku ClickHouse: Patadutsa zaka 3

M'zitsanzo zonse pamwambapa ndinagwiritsa ntchito Zoyandama64. Koma ngati tinasankha Zoyandama32, ndiye kuti zingakhale bwinoko. Izi zinasonyezedwa bwino ndi anyamata ochokera ku Perkona m'nkhani yomwe ili pamwambapa. Ndikofunikira kugwiritsa ntchito mtundu wophatikizika kwambiri woyenera ntchitoyi: ngakhale zochepa pakukula kwa disk kuposa kuthamanga kwamafunso. Dinani Nyumba tcheru kwambiri pa izi.

Ngati mungagwiritse ntchito intxnumx mmalo mwa intxnumx, ndiye kuyembekezera kuwonjezeka pafupifupi kawiri pakuchita. Deta imatenga kukumbukira pang'ono, ndipo "masamu" onse amagwira ntchito mofulumira kwambiri. Dinani Nyumba mkati mwake ndi dongosolo losindikizidwa kwambiri; limagwiritsa ntchito kwambiri zotheka zonse zomwe machitidwe amakono amapereka.

Aggregation ndi Maonekedwe athupi

Kuphatikizika ndi mawonedwe opangidwa ndi thupi kumakupatsani mwayi wopanga zophatikiza pazochitika zosiyanasiyana:

Kusamukira ku ClickHouse: Patadutsa zaka 3

Mwachitsanzo, mutha kukhala ndi gwero lomwe silinaphatikizidwe, ndipo mutha kuyika mawonedwe osiyanasiyana owoneka bwino kwa iwo pogwiritsa ntchito injini yapadera. SummingMergeTree (SMT). SMT ndi mtundu wapadera wa data womwe umawerengetsera zophatikiza zokha. Deta yaiwisi imayikidwa mu nkhokwe, imangophatikizidwa, ndipo ma dashboards amatha kugwiritsidwa ntchito pomwepo.

Mtengo wa TTL - "iwalani" data yakale

Momwe "mungayiwala" deta yomwe sikufunikanso? Dinani Nyumba amadziwa kuchita izi. Mukamapanga matebulo, mutha kufotokoza Mtengo wa TTL mawu: mwachitsanzo, kuti timasunga mphindi zochepa za tsiku limodzi, zatsiku ndi tsiku kwa masiku 30, osakhudzanso data ya sabata kapena mwezi uliwonse:

CREATE TABLE aggr_by_minute
…
TTL time + interval 1 day

CREATE TABLE aggr_by_day
…
TTL time + interval 30 day

CREATE TABLE aggr_by_week
…
/* no TTL */

Magulu ambiri - Gawani data pama disks

Kutengera lingaliro ili mopitilira, deta ikhoza kusungidwa mkati Dinani Nyumba m'malo osiyanasiyana. Tiyerekeze kuti tikufuna kusunga deta yotentha sabata yatha pa malo othamanga kwambiri SSD, ndipo timayika zambiri za mbiri yakale kumalo ena. MU Dinani Nyumba izi ndizotheka tsopano:

Kusamukira ku ClickHouse: Patadutsa zaka 3

Mukhoza kukonza ndondomeko yosungirako (ndondomeko yosungirako) Choncho Dinani Nyumba idzasamutsa deta ikafika pazinthu zina kupita ku malo ena osungira.

Koma si zokhazo. Pa mlingo wa tebulo lapadera, mukhoza kufotokozera malamulo enieni pamene deta ikupita kumalo ozizira. Mwachitsanzo, deta imasungidwa pa diski yothamanga kwambiri kwa masiku 7, ndipo chirichonse chomwe chiri chakale chimasamutsidwa pang'onopang'ono. Izi ndizabwino chifukwa zimakupatsani mwayi wosunga dongosolo kuti lizigwira ntchito kwambiri, ndikuwongolera ndalama komanso osawononga ndalama pazida zozizira:

CREATE TABLE 
... 
TTL date + INTERVAL 7 DAY TO VOLUME 'cold_volume', 
    date + INTERVAL 180 DAY DELETE

Zapadera Dinani Nyumba

Pafupifupi chilichonse Dinani Nyumba Pali "zowoneka bwino" zotere, koma zimathetsedwa ndi kudzipatula - china chake chomwe sichipezeka m'ma database ena. Mwachitsanzo, apa pali zina mwapadera Dinani Nyumba:

  • Mipikisano. The Dinani Nyumba chithandizo chabwino kwambiri chamagulu, komanso kuthekera kochita mawerengedwe ovuta pa iwo.
  • Kuphatikiza Zomangamanga za Data. Ichi ndi chimodzi mwa "zigawo zakupha" Dinani Nyumba. Ngakhale kuti anyamata ochokera ku Yandex akunena kuti sitikufuna kusonkhanitsa deta, zonse zimaphatikizidwa. Dinani Nyumba, chifukwa ndi yofulumira komanso yabwino.
  • Mawonedwe Azakuthupi. Pamodzi ndi kuphatikiza ma data, mawonedwe owoneka bwino amakupatsani mwayi wosavuta pompopompo kusonkhanitsa.
  • ClickHouse SQL. Uku ndikowonjezera chilankhulo SQL ndi zina zowonjezera komanso zapadera zomwe zimapezeka mkati Dinani Nyumba. M'mbuyomu, zinali ngati kukulitsa mbali imodzi, ndi kuipa kwina. Tsopano pafupifupi zonse kuipa poyerekeza SQL 92 tinachichotsa, tsopano ndichongowonjezera.
  • Lambda- mawu. Kodi akadali mu database iliyonse?
  • ML-thandizo. Izi zimapezeka m'ma database osiyanasiyana, ena ndi abwino, ena ndi oyipa.
  • Open source. Tikhoza kukula Dinani Nyumba pamodzi. Tsopano mkati Dinani Nyumba pafupifupi 500 opereka, ndipo chiwerengero ichi chikukulirakulirabe.

Mafunso ovuta

В Dinani Nyumba pali njira zambiri zochitira chinthu chomwecho. Mwachitsanzo, mutha kubweza mtengo womaliza kuchokera patebulo m'njira zitatu zosiyanasiyana CPU (palinso wachinayi, koma ndi wachilendo kwambiri).

Yoyamba ikuwonetsa momwe kulili kosavuta kuchita Dinani Nyumba funsani pamene mukufuna kufufuza izo tulo zomwe zili mu subquery. Ichi ndichinthu chomwe ine pandekha ndidachiphonya mu nkhokwe zina. Ngati ndikufuna kufananitsa chinachake ndi subquery, ndiye kuti m'mabuku ena ndi scalar okha omwe angafanane nawo, koma pazigawo zingapo zomwe ndiyenera kulemba. ONANI. The Dinani Nyumba mungagwiritse ntchito tuple:

SELECT *
  FROM cpu 
 WHERE (tags_id, created_at) IN 
    (SELECT tags_id, max(created_at)
        FROM cpu 
        GROUP BY tags_id)

Njira yachiwiri imachita zomwezo koma imagwiritsa ntchito aggregate argMax:

SELECT 
    argMax(usage_user), created_at),
    argMax(usage_system), created_at),
...
 FROM cpu 

В Dinani Nyumba pali ntchito zingapo zophatikizira, ndipo ngati mugwiritsa ntchito zophatikiza, ndiye kuti malinga ndi malamulo a combinatorics mupeza pafupifupi chikwi. ArgMax - imodzi mwa ntchito zomwe zimawerengera mtengo wapamwamba: pempho limabweretsa mtengo usage_user, pomwe mtengo wapamwamba umafikira adalengedwa_pa:

SELECT now() as created_at,
       cpu.*
  FROM (SELECT DISTINCT tags_id from cpu) base 
  ASOF LEFT JOIN cpu USING (tags_id, created_at)

ASOF JOIN - "kumatira" mizere yokhala ndi nthawi zosiyanasiyana. Ichi ndi gawo lapadera la database lomwe limapezeka kokha kdb+. Ngati pali nthawi ziwiri zokhala ndi nthawi zosiyanasiyana, ASOF JOIN amakulolani kuti musunthe ndi kuwaphatikiza mu pempho limodzi. Pa mtengo uliwonse mumndandanda wanthawi imodzi, mtengo woyandikira kwambiri mumzake umapezeka, ndipo amabwezedwa pamzere womwewo:

Kusamukira ku ClickHouse: Patadutsa zaka 3

Analytic Ntchito

Mu muyezo SQL-2003 mukhoza kulemba motere:

SELECT origin,
       timestamp,
       timestamp -LAG(timestamp, 1) OVER (PARTITION BY origin ORDER BY timestamp) AS duration,
       timestamp -MIN(timestamp) OVER (PARTITION BY origin ORDER BY timestamp) AS startseq_duration,
       ROW_NUMBER() OVER (PARTITION BY origin ORDER BY timestamp) AS sequence,
       COUNT() OVER (PARTITION BY origin ORDER BY timestamp) AS nb
  FROM mytable
ORDER BY origin, timestamp;

В Dinani Nyumba Simungathe kuchita izi - sizigwirizana ndi muyezo SQL-2003 ndipo mwina sindidzachita konse. M'malo mwake, mu Dinani Nyumba Ndi chizolowezi kulemba motere:

Kusamukira ku ClickHouse: Patadutsa zaka 3

Ndinalonjeza ma lambdas - awa!

Ichi ndi analogue ya funso analytical mu muyezo SQL-2003: amawerengera kusiyana pakati pa ziwirizi nthawi, nthawi, ordinal number - chilichonse chomwe timakonda kuganizira ntchito zowunikira. MU Dinani Nyumba Timawawerengera kupyolera mumagulu: choyamba timagwetsa deta mumagulu, pambuyo pake timachita zonse zomwe tikufuna pamagulu, ndiyeno timakulitsanso. Sizothandiza kwambiri, zimafunikira chikondi cha mapulogalamu ogwira ntchito pang'ono, koma ndizosinthika kwambiri.

Zapadera

Komanso, mu Dinani Nyumba ntchito zambiri zapadera. Mwachitsanzo, mungadziwe bwanji kuti ndi magawo angati omwe akuchitika panthawi imodzi? Ntchito yowunikira yodziwika bwino ndiyo kudziwa kuchuluka kwa katundu ndi pempho limodzi. MU Dinani Nyumba Pali ntchito yapadera ya cholinga ichi:

Kusamukira ku ClickHouse: Patadutsa zaka 3

Nthawi zambiri, ClickHouse ili ndi ntchito zapadera pazifukwa zambiri:

  • kuthamangaKusiyana, kuthamangaKuunjikani, mnansi;
  • sumMap (kiyi, mtengo);
  • timeSeriesGroupSum (uid, timestamp, mtengo);
  • timeSeriesGroupRateSum(uid, timestamp, mtengo);
  • skewPop, skewSamp, kurtPop, kurtSamp;
  • NDI KUDZAZA / NDI MATAYA;
  • simpleLinearRegression, stochasticLinearRegression.

Izi si mndandanda wathunthu wa ntchito, pali 500-600 onse. Malangizo: ntchito zonse mu Dinani Nyumba ili mu tebulo ladongosolo (osati zonse zolembedwa, koma zonse ndizosangalatsa):

select * from system.functions order by name

Dinani Nyumba imasunga zambiri zokhudza iyokha, kuphatikizapo log tables, query_log, chipika chotsatira, chipika cha ntchito ndi midadada ya data (gawo_logi), chipika cha metrics, ndi chipika chadongosolo, chomwe nthawi zambiri chimalemba ku diski. Log metrics ndi nthawi-mndandanda в Dinani Nyumba Pamenepo Dinani Nyumba: Nawonso database yokha imatha kuchitapo kanthu nthawi-mndandanda database, motero "kudziwononga" palokha.

Kusamukira ku ClickHouse: Patadutsa zaka 3

Ichinso ndi chinthu chapadera - chifukwa timachita ntchito yabwino nthawi-mndandanda, chifukwa chiyani sitingathe kusunga zonse zomwe timafunikira mwa ife tokha? Sitifunikira Prometheus, timasunga zonse mwa ife tokha. Zolumikizidwa grafana ndipo timadziyang'anira tokha. Komabe, ngati Dinani Nyumba kugwa, sitiwona chifukwa chake, kotero nthawi zambiri samachita zimenezo.

Magulu akuluakulu kapena ang'onoang'ono ambiri Dinani Nyumba

Chabwino n'chiti - gulu limodzi lalikulu kapena ang'onoang'ono ClickHouses? Njira yachikhalidwe kuti DWH ndi gulu lalikulu momwe mabwalo amagawidwira ntchito iliyonse. Tinabwera kwa woyang'anira nkhokwe - tipatseni chithunzi, ndipo adatipatsa:

Kusamukira ku ClickHouse: Patadutsa zaka 3

В Dinani Nyumba mukhoza kuchita mosiyana. Mutha kupanga pulogalamu iliyonse kukhala yanu Dinani Nyumba:

Kusamukira ku ClickHouse: Patadutsa zaka 3

Sitikufunanso choyipa chachikulu DWH ndi ma admin osadziwika. Titha kupatsa pulogalamu iliyonse yake Dinani Nyumba, ndipo woyambitsa akhoza kuchita yekha, popeza Dinani Nyumba zosavuta kukhazikitsa ndipo sizifuna zovuta makonzedwe:

Kusamukira ku ClickHouse: Patadutsa zaka 3

Koma ngati tili ndi zambiri Dinani Nyumba, ndipo muyenera kuyiyika nthawi zambiri, ndiye mukufuna kupanga izi. Kwa ichi tikhoza, mwachitsanzo, kugwiritsa ntchito Kubernetes и clickhouse- woyendetsa. MU Kubernetes ClickHouse mutha kuyiyika "pa-click": Nditha kudina batani, kuyendetsa chiwonetserocho ndipo database yakonzeka. Nditha kupanga chojambula nthawi yomweyo, ndikuyamba kuyika ma metric pamenepo, ndipo pakadutsa mphindi 5 ndikhala ndi bolodi yokonzeka. grafana. Ndizosavuta!

Cholinga chake ndi chiyani?

Ndipo kotero, Dinani Nyumba - Izi:

  • Mwamsanga. Aliyense amadziwa izi.
  • Mwachidule. Zotsutsana pang'ono, koma ndikukhulupirira kuti ndizovuta pamaphunziro, zosavuta kumenya nkhondo. Ngati mukumvetsa bwanji Dinani Nyumba zimagwira ntchito, ndiye zonse ndizosavuta.
  • Padziko lonse lapansi. Ndizoyenera zochitika zosiyanasiyana: DWH, Time Series, Log Storage. Koma sichoncho OLTP database, kotero musayese kuyika mwachidule ndikuwerenga pamenepo.
  • Zosangalatsa. Mwinamwake amene amagwira naye ntchito Dinani Nyumba, adakumana ndi zochitika zambiri zosangalatsa m'lingaliro labwino ndi loipa. Mwachitsanzo, kumasulidwa kwatsopano kunatuluka, chirichonse chinasiya kugwira ntchito. Kapena pamene mudavutika ndi ntchito kwa masiku awiri, koma mutafunsa funso mu macheza a Telegalamu, ntchitoyi inathetsedwa mu mphindi ziwiri. Kapena monga pamsonkhano pa lipoti la Lesha Milovidov, chithunzi chochokera Dinani Nyumba adaphwanya kuwulutsa Kuthamanga Kwambiri ++. Zoterezi zimachitika nthawi zonse ndipo zimapangitsa moyo wathu kukhala wovuta. Dinani Nyumba chowala komanso chosangalatsa!

Mutha kuwonera chiwonetserochi apa.

Kusamukira ku ClickHouse: Patadutsa zaka 3

Msonkhano womwe ukuyembekezeredwa kwa nthawi yayitali wa opanga machitidwe olemetsa kwambiri pa Kuthamanga Kwambiri ++ zidzachitika pa November 9 ndi 10 ku Skolkovo. Pomaliza, uwu ukhala msonkhano wapaintaneti (ngakhale ndi njira zonse zodzitetezera), popeza mphamvu za HighLoad ++ sizingapakidwe pa intaneti.

Pamsonkhanowu, timapeza ndikukuwonetsani milandu yokhudza luso lapamwamba laukadaulo: HighLoad ++ inali, ndipo idzakhala malo okhawo omwe mungaphunzire masiku awiri momwe Facebook, Yandex, VKontakte, Google ndi Amazon zimagwirira ntchito.

Popeza tachita misonkhano yathu popanda zododometsa chiyambire 2007, chaka chino tidzakumana kwa nthawi ya 14. Panthawiyi, msonkhanowu wakula nthawi za 10; chaka chatha, chochitika chachikulu chamakampani chinasonkhanitsa anthu 3339, okamba 165, malipoti ndi misonkhano, ndi nyimbo za 16 zinali kuyenda nthawi imodzi.
Chaka chatha panali mabasi 20, malita 5280 a tiyi ndi khofi, malita 1650 a zakumwa za zipatso ndi mabotolo 10200 amadzi. Ndipo chakudya china cholemera makilogalamu 2640, mbale 16 ndi makapu 000. Mwa njira, ndi ndalama zomwe zidapangidwa kuchokera pamapepala obwezerezedwanso, tidabzala mbande 25 za oak :)

Mutha kugula matikiti apa, pezani nkhani za msonkhanowu - apa, ndikulankhula pama social network onse: uthengawo, Facebook, Vkontakte и Twitter.

Source: www.habr.com

Kuwonjezera ndemanga