Motsawa zuwa ClickHouse: Bayan shekaru 3

Shekaru uku da suka wuce Viktor Tarnavsky da Alexei Milovidov daga Yandex a kan mataki HighLoad++ gaya, yadda kyau ClickHouse yake, da kuma yadda ba ya raguwa. Kuma a mataki na gaba akwai Alexander Zaitsev с rahoto game da motsi zuwa DannaHause daga wani DBMS na nazari kuma tare da ƙarshe cewa DannaHause, ba shakka, mai kyau, amma ba dace sosai ba. Lokacin a cikin 2016 kamfanin LifeStreet, Inda Alexander ya yi aiki, yana canza tsarin nazari mai yawa-petabyte zuwa DannaHauseHanyar bulo mai ban sha'awa ce mai cike da hadurran da ba a sani ba - DannaHause a baya sai ya zama kamar filin nakiya.

Bayan shekaru uku DannaHause ya zama mafi kyau - a wannan lokacin Alexander ya kafa kamfanin Altinity, wanda ba wai kawai yana taimakawa mutane su matsa zuwa ba DannaHause da dama na ayyuka, amma kuma inganta samfurin kanta tare da abokan aiki daga Yandex. Yanzu DannaHause har yanzu ba yawon shakatawa ba ne, amma ba filin naki ba.

Alexander yana aiki tare da tsarin rarraba tun 2003, yana haɓaka manyan ayyuka akan MySQL, Oracle и Vertica. A karshe HighLoad++ 2019 Alexander, daya daga cikin majagaba na amfani DannaHause, ya gaya abin da wannan DBMS yake yanzu. Za mu koyi game da manyan siffofi DannaHause: yadda ya bambanta da sauran tsarin kuma a cikin waɗanne lokuta ya fi tasiri don amfani da shi. Yin amfani da misalai, za mu dubi kwanan nan da ayyukan da aka gwada aikin don tsarin gine-gine bisa DannaHause.


Juya baya: abin da ya faru shekaru 3 da suka gabata

Shekaru uku da suka wuce mun canza kamfanin LifeStreet a kan DannaHause daga wani bayanan nazari, kuma ƙaura na nazarin hanyar sadarwar talla yayi kama da haka:

  • Yuni 2016. In BugunBayan bayyana DannaHause kuma aikin mu ya fara;
  • Agusta. Tabbacin Ra'ayi: babban cibiyar sadarwar talla, kayan aiki da terabytes na 200-300 na bayanai;
  • Oktoba. Bayanan samarwa na farko;
  • Disamba. Cikakken nauyin samfurin shine abubuwan da suka faru biliyan 10-50 kowace rana.
  • Yuni 2017. Nasarar ƙaura na masu amfani zuwa DannaHause, 2,5 petabytes na bayanai a kan gungu na sabobin 60.

A lokacin aikin ƙaura, an sami fahimtar cewa DannaHause Kyakkyawan tsarin da ke da daɗin yin aiki da shi, amma wannan aikin cikin gida ne na Yandex. Saboda haka, akwai nuances: Yandex zai fara hulɗa da abokan cinikinsa na ciki sannan kuma tare da al'umma da bukatun masu amfani da waje, kuma ClickHouse bai kai matakin kasuwanci ba a yawancin wuraren aiki. Shi ya sa muka kafa Altinity a cikin Maris 2017 don yin DannaHause ko da sauri kuma mafi dacewa ba kawai don Yandex ba, har ma ga sauran masu amfani. Kuma yanzu mu:

  • Muna horarwa da taimakawa wajen gina mafita bisa ga DannaHause don kada kwastomomi su shiga cikin matsala, kuma a karshe mafita ta yi aiki;
  • Muna ba da tallafi na 24/7 DannaHause- shigarwa;
  • Muna haɓaka ayyukan mu na muhalli;
  • Mun himmatu ga kanmu DannaHause, amsa buƙatun daga masu amfani waɗanda suke son ganin wasu fasaloli.

Kuma ba shakka, muna taimakawa tare da motsi zuwa DannaHause с MySQL, Vertica, Oracle, Greenplum, Redshift da sauran tsarin. Mun shiga cikin motsi iri-iri, kuma duk sun yi nasara.

Motsawa zuwa ClickHouse: Bayan shekaru 3

Me yasa matsa zuwa DannaHause

Baya rage gudu! Wannan shi ne babban dalili. DannaHause - bayanai masu sauri don yanayi daban-daban:

Motsawa zuwa ClickHouse: Bayan shekaru 3

Kalmomin bazuwar mutanen da suka daɗe suna aiki tare da mutane DannaHause.

Ƙimar ƙarfi. A kan wasu bayanan za ku iya cimma kyakkyawan aiki akan kayan aikin guda ɗaya, amma DannaHause Kuna iya sikelin ba kawai a tsaye ba, har ma a kwance, ta hanyar ƙara sabobin. Komai baya aiki da kyau kamar yadda muke so, amma yana aiki. Kuna iya faɗaɗa tsarin yayin da kasuwancin ku ke girma. Yana da mahimmanci kada a iyakance mu da mafita a yanzu kuma koyaushe akwai yuwuwar ci gaba.

Abun iya ɗauka. Babu abin da aka makala ga abu daya. Misali, tare da Redshift na Amazon Yana da wuya a matsa wani wuri. A DannaHause za ka iya shigar da shi a kan kwamfutar tafi-da-gidanka, uwar garken, tura shi zuwa ga girgije, je zuwa Kubernetes - babu ƙuntatawa akan aikin kayan aikin. Wannan ya dace ga kowa da kowa, kuma wannan babban fa'ida ne wanda yawancin sauran bayanan bayanai iri ɗaya ba za su iya yin alfahari da su ba.

Sassauci. DannaHause bai tsaya a abu ɗaya ba, alal misali, Yandex.Metrica, amma yana tasowa kuma ana amfani dashi a cikin ayyuka da masana'antu daban-daban. Ana iya faɗaɗa shi ta hanyar ƙara sabbin damar magance sabbin matsaloli. Misali, an yi imanin cewa adana gundumomi a cikin ma’ajiyar bayanai, munanan halaye ne, don haka suka fito da su Elasticsearch. Amma godiya ga sassauci DannaHause, Hakanan zaka iya adana rajistan ayyukan a ciki, kuma sau da yawa wannan ya fi kyau a ciki Elasticsearch - a cikin DannaHause wannan yana buƙatar ƙarancin ƙarfe sau 10.

Kyauta Open Source. Ba sai ka biya komai ba. Babu buƙatar yin shawarwari don shigar da tsarin a kwamfutar tafi-da-gidanka ko uwar garken ku. Babu boye kudade. A lokaci guda, babu wata fasaha ta Bude Source da za ta iya yin gogayya da sauri DannaHause. MySQL, MariaDB, Greenplum - duk sun yi hankali sosai.

Al'umma, tuƙi da fun. A DannaHause kyakkyawan al'umma: haduwa, tattaunawa da Alexey Milovidov, wanda ke tuhumar mu duka da kuzarinsa da kyakkyawan fata.

Motsawa zuwa ClickHouse

Don zuwa DannaHause saboda wasu dalilai, kawai kuna buƙatar abubuwa uku:

  • Fahimtar iyakoki DannaHause da abin da bai dace da shi ba.
  • Yi amfani fasaha da mafi girman karfinta.
  • Gwaji. Ko da fahimtar yadda yake aiki DannaHause, ba koyaushe yana yiwuwa a faɗi lokacin da zai yi sauri ba, lokacin da zai yi hankali, lokacin da zai fi kyau, da lokacin da zai fi muni. Don haka gwada shi.

Matsalar motsi

Akwai "amma" ɗaya kawai: idan kun matsa zuwa DannaHause daga wani abu dabam, to yawanci wani abu yana faruwa ba daidai ba. Mun saba da wasu ayyuka da abubuwan da ke aiki a cikin bayanan da muka fi so. Misali, duk wanda ke aiki da shi SQL-databases suna la'akari da saitin ayyuka na wajibi:

  • ma'amaloli;
  • ƙuntatawa;
  • daidaito;
  • alamomi;
  • KYAUTA/SHARE;
  • NULLs;
  • millise seconds;
  • nau'in simintin gyare-gyare na atomatik;
  • haɗuwa da yawa;
  • bangare na sabani;
  • cluster management kayan aikin.

Daukar ma'aikata ya zama tilas, amma shekaru uku da suka wuce DannaHause Babu ɗayan waɗannan ayyukan da aka samu! Yanzu kasa da rabin abin da ba a aiwatar da shi ya ragu: ma'amaloli, takurawa, daidaito, milliseconds da nau'in simintin gyare-gyare.

Kuma babban abu shine a cikin DannaHause wasu daidaitattun ayyuka da hanyoyin ba sa aiki ko aiki daban fiye da yadda muka saba. Duk abin da ya bayyana a ciki DannaHause, yayi daidai da"Hanyar ClickHouse", i.e. ayyuka sun bambanta da sauran bayanan bayanai. Misali:

  • Ba a zaɓi fihirisa ba, amma an tsallake su.
  • KYAUTA/SHARE ba synchronous ba, amma asynchronous.
  • Akwai ƙungiyoyi masu yawa, amma babu mai tsara tambaya. Yadda ake yin su gabaɗaya bai bayyana sosai ga mutane daga duniyar bayanai ba.

Rubutun ClickHouse

A cikin 1960, wani Ba'amurke mathematician asalin Hungarian Wigner EP ya rubuta labarin"Amfanin lissafi mara ma'ana a cikin ilimin kimiyyar halitta"("The Incomprehensible Effectiveness of Mathematics in the Natural Sciences") cewa duniya da ke kewaye da mu saboda wasu dalilai ne da dokokin lissafi suka kwatanta. Ilimin lissafi kimiyya ce mai ƙima, kuma dokokin zahiri da aka bayyana a sigar lissafi ba ƙaramin abu bane, kuma Wigner EP ya jaddada cewa wannan abin mamaki ne.

Daga ra'ayi na, DannaHause - wannan bakon. Don sake sake fasalin Wigner, zamu iya faɗi haka: ƙwarewar da ba za a iya tunani ba abin mamaki ne. DannaHause a cikin aikace-aikacen nazari iri-iri!

Motsawa zuwa ClickHouse: Bayan shekaru 3

Misali, bari mu dauka Real-Time Data Warehouse, wanda kusan ci gaba da loda bayanai a ciki. Muna son karɓar buƙatun daga gare ta tare da jinkiri na biyu. Don Allah - amfani da shi DannaHause, saboda wannan shine yanayin da aka tsara shi. DannaHause wannan shi ne daidai yadda ake amfani da shi ba kawai akan yanar gizo ba, har ma a cikin tallace-tallace da nazarin kudi, AdTech, kuma in Gano zamban. IN Real-time Data Warehouse ana amfani da wani hadadden tsari kamar "tauraro" ko "snowflake", tebur da yawa tare da JIIN (wani lokaci ma yawa), kuma yawanci ana adana bayanan kuma ana canza su a wasu tsarin.

Bari mu dauki wani labari - Jerin Lokaci: saka idanu na na'urori, cibiyoyin sadarwa, kididdigar amfani, Intanet na Abubuwa. Anan mun gamu da sauƙaƙan abubuwan da aka ba da umarni cikin lokaci. DannaHause Ba a samo asali ne don wannan dalili ba, amma ya nuna kansa yana aiki sosai, wanda shine dalilin da ya sa manyan kamfanoni ke amfani da su DannaHause a matsayin wurin ajiyar bayanai don sa ido. Don bincika ko ya dace DannaHause don jerin lokaci, mun yi ma'auni dangane da hanya da sakamako InfluxDB и LokaciDB - na musamman jerin lokaci bayanan bayanai. Ya juya, cewa DannaHause, ko da ba tare da ingantawa ga irin waɗannan ayyuka ba, yana cin nasara a filin waje:

Motsawa zuwa ClickHouse: Bayan shekaru 3

В jerin lokaci Yawancin lokaci ana amfani da kunkuntar tebur - ƙananan ginshiƙai da yawa. Yawancin bayanai na iya fitowa daga saka idanu-miliyoyin bayanai a sakan daya-kuma yawanci suna zuwa cikin ƙananan fashe (real-lokaci yawo). Don haka, ana buƙatar rubutun shigar daban, kuma tambayoyin da kansu suna da nasu ƙayyadaddun bayanai.

Gudanar da Log. Tattara rajistan ayyukan cikin ma'ajin bayanai yawanci mara kyau ne, amma DannaHause ana iya yin hakan da wasu sharhi kamar yadda aka bayyana a sama. Kamfanoni da yawa suna amfani da su DannaHause daidai don wannan dalili. A wannan yanayin, muna amfani da tebur mai faɗi mai faɗi inda muke adana duk rajistan ayyukan (misali, a cikin tsari JSON), ko a yanka gunduwa-gunduwa. Yawancin lokaci ana loda bayanai a cikin manyan batches (files), kuma muna bincika ta wasu filayen.

Ga kowane ɗayan waɗannan ayyuka, galibi ana amfani da bayanan bayanai na musamman. DannaHause mutum zai iya yin komai da kyau har ya fi su. Bari yanzu mu duba a hankali jerin lokaci labari, da yadda ake "dafa" daidai DannaHause ga wannan yanayin.

Lokaci-Series

A halin yanzu wannan shine babban yanayin wanda DannaHause dauke da misali bayani. Jerin-lokaci saitin abubuwan da aka yi oda a cikin lokaci, wakiltar canje-canje a wasu tsari akan lokaci. Misali, wannan na iya zama bugun zuciya a kowace rana ko adadin matakai a cikin tsarin. Duk abin da ke ba da lokaci ticks tare da wasu girma shine jerin lokaci:

Motsawa zuwa ClickHouse: Bayan shekaru 3

Yawancin ire-iren waɗannan abubuwan sun fito ne daga sa ido. Wannan na iya zama ba kawai saka idanu akan yanar gizo ba, har ma da na'urori na gaske: motoci, tsarin masana'antu, IoT, masana'antu ko tasi marasa matuki, a cikin akwati wanda Yandex ya riga ya sa DannaHause- uwar garken.

Misali, akwai kamfanoni masu tattara bayanai daga jiragen ruwa. Kowane daƙiƙa kaɗan, na'urori masu auna firikwensin da ke kan jirgin ruwa suna aika ɗaruruwan ma'auni daban-daban. Injiniyoyin suna nazarin su, suna gina samfura kuma suna ƙoƙarin fahimtar yadda ake amfani da jirgin yadda ya kamata, domin jirgin ruwa bai kamata ya kasance yana aiki ko da daƙiƙa guda ba. Duk wani lokacin raguwa shine asarar kuɗi, don haka yana da mahimmanci a yi la'akari da hanya don dakatarwa ya kasance kadan.

A zamanin yau akwai haɓakar bayanai na musamman waɗanda ke aunawa jerin lokaci. A shafin DB-Injiniyoyi Mabambantan bayanan bayanai suna ko ta yaya, kuma kuna iya duba su ta nau'in:

Motsawa zuwa ClickHouse: Bayan shekaru 3

Mafi saurin girma nau'in shine jerin lokacis. Rubutun bayanan hoto kuma suna girma, amma jerin lokacis yana girma cikin sauri cikin ƴan shekarun da suka gabata. Yawancin wakilan wannan dangin na bayanan bayanai sune InfluxDB, Prometheus, KDB, LokaciDB (gina akan PostgreSQL), mafita daga Amazon. DannaHause ana iya amfani dashi a nan kuma, kuma ana amfani dashi. Bari in baku wasu misalai na jama'a.

Ɗaya daga cikin majagaba shine kamfani CloudFlare (CDN-mai bayarwa). Suna lura da su CDN ta hanyar DannaHause (DNS- buƙatun, HTTP-queries) tare da babban kaya - abubuwan da suka faru miliyan 6 a sakan daya. Komai yana wucewa Kafka, tafi zuwa DannaHause, wanda ke ba da damar ganin dashboards na abubuwan da ke faruwa a cikin tsarin a ainihin lokacin.

Comcast - daya daga cikin jagororin sadarwa a cikin Amurka: Intanet, talabijin na dijital, wayar tarho. Sun kirkiro tsarin sarrafawa irin wannan CDN a cikin tsarin Open Source aikin Ikon Traffic Apache don aiki tare da manyan bayanan ku. DannaHause ana amfani da shi azaman baya don nazari.

perkona gina a DannaHause cikin ku PMMdon adana saka idanu na daban-daban MySQL.

Takamaiman Bukatu

Ma'ajin bayanai na lokaci-lokaci suna da takamaiman bukatunsu.

  • Saurin shigarwa daga wakilai da yawa. Dole ne mu saka bayanai daga magudanan ruwa da yawa cikin sauri. DannaHause Yana yin wannan da kyau saboda duk abubuwan da aka sanyawa ba su toshewa. Kowa saka sabon fayil ne akan faifai, kuma ana iya adana ƙananan abubuwan da aka saka ta hanya ɗaya ko wata. IN DannaHause Yana da kyau a saka bayanai a cikin manyan batches maimakon layi ɗaya a lokaci guda.
  • Tsarin sassauƙa. A jerin lokaci yawanci ba mu san tsarin bayanan gaba daya ba. Yana yiwuwa a gina tsarin kulawa don takamaiman aikace-aikacen, amma yana da wuya a yi amfani da shi don wani aikace-aikacen. Wannan yana buƙatar tsari mai sassauƙa. DannaHause, yana ba ku damar yin wannan, kodayake yana da tushe mai ƙarfi da aka buga.
  • Ingantacciyar ajiya da manta bayanai. Yawancin lokaci a cikin jerin lokaci babban adadin bayanai, don haka dole ne a adana shi yadda ya kamata. Misali, a InfluxDB matsi mai kyau shine babban fasalinsa. Amma ban da adanawa, kuna buƙatar samun damar "manta" tsoffin bayanai da yin wasu nau'ikan downsampling - kirgawa ta atomatik na aggregates.
  • Tambayoyi masu sauri akan tattara bayanai. Wani lokaci yana da ban sha'awa duban mintuna 5 na ƙarshe tare da daidaito na millise seconds, amma a kan bayanan kowane wata ko na biyu ba za a iya buƙata ba - ƙididdiga na gaba ɗaya sun isa. Goyon bayan irin wannan wajibi ne, in ba haka ba buƙatar watanni 3 zai ɗauki lokaci mai tsawo don kammala har ma a ciki DannaHause.
  • Bukatu kamar"batu na karshe, kamar na». Waɗannan su ne na hali don jerin lokaci tambayoyi: duba ma'auni na ƙarshe ko yanayin tsarin a ɗan lokaci kaɗan t. Waɗannan tambayoyin ba su da daɗi sosai don bayanan bayanai, amma kuma kuna buƙatar samun damar yin su.
  • jerin lokaci "Manne".. Jerin-lokaci jerin lokaci ne. Idan akwai jerin lokaci guda biyu, galibi suna buƙatar haɗawa da alaƙa. Ba shi da dacewa don yin wannan akan duk bayanan bayanai, musamman tare da jerin lokutan da ba a haɗa su ba: a nan akwai wasu lokutan lokaci, akwai wasu. Kuna iya la'akari da matsakaici, amma ba zato ba tsammani za a sami rami a can, don haka ba a bayyana ba.

Bari mu ga yadda waɗannan buƙatun suka cika DannaHause.

Makircin

В DannaHause tsari don jerin lokaci za a iya yi ta hanyoyi daban-daban, dangane da matakin na yau da kullum na bayanai. Yana yiwuwa a gina tsarin akan bayanai na yau da kullum lokacin da muka san duk ma'auni a gaba. Misali, na yi wannan CloudFlare tare da saka idanu CDN shi ne ingantaccen tsarin. Kuna iya gina ƙarin tsarin gaba ɗaya wanda ke sa ido kan duk abubuwan more rayuwa da ayyuka daban-daban. Game da bayanan da ba daidai ba, ba mu san abin da muke sa ido a gaba ba - kuma wannan shi ne mai yiwuwa lamarin ya fi kowa.

Bayanai na yau da kullun. ginshiƙai. Tsarin yana da sauƙi - ginshiƙai tare da nau'ikan da ake buƙata:

CREATE TABLE cpu (
  created_date Date DEFAULT today(),  
  created_at DateTime DEFAULT now(),  
  time String,  
  tags_id UInt32,  /* join to dim_tag */
  usage_user Float64,  
  usage_system Float64,  
  usage_idle Float64,  
  usage_nice Float64,  
  usage_iowait Float64,  
  usage_irq Float64,  
  usage_softirq Float64,  
  usage_steal Float64,  
  usage_guest Float64,  
  usage_guest_nice Float64
) ENGINE = MergeTree(created_date, (tags_id, created_at), 8192);

Wannan tebur ne na yau da kullun wanda ke lura da wasu nau'ikan ayyukan loda tsarin (mai amfani, tsarin, malalaci, nice). Mai sauƙi kuma mai dacewa, amma ba m. Idan muna son tsari mai sassauƙa, to zamu iya amfani da tsararru.

Bayanan da ba daidai ba. Tsare-tsare:

CREATE TABLE cpu_alc (
  created_date Date,  
  created_at DateTime,  
  time String,  
  tags_id UInt32,  
  metrics Nested(
    name LowCardinality(String),  
    value Float64
  )
) ENGINE = MergeTree(created_date, (tags_id, created_at), 8192);

SELECT max(metrics.value[indexOf(metrics.name,'usage_user')]) FROM ...

tsarin Gida guda biyu ne: awo.suna и awo.darajar. Anan zaku iya adana irin waɗannan bayanan sa ido na sabani azaman jerin sunaye da ma'auni na kowane taron. Don ƙarin haɓakawa, maimakon ɗaya irin wannan tsarin, zaku iya yin da yawa. Misali, daya don taso kan ruwa-daraja, wani - don int-ma'ana saboda int Ina so in adana da inganci.

Amma irin wannan tsarin ya fi wuya a shiga. Dole ne ku yi amfani da gini na musamman, ta amfani da ayyuka na musamman don fitar da ƙimar da farko fihirisa sannan kuma tsararru:

SELECT max(metrics.value[indexOf(metrics.name,'usage_user')]) FROM ...

Amma har yanzu yana aiki da sauri. Wata hanyar adana bayanan da ba ta dace ba ita ce ta layi.

Bayanan da ba daidai ba. igiyoyi. A cikin wannan hanyar gargajiya, ba tare da tsararru ba, ana adana sunaye da ƙimar lokaci guda. Idan ma'aunai 5 sun fito daga na'ura ɗaya lokaci ɗaya, ana samar da layuka 000 a cikin bayanan:

CREATE TABLE cpu_rlc (
  created_date Date,  
  created_at DateTime,  
  time String,  
  tags_id UInt32,  
  metric_name LowCardinality(String),  
  metric_value Float64
) ENGINE = MergeTree(created_date, (metric_name, tags_id, created_at), 8192);


SELECT 
    maxIf(metric_value, metric_name = 'usage_user'),
    ... 
FROM cpu_r
WHERE metric_name IN ('usage_user', ...)

DannaHause jimre wa wannan - yana da kari na musamman DannaHause SQL. Misali maxIdan - aiki na musamman wanda ke ƙididdige matsakaicin ta ma'auni lokacin da wasu yanayi suka cika. Kuna iya rubuta irin waɗannan maganganu da yawa a cikin buƙatu ɗaya kuma nan da nan ƙididdige ƙimar ma'auni da yawa.

Bari mu kwatanta hanyoyi guda uku:

Motsawa zuwa ClickHouse: Bayan shekaru 3

Детали

Anan na kara "Disk Data Size" don wasu saitin bayanan gwaji. A cikin yanayin ginshiƙai, muna da mafi ƙarancin girman bayanai: matsakaicin matsawa, matsakaicin saurin tambaya, amma muna biya ta yin rikodin komai a lokaci ɗaya.

A cikin yanayin tsararraki, komai ya ɗan yi muni. Har yanzu bayanan suna cike da kyau kuma ana iya adana tsarin da bai dace ba. Amma DannaHause - database na columnar, kuma lokacin da muka fara adana duk abin da ke cikin tsararru, ya juya zuwa jere ɗaya, kuma muna biya don sassauci tare da inganci. Ga kowane aiki, dole ne ka karanta gabaɗayan jeri zuwa ƙwaƙwalwar ajiya, sannan nemo abin da ake so a ciki - kuma idan tsararrun ya girma, to saurin yana raguwa.

A daya daga cikin kamfanonin da ke amfani da wannan hanya (misali, Uber), an yanke tsararraki zuwa guntu na abubuwa 128. Bayanai daga ma'auni dubu da yawa tare da ƙarar 200 TB na bayanai / rana ana adana su ba a cikin tsararru ɗaya ba, amma a cikin tsararru 10 ko 30 tare da dabaru na musamman na ajiya.

Hanya mafi sauƙi ita ce tare da kirtani. Amma bayanan ba su da kyau sosai, girman tebur yana da girma, kuma ko da lokacin da tambayoyi suka dogara akan awoyi da yawa, ClickHouse baya aiki da kyau.

Hybrid tsarin

Bari mu ɗauka cewa mun zaɓi tsarin da'ira. Amma idan mun san cewa yawancin dashboards ɗin mu suna nuna ma'aunin mai amfani da tsarin kawai, za mu iya ƙara haɓaka waɗannan ma'auni zuwa ginshiƙai daga tsararru a matakin tebur ta wannan hanyar:

CREATE TABLE cpu_alc (
  created_date Date,  
  created_at DateTime,  
  time String,  
  tags_id UInt32,  
  metrics Nested(
    name LowCardinality(String),  
    value Float64
  ),
  usage_user Float64 
             MATERIALIZED metrics.value[indexOf(metrics.name,'usage_user')],
  usage_system Float64 
             MATERIALIZED metrics.value[indexOf(metrics.name,'usage_system')]
) ENGINE = MergeTree(created_date, (tags_id, created_at), 8192);

Lokacin sakawa DannaHause za ta kirga su kai tsaye. Ta wannan hanyar zaku iya haɗa kasuwanci tare da jin daɗi: makircin yana da sassauƙa kuma gabaɗaya, amma mun fitar da ginshiƙan da aka fi amfani da su akai-akai. Lura cewa wannan baya buƙatar canza saka da ETLwanda ke ci gaba da saka tsararru a cikin tebur. Mun yi kawai SHE TABLE, ya kara da ma'aurata masu magana kuma mun sami tsarin matasan da sauri wanda za ku iya fara amfani da shi nan da nan.

Codecs da matsawa

domin jerin lokaci Yana da mahimmanci yadda kuke tattara bayanan da kyau saboda adadin bayanai na iya zama babba. IN DannaHause Akwai saitin kayan aikin don cimma tasirin matsawa na 1:10, 1:20, kuma wani lokacin ƙari. Wannan yana nufin cewa 1 TB na bayanan da ba a tattara ba akan faifai yana ɗaukar 50-100 GB. Ƙananan girman yana da kyau, ana iya karanta bayanai da sauri da sauri.

Don cimma babban matakin matsawa, DannaHause yana goyan bayan codecs masu zuwa:

Motsawa zuwa ClickHouse: Bayan shekaru 3

Misalin tebur:

CREATE TABLE benchmark.cpu_codecs_lz4 (
    created_date Date DEFAULT today(), 
    created_at DateTime DEFAULT now() Codec(DoubleDelta, LZ4), 
    tags_id UInt32, 
    usage_user Float64 Codec(Gorilla, LZ4), 
    usage_system Float64 Codec(Gorilla, LZ4), 
    usage_idle Float64 Codec(Gorilla, LZ4), 
    usage_nice Float64 Codec(Gorilla, LZ4), 
    usage_iowait Float64 Codec(Gorilla, LZ4), 
    usage_irq Float64 Codec(Gorilla, LZ4), 
    usage_softirq Float64 Codec(Gorilla, LZ4), 
    usage_steal Float64 Codec(Gorilla, LZ4), 
    usage_guest Float64 Codec(Gorilla, LZ4), 
    usage_guest_nice Float64 Codec(Gorilla, LZ4), 
    additional_tags String DEFAULT ''
)
ENGINE = MergeTree(created_date, (tags_id, created_at), 8192);

Anan mun ayyana codec DoubleDelta a wani hali, a cikin na biyu - Gorilla, kuma tabbas za mu ƙara ƙarin LZ4 matsawa. Sakamakon haka, girman bayanan akan faifai yana raguwa sosai:

Motsawa zuwa ClickHouse: Bayan shekaru 3

Wannan yana nuna adadin sararin da bayanai ɗaya ke ɗauka, amma ta amfani da codecs daban-daban da matsawa:

  • a cikin fayil ɗin GZIP akan faifai;
  • a cikin ClickHouse ba tare da codecs ba, amma tare da matsawa ZSTD;
  • a ClickHouse tare da codecs da matsawa LZ4 da ZSTD.

Ana iya ganin cewa teburi masu codecs suna ɗaukar sarari kaɗan.

Girman al'amura

Ba ƙaramin mahimmanci ba zabi daidai nau'in bayanai:

Motsawa zuwa ClickHouse: Bayan shekaru 3

A cikin duk misalan da na sama na yi amfani da su Tafiya64. Amma idan muka zaba Tafiya32, to hakan zai fi kyau. Mutanen Perkona sun nuna wannan da kyau a cikin labarin da aka haɗa a sama. Yana da mahimmanci a yi amfani da mafi ƙarancin nau'in da ya dace da aikin: ko da ƙasa don girman faifai fiye da saurin tambaya. DannaHause sosai m ga wannan.

Idan zaka iya amfani zan 32 maimakon zan 64, sa'an nan kuma yi tsammanin haɓakar aiki kusan ninki biyu. Bayanan yana ɗaukar ƙananan ƙwaƙwalwar ajiya, kuma duk "lissafi" yana aiki da sauri. DannaHause A ciki tsari ne mai tsauri sosai, yana yin amfani da duk damar da tsarin zamani ke bayarwa.

Tari da Materialized Views

Tari da ra'ayi na zahiri suna ba ku damar ƙirƙirar tari don lokuta daban-daban:

Motsawa zuwa ClickHouse: Bayan shekaru 3

Misali, kuna iya samun bayanan tushen da ba a tara ba, kuma kuna iya haɗa ra'ayoyi daban-daban na zahiri zuwa gare su tare da taƙaitawa ta atomatik ta injin na musamman. SummingMergeTree (SMT). SMT tsarin tattara bayanai ne na musamman wanda ke ƙididdige tara ta atomatik. Ana shigar da danyen bayanai a cikin ma'ajin bayanai, ana tattara su ta atomatik, kuma ana iya amfani da dashboards a kai tsaye.

TTL - "manta" tsohon bayanai

Yadda za a "manta" bayanan da ba a buƙata? DannaHause ya san yadda ake yin wannan. Lokacin ƙirƙirar tebur, zaku iya tantancewa TTL maganganu: misali, cewa muna adana bayanan mintuna na kwana ɗaya, bayanan yau da kullun na kwanaki 30, kuma kada mu taɓa bayanan sati ko kowane wata:

CREATE TABLE aggr_by_minute
…
TTL time + interval 1 day

CREATE TABLE aggr_by_day
…
TTL time + interval 30 day

CREATE TABLE aggr_by_week
…
/* no TTL */

Multi-mataki - raba bayanai a cikin faifai

Ƙara wannan ra'ayin, ana iya adana bayanai a ciki DannaHause a wurare daban-daban. A ce muna son adana bayanai masu zafi don makon da ya gabata akan gida mai sauri SSD, kuma mun sanya ƙarin bayanan tarihi a wani wuri. IN DannaHause wannan yana yiwuwa yanzu:

Motsawa zuwa ClickHouse: Bayan shekaru 3

Kuna iya saita manufofin ajiya (tsarin ajiya) Don haka DannaHause za ta atomatik canja wurin bayanai a kan isa wasu sharudda zuwa wani ajiya.

Amma ba haka kawai ba. A matakin takamaiman tebur, zaku iya ayyana dokoki don daidai lokacin da bayanan ke shiga cikin ajiyar sanyi. Misali, ana adana bayanai a kan faifai mai sauri na tsawon kwanaki 7, kuma duk abin da ya tsufa ana canza shi zuwa mai hankali. Wannan yana da kyau saboda yana ba da damar tsarin don ci gaba da aiki mafi girma, yayin sarrafa farashi kuma ba ɓata kuɗi akan bayanan sanyi:

CREATE TABLE 
... 
TTL date + INTERVAL 7 DAY TO VOLUME 'cold_volume', 
    date + INTERVAL 180 DAY DELETE

Dama na musamman DannaHause

A kusan komai DannaHause Akwai irin waɗannan "mahimman bayanai", amma an cire su ta hanyar keɓancewa - wani abu da ba ya cikin sauran bayanan bayanai. Misali, ga wasu daga cikin abubuwan musamman DannaHause:

  • Tsare-tsare. A DannaHause goyon baya mai kyau ga tsararru, da kuma ikon yin ƙididdiga masu rikitarwa akan su.
  • Haɗa Tsarin Bayanai. Wannan yana ɗaya daga cikin "sifofin kisa" DannaHause. Duk da cewa mutanen daga Yandex sun ce ba mu so mu tara bayanai, duk abin da aka tara a ciki. DannaHause, saboda yana da sauri da dacewa.
  • Materialized Views. Tare da haɗa tsarin bayanai, ra'ayi na zahiri yana ba ku damar yin dacewa real-lokaci tarawa.
  • ClickHouse SQL. Wannan karin harshe ne SQL tare da wasu ƙarin fasalulluka da keɓaɓɓu waɗanda ake samu kawai a ciki DannaHause. A baya can, ya kasance kamar haɓakawa a gefe ɗaya, kuma rashin lahani a ɗayan. Yanzu kusan duk rashin amfani idan aka kwatanta da Farashin SQL92 mun cire shi, yanzu kari ne kawai.
  • lambda–bayani. Har yanzu suna cikin kowace rumbun adana bayanai?
  • ML-taimako. Ana samun wannan a cikin maballin bayanai daban-daban, wasu sun fi kyau, wasu sun fi muni.
  • bude tushen. Za mu iya fadada DannaHause tare. Yanzu a DannaHause kusan masu ba da gudummawa 500, kuma wannan adadin yana ƙaruwa koyaushe.

Tambayoyi masu rikitarwa

В DannaHause akwai hanyoyi daban-daban don yin abu ɗaya. Misali, akwai hanyoyi daban-daban guda uku don dawo da ƙimar ƙarshe daga tebur don CPU (akwai na hudu, amma ya fi ban mamaki).

Na farko yana nuna yadda ya dace a yi a ciki DannaHause tambayoyin lokacin da kake son duba hakan cikakke kunshe a cikin subquery. Wannan wani abu ne da ni da kaina na rasa a cikin sauran ma'ajin bayanai. Idan ina so in kwatanta wani abu tare da subquery, to a cikin sauran bayanan bayanai kawai za a iya kwatanta scalar da shi, amma ga ginshiƙai da yawa ina buƙatar rubutawa. JIIN. A DannaHause zaka iya amfani da tuple:

SELECT *
  FROM cpu 
 WHERE (tags_id, created_at) IN 
    (SELECT tags_id, max(created_at)
        FROM cpu 
        GROUP BY tags_id)

Hanya na biyu yana yin abu iri ɗaya amma yana amfani da aikin tarawa argMax:

SELECT 
    argMax(usage_user), created_at),
    argMax(usage_system), created_at),
...
 FROM cpu 

В DannaHause akwai da yawa dozin aggregate ayyuka, kuma idan ka yi amfani da combinators, bisa ga dokokin combinatorics, za ka samu game da dubu daga cikinsu. ArgMax - ɗayan ayyukan da ke ƙididdige ƙimar mafi girman: buƙatun ya dawo da ƙimar mai amfani_mai amfani, wanda aka kai matsakaicin ƙimar halitta_a:

SELECT now() as created_at,
       cpu.*
  FROM (SELECT DISTINCT tags_id from cpu) base 
  ASOF LEFT JOIN cpu USING (tags_id, created_at)

ASOF SHIGA - "manne" layuka tare da lokuta daban-daban. Wannan siffa ce ta musamman don bayanan bayanan da ake samu kawai a ciki kdb+. Idan akwai jerin lokuta biyu tare da lokuta daban-daban, ASOF SHIGA yana ba ku damar motsawa da haɗa su cikin buƙatu ɗaya. Ga kowane ƙima a cikin jerin lokaci ɗaya, ana samun ƙimar mafi kusa a ɗayan, kuma ana mayar dasu akan layi ɗaya:

Motsawa zuwa ClickHouse: Bayan shekaru 3

Ayyukan Nazari

A cikin ma'auni SQL-2003 za ku iya rubuta kamar haka:

SELECT origin,
       timestamp,
       timestamp -LAG(timestamp, 1) OVER (PARTITION BY origin ORDER BY timestamp) AS duration,
       timestamp -MIN(timestamp) OVER (PARTITION BY origin ORDER BY timestamp) AS startseq_duration,
       ROW_NUMBER() OVER (PARTITION BY origin ORDER BY timestamp) AS sequence,
       COUNT() OVER (PARTITION BY origin ORDER BY timestamp) AS nb
  FROM mytable
ORDER BY origin, timestamp;

В DannaHause Ba za ku iya yin hakan ba - baya goyan bayan ƙa'idar SQL-2003 kuma mai yiwuwa ba zai taba yi ba. A maimakon haka, a DannaHause Ya zama al'ada a rubuta kamar haka:

Motsawa zuwa ClickHouse: Bayan shekaru 3

Na yi alkawarin lambdas - ga su nan!

Wannan analogue ne na tambayar nazari a cikin ma'auni SQL-2003: yana ƙididdige bambanci tsakanin su biyun timestamp, duration, ordinal lamba - duk abin da muka saba la'akari da ayyukan nazari. IN DannaHause Muna kirga su ta hanyar tsararru: da farko za mu rurrushe bayanan a cikin tsararru, bayan haka muna yin duk abin da muke so akan jeri, sannan mu fadada shi baya. Ba shi da dacewa sosai, yana buƙatar son shirye-shirye na aiki a ƙaƙaƙƙen, amma yana da sassauƙa sosai.

Siffofin Musamman

Bayan haka, in DannaHause ayyuka na musamman da yawa. Misali, ta yaya za a tantance lokuta nawa ne ke faruwa a lokaci guda? Ayyukan kulawa na yau da kullun shine ƙayyade matsakaicin nauyi tare da buƙatu ɗaya. IN DannaHause Akwai aiki na musamman don wannan dalili:

Motsawa zuwa ClickHouse: Bayan shekaru 3

Gabaɗaya, ClickHouse yana da ayyuka na musamman don dalilai da yawa:

  • Gudun Bambanci, Gudun Tattaunawa, maƙwabci;
  • sumMap (maɓalli, ƙima);
  • timeSeriesGroupSum (uid, timestamp, darajar);
  • timeSeriesGroupRateSum(uid, timestamp, darajar);
  • skewPop, skewSamp, kurtPop, kurtSamp;
  • TARE DA CIKA / DA alaƙa;
  • SauƙaƙanLinearRegression, stochasticLinearRegression.

Wannan ba cikakken jerin ayyuka bane, akwai 500-600 gabaɗaya. Alamomi: duk aiki a cikin DannaHause yana cikin teburin tsarin (ba duka aka rubuta ba, amma duk suna da ban sha'awa):

select * from system.functions order by name

DannaHause tana adana bayanai da yawa game da kanta, gami da tebur tebur, tambaya_log, log log, log na ayyuka tare da tubalan bayanai (part_log), log ɗin awo, da tsarin tsarin, wanda yawanci yakan rubuta zuwa faifai. Log metrics shine jerin lokaci в DannaHause a zahiri DannaHause: Database kanta na iya taka rawa jerin lokaci databases, don haka "cinye" kanta.

Motsawa zuwa ClickHouse: Bayan shekaru 3

Wannan kuma abu ne na musamman - tunda muna yin aiki mai kyau don jerin lokaci, me ya sa ba za mu iya adana duk abin da muke bukata a cikin kanmu ba? Ba mu bukata Prometheus, Mu kiyaye komai a kanmu. An haɗa Grafana kuma muna sanya ido kan kanmu. Duk da haka, idan DannaHause faɗuwa, ba za mu ga dalilin da ya sa ba, don haka yawanci ba sa yin haka.

Manyan gungu ko ƙananan ƙananan DannaHause

Menene mafi kyau - babban gungu ɗaya ko ƙananan ClickHouses da yawa? Hanyar gargajiya zuwa DWH babban gungu ne wanda aka keɓe da'irori don kowane aikace-aikacen. Mun zo wurin mai kula da bayanai - ba mu zane, kuma sun ba mu daya:

Motsawa zuwa ClickHouse: Bayan shekaru 3

В DannaHause za ku iya yin shi daban. Kuna iya yin kowane aikace-aikacen naku DannaHause:

Motsawa zuwa ClickHouse: Bayan shekaru 3

Ba ma buƙatar babban abin ban tsoro kuma DWH da admins masu ban sha'awa. Za mu iya ba kowane aikace-aikacen kansa DannaHause, kuma mai haɓakawa zai iya yin shi da kansa, tun da DannaHause mai sauƙin shigarwa kuma baya buƙatar gudanarwa mai rikitarwa:

Motsawa zuwa ClickHouse: Bayan shekaru 3

Amma idan muna da yawa DannaHause, kuma kuna buƙatar shigar da shi akai-akai, sannan kuna son sarrafa wannan tsari. Don wannan za mu iya, alal misali, amfani Kubernetes и danna gidan- mai aiki. IN Kubernetes ClickHouse za ka iya sanya shi "danna-danna": Zan iya danna maballin, gudanar da bayanin kuma an shirya bayanan. Nan da nan zan iya ƙirƙirar zane, fara loda awo a can, kuma a cikin mintuna 5 ina da dashboard a shirye. Grafana. Yana da sauƙi!

Mene ne a karshen?

Sabili da haka, DannaHause - Wannan:

  • Mai sauri. Kowa ya san wannan.
  • Kawai. Ƙananan rikice-rikice, amma na yi imani cewa yana da wuya a horarwa, mai sauƙi a cikin fama. Idan kun fahimci yadda DannaHause yana aiki, to komai yana da sauqi.
  • Na duniya. Ya dace da yanayi daban-daban: DWH, Jerin Lokaci, Ma'ajiyar Log. Amma ba haka ba ne OLTP database, don haka kar a yi ƙoƙarin yin gajeriyar sakawa kuma karanta a can.
  • Yana da ban sha'awa. Wataƙila wanda ke aiki da shi DannaHause, dandana lokuta masu ban sha'awa da yawa a cikin ma'ana mai kyau da mara kyau. Misali, sabon saki ya fito, komai ya daina aiki. Ko kuma lokacin da kuka yi fama da wani aiki na kwana biyu, amma bayan yin tambaya a cikin Taɗi na Telegram, an warware aikin cikin mintuna biyu. Ko kuma kamar a taron a rahoton Lesha Milovidov, hoton hoto daga DannaHause karya watsa shirye-shirye HighLoad++. Irin wannan abu yana faruwa koyaushe kuma yana sa rayuwarmu ta kasance cikin wahala. DannaHause mai haske da ban sha'awa!

Kuna iya kallon gabatarwar a nan.

Motsawa zuwa ClickHouse: Bayan shekaru 3

Taron da aka daɗe ana jira na masu haɓaka na'urori masu ɗaukar nauyi a HighLoad++ Za a yi a ranar 9 da 10 ga Nuwamba a Skolkovo. A ƙarshe, wannan zai zama taron layi na layi (ko da yake tare da duk matakan tsaro a wurin), tun da ba za a iya haɗa makamashin HighLoad ++ akan layi ba.

Don taron, mun sami kuma mun nuna muku lokuta game da matsakaicin ƙarfin fasaha: HighLoad ++ ya kasance, kuma shine kawai wurin da zaku iya koyo cikin kwanaki biyu yadda Facebook, Yandex, VKontakte, Google da Amazon ke aiki.

Kasancewar mun gudanar da tarukanmu ba tare da katsewa ba tun 2007, a wannan shekara za mu hadu a karo na 14. A wannan lokacin, taron ya girma sau 10; a bara, babban taron masana'antu ya haɗu da mahalarta 3339, masu magana 165, rahotanni da tarurruka, kuma waƙoƙi 16 suna gudana lokaci guda.
A bara akwai motocin bas 20, lita 5280 na shayi da kofi, lita 1650 na ruwan 'ya'yan itace da kwalaben ruwa 10200. Da kuma wani kilogiram 2640 na abinci, faranti 16 da kofuna 000. Af, tare da kuɗin da aka samu daga takarda da aka sake yin fa'ida, mun shuka itatuwan oak 25 :)

Kuna iya siyan tikiti a nan, samun labarai game da taron - a nan, da kuma yin magana a duk social networks: sakon waya, Facebook, Vkontakte и Twitter.

source: www.habr.com

Add a comment