Kushandisa Clickhouse seKutsiva ELK, Big Query uye TimescaleDB

Clickhouse ndeye yakavhurika sosi columnar dhatabhesi manejimendi system ye online analytical query process (OLAP), yakagadzirwa neYandex. Inoshandiswa neYandex, CloudFlare, VK.com, Badoo nemamwe masevhisi pasi rese kuchengetedza huwandu hukuru hwe data (kuisa zviuru zvemitsara pasekondi kana petabytes yedata yakachengetwa padhisiki).

Nenguva dzose, "tambo" DBMS, mienzaniso iyo MySQL, Postgres, MS SQL Server, data inochengetwa nenzira inotevera:

Kushandisa Clickhouse seKutsiva ELK, Big Query uye TimescaleDB

Muchiitiko ichi, kukosha kwakabatana nemutsara mumwe kunochengetwa padyo. Mune columnar DBMSs, kukosha kubva kumakoramu akasiyana anochengetwa akaparadzana, uye data kubva kune imwe koramu inochengetwa pamwechete:

Kushandisa Clickhouse seKutsiva ELK, Big Query uye TimescaleDB

Mienzaniso yemakoramu eDBMS ndeye Vertica, Paraccel (Actian Matrix, Amazon Redshift), Sybase IQ, Exasol, Infobright, InfiniDB, MonetDB (VectorWise, Actian Vector), LucidDB, SAP HANA, Google Dremel, Google PowerDrill, Druid, kdb+.

Mail forwarder kambani Qwintry yakatanga kushandisa Clickhouse muna 2018 yekuzivisa uye yakafadzwa kwazvo nekureruka kwayo, scalability, SQL rutsigiro uye kumhanya. Kumhanya kweiyi DBMS kwakaganhurana nemashiripiti.

unyore

Clickhouse yakaiswa paUbuntu nemurairo mumwechete. Kana iwe uchiziva SQL, unogona kubva watanga kushandisa Clickhouse kune zvaunoda. Nekudaro, izvi hazvireve kuti iwe unogona kuita "kuratidza kugadzira tafura" muMySQL uye kukopa-namira iyo SQL muClickhouse.

Kuenzaniswa neMySQL, kune yakakosha mhando yedhata mutsauko mune tafura schema tsananguro, saka iwe uchazoda imwe nguva yekuchinja tafura schema tsananguro uye kudzidza injini dzetafura kuti ugadzikane.

Clickhouse inoshanda zvikuru pasina imwe software yekuwedzera, asi kana iwe uchida kushandisa kudzokorora, unozofanirwa kuisa ZooKeeper. Kuongorora kwekuita kwemubvunzo kunoratidza mhedzisiro yakanaka - matafura ehurongwa ane ruzivo rwese, uye data rese rinogona kutorwa uchishandisa yekare uye inofinha SQL.

Kubudirira

  • Benchmark kuenzanisa kweClickhouse neVertica uye MySQL pane sevha yekugadzirisa: zvigadziko zviviri Intel® Xeon® CPU E5-2650 v2 @ 2.60GHz; 128 GiB RAM; md RAID-5 pa8 6TB SATA HDD, ext4.
  • Benchmark kuenzanisa kweClickhouse neAmazon RedShift gore kuchengetedza.
  • Blog zvinyorwa Cloudflare paClickhouse performance:

Kushandisa Clickhouse seKutsiva ELK, Big Query uye TimescaleDB

Iyo ClickHouse dhatabhesi ine yakapusa dhizaini - ese ma node musumbu ane mashandiro akafanana uye anongoshandisa ZooKeeper yekubatanidza. Isu takavaka boka diki remanodhi akati wandei uye takaita bvunzo, panguva yatakaona kuti sisitimu yacho ine mashandiro anokatyamadza, ayo anoenderana nezvakanakira zvakataurwa mukuongorora DBMS mabhenji. Isu takasarudza kunyatsotarisisa iyo pfungwa iri kuseri kweClickHouse. Chekutanga chipingamupinyi pakutsvaga kwaive kushomeka kwezvishandiso uye nharaunda diki yeClickHouse, saka takaongorora magadzirirwo eDBMS iyi kuti tinzwisise kuti inoshanda sei.

ClickHouse haitsigire kugamuchira data zvakananga kubva kuKafka sezvo ingori dhatabhesi, saka isu takanyora yedu pachedu sevhisi muGo. Yakaverenga Cap'n Proto mameseji encoded kubva kuKafka, akaashandura kuita TSV uye akaaisa muClickHouse mumabhechi kuburikidza neHTTP interface. Isu takazonyora zvakare sevhisi iyi kuti tishandise Go raibhurari takabatana neClickHouse's yega interface kuvandudza mashandiro. Pakuongorora mashandiro ekugamuchira mapaketi, takawana chinhu chakakosha - zvakazoitika kuti kune ClickHouse kuita uku kunoenderana nehukuru hwepaketi, ndiko kuti, nhamba yemitsara yakaiswa panguva imwe chete. Kuti tinzwisise kuti sei izvi zvichiitika, takatarisa kuti ClickHouse inochengeta sei data.

Injini huru, kana kuti mhuri yemainjini ematafura, anoshandiswa naClickHouse kuchengetedza data ndeye MergeTree. Injini iyi ine pfungwa yakafanana neLSM algorithm inoshandiswa muGoogle BigTable kana Apache Cassandra, asi inodzivirira kuvaka tafura yepakati yekurangarira uye inonyora data zvakananga kudhisiki. Izvi zvinoipa yakanakisa kunyora kwekunyora, sezvo paketi yega yega yakaiswa inorongwa chete nekiyi yekutanga, yakamanikidzwa, uye inonyorerwa kudhisiki kuita chikamu.

Kusavapo kwetafura yekurangarira kana chero pfungwa ye "kutsva" yedata zvakare zvinoreva kuti vanogona kungowedzerwa; iyo system haitsigire kuchinja kana kudzima. Parizvino, nzira chete yekudzima data ndeyekuidzima nemwedzi wekarenda, sezvo zvikamu zvisingadariki muganho wemwedzi. Chikwata cheClickHouse chiri kushanda nesimba kuti chimiro ichi chigadzirike. Kune rimwe divi, inoita kunyora uye kubatanidza zvikamu zvisina kukakavara, saka gamuchira zvikero zvekupfuura zvakatevedzana nehuwandu hwekuisa pamwe chete kusvika I/O kana core saturation yaitika.
Nekudaro, izvi zvinoreva zvakare kuti sisitimu haina kukodzera mapaketi madiki, saka masevhisi eKafka uye anopinza anoshandiswa kubhafa. Tevere, ClickHouse kumashure inoramba ichiita segment kubatanidza, kuitira kuti zvidimbu zvidiki zveruzivo zvibatanidzwe uye kurekodhwa kakawanda, zvichiwedzera kusimba kwekurekodha. Nekudaro, zvikamu zvakawandisa zvisina kubatana zvinokonzeresa kubhuroka kwehasha kwekuisa chero kusanganiswa kuchienderera. Takaona kuti kuwirirana kwakanyanya pakati penguva-chaiyo yekumedza uye kuita kwekuita ndeyekupinza nhamba shoma yekuisa pasekondi mutafura.

Chinokosha chekushanda kwetafura kuverenga ndeye indexing uye nzvimbo ye data pane disk. Hazvina mhosva kuti kugadzirisa kunokurumidza sei, kana injini inoda kuongorora terabytes yedata kubva ku diski uye ingoshandisa chikamu chayo, zvinotora nguva. ClickHouse ichitoro checolumnar, saka chikamu chega chega chine faira rekoramu yega yega (column) ine yakarongwa tsika pamutsara wega wega. Nenzira iyi, makoramu ese akashaikwa pamubvunzo anogona kutsikwa pekutanga, uyezve maseru akawanda anogona kugadziriswa achienderana nevectorized execution. Kuti udzivise scan yakazara, chikamu chimwe nechimwe chine diki index file.

Tichifunga kuti makoramu ese akarongwa ne "primary kiyi", iyo index file inongori nemavara (akatorwa mitsara) yega yega Nth mutsara kuti akwanise kuzvichengeta mundangariro kunyangwe matafura makuru kwazvo. Semuenzaniso, unogona kuseta zvigadziriso zve "makero ega ega 8192nd", ipapo "zvishoma" indexing yetafura ine 1 trillion. mitsetse inokwana nyore mundangariro inotora mavara 122 chete.

Sisitimu yekuvandudza

Iko kuvandudza uye kuvandudzwa kweClickhouse kunogona kuteverwa Github repo uye iva nechokwadi chokuti nzira ye "kukura" inoitika pamwero unoshamisa.

Kushandisa Clickhouse seKutsiva ELK, Big Query uye TimescaleDB

Kuverenga

Kuzivikanwa kwaClickhouse kunoratidzika kunge kuri kukura zvakanyanya, kunyanya munharaunda inotaura chiRussia. Gore rakapera High load 2018 musangano (Moscow, November 8-9, 2018) yakaratidza kuti mhuka dzakadai sevk.com neBadoo dzinoshandisa Clickhouse, iyo yavanoisa data (somuenzaniso, matanda) kubva kumakumi ezviuru zvemaseva panguva imwe chete. Muvhidhiyo yemaminitsi makumi mana Yuri Nasretdinov weboka reVKontakte anotaura nezvekuti izvi zvinoitwa sei. Nenguva isipi isu tichatumira chinyorwa paHabr kuitira nyore kushanda nezvinhu.

Maitiro ekushanda

Mushure mekupedza imwe nguva ndichitsvaga, ndinofunga kune nzvimbo iyo ClickHouse inogona kubatsira kana inogona kunyatso kutsiva mamwe, echinyakare uye akakurumbira mhinduro dzakadai seMySQL, PostgreSQL, ELK, Google Big Query, Amazon RedShift, TimescaleDB, Hadoop, MapReduce, Pinot uye. Druid. Izvi zvinotevera zvinotsanangura ruzivo rwekushandisa ClickHouse kugadzirisa kana kutsiva zvachose iyo DBMS iri pamusoro.

Kuwedzera kugona kweMySQL uye PostgreSQL

Nguva pfupi yadarika takatsiva zvishoma MySQL neClickHouse yepuratifomu yedu yetsamba Mautic newsletter. Dambudziko nderekuti MySQL, nekuda kweiyo dhizaini isina kunaka, yaitora maemail ese akatumirwa uye yese link mune iyo email ine base64 hash, ichigadzira hombe MySQL tafura (email_stats). Mushure mekutumira maemail emamiriyoni gumi chete kune vanyoreri vebasa, tafura iyi yakatora 10 GB yefaira nzvimbo, uye MySQL yakatanga kuve "benzi" pamibvunzo iri nyore. Kugadzirisa iyo nzvimbo yefaira nyaya, takabudirira kushandisa InnoDB tafura compression iyo yakaideredza nechikamu che150. Nekudaro, hazvisati zvaita zvine musoro kuchengetedza anopfuura 4-20 miriyoni maemail muMySQL nekuda kwekuverenga nhoroondo, sezvo chero mubvunzo wakapusa uyo nekuda kwechimwe chikonzero unoda kuita yakazara scan mhedzisiro mukuchinjana uye yakawanda ini. /O mutoro, maererano nezvataigara tichiwana yambiro kubva kuZabbix.

Kushandisa Clickhouse seKutsiva ELK, Big Query uye TimescaleDB

Clickhouse inoshandisa maviri compression algorithms ayo anoderedza dhata vhoriyamu neanenge 3-4 nguva, asi munyaya iyi data yainyanya "compressible".

Kushandisa Clickhouse seKutsiva ELK, Big Query uye TimescaleDB

Kutsiva ELK

Zvichienderana neruzivo rwangu, iyo ELK stack (ElasticSearch, Logstash uye Kibana, mune iyi nyaya ElasticSearch) inoda zvakawanda zviwanikwa kuti zvimhanye kupfuura zvinodiwa kuchengetedza matanda. ElasticSearch ijini huru kana iwe uchida yakanaka-yakazara-zvinyorwa zvinyorwa yekutsvaga (izvo zvandisingafungi kuti unonyanyoda), asi ndiri kushamisika kuti sei yave iyo de facto standard yekutema matanda. Kuita kwayo kupinza kwakasanganiswa neLogstash kwakatipa matambudziko kunyangwe pasi pemitoro yakareruka uye yaida kuti tiwedzere kuwedzera RAM uye disk nzvimbo. Sedhatabhesi, Clickhouse iri nani pane ElasticSearch nekuda kwezvikonzero zvinotevera:

  • SQL dialect rutsigiro;
  • Iyo yakanakisa dhigirii yekudzvanya data yakachengetwa;
  • Tsigiro yeRegex yenguva dzose kutaura kutsvaga panzvimbo yekutsvaga yakazara mavara;
  • Yakavandudzwa kurongwa kwemibvunzo uye nepamusoro pese kuita.

Parizvino, dambudziko guru rinomuka kana uchienzanisa ClickHouse neELK ndiko kushaikwa kwemhinduro dzekuisa matanda, pamwe nekushaikwa kwezvinyorwa uye zvidzidzo pamusoro pechinyorwa. Uyezve, mushandisi wega wega anogona kugadzirisa ELK achishandisa Digital Ocean manual, iyo inonyanya kukosha kuti iite nekukurumidza kuita matekinoroji akadaro. Iko kune injini yedatabase, asi hapana Filebeat yeClickHouse parizvino. Hongu, iripo fluentd uye hurongwa hwekushanda nematanda loghouse, pane mudziyo clicktail kuisa log file data muClickHouse, asi zvese izvi zvinotora nguva yakawanda. Nekudaro, ClickHouse ichiri mutungamiri nekuda kwekureruka kwayo, saka kunyangwe vanotanga vanogona kuimisa zviri nyore uye votanga kuishandisa zvizere mumaminitsi gumi chete.

Ndichifarira minimalist mhinduro, ndakaedza kushandisa FluentBit, chishandiso chekutumira matanda chine ndangariro shoma, pamwe neClickHouse, ndichiedza kudzivirira kushandisa Kafka. Nekudaro, kusawirirana kudiki kunoda kugadziriswa, senge date format matambudzikoizvi zvisati zvaitika pasina proxy layer inoshandura data kubva kuFluentBit kuenda kuClickHouse.

Seimwe nzira, Kibana inogona kushandiswa seClickHouse backend grafana. Kubva pane zvandinonzwisisa, izvi zvinogona kukonzera nyaya dzekuita kana uchipa huwandu hukuru hwemapoinzi edata, kunyanya neshanduro dzekare dzeGrafana. Hatisati taedza izvi kuQwintry, asi zvichemo pamusoro peizvi zvinoonekwa nguva nenguva paClickHouse chiteshi chetsigiro paTeregiramu.

Kutsiviwa kweGoogle Big Query uye Amazon RedShift (mhinduro kumakambani makuru)

Iyo yakanaka yekushandisa kesi yeBigQuery ndeyekurodha 1 TB yeJSON data uye kumhanya yekuongorora mibvunzo pairi. Big Query chigadzirwa chakanakisa chine scalability isingagone kuwedzeredzwa. Iyi yakanyanya kuomarara software kupfuura ClickHouse, iyo inomhanya pane yemukati cluster, asi kubva pakuona kwemutengi ine zvakawanda zvakafanana neClickHouse. BigQuery inogona kudhura nekukurumidza kana iwe uchinge watanga kubhadhara paSARUDZA, saka ichokwadi SaaS mhinduro ine zvese zvayakanakira nezvayakaipira.

ClickHouse ndiyo yakanakisa sarudzo kana iwe uchimhanyisa yakawanda yemakomputa inodhura mibvunzo. Iyo yakawanda SARUDZA mibvunzo yaunomhanyisa zuva rega rega, ndipo pazvinoita zvine musoro kutsiva Big Query neClickHouse, nekuti kutsiva kwakadaro kunogona kukuchengetedza zviuru zvemadhora kana zvasvika kune akawanda materabytes edata riri kugadziriswa. Izvi hazvishande kune data rakachengetwa, iro rakachipa kuita muBig Query.

Mune chinyorwa naAltinity co-muvambi Alexander Zaitsev "Kuchinjira kuClickHouse" inotaura nezve mabhenefiti eiyo DBMS kutama.

TimescaleDB kutsiva

TimescaleDB ndeyePostgreSQL yekuwedzera iyo inogonesa kushanda netimeseries nguva yakatevedzana mune yakajairwa dhatabhesi (https://docs.timescale.com/v1.0/introduction, https://habr.com/ru/company/zabbix/blog/458530/).

Kunyangwe ClickHouse isiri mukwikwidzi akakomba mune niche yenguva yakatevedzana, asi columnar chimiro uye vector query execution, inokurumidza kupfuura TimescaleDB mune dzakawanda zviitiko zvekuongorora mubvunzo kugadzirisa. Panguva imwecheteyo, kuita kwekugamuchira batch data kubva kuClickHouse kunenge kakapetwa katatu, uye inoshandisawo 3 nguva shoma yedhisiki nzvimbo, iyo inonyanya kukosha pakugadzirisa mavhoriyamu makuru enhoroondo data: 
https://www.altinity.com/blog/ClickHouse-for-time-series.

Kusiyana neClickHouse, iyo chete nzira yekuchengetedza imwe dhisiki nzvimbo muTimescaleDB ndeye kushandisa ZFS kana akafanana mafaera masisitimu.

Zvitsva zvinouya zveClickHouse zvingangounza delta compression, izvo zvinozoita kuti ive yakanyanya kukodzera kugadzirisa uye kuchengetedza data data. TimescaleDB inogona kunge iri sarudzo iri nani pane isina kushama ClickHouse mune anotevera kesi:

  • madiki ekuisa ane RAM shoma (<3 GB);
  • huwandu hukuru hwemaINSERT madiki ausingade kuvharira muzvidimbu zvakakura;
  • zviri nani kuenderana, kufanana uye ACID zvinodiwa;
  • PostGIS rutsigiro;
  • kujoinha nematafura ePostgreSQL aripo, sezvo Timescale DB iri chaizvo PostgreSQL.

Makwikwi neHadoop uye MepuReduce masisitimu

Hadoop nezvimwe zvigadzirwa zveMapReduce zvinogona kuita maverengero akaomarara, asi anowanzo mhanya nekunonoka. Saka, ClickHouse inonyanya kushanda pakuita nekukurumidza, inopindirana yekuongorora tsvagiridzo, iyo inofanirwa kufarira masayendisiti edata.

Makwikwi naPinot uye Druid

Vakwikwidzi veClickHouse vepedyo ndivo columnar, linearly scalable open source zvigadzirwa Pinot uye Druid. Basa rakanakisa rekuenzanisa masisitimu aya rinoburitswa muchinyorwa Romana Leventova yaFebruary 1, 2018

Kushandisa Clickhouse seKutsiva ELK, Big Query uye TimescaleDB

Ichi chinyorwa chinoda kuvandudzwa - chinoti ClickHouse haitsigire UPDATE uye DELETE mashandiro, izvo zvisiri zvechokwadi zvachose kune ichangoburwa shanduro.

Isu hatina ruzivo rwakawanda neiyo dhatabhesi, asi ini handifarire kuoma kwezvivakwa zvinodiwa kumhanya Druid nePinot - iboka rose rezvikamu zvinofamba zvakakomberedzwa neJava kumativi ese.

Druid nePinot mapurojekiti eApache incubator, kufambira mberi kwayo kwakafukidzwa zvakadzama neApache pamapeji ayo eGitHub chirongwa. Pinot akaonekwa muincubator muna Gumiguru 2018, uye Druid akaberekwa mwedzi 8 yapfuura - muna Kukadzi.

Kushaikwa kweruzivo rwekuti AFS inoshanda sei kunomutsa mimwe, uye pamwe yakapusa, mibvunzo kwandiri. Ndinoshamisika kana vanyori vePinot vakacherechedza kuti Apache Foundation inonyanya kufarira Druid, uye kana maitiro aya kune anokwikwidza akakonzera kunzwa kwegodo? Ko budiriro yaDruid inodzikira uye budiriro yaPinot inokurumidza kana vanotsigira vekare vakangoerekana vafarira zvekupedzisira?

Zvakaipa zveClickHouse

Immaturity: Zviripachena, iyi haisati iri tekinoroji inofinha, asi chero zvakadaro, hapana chakadai chinocherechedzwa mune mamwe madhijitari maDBMS.

Zvidiki zvinopinza hazvishande zvakanaka pakumhanya kukuru: zvinoiswa zvinofanirwa kupatsanurwa kuita zvidimbu zvihombe nekuti kuita kwezvinyoro zvishoma kunodzikisira zvichienderana nehuwandu hwemakoramu mumutsara wega wega. Aya ndiwo machengetero eClickHouse data padhisiki - koramu yega yega inomiririra faira rimwe kana kupfuura, saka kuti uise mutsara mumwe une 1 columns, unofanirwa kuvhura nekunyora angangoita zana mafaira. Ndosaka kuisa buffering kuchida munhu wepakati (kunze kwekunge mutengi pachawo achipa buffering) - kazhinji Kafka kana imwe mhando ye queue management system. Iwe unogona zvakare kushandisa iyo Buffer tafura injini kuti gare gare kukopa machunks makuru e data mumatafura eMergeTree.

Tafura inojoinha inogumira nesavha RAM, asi zvirinani ivo varipo! Semuenzaniso, Druid nePinot havana hukama hwakadaro zvachose, sezvo zvakaoma kushandisa zvakananga mumasitimu akagoverwa asingatsigire kufambisa machunks makuru e data pakati pemanodhi.

zvakawanikwa

Isu tinoronga kushandisa zvakanyanya ClickHouse muQwintry mumakore anotevera, sezvo iyi DBMS inopa yakanakisa chiyero chekuita, yakaderera pamusoro, scalability uye nyore. Ndine chokwadi chekuti ichatanga kupararira nekukurumidza kana iyo ClickHouse nharaunda yauya nedzimwe nzira dzekuishandisa mudiki kusvika pakati-saizi yekumisikidza.

Dzimwe ads 🙂

Ndinokutendai nekugara nesu. Unoda zvinyorwa zvedu here? Unoda kuona zvimwe zvinonakidza zvemukati? Titsigire nekuisa odha kana kukurudzira kushamwari, Cloud VPS yevagadziri kubva kumadhora 4.99, yakasarudzika analogue yekupinda-level maseva, iyo yakagadzirwa nesu kuti iwe: Chokwadi chese nezveVPS (KVM) E5-2697 v3 (6 Cores) 10GB DDR4 480GB SSD 1Gbps kubva pamadhora makumi maviri kana kugovera sevha? (inowanikwa neRAID1 uye RAID10, kusvika ku24 cores uye kusvika ku40GB DDR4).

Dell R730xd 2 nguva yakachipa muEquinix Tier IV data center muAmsterdam? Chete pano 2 x Intel TetraDeca-Core Xeon 2x E5-2697v3 2.6GHz 14C 64GB DDR4 4x960GB SSD 1Gbps 100 TV kubva $199 muNetherlands! Dell R420 - 2x E5-2430 2.2Ghz 6C 128GB DDR3 2x960GB SSD 1Gbps 100TB - kubva pamadhora makumi mapfumbamwe nemapfumbamwe! Verenga nezve Nzira yekuvaka Infrastructure Corp. kirasi nekushandiswa kweDell R730xd E5-2650 v4 maseva anokosha 9000 euros penny?

Source: www.habr.com

Voeg