Kugwiritsa ntchito Clickhouse m'malo mwa ELK, Big Query ndi TimescaleDB

clickhouse ndi njira yotseguka yoyang'anira database yowunikira pa intaneti (OLAP) yopangidwa ndi Yandex. Imagwiritsidwa ntchito ndi Yandex, CloudFlare, VK.com, Badoo ndi mautumiki ena padziko lonse lapansi kuti asunge zochulukirapo (kuyika masauzande a mizere pamphindikati kapena ma petabytes a data yosungidwa pa disk).

Mwachizolowezi, "chingwe" DBMS, zitsanzo zomwe ndi MySQL, Postgres, MS SQL Server, deta imasungidwa motere:

Kugwiritsa ntchito Clickhouse m'malo mwa ELK, Big Query ndi TimescaleDB

Pankhaniyi, zikhalidwe zokhudzana ndi mzere umodzi zimasungidwa mbali ndi mbali. Mu columnar DBMS, mfundo zochokera m'magawo osiyanasiyana zimasungidwa padera, ndipo deta ya gawo limodzi imasungidwa pamodzi:

Kugwiritsa ntchito Clickhouse m'malo mwa ELK, Big Query ndi TimescaleDB

Zitsanzo za columnar DBMSs ndi Vertica, Paraccel (Actian Matrix, Amazon Redshift), Sybase IQ, Exasol, Infobright, InfiniDB, MonetDB (VectorWise, Actian Vector), LucidDB, SAP HANA, Google Dremel, Google PowerDrill, Druid, kdb+.

Kampaniyo ndi yotumiza makalata Qwintry Ndidayamba kugwiritsa ntchito Clickhouse mu 2018 popereka lipoti ndipo ndidachita chidwi kwambiri ndi kuphweka kwake, scalability, thandizo la SQL, komanso liwiro. Kuthamanga kwa DBMS iyi kunali malire ndi matsenga.

tisaletse

Clickhouse imayika pa Ubuntu ndi lamulo limodzi. Ngati mukudziwa SQL, mutha kuyamba kugwiritsa ntchito Clickhouse pazosowa zanu. Komabe, izi sizikutanthauza kuti mutha "kuwonetsa tebulo lopanga" mu MySQL ndikumata SQL mu Clickhouse.

Poyerekeza ndi MySQL, pali kusiyana kofunikira kwamtundu wa data pamatanthauzidwe a tebulo la schema mu DBMS iyi, kotero mukufunikirabe nthawi kuti musinthe matanthauzidwe a tebulo la schema ndikuphunzira ma injini a tebulo kuti mukhale omasuka.

Clickhouse imagwira ntchito bwino popanda pulogalamu yowonjezera, koma ngati mukufuna kugwiritsa ntchito kubwereza muyenera kukhazikitsa ZooKeeper. Kusanthula kwamachitidwe kumawonetsa zotsatira zabwino kwambiri - matebulo adongosolo ali ndi chidziwitso chonse, ndipo zonse zitha kupezeka pogwiritsa ntchito SQL yakale komanso yotopetsa.

Kukonzekera

  • Benchmark Kuyerekeza kwa Clickhouse motsutsana ndi Vertica ndi MySQL pa seva yosinthira: zitsulo ziwiri za Intel® Xeon® CPU E5-2650 v2 @ 2.60GHz; 128 GiB RAM; md RAID-5 pa 8 6TB SATA HDD, ext4.
  • Benchmark kuyerekeza Clickhouse ndi Amazon RedShift mtambo yosungirako.
  • Zolemba za blog Cloudflare za Clickhouse performance:

Kugwiritsa ntchito Clickhouse m'malo mwa ELK, Big Query ndi TimescaleDB

Dongosolo la ClickHouse lili ndi mapangidwe osavuta - ma node onse omwe ali mgululi ali ndi magwiridwe antchito omwewo ndipo amagwiritsa ntchito ZooKeeper okha kuti agwirizane. Tidapanga kagulu kakang'ono ka ma node angapo ndikuyesa kuyesa, pomwe tidapeza kuti makinawa ali ndi magwiridwe antchito ochititsa chidwi, omwe amafanana ndi zomwe amati amapeza pama benchmarks a DBMS. Tidaganiza zowunikiranso lingaliro lomwe lili kumbuyo kwa ClickHouse. Cholepheretsa choyamba kufufuza chinali kusowa kwa zida ndi gulu laling'ono la ClickHouse, kotero tinayang'ana mu mapangidwe a DBMS iyi kuti timvetse momwe imagwirira ntchito.

ClickHouse sichirikiza kulandira deta mwachindunji kuchokera ku Kafka, chifukwa ndi nkhokwe chabe, kotero tinalemba ntchito yathu ya adaputala ku Go. Idawerenga mauthenga osungidwa a Cap'n Proto kuchokera ku Kafka, kuwasintha kukhala TSV, ndikuyika mu ClickHouse m'magulu kudzera pa mawonekedwe a HTTP. Pambuyo pake tidalembanso ntchitoyi kuti tigwiritse ntchito laibulale ya Go molumikizana ndi mawonekedwe athu a ClickHouse kuti tichite bwino. Powunika momwe ma paketi amagwirira ntchito, tidapeza chinthu chofunikira - zidapezeka kuti kwa ClickHouse izi zimatengera kukula kwa paketi, ndiko kuti, kuchuluka kwa mizere yoyikidwa nthawi imodzi. Kuti timvetse chifukwa chake izi zimachitika, tidaphunzira momwe ClickHouse imasungira deta.

Injini yayikulu, kapena kani, banja la injini zama tebulo zomwe zimagwiritsidwa ntchito ndi ClickHouse posungira deta, ndi MergeTree. Injiniyi ndi yofanana ndi algorithm ya LSM yomwe imagwiritsidwa ntchito mu Google BigTable kapena Apache Cassandra, koma imapewa kupanga tebulo lapakatikati ndikulemba deta mwachindunji ku disk. Izi zimapangitsa kuti ikhale yabwino kwambiri polemba, popeza paketi iliyonse yomwe yalowetsedwa imasanjidwa ndi kiyi ya "primary key", yopanikizidwa, ndi kulembedwa ku disk kuti apange gawo.

Kusowa kwa tebulo la kukumbukira kapena lingaliro lililonse la "kutsitsimuka" kwa deta kumatanthauzanso kuti akhoza kuwonjezeredwa, dongosolo siligwirizana ndi kusintha kapena kuchotsa. Kuyambira lero, njira yokhayo yochotsera deta ndikuchotsa ndi mwezi wa kalendala, popeza magawo samadutsa malire a mwezi umodzi. Gulu la ClickHouse likugwira ntchito mwakhama kuti izi zitheke. Kumbali inayi, zimapangitsa kulemba ndi kuphatikiza magawo kukhala opanda mikangano, kotero landirani masikelo odutsira motsatana ndi kuchuluka kwa zoyika zofananira mpaka I/O kapena ma cores akhute.
Komabe, izi zikutanthawuzanso kuti dongosololi siloyenera mapaketi ang'onoang'ono, kotero ntchito za Kafka ndi zolowetsa zimagwiritsidwa ntchito posungira. Kupitilira apo, ClickHouse chakumbuyo ikupitilizabe kuphatikiza zigawozo, kuti zidziwitso zing'onozing'ono ziphatikizidwe ndikujambulidwa nthawi zambiri, motero kukulitsa kujambula. Komabe, mbali zambiri zosagwirizana zingayambitse kugwedezeka kwamphamvu kwa zoyika bola ngati kuphatikiza kukupitilira. Tapeza kuti kuyanjanitsa kwabwino kwambiri pakati pa kulowetsa data mu nthawi yeniyeni ndikuchita kulowetsedwa ndiko kuvomereza kuchuluka kwazomwe zimayikidwa pamphindikati patebulo.

Chinsinsi cha momwe mungawerengere patebulo ndikulozera ndi malo a data pa disk. Ziribe kanthu momwe kukonzaku kumathamanga, injini ikafunika kusanthula ma terabytes a data kuchokera pa disk ndikungogwiritsa ntchito kachigawo kakang'ono kake, zimatenga nthawi. ClickHouse ndi malo ogulitsira, kotero gawo lililonse limakhala ndi fayilo pagawo lililonse (gawo) lomwe lili ndi miyeso yosankhidwa pamzere uliwonse. Chifukwa chake, zigawo zonse zomwe sizikupezeka mufunso zitha kudumphidwa, kenako ma cell angapo amatha kusinthidwa molingana ndi kuphedwa kwa vectorized. Pofuna kupewa sikani yathunthu, gawo lililonse limakhala ndi kalozera kakang'ono.

Popeza kuti mizati yonse imasanjidwa ndi "kiyi choyambirira", fayilo ya index ili ndi zolemba zokha (mizere yojambulidwa) ya mzere uliwonse wa Nth, kuti athe kuwasunga m'chikumbukiro ngakhale pamatebulo akulu kwambiri. Mwachitsanzo, mutha kuyika zosintha zosasinthika kuti "mulembe mzere uliwonse wa 8192", kenako "zochepa" za tebulo lokhala ndi 1 thililiyoni. mizere yomwe imalowa m'mtima mosavuta ingatenge zilembo 122 zokha.

Kukula kwadongosolo

Kukula ndi kukonza kwa Clickhouse zitha kutsatiridwa Github repos ndikuwonetsetsa kuti njira ya "kukula" ikuchitika pamlingo wochititsa chidwi.

Kugwiritsa ntchito Clickhouse m'malo mwa ELK, Big Query ndi TimescaleDB

Kutchuka

Zikuwoneka kuti kutchuka kwa Clickhouse kukukulirakulira, makamaka m'dera la anthu olankhula Chirasha. Msonkhano wa chaka chatha High load 2018 (Moscow, November 8-9, 2018) umasonyeza kuti zinyama monga vk.com ndi Badoo zimagwiritsa ntchito Clickhouse, zomwe zimayika deta (mwachitsanzo, zipika) kuchokera ku makumi masauzande a ma seva panthawi imodzi. Mu kanema wa mphindi 40 Yuri Nasretdinov wa gulu la VKontakte amalankhula za momwe zimachitikira. Posachedwa titumiza zolembedwa pa Habr kuti zitheke kugwira ntchito ndi zinthuzo.

Mapulogalamu

Nditatha nthawi ndikufufuza, ndikuganiza kuti pali madera omwe ClickHouse ingakhale yothandiza kapena yokhoza kusintha njira zina zodziwika bwino monga MySQL, PostgreSQL, ELK, Google Big Query, Amazon RedShift, TimescaleDB, Hadoop, MapReduce, Pinot ndi Druid. Zotsatirazi ndi tsatanetsatane wogwiritsa ntchito ClickHouse kukweza kapena kusinthiratu DBMS yomwe ili pamwambapa.

Kuwonjezera MySQL ndi PostgreSQL

Posachedwapa, tidasintha pang'ono MySQL ndi ClickHouse papulatifomu yamakalata Mautic newsletter. Vuto linali loti MySQL chifukwa chosapanga bwino idalowetsa imelo iliyonse yotumizidwa ndi ulalo uliwonse mu imeloyo ndi base64 hash, ndikupanga tebulo lalikulu la MySQL (email_stats). Pambuyo potumiza maimelo miliyoni 10 okha kwa omwe adalembetsa nawo ntchitoyi, tebulo ili lidatenga 150 GB yamafayilo, ndipo MySQL idayamba "kupusa" pamafunso osavuta. Kuti tikonze vuto la malo a fayilo, tidagwiritsa ntchito bwino compression ya tebulo la InnoDB, yomwe idachepetsa ndi 4. Komabe, sizingakhale zomveka kusungira maimelo opitilira 20-30 miliyoni mu MySQL chifukwa chowerenga mbiri yakale, monga funso losavuta lomwe pazifukwa zina liyenera kupanga sikani yathunthu mukusinthana ndi I/O yolemetsa. pamwamba, zomwe tinkalandira machenjezo a Zabbix pafupipafupi.

Kugwiritsa ntchito Clickhouse m'malo mwa ELK, Big Query ndi TimescaleDB

Clickhouse amagwiritsa ntchito ma aligorivimu opondereza awiri omwe amachepetsa kuchuluka kwa data pafupifupi 3-4 nthawi, koma mu nkhani iyi, deta makamaka "compressible".

Kugwiritsa ntchito Clickhouse m'malo mwa ELK, Big Query ndi TimescaleDB

Kusintha kwa ELK

Kutengera zomwe ndakumana nazo, ELK stack (ElasticSearch, Logstash ndi Kibana, pankhaniyi ElasticSearch) imafuna zambiri kuti zitheke kuposa momwe zimafunikira kusunga matabwa. ElasticSearch ndi injini yabwino ngati mukufuna kusaka kwa zolemba zonse bwino (ndipo sindikuganiza kuti mukuzifuna), koma ndikudabwa chifukwa chake yakhala injini yodula mitengo. Kugwiritsa ntchito kwake, kuphatikiza ndi Logstash, kunatipatsa zovuta ngakhale pazantchito zopepuka komanso zimafunikira kuwonjezera kwa RAM ndi disk space. Monga nkhokwe, Clickhouse ndiyabwino kuposa ElasticSearch pazifukwa izi:

  • Thandizo la chilankhulo cha SQL;
  • Digiri yabwino kwambiri ya psinjika ya data yosungidwa;
  • Kuthandizira kusaka kwa Regex m'malo mofufuza mawu onse;
  • Kuwongolera kwamafunso ndikuchita bwino konse.

Pakalipano, vuto lalikulu lomwe limakhalapo poyerekeza ClickHouse ndi ELK ndi kusowa kwa njira zothetsera zipika, komanso kusowa kwa zolemba ndi maphunziro pamutuwu. Pa nthawi yomweyi, wogwiritsa ntchito aliyense akhoza kukhazikitsa ELK pogwiritsa ntchito buku la Digital Ocean, lomwe ndi lofunika kwambiri kuti pakhale njira zamakono zamakono. Pali injini ya database pano, koma palibe Filebeat ya ClickHouse pano. Inde, alipo bwino ndi ndondomeko yogwirira ntchito ndi zipika nyumba ya log, pali chida dinani mchira kuti mulowetse deta ya fayilo mu ClickHouse, koma zonsezi zimatenga nthawi yochulukirapo. Komabe, ClickHouse imatsogolerabe njira chifukwa cha kuphweka kwake, kotero ngakhale oyamba kumene amatha kuyiyika mosavuta ndikuyamba kugwiritsa ntchito mokwanira mphindi 10 zokha.

Kukonda mayankho a minimalist, ndidayesa kugwiritsa ntchito FluentBit, chida chotsitsa cholembera chotsika kwambiri, chokhala ndi ClickHouse, ndikuyesera kupewa kugwiritsa ntchito Kafka. Komabe, zosagwirizana zazing'ono ziyenera kuthetsedwa, monga zovuta zamtundu wa datezisanachitike popanda wosanjikiza wa projekiti yomwe imatembenuza deta kuchokera ku FluentBit kupita ku ClickHouse.

M'malo mwa Kibana, mutha kugwiritsa ntchito ClickHouse ngati backend grafana. Momwe ndikumvera, izi zitha kuyambitsa zovuta zogwirira ntchito popereka ma data ambiri, makamaka ndi mitundu yakale ya Grafana. Ku Qwintry, sitinayesebe izi, koma zodandaula za izi zimawonekera nthawi ndi nthawi pa njira yothandizira ya ClickHouse mu Telegraph.

Kusintha kwa Google Big Query ndi Amazon RedShift (njira yamakampani akuluakulu)

Njira yabwino yogwiritsira ntchito BigQuery ndikutsegula 1TB ya data ya JSON ndikuyendetsa mafunso owunikira. Big Query ndi chinthu chabwino chomwe ma scalability ake ndi ovuta kukulitsa. Iyi ndi pulogalamu yovuta kwambiri kuposa ClickHouse yomwe ikuyenda pagulu lamkati, koma kuchokera kwa kasitomala, ili ndi zambiri zofanana ndi ClickHouse. BigQuery imatha "kutsika mtengo" mwachangu mukangoyamba kulipira KUSANKHA kulikonse, ndiye yankho lenileni la SaaS ndi zabwino zonse ndi zoyipa zake.

ClickHouse ndiye chisankho chabwino kwambiri mukafunsa mafunso okwera mtengo kwambiri. Mafunso ochulukirapo omwe mumayendetsa tsiku lililonse, ndiye kuti amafunikiranso kusintha Big Query ndi ClickHouse, chifukwa m'malo mwake mudzapulumutsira masauzande a madola ikafika ma terabytes ambiri akukonzedwa. Izi sizikugwira ntchito pazida zosungidwa, zomwe ndizotsika mtengo kuti zitheke ku Big Query.

M'nkhani ya Alexander Zaitsev, woyambitsa nawo Altinity "Kusamukira ku ClickHouse" limafotokoza ubwino wa kusamuka koteroko kwa DBMS.

Kusintha kwa TimescaleDB

TimescaleDB ndikuwonjeza kwa PostgreSQL komwe kumakwaniritsa kugwira ntchito ndi nthawi mu nkhokwe yanthawi zonse (https://docs.timescale.com/v1.0/introduction, https://habr.com/ru/company/zabbix/blog/458530/).

Ngakhale ClickHouse siwopikisana nawo kwambiri pamndandanda wanthawi, koma malinga ndi kapangidwe kake komanso kafufuzidwe ka vekitala, imathamanga kwambiri kuposa TimescaleDB nthawi zambiri pokonza mafunso owunikira. Panthawi imodzimodziyo, ntchito yolandira paketi ya ClickHouse ili pafupi kuwirikiza katatu, kuwonjezera apo, imagwiritsa ntchito 3 nthawi yochepa ya disk space, yomwe ndi yofunika kwambiri pokonza deta yambiri ya mbiri yakale: 
https://www.altinity.com/blog/ClickHouse-for-time-series.

Mosiyana ndi ClickHouse, njira yokhayo yosungira malo ena a disk mu TimescaleDB ndikugwiritsa ntchito ZFS kapena mafayilo ofanana.

Zosintha zomwe zikubwera ku ClickHouse zitha kuyambitsa kupsinjika kwa delta, zomwe zipangitsa kuti zikhale zoyenera kukonza ndikusunga nthawi. TimescaleDB ikhoza kukhala chisankho chabwinoko kuposa ClickHouse yopanda kanthu pamilandu iyi:

  • makhazikitsidwe ang'onoang'ono okhala ndi RAM yochepa kwambiri (<3 GB);
  • ochuluka ang'onoang'ono INSERTs kuti simukufuna buffer mu zidutswa zazikulu;
  • kusasinthasintha bwino, kufanana ndi zofunikira za ACID;
  • Thandizo la PostGIS;
  • phatikizani ndi matebulo a PostgreSQL omwe alipo, popeza Timescale DB kwenikweni ndi PostgreSQL.

Mpikisano ndi machitidwe a Hadoop ndi MapReduce

Hadoop ndi zinthu zina za MapReduce zimatha kuwerengera zovuta kwambiri, koma zimakonda kuthamanga kwambiri. Chifukwa chake, ClickHouse ndiyothandiza kwambiri pochita kafukufuku wachangu, wolumikizana, womwe uyenera kukhala wosangalatsa kwa asayansi a data.

Mpikisano ndi Pinot ndi Druid

Omwe akupikisana nawo kwambiri a ClickHouse ndi omwe ali ndi mizere, zotsatsa zotseguka za Pinot ndi Druid. Ntchito yabwino kwambiri yofanizira machitidwe awa idasindikizidwa m'nkhaniyi Romana Leventova February 1, 2018

Kugwiritsa ntchito Clickhouse m'malo mwa ELK, Big Query ndi TimescaleDB

Nkhaniyi ikuyenera kusinthidwa - ikunena kuti ClickHouse sichirikiza ntchito za UPDATE ndi DELETE, zomwe sizowona kwathunthu pokhudzana ndi matembenuzidwe atsopano.

Tilibe chidziwitso chochuluka ndi ma DBMS awa, koma sindimakonda zovuta za zomangamanga zomwe zimafunikira kuyendetsa Druid ndi Pinot - ndi gulu lonse la "magawo osuntha" ozunguliridwa ndi Java kuchokera kumbali zonse.

Druid ndi Pinot ndi ma projekiti a Apache incubator, omwe amafotokozedwa mwatsatanetsatane ndi Apache patsamba lawo la projekiti ya GitHub. Pinot anawonekera mu chofungatira mu October 2018, ndipo Druid anabadwa miyezi 8 m'mbuyomo - mu February.

Kusowa kwa chidziwitso cha momwe AFS imagwirira ntchito kumadzutsa mafunso ena, ndipo mwina opusa, kwa ine. Ndikudabwa ngati olemba a Pinot adazindikira kuti Apache Foundation imakonda kwambiri Druid, ndipo kodi malingaliro otere kwa opikisana nawo adayambitsa kaduka? Kodi chitukuko cha Druid chidzacheperachepera komanso kukula kwa Pinot kufulumizitsa ngati othandizira omwe amathandizira omwe adakhala nawo mwadzidzidzi achita chidwi ndi omalizawo?

Zoyipa za ClickHouse

Kusakhwima: Mwachiwonekere, iyi ikadali teknoloji yotopetsa, koma mulimonse, palibe chonga ichi chikuwoneka mu DBMS ina.

Zoyikapo zazing'ono sizikuyenda bwino pa liwiro lalikulu: zoyikapo ziyenera kugawidwa m'magulu akulu chifukwa magwiridwe antchito ang'onoang'ono amatsika molingana ndi kuchuluka kwa mizati pamzere uliwonse. Umu ndi momwe ClickHouse imasungira deta pa disk - gawo lililonse limatanthauza fayilo imodzi kapena kuposerapo, kotero kuti muyike mzere umodzi wokhala ndi mizati 1, muyenera kutsegula ndi kulemba mafayilo osachepera 1. Ichi ndichifukwa chake kuyika buffer kumafuna mkhalapakati (pokhapokha ngati kasitomala mwiniyo akupereka buffer) - nthawi zambiri Kafka kapena mtundu wina wa mizere. Mutha kugwiritsanso ntchito injini ya tebulo la Buffer kuti pambuyo pake kukopera magawo akulu a data pamagome a MergeTree.

Kujowina patebulo kumakhala ndi malire ndi RAM ya seva, koma ali pamenepo! Mwachitsanzo, Druid ndi Pinot alibe kugwirizana koteroko nkomwe, chifukwa ndizovuta kukhazikitsa mwachindunji mu machitidwe omwe amagawidwa omwe samathandizira kusuntha zigawo zazikulu za deta pakati pa mfundo.

anapezazo

M'zaka zikubwerazi, tikukonzekera kugwiritsa ntchito kwambiri ClickHouse ku Qwintry, popeza DBMS iyi imapereka njira yabwino kwambiri yogwirira ntchito, yotsika kwambiri, yowonongeka, ndi kuphweka. Ndine wotsimikiza kuti idzafalikira mwachangu gulu la ClickHouse likadzabwera ndi njira zambiri zogwiritsira ntchito pazoyika zazing'ono komanso zapakati.

Zotsatsa zina 🙂

Zikomo chifukwa chokhala nafe. Kodi mumakonda zolemba zathu? Mukufuna kuwona zambiri zosangalatsa? Tithandizeni potipatsa oda kapena kulimbikitsa anzathu, mtambo VPS kwa opanga kuchokera ku $ 4.99, ma analogi apadera a ma seva olowera, omwe adakupangirani inu: Chowonadi chonse chokhudza VPS (KVM) E5-2697 v3 (6 Cores) 10GB DDR4 480GB SSD 1Gbps kuchokera $19 kapena momwe mungagawire seva? (ikupezeka ndi RAID1 ndi RAID10, mpaka 24 cores mpaka 40GB DDR4).

Dell R730xd 2x yotsika mtengo ku Equinix Tier IV data center ku Amsterdam? Pokhapokha 2 x Intel TetraDeca-Core Xeon 2x E5-2697v3 2.6GHz 14C 64GB DDR4 4x960GB SSD 1Gbps 100 TV kuchokera $199 ku Netherlands! Dell R420 - 2x E5-2430 2.2Ghz 6C 128GB DDR3 2x960GB SSD 1Gbps 100TB - kuchokera $99! Werengani za Momwe mungamangire Infrastructure Corp. kalasi pogwiritsa ntchito ma seva a Dell R730xd E5-2650 v4 ofunika ma euro 9000 pa khobiri?

Source: www.habr.com

Kuwonjezera ndemanga