Amfani da Clickhouse azaman maye gurbin ELK, Big Query da TimecaleDB

danna gidan tsarin gudanar da bayanai ne na tushen tushen tushen tushen bayanai don sarrafa tambaya ta kan layi (OLAP) wanda Yandex ya kirkira. Yana amfani da Yandex, CloudFlare, VK.com, Badoo da sauran ayyuka a duniya don adana bayanai masu yawa (saka dubban layuka a sakan daya ko petabytes na bayanan da aka adana akan faifai).

A cikin al'ada, "string" DBMS, misalan waɗanda sune MySQL, Postgres, MS SQL Server, ana adana bayanai ta wannan tsari:

Amfani da Clickhouse azaman maye gurbin ELK, Big Query da TimecaleDB

A wannan yanayin, ƙimar da ke da alaƙa da layi ɗaya ana adana su a zahiri a kusa. A cikin DBMS columnar, ana adana ƙima daga ginshiƙai daban-daban, kuma ana adana bayanan shafi ɗaya tare:

Amfani da Clickhouse azaman maye gurbin ELK, Big Query da TimecaleDB

Misalan DBMSs na columnar sune Vertica, Paraccel (Actian Matrix, Amazon Redshift), Sybase IQ, Exasol, Infobright, InfiniDB, MonetDB (VectorWise, Actian Vector), LucidDB, SAP HANA, Google Dremel, Google PowerDrill, Druid, kdb+.

Kamfanin mai aika wasiku ne Qwintry Na fara amfani da Clickhouse a cikin 2018 don bayar da rahoto kuma na gamsu sosai da sauƙi, haɓakawa, tallafin SQL, da sauri. Gudun wannan DBMS yana iyaka da sihiri.

'yanci

Clickhouse yana shigarwa akan Ubuntu tare da umarni ɗaya. Idan kun san SQL, zaku iya fara amfani da Clickhouse nan da nan don bukatun ku. Koyaya, wannan baya nufin cewa zaku iya "nuna ƙirƙira tebur" a cikin MySQL kuma kuyi kwafin-manna SQL a cikin Clickhouse.

Idan aka kwatanta da MySQL, akwai mahimman bambance-bambancen nau'in bayanai a cikin ma'anar tsarin tsarin tebur a cikin wannan DBMS, don haka har yanzu kuna buƙatar ɗan lokaci don canza ma'anar tsarin tsarin tebur kuma ku koyi injin tebur don samun daɗi.

Clickhouse yana aiki sosai ba tare da ƙarin software ba, amma idan kuna son yin amfani da kwafi, kuna buƙatar shigar da ZooKeeper. Binciken aikin tambaya yana nuna kyakkyawan sakamako - allunan tsarin sun ƙunshi duk bayanan, kuma ana iya dawo da duk bayanan ta amfani da tsohuwar SQL mai ban sha'awa.

Yawan aiki

  • Alamar alama Clickhouse da Vertica da MySQL kwatancen akan uwar garken sanyi: kwasfa biyu Intel® Xeon® CPU E5-2650 v2 @ 2.60GHz; 128 GiB RAM; md RAID-5 akan 8 6TB SATA HDD, ext4.
  • Alamar alama kwatanta Clickhouse tare da Amazon RedShift ajiyar girgije.
  • Bayanan Blog Cloudflare game da aikin Clickhouse:

Amfani da Clickhouse azaman maye gurbin ELK, Big Query da TimecaleDB

Rukunin bayanai na ClickHouse yana da tsari mai sauqi qwarai - duk nodes a cikin gungu suna da ayyuka iri ɗaya kuma suna amfani da ZooKeeper kawai don daidaitawa. Mun gina ƙaramin gungu na nodes da yawa kuma mun yi gwaji, a lokacin da muka gano cewa tsarin yana da kyakkyawan aiki, wanda ya dace da fa'idodin da ake da'awar a cikin ma'auni na DBMS. Mun yanke shawarar yin nazari sosai kan manufar da ke bayan ClickHouse. Farkon cikas ga bincike shine rashin kayan aiki da ƙananan al'umma na ClickHouse, don haka mun shiga cikin ƙirar wannan DBMS don fahimtar yadda yake aiki.

ClickHouse baya goyan bayan karɓar bayanai kai tsaye daga Kafka, saboda bayanai ne kawai, don haka mun rubuta sabis ɗin adaftar namu a cikin Go. Ya karanta Cap'n Proto rufaffiyar saƙon daga Kafka, ya canza su zuwa TSV, kuma ya saka su cikin ClickHouse a cikin batches ta hanyar haɗin HTTP. Daga baya mun sake rubuta wannan sabis ɗin don amfani da ɗakin karatu na Go tare da haɗin gwiwar ClickHouse na mu don inganta aiki. Lokacin da aka kimanta aikin karɓar fakiti, mun gano wani abu mai mahimmanci - ya nuna cewa don ClickHouse wannan aikin ya dogara da girman fakitin, wato, adadin layuka da aka saka a lokaci guda. Don fahimtar dalilin da yasa hakan ke faruwa, mun yi nazarin yadda ClickHouse ke adana bayanai.

Babban injin, ko kuma wajen, dangin injunan tebur da ClickHouse ke amfani da shi don adana bayanai, shine MergeTree. Wannan injin yana da kamanceceniya da algorithm na LSM da aka yi amfani da shi a cikin Google BigTable ko Apache Cassandra, amma yana guje wa gina teburin ƙwaƙwalwar ajiya na matsakaici kuma yana rubuta bayanai kai tsaye zuwa diski. Wannan yana ba ta kyakkyawan hanyar rubutawa, saboda kowace fakitin da aka saka ana jerawa kawai ta hanyar maɓalli na farko kawai, matsawa, da rubuta su zuwa faifai don samar da sashi.

Rashin tebur na ƙwaƙwalwar ajiya ko duk wani ra'ayi na "sabon" na bayanai kuma yana nufin cewa za'a iya ƙara su kawai, tsarin baya goyan bayan canzawa ko sharewa. Ya zuwa yau, hanya daya tilo don goge bayanan ita ce share su ta wata kalandar, saboda sassan ba su ketare iyakar wata guda ba. Ƙungiyar ClickHouse tana aiki tuƙuru don yin wannan fasalin wanda za'a iya daidaita shi. A gefe guda, yana sanya rubuce-rubuce da haɗa ƙungiyoyi marasa jayayya, don haka karɓar ma'auni na kayan aiki daidai gwargwado tare da adadin abin da aka saka a layi daya har sai I/O ko cores sun cika.
Koyaya, wannan yanayin kuma yana nufin cewa tsarin bai dace da ƙananan fakiti ba, don haka ana amfani da sabis na Kafka da masu sakawa don buffering. Bugu da ari, ClickHouse a baya ya ci gaba da ci gaba da haɗa sassan, ta yadda za a haɗa ƙananan ƙananan bayanai da yawa kuma a yi rikodin sau da yawa, don haka ƙara ƙarfin rikodin. Koyaya, ɓangarorin da ba su da alaƙa da yawa za su haifar da murkushe abubuwan sakawa muddun haɗuwa ta ci gaba. Mun gano cewa mafi kyawun sasantawa tsakanin shigar da bayanai na ainihin-lokaci da aiwatar da aikin ciki shine karɓar iyakataccen adadin abubuwan da aka saka a sakan daya a cikin tebur.

Makullin aikin karanta tebur shine ƙididdigewa da wuri na bayanai akan faifai. Komai saurin sarrafa shi, idan injin yana buƙatar bincika terabyte na bayanai daga faifai kuma kawai amfani da ɗan guntunsa, zai ɗauki lokaci. ClickHouse shine kantin sayar da ginshiƙi, don haka kowane yanki yana ƙunshe da fayil don kowane ginshiƙi (shafi) tare da ƙima mai ƙima don kowane jere. Don haka, gabaɗayan ginshiƙan da ba su kasance a cikin tambayar ba za a iya fara tsallake su, sannan ana iya sarrafa sel da yawa a layi daya tare da kisa. Don guje wa cikakken bincike, kowane yanki yana da ƙaramin fayil ɗin fihirisa.

Ganin cewa duk ginshiƙan ana jera su ta “maɓalli na farko”, fayil ɗin maƙasudin kawai yana ƙunshe da tambarin ( layuka da aka ɗauka) na kowane jere na Nth, don samun damar adana su cikin ƙwaƙwalwar ajiya har ma da manyan tebura. Misali, zaku iya saita saitunan tsoho zuwa “alama kowane jere na 8192”, sannan “mai ƙanƙanta” na tebur mai tiriliyan 1. Layukan da suka dace cikin sauƙi cikin ƙwaƙwalwar ajiya zasu ɗauki haruffa 122 kawai.

Ci gaban tsarin

Ana iya gano haɓakawa da haɓakawa na Clickhouse Github repos kuma tabbatar da cewa tsarin "girma" yana faruwa a cikin sauri mai ban sha'awa.

Amfani da Clickhouse azaman maye gurbin ELK, Big Query da TimecaleDB

Mashahuri Shahara

Da alama farin jinin Clickhouse yana karuwa sosai, musamman a cikin al'ummar Rashanci. Babban taro na 2018 na bara (Moscow, Nuwamba 8-9, 2018) ya nuna cewa dodanni kamar vk.com da Badoo suna amfani da Clickhouse, wanda ke saka bayanai (misali, rajistan ayyukan) daga dubun-dubatar sabobin lokaci guda. A cikin bidiyo na minti 40 Yuri Nasretdinov daga kungiyar VKontakte yayi magana game da yadda aka yi. Nan ba da jimawa ba za mu sanya kwafin a kan Habr don dacewa da aiki da kayan.

Aikace-aikace

Bayan kashe ɗan lokaci bincike, Ina tsammanin akwai wuraren da ClickHouse zai iya zama da amfani ko iya maye gurbin gaba ɗaya sauran ƙarin al'adun gargajiya da shahararrun hanyoyin kamar MySQL, PostgreSQL, ELK, Google Big Query, Amazon RedShift, TimescaleDB, Hadoop, MapReduce, Pinot da Druid. Wadannan sune cikakkun bayanai na amfani da ClickHouse don haɓakawa ko maye gurbin DBMS na sama gaba ɗaya.

Ƙara MySQL da PostgreSQL

Kwanan nan, mun maye gurbin MySQL tare da ClickHouse don dandalin labarai Jaridar Mautic. Matsalar ita ce MySQL saboda ƙirar da ba ta da kyau ta shiga kowane imel ɗin da aka aika da kowane hanyar haɗi a cikin imel ɗin tare da hash64, ƙirƙirar babban tebur MySQL (email_stats). Bayan aika imel miliyan 10 kawai ga masu biyan kuɗin sabis, wannan tebur ya mamaye 150 GB na sararin fayil, kuma MySQL ya fara "wawa" akan tambayoyin masu sauƙi. Don gyara matsalar sararin fayil, mun yi nasarar amfani da matsawar tebur na InnoDB, wanda ya rage shi da kashi 4. Duk da haka, har yanzu ba shi da ma'ana don adana imel sama da miliyan 20-30 a cikin MySQL kawai saboda karatun tarihi, kamar yadda duk wata tambaya mai sauƙi wanda saboda wasu dalilai dole ne a yi cikakken sakamakon binciken a musanyawa da nauyi I/O. sama, game da abin da a kai a kai muna karɓar gargaɗin Zabbix.

Amfani da Clickhouse azaman maye gurbin ELK, Big Query da TimecaleDB

Clickhouse yana amfani da algorithms matsawa guda biyu waɗanda ke rage adadin bayanai da kusan 3-4 sau, amma a cikin wannan yanayin musamman, bayanan sun kasance "masu ƙarfi".

Amfani da Clickhouse azaman maye gurbin ELK, Big Query da TimecaleDB

Maye gurbin ELK

Dangane da gogewar kaina, tarin ELK (ElasticSearch, Logstash da Kibana, a cikin wannan yanayin ElasticSearch) yana buƙatar ƙarin albarkatu don gudana fiye da yadda ake buƙata don adana rajistan ayyukan. ElasticSearch babban inji ne idan kuna son kyakkyawan binciken log na cikakken rubutu (kuma ba na tsammanin kuna buƙatar shi da gaske), amma ina mamakin dalilin da ya sa ya zama ingin rajista na gaskiya. Ayyukansa na ciki, haɗe da Logstash, sun ba mu matsaloli ko da a daidaitaccen nauyin aiki kuma yana buƙatar ƙarin ƙarin RAM da sarari diski. A matsayin bayanan bayanai, Clickhouse ya fi ElasticSearch kyau saboda dalilai masu zuwa:

  • Taimakon yare na SQL;
  • Mafi kyawun matakin matsawa na bayanan da aka adana;
  • Taimako don binciken Regex maimakon cikakken binciken rubutu;
  • Ingantattun jadawalin tambaya da ingantaccen aiki gabaɗaya.

A halin yanzu, babbar matsalar da ta taso idan aka kwatanta ClickHouse tare da ELK shine rashin mafita don loda rajistan ayyukan, da kuma rashin takardu da koyawa kan wannan batu. A lokaci guda, kowane mai amfani zai iya saita ELK ta amfani da littafin Digital Ocean, wanda ke da matukar muhimmanci ga saurin aiwatar da irin waɗannan fasahohin. Akwai injin bayanai anan, amma babu Filebeat don ClickHouse tukuna. Ee, akwai m da tsarin aiki tare da katako gidan log, akwai kayan aiki clicktail don shigar da bayanan fayil ɗin shiga cikin ClickHouse, amma duk wannan yana ɗaukar ƙarin lokaci. Koyaya, ClickHouse har yanzu yana jagorantar hanya saboda sauƙin sa, don haka ko da masu farawa suna iya shigar da shi cikin sauƙi kuma su fara amfani da cikakken aiki a cikin mintuna 10 kawai.

Zaɓin mafi ƙarancin mafita, Na gwada amfani da FluentBit, kayan aiki mai ƙarancin ƙwaƙwalwar ajiyar ƙwaƙwalwar ajiya, tare da ClickHouse, yayin ƙoƙarin guje wa amfani da Kafka. Koyaya, ana buƙatar magance ƙananan rashin daidaituwa, kamar al'amurran tsarin kwanan watakafin a iya yin shi ba tare da Layer na wakili wanda ke canza bayanai daga FluentBit zuwa ClickHouse ba.

A matsayin madadin Kibana, zaku iya amfani da ClickHouse azaman abin baya Grafana. Kamar yadda na fahimta, wannan na iya haifar da al'amurran da suka shafi aiki yayin samar da adadi mai yawa na bayanai, musamman tare da tsofaffin nau'ikan Grafana. A cikin Qwintry, ba mu gwada wannan ba tukuna, amma gunaguni game da wannan suna bayyana lokaci zuwa lokaci akan tashar tallafin ClickHouse a cikin Telegram.

Sauya Google Big Query da Amazon RedShift (maganin manyan kamfanoni)

Kyakkyawan yanayin amfani don BigQuery shine a loda 1 TB na bayanan JSON da gudanar da tambayoyin nazari akansa. Big Query samfuri ne mai girma wanda girmansa ke da wuyar ƙima. Wannan babbar manhaja ce mai rikitarwa fiye da ClickHouse, wacce ke gudana akan gungu na ciki, amma ta fuskar abokin ciniki tana da kamanceceniya da ClickHouse. BigQuery na iya yin tsada da sauri da zarar kun fara biyan kowane SELECT, don haka mafita ce ta SaaS ta gaskiya tare da duk fa'ida da fursunoni.

ClickHouse shine mafi kyawun zaɓi lokacin da kuke gudanar da tambayoyi masu tsada masu tsadar lissafi. Yawan tambayoyin SELECT da kuke gudanarwa a kowace rana, mafi mahimmancin abin da ya sa ya zama mafi mahimmanci don maye gurbin Big Query da ClickHouse, saboda irin wannan maye gurbin zai cece ku dubban daloli idan ya zo ga yawancin terabytes na bayanai da ake sarrafa su. Wannan bai shafi bayanan da aka adana ba, wanda ke da arha don sarrafawa a cikin Babban Tambaya.

A cikin labarin da Alexander Zaitsev, co-kafa Altinity "Matsar zuwa ClickHouse" ya bayyana fa'idodin irin wannan ƙaura na DBMS.

Sauyawa TimecaleDB

TimescaleDB shine tsawo na PostgreSQL wanda ke inganta aiki tare da jerin lokuta a cikin bayanan yau da kullum (https://docs.timescale.com/v1.0/introduction, https://habr.com/ru/company/zabbix/blog/458530/).

Kodayake ClickHouse ba babban mai fafatawa bane a cikin jerin lokaci, amma dangane da tsarin shafi da aiwatar da binciken vector, yana da sauri fiye da TimescaleDB a mafi yawan lokuta na sarrafa tambayoyin bincike. A lokaci guda, aikin karɓar bayanan fakitin ClickHouse yana da kusan sau 3 mafi girma, ƙari kuma, yana amfani da ƙasa da sarari faifai sau 20, wanda yake da mahimmanci ga sarrafa manyan bayanan tarihi: 
https://www.altinity.com/blog/ClickHouse-for-time-series.

Ba kamar ClickHouse ba, hanya ɗaya tilo don adana wasu sarari diski a cikin TimecaleDB shine amfani da ZFS ko tsarin fayil iri ɗaya.

Sabuntawa masu zuwa zuwa ClickHouse na iya gabatar da matsawar delta, wanda zai sa ya fi dacewa da sarrafawa da adana bayanan jerin lokaci. TimecaleDB na iya zama mafi kyawun zaɓi fiye da bare ClickHouse a cikin waɗannan lokuta:

  • ƙananan shigarwa tare da ƙananan RAM (<3 GB);
  • adadi mai yawa na ƙananan INSERTs waɗanda ba ku so ku sanya su cikin manyan gutsuttsura;
  • mafi kyawun daidaito, daidaituwa da bukatun ACID;
  • Tallafin PostGIS;
  • Haɗa tare da tebur na PostgreSQL na yanzu, tunda Timecale DB shine ainihin PostgreSQL.

Gasa tare da tsarin Hadoop da MapReduce

Hadoop da sauran samfuran MapReduce na iya yin ƙididdige ƙididdiga masu yawa, amma suna da saurin aiki da sauri. Don haka, ClickHouse ya fi dacewa don yin sauri, bincike na nazari na mu'amala, wanda yakamata ya kasance mai sha'awar masana kimiyyar bayanai.

Gasa tare da Pinot da Druid

Matsakaicin masu fafatawa na ClickHouse su ne ginshiƙai, samfuran buɗaɗɗen madaidaicin madaidaiciyar Pinot da Druid. An buga kyakkyawan aiki na kwatanta waɗannan tsarin a cikin labarin Romana Leventova Fabrairu 1, 2018

Amfani da Clickhouse azaman maye gurbin ELK, Big Query da TimecaleDB

Wannan labarin yana buƙatar sabuntawa - ya ce ClickHouse baya goyan bayan ayyukan UPDATE da DELETE, wanda ba gaskiya bane gaba ɗaya dangane da sabbin nau'ikan.

Ba mu da gogewa da yawa tare da waɗannan DBMSs, amma ba na son sarƙaƙƙiyar ƙayyadaddun kayan aikin da ake buƙata don gudanar da Druid da Pinot - duka rukuni ne na "ɓangarorin motsi" da Java ke kewaye da shi daga kowane bangare.

Druid da Pinot ayyukan Apache ne incubator, waɗanda Apache ya rufe dalla-dalla akan shafukan aikin su na GitHub. Pinot ya bayyana a cikin incubator a watan Oktoba 2018, kuma an haifi Druid watanni 8 a baya - a watan Fabrairu.

Rashin bayani game da yadda AFS ke aiki yana tayar da wasu, kuma watakila wawa, tambayoyi a gare ni. Ina mamakin idan marubutan Pinot sun lura cewa Gidauniyar Apache ta fi son Druid, kuma shin irin wannan hali ga mai fafatawa ya haifar da hassada? Shin ci gaban Druid zai ragu kuma ci gaban Pinot zai haɓaka idan masu tallafawa da ke tallafawa tsohon ba zato ba tsammani sun zama masu sha'awar na ƙarshe?

Rashin amfanin ClickHouse

Rashin girma: Babu shakka, wannan har yanzu fasaha ce mai ban sha'awa, amma a kowane hali, ba a ganin wani abu kamar wannan a cikin sauran DBMS na columnar.

Ƙananan abubuwan da ake sakawa ba sa aiki da kyau a cikin babban gudu: dole ne a raba abubuwan da aka saka zuwa manyan ƙugiya saboda aikin ƙananan abubuwan da ake sakawa yana raguwa daidai da adadin ginshiƙai a kowane jere. Wannan shine yadda ClickHouse ke adana bayanai akan faifai - kowane shafi yana nufin fayil 1 ko fiye, don haka don saka jere 1 mai ɗauke da ginshiƙai 100, kuna buƙatar buɗewa da rubuta aƙalla fayiloli 100. Wannan shine dalilin da yasa saka buffering yana buƙatar tsaka-tsaki (sai dai idan abokin ciniki da kansa ya ba da buffering) - yawanci Kafka ko wani nau'i na tsarin layi. Hakanan zaka iya amfani da injin tebur na Buffer don daga baya kwafi manyan ɓangarorin bayanai cikin teburan MergeTree.

Abubuwan haɗin tebur suna iyakance ta RAM uwar garken, amma aƙalla suna can! Misali, Druid da Pinot ba su da irin wannan haɗin kwata-kwata, tunda suna da wahalar aiwatarwa kai tsaye a cikin tsarin da aka rarraba waɗanda ba sa goyan bayan motsi manyan ɓangarorin bayanai tsakanin nodes.

binciken

A cikin shekaru masu zuwa, muna shirin yin amfani mai yawa na ClickHouse a cikin Qwintry, saboda wannan DBMS yana ba da ma'auni mai kyau na aiki, ƙananan sama, haɓakawa, da sauƙi. Na tabbata zai bazu cikin sauri da zarar jama'ar ClickHouse sun fito da karin hanyoyin amfani da shi a kanana da matsakaita shigarwa.

Wasu tallace-tallace 🙂

Na gode da kasancewa tare da mu. Kuna son labaran mu? Kuna son ganin ƙarin abun ciki mai ban sha'awa? Goyon bayan mu ta hanyar ba da oda ko ba da shawara ga abokai, girgije VPS don masu haɓakawa daga $ 4.99, analog na musamman na sabar matakin shigarwa, wanda mu muka ƙirƙira muku: Duk gaskiyar game da VPS (KVM) E5-2697 v3 (6 Cores) 10GB DDR4 480GB SSD 1Gbps daga $19 ko yadda ake raba sabar? (akwai tare da RAID1 da RAID10, har zuwa 24 cores kuma har zuwa 40GB DDR4).

Dell R730xd 2x mai rahusa a cibiyar bayanan Equinix Tier IV a Amsterdam? Nan kawai 2 x Intel TetraDeca-Core Xeon 2x E5-2697v3 2.6GHz 14C 64GB DDR4 4x960GB SSD 1Gbps 100 TV daga $199 a cikin Netherlands! Dell R420 - 2x E5-2430 2.2Ghz 6C 128GB DDR3 2x960GB SSD 1Gbps 100TB - daga $99! Karanta game da Yadda ake gina Infrastructure Corp. aji tare da amfani da sabar Dell R730xd E5-2650 v4 masu darajan Yuro 9000 akan dinari?

source: www.habr.com

Add a comment