Ingantacciyar amfani da ClickHouse. Alexei Milovidov (Yandex)

Ingantacciyar amfani da ClickHouse. Alexei Milovidov (Yandex)

Tun da ClickHouse tsari ne na musamman, yana da mahimmanci a yi la'akari da abubuwan da ke cikin gine-ginen lokacin amfani da shi. A cikin wannan rahoto, Alexey zai yi magana game da misalan kurakurai na yau da kullun lokacin amfani da ClickHouse, wanda zai iya haifar da aiki mara inganci. Yin amfani da misalai masu amfani, za a nuna yadda zaɓin ɗaya ko wani tsarin sarrafa bayanai zai iya canza aiki ta umarni mai girma.

Sannu duka! Sunana Alexey, Ina yin ClickHouse.

Ingantacciyar amfani da ClickHouse. Alexei Milovidov (Yandex)

Da farko, na gaggauta faranta muku rai nan da nan, ba zan gaya muku yau menene ClickHouse ba. Gaskiya na gaji da shi. Ina gaya muku duk lokacin da abin yake. Kuma tabbas kowa ya riga ya sani.

Ingantacciyar amfani da ClickHouse. Alexei Milovidov (Yandex)

Madadin haka, zan gaya muku menene rake mai yuwuwa, watau yadda ake amfani da ClickHouse ba daidai ba. A gaskiya ma, kada ku ji tsoro, saboda muna haɓaka ClickHouse a matsayin tsarin da ke da sauƙi, dacewa, kuma yana aiki daga cikin akwatin. An shigar da komai, babu matsala.

Amma har yanzu, dole ne a la'akari da cewa wannan tsarin na musamman ne kuma zaka iya yin tuntuɓe a kan wani yanayin amfani da ba a saba ba wanda zai fitar da wannan tsarin daga yankin kwanciyar hankali.

To, menene rake? Ainihin zan yi magana game da abubuwan bayyane. Komai a bayyane yake ga kowa, kowa ya fahimci komai kuma yana iya farin ciki cewa suna da hankali sosai, kuma waɗanda ba su fahimta ba za su koyi sabon abu.

Ingantacciyar amfani da ClickHouse. Alexei Milovidov (Yandex)

Misali mafi sauƙi na farko, wanda, rashin alheri, sau da yawa yakan faru, shine babban adadin abubuwan da aka saka tare da ƙananan batches, watau babban adadin ƙananan abubuwan da aka saka.

Idan muka yi la'akari da yadda ClickHouse ke yin abin sakawa, to za ku iya aika aƙalla terabyte na bayanai a cikin buƙatu ɗaya. Ba matsala.

Kuma bari mu ga abin da na hali yi zai zama. Misali, muna da tebur tare da bayanan Yandex.Metrica. Hits 105 wasu ginshiƙai. 700 bytes ba a matsawa ba. Kuma za mu saka a hanya mai kyau batches na layin miliyan daya.

Mun saka a cikin tebur na MergeTree, ana samun rabin layuka a cikin dakika daya. Mai girma. A cikin tebur da aka kwafi - zai kasance kaɗan kaɗan, kusan layuka 400 a sakan daya.

Kuma idan kun kunna abin da ake sakawa, za ku sami ɗan ƙasa kaɗan, amma har yanzu kyakkyawan aiki, sau 250 a cikin daƙiƙa guda. Shigar da Quorum fasali ne mara izini a ClickHouse*.

*Daga shekarar 2020, riga an rubuta.

Ingantacciyar amfani da ClickHouse. Alexei Milovidov (Yandex)

Me zai faru idan kun yi kuskure? Mun saka jere daya a cikin tebur na MergeTree kuma muna samun layuka 59 a sakan daya. Wannan shine sau 10 a hankali. A cikin ReplicatedMergeTree - layuka 000 a sakan daya. Idan kuma adadin ya kunna, to ana samun layi biyu a cikin daƙiƙa guda. A ra'ayina, wannan wani nau'i ne na tsatsauran ra'ayi. Ta yaya za ku rage gudu haka? Har ma yana faɗi akan T-shirt na cewa kada ClickHouse ya rage gudu. Amma duk da haka yana faruwa wani lokaci.

Ingantacciyar amfani da ClickHouse. Alexei Milovidov (Yandex)

A gaskiya wannan shi ne kasawarmu. Za mu iya sanya shi yayi aiki da kyau, amma ba mu yi ba. Kuma ba mu yi ba, saboda rubutun mu ba ya bukatarsa. Mun riga mun sami batches. Mun dai karɓi batches a ƙofar, kuma babu matsala. Toshe shi kuma komai yana aiki lafiya. Amma, ba shakka, kowane irin yanayi yana yiwuwa. Misali, lokacin da kake da tarin sabar da aka samar da bayanai akan su. Kuma ba sa saka bayanai akai-akai, amma har yanzu suna yawan sakawa. Kuma kuna buƙatar guje wa wannan ko ta yaya.

Daga ra'ayi na fasaha, layin ƙasa shine lokacin da kuka yi sakawa a cikin ClickHouse, bayanan ba su shiga cikin kowane abin da za a iya tunawa. Ba mu ma da ainihin tsarin log ɗin MergeTree, amma kawai MergeTree, saboda babu log ko memTable. Mu kawai mu rubuta bayanan nan da nan zuwa tsarin fayil, an riga an lalace zuwa ginshiƙai. Kuma idan kana da ginshiƙai 100, to, fiye da fayiloli 200 za a buƙaci a rubuta su zuwa wani kundin adireshi daban. Duk wannan yana da wahala sosai.

Ingantacciyar amfani da ClickHouse. Alexei Milovidov (Yandex)

Kuma tambaya ta taso: "Yadda za a yi daidai?" Idan irin wannan yanayin, har yanzu kuna buƙatar ko ta yaya rubuta bayanai zuwa ClickHouse.

Hanyar 1. Wannan ita ce hanya mafi sauƙi. Yi amfani da wani nau'in layin da aka rarraba. Misali, Kafka. Kuna fitar da bayanai daga Kafka, muna ba da shi sau ɗaya a cikin daƙiƙa guda. Kuma komai zai yi kyau, kun yi rikodin, komai yana aiki lafiya.

Rashin hasara shine cewa Kafka wani tsarin rarrabawa ne mai wahala. Na kuma gane idan kun riga kuna da Kafka a cikin kamfanin ku. Yana da kyau, ya dace. Amma idan ba a can ba, to ya kamata ku yi tunani sau uku kafin ku jawo wani tsarin da aka rarraba a cikin aikinku. Sabili da haka yana da kyau a yi la'akari da madadin.

Ingantacciyar amfani da ClickHouse. Alexei Milovidov (Yandex)

Hanyar 2. Anan akwai irin wannan madadin tsohuwar makaranta kuma a lokaci guda mai sauƙi. Kuna da wani nau'in uwar garken da ke haifar da rajistan ayyukan ku. Kuma kawai yana rubuta rajistan ayyukan ku zuwa fayil. Kuma sau ɗaya a cikin daƙiƙa, misali, muna sake sunan wannan fayil ɗin, buɗe wani sabo. Kuma wani rubutun daban ko dai ta cron ko wasu daemon yana ɗaukar fayil mafi tsufa kuma ya rubuta shi zuwa ClickHouse. Idan ka rubuta rajistan ayyukan sau ɗaya a cikin daƙiƙa, to komai zai yi kyau.

Amma illar wannan hanyar ita ce idan uwar garken da aka samar da logs a ciki ya bace a wani wuri, to bayanan ma za su bace.

Ingantacciyar amfani da ClickHouse. Alexei Milovidov (Yandex)

Hanyar 3. Akwai wata hanya mai ban sha'awa, wanda ba tare da fayilolin wucin gadi ba kwata-kwata. Misali, kuna da wani nau'in tallan talla ko wani daemon mai ban sha'awa wanda ke haifar da bayanai. Kuma zaku iya tara tarin bayanai daidai a cikin RAM, a cikin buffer. Kuma idan isasshen lokaci ya wuce, sai ku ajiye wannan buffer a gefe, ku ƙirƙiri sabo, kuma ku saka abin da ya riga ya tara a cikin ClickHouse a cikin wani zaren daban.

A gefe guda, bayanan kuma suna ɓacewa tare da kashe -9. Idan uwar garken ku ta ragu, za ku rasa wannan bayanan. Wata matsalar kuma ita ce, idan ba za ku iya rubuta wa ma’adanar bayanai ba, to bayananku za su taru a cikin RAM. Kuma ko dai RAM ɗin ya ƙare, ko kuma kawai ku rasa bayanai.

Ingantacciyar amfani da ClickHouse. Alexei Milovidov (Yandex)

Hanyar 4. Wata hanya mai ban sha'awa. Kuna da kowane tsari na uwar garken. Kuma yana iya aika bayanai zuwa ClickHouse a lokaci guda, amma yi shi a cikin haɗin gwiwa ɗaya. Misali, na aika buƙatun http tare da canja wuri-encoding: gunki tare da sakawa. Kuma yana haifar da chunks ba da wuya ba, zaku iya aika kowane layi, kodayake za a sami kan gaba don tsara wannan bayanan.

Koyaya, a wannan yanayin, za a aika bayanan zuwa ClickHouse nan da nan. Kuma ClickHouse da kansa zai adana su.

Amma akwai kuma matsaloli. Yanzu za ku rasa bayanai, ciki har da lokacin da aka kashe tsarin ku da kuma idan an kashe tsarin ClickHouse, saboda zai zama shigarwar da ba ta cika ba. Kuma a cikin abubuwan da ake sakawa ClickHouse akwai atomic har zuwa wasu ƙayyadaddun ƙofa a cikin girman layuka. A ka'ida, wannan hanya ce mai ban sha'awa. Hakanan za'a iya amfani dashi.

Ingantacciyar amfani da ClickHouse. Alexei Milovidov (Yandex)

Hanyar 5. Ga wata hanya mai ban sha'awa. Wannan wani nau'i ne na sabar da al'umma ta haɓaka don daidaita bayanai. Ban dube shi da kaina ba, don haka ba zan iya ba da tabbacin komai ba. Koyaya, babu garanti don ClickHouse kanta. Wannan kuma buɗaɗɗen tushe ne, amma a gefe guda, zaku iya saba da wasu ƙa'idodi masu inganci waɗanda muke ƙoƙarin samarwa. Amma don wannan abu - ban sani ba, je GitHub, duba lambar. Wataƙila sun rubuta wani abu mai kyau.

* tun daga shekarar 2020, ya kamata kuma a kara yin la'akari Gidan Kitten.

Ingantacciyar amfani da ClickHouse. Alexei Milovidov (Yandex)

Hanyar 6. Wata hanya ita ce ta amfani da tebur na Buffer. Amfanin wannan hanya shine cewa yana da sauƙin fara amfani da shi. Ƙirƙirar tebur mai buffer kuma saka a ciki.

Amma illar ita ce ba a gama magance matsalar gaba daya ba. Idan akan adadin nau'in MergeTree dole ne ku tattara bayanai ta hanyar batch ɗaya a cikin daƙiƙa guda, sannan a ƙimar kuɗi a cikin tebur mai buffer, kuna buƙatar rukuni aƙalla har zuwa dubu da yawa a cikin daƙiƙa guda. Idan akwai fiye da 10 a cikin dakika guda, zai kasance mara kyau. Idan kuma ka saka a batches, to ka ga ana samun layi dubu dari a cikin dakika guda a can. Kuma wannan ya riga ya kasance akan bayanai masu nauyi.

Sannan kuma teburan buffer ba su da gungumen azaba. Kuma idan wani abu ba daidai ba tare da uwar garken ku, to bayanan za su ɓace.

Ingantacciyar amfani da ClickHouse. Alexei Milovidov (Yandex)

Kuma a matsayin kari, kwanan nan mun sami damar tattara bayanai daga Kafka a ClickHouse. Akwai injin tebur - Kafka. Kuna kawai ƙirƙira. Kuma kuna iya rataya ra'ayi na zahiri akansa. A wannan yanayin, zai fitar da bayanan daga Kafka kuma ya saka shi cikin teburin da kuke buƙata.

Kuma abin da ke da daɗi musamman game da wannan dama shi ne ba mu samu ba. Wannan sifa ce ta al'umma. Kuma idan na ce "siffar al'umma", na faɗi shi ba tare da wani raini ba. Mun karanta lambar, yi bita, ya kamata yayi aiki lafiya.

* kamar na 2020, akwai irin wannan tallafi don RabbitMQ.

Ingantacciyar amfani da ClickHouse. Alexei Milovidov (Yandex)

Menene kuma zai iya zama mara kyau ko mara tsammani lokacin saka bayanai? Idan kayi tambayan saka ƙima kuma rubuta wasu ƙididdigan maganganu a cikin ƙima. Misali, yanzu() kuma magana ce da aka kimanta. Kuma a wannan yanayin, ClickHouse yana tilasta ƙaddamar da mai fassarar waɗannan maganganun ga kowane layi, kuma aikin zai ragu ta hanyar oda mai girma. Gara a guje shi.

* a halin yanzu, an warware matsalar gaba ɗaya, babu sauran koma bayan aiki yayin amfani da maganganu a cikin VALUES.

Wani misali inda za a iya samun wasu matsaloli shine lokacin da bayanan ku akan tsari ɗaya ya kasance na gungun ɓangarori. Ta hanyar tsoho, ClickHouse partitions ta wata-wata. Kuma idan kun saka adadin layuka miliyan, kuma akwai bayanai na shekaru da yawa, to, zaku sami kashi goma sha biyu a can. Kuma wannan yana daidai da gaskiyar cewa za a sami batches da yawa sau da yawa karami, saboda a ciki an fara raba su zuwa kashi.

* kwanan nan a cikin ClickHouse a cikin yanayin gwaji ya ƙara tallafi don ƙaramin tsari na chunks da chunks a cikin RAM tare da rubuta-gaba, wanda kusan yana magance matsalar gaba ɗaya.

Ingantacciyar amfani da ClickHouse. Alexei Milovidov (Yandex)

Yanzu la'akari da nau'in matsala na biyu - bugun bayanai.

Buga bayanai na iya zama mai tsauri, kuma wani lokacin kirtani. String - wannan shine lokacin da kawai ka ɗauka kuma ka bayyana cewa kana da duk filayen nau'in kirtani. Yana tsotsa. Ba lallai ne ku yi hakan ba.

Bari mu gano yadda za a yi daidai a lokuta inda kake son cewa muna da filin, kirtani, kuma bari ClickHouse ya gano shi da kansa, amma ba zan yi wanka ba. Amma har yanzu yana da kyau a yi ƙoƙari.

Ingantacciyar amfani da ClickHouse. Alexei Milovidov (Yandex)

Misali, muna da adireshin IP. A wani yanayi, mun ajiye shi azaman kirtani. Misali, 192.168.1.1. In ba haka ba, zai zama adadin nau'in UInt32*. 32 ragowa sun isa ga adireshin IPv4.

Na farko, abin banƙyama, za a matsa bayanan kusan iri ɗaya. Za a sami bambanci, tabbas, amma ba mai girma ba. Don haka babu matsaloli na musamman tare da diski I/O.

Amma akwai babban bambanci a lokacin CPU da lokacin aiwatar da tambaya.

Bari mu ƙidaya adadin adiresoshin IP na musamman idan an adana su azaman lambobi. Yana fitar da layukan miliyan 137 a sakan daya. Idan daidai yake da layukan, to, layukan miliyan 37 a sakan daya. Ban san dalilin da ya sa wannan daidaituwar ta faru ba. Na yi waɗannan buƙatun da kaina. Amma duk da haka kusan sau 4 a hankali.

Kuma idan kun lissafta bambancin sararin diski, to akwai kuma bambanci. Kuma bambancin kusan kashi ɗaya ne, saboda akwai adiresoshin IP na musamman da yawa. Kuma da akwai layukan da ke da ƴan ƙima daban-daban, to da an natse su cikin nutsuwa a cikin ƙamus zuwa kusan girma iri ɗaya.

Kuma bambancin lokaci sau hudu ba a kwance akan hanya ba. Wataƙila ku, ba shakka, ba ku damu ba, amma lokacin da na ga irin wannan bambanci, ina jin bakin ciki.

Ingantacciyar amfani da ClickHouse. Alexei Milovidov (Yandex)

Bari mu yi la'akari da lokuta daban-daban.

1. Harka ɗaya idan kuna da ƙima daban-daban na musamman. A wannan yanayin, muna amfani da aiki mai sauƙi wanda ƙila ka sani kuma zaka iya amfani da shi don kowane DBMS. Wannan duk yana da ma'ana ba kawai don ClickHouse ba. Kawai rubuta masu gano lamba zuwa ma'ajin bayanai. Kuma kuna iya jujjuya zuwa kirtani da baya a gefen aikace-aikacen ku.

Misali, kuna da yanki. Kuma kuna ƙoƙarin ajiye shi azaman kirtani. Kuma za a rubuta a can: Moscow da kuma Moscow yankin. Kuma lokacin da na ga cewa an rubuta "Moscow" a can, to, wannan ba kome ba ne, kuma lokacin da yake MO, ko ta yaya ya zama bakin ciki. Yawan bytes kenan.

Maimakon haka, kawai mu rubuta lambar Ulnt32 da 250. Muna da 250 a Yandex, amma naku na iya bambanta. Kawai idan, zan ce ClickHouse yana da ginanniyar ikon aiki tare da geobase. Kuna kawai rubuta kundin adireshi tare da yankuna, ciki har da matsayi na matsayi, watau za a sami Moscow, Yankin Moscow, da duk abin da kuke buƙata. Kuma zaka iya jujjuyawa a matakin buƙata.

Ingantacciyar amfani da ClickHouse. Alexei Milovidov (Yandex)

Zaɓin na biyu kusan iri ɗaya ne, amma tare da tallafi a cikin ClickHouse. Nau'in bayanan Enum ne. Kuna kawai rubuta duk ƙimar da kuke buƙata a cikin Enum. Misali, nau'in na'urar kuma rubuta a can: tebur, wayar hannu, kwamfutar hannu, TV. Zaɓuɓɓuka 4 kawai.

Rashin hasara shine kuna buƙatar canza lokaci-lokaci. An ƙara zaɓi ɗaya kawai. Muna yin tebur mai canzawa. A zahiri, canza tebur a ClickHouse kyauta ne. Musamman kyauta ga Enum saboda bayanan akan faifai ba ya canzawa. Amma duk da haka, alter ya sami makulli * akan tebur kuma dole ne ya jira har sai an kammala duk zaɓin. Kuma bayan haka, za a aiwatar da canji, watau, har yanzu akwai wasu rashin jin daɗi.

* a cikin 'yan kwanan nan na ClickHouse, ALTER an yi shi gaba ɗaya ba tare da toshewa ba.

Ingantacciyar amfani da ClickHouse. Alexei Milovidov (Yandex)

Wani zaɓi na musamman na ClickHouse shine haɗin ƙamus na waje. Kuna iya rubuta lambobi a ClickHouse, kuma ku adana kundayen adireshi a kowane tsarin da ya dace da ku. Misali, zaku iya amfani da: MySQL, Mongo, Postgres. Kuna iya ƙirƙirar microservice na ku, wanda zai aika wannan bayanan ta hanyar http. Kuma a matakin ClickHouse, kuna rubuta aikin da zai canza wannan bayanai daga lambobi zuwa kirtani.

Wannan hanya ce ta musamman amma tana da inganci don yin haɗin gwiwa akan tebur na waje. Kuma akwai zaɓuɓɓuka biyu. A cikin zaɓi ɗaya, waɗannan bayanan za a adana su gabaɗaya, cikakke a cikin RAM kuma a sabunta su a wasu tazara. Kuma a cikin wani zaɓi, idan wannan bayanan bai dace da RAM ba, to zaku iya cache ta wani ɓangare.

Ga misali. Akwai Yandex.Direct. Kuma akwai kamfanin talla da banners. Wataƙila akwai dubban miliyoyin kamfanonin talla. Kuma yayi daidai a cikin RAM. Kuma akwai biliyoyin tutoci, ba su dace ba. Kuma muna amfani da ƙamus mai ɓoye daga MySQL.

Matsalar kawai ita ce ƙamus ɗin da aka adana zai yi aiki mai kyau idan ƙimar bugun ta kusa 100%. Idan ya kasance karami, to lokacin aiwatar da buƙatun kowane fakitin bayanai, zai zama dole a zahiri ɗaukar maɓallan da suka ɓace kuma je ɗaukar bayanai daga MySQL. Game da ClickHouse, har yanzu zan iya ba da tabbacin cewa - a, ba ya raguwa, ba zan yi magana game da wasu tsarin ba.

Kuma a matsayin kari, ƙamus hanya ce mai sauƙi don sabunta bayanai a cikin ClickHouse a baya. Wato kuna da rahoto kan kamfanonin talla, mai amfani kawai ya canza kamfanin talla kuma a cikin duk tsoffin bayanan, a cikin duk rahotanni, wannan bayanan kuma sun canza. Idan ka rubuta layuka kai tsaye zuwa tebur, to ba za ka iya sabunta su ba.

Ingantacciyar amfani da ClickHouse. Alexei Milovidov (Yandex)

Wata hanya kuma lokacin da ba ku san inda za ku sami masu gano abubuwan kirtani na ku ba. za ku iya kawai hash. Kuma mafi sauƙi zaɓi shine ɗaukar zanta 64-bit.

Matsalar kawai ita ce idan hash ɗin 64-bit ne, to tabbas za ku sami karo. Domin idan akwai layukan biliyan guda, to yuwuwar ta riga ta zama abin gani.

Kuma ba zai yi kyau a sanya sunayen kamfanonin talla irin wannan ba. Idan kamfen ɗin talla na kamfanoni daban-daban ya haɗu, to za a sami wani abu da ba za a iya fahimta ba.

Kuma akwai dabara mai sauƙi. Gaskiya ne, kuma bai dace da mahimman bayanai ba, amma idan wani abu ba shi da mahimmanci, to kawai ƙara wani mai gano abokin ciniki zuwa maɓallin ƙamus. Kuma a sa'an nan za ku sami karo, amma a cikin abokin ciniki ɗaya kawai. Kuma muna amfani da wannan hanyar don taswirar hanyar haɗi a cikin Yandex.Metrica. Muna da url a can, muna adana hashes. Kuma mun san cewa akwai rikice-rikice, ba shakka. Amma lokacin da aka nuna shafi, to yuwuwar ya kasance akan shafi ɗaya ga mai amfani ɗaya cewa wasu urls suna haɗuwa tare kuma za a lura da hakan, to ana iya yin watsi da wannan.

A matsayin kari, don ayyuka da yawa, hashes kawai sun isa kuma igiyoyin kansu ba za a iya adana su a ko'ina ba.

Ingantacciyar amfani da ClickHouse. Alexei Milovidov (Yandex)

Wani misali idan igiyoyin gajeru ne, kamar wuraren yanar gizon. Ana iya adana su kamar yadda yake. Ko, alal misali, harshen burauzar ru shine 2 bytes. Tabbas naji tausayin bytes amma kar ki damu, 2 bytes ba abin tausayi bane. Da fatan za a kiyaye shi yadda yake, kada ku damu.

Ingantacciyar amfani da ClickHouse. Alexei Milovidov (Yandex)

Wani lamari kuma shine lokacin da, akasin haka, akwai igiyoyi masu yawa kuma a lokaci guda akwai masu yawa na musamman a cikinsu, har ma saitin yana da yuwuwar rashin iyaka. Misali na yau da kullun shine jumlar bincike ko urls. Bincika jimlolin, gami da saboda buga rubutu. Bari mu ga jumlolin bincike nawa nawa na musamman kowace rana. Kuma ya bayyana cewa kusan rabin duk abubuwan da suka faru. Kuma a wannan yanayin, kuna iya tunanin cewa kuna buƙatar daidaita bayanan, ƙidaya masu ganowa, sanya su a cikin wani tebur daban. Amma ba lallai ne ku yi hakan ba. Kawai kiyaye waɗannan layin kamar yadda yake.

Mafi kyau - kada ku ƙirƙira wani abu, domin idan kun adana shi daban, kuna buƙatar yin haɗin gwiwa. Kuma wannan haɗin yana da mafi kyawun damar shiga ƙwaƙwalwar ajiya, idan har yanzu ya dace da ƙwaƙwalwar ajiya. Idan bai dace ba, to za a sami matsaloli gaba ɗaya.

Kuma idan an adana bayanan a wurin, to ana karanta su kawai a cikin tsari mai kyau daga tsarin fayil kuma komai yana da kyau.

Ingantacciyar amfani da ClickHouse. Alexei Milovidov (Yandex)

Idan kana da urls ko wani hadadden dogon kirtani, to ya kamata ka yi tunani game da gaskiyar cewa za ka iya ƙididdige wasu matsi a gaba kuma ka rubuta shi a cikin wani shafi daban.

Don urls, misali, zaku iya adana yankin daban. Kuma idan da gaske kuna buƙatar yanki, to kawai kuyi amfani da wannan shafi, kuma urls zasu yi ƙarya, kuma ba za ku taɓa su ba.

Bari mu ga menene bambancin. ClickHouse yana da ayyuka na musamman wanda ke ƙididdige yankin. Yana da sauri sosai, mun inganta shi. Kuma, a gaskiya, ba ya ma bin tsarin RFC, amma duk da haka yana la'akari da duk abin da muke bukata.

Kuma a cikin yanayi ɗaya, kawai za mu sami urls kuma mu lissafta yankin. Yana juya 166 millise seconds. Kuma idan kun ɗauki yankin da aka shirya, to yana fitowa kawai 67 millise seconds, wato, kusan sau uku cikin sauri. Kuma da sauri, ba don muna buƙatar yin wasu ƙididdiga ba, amma saboda mun karanta ƙananan bayanai.

Don wasu dalilai, buƙatu ɗaya, wanda ke da hankali, yana samun ƙarin saurin gigabytes a cikin daƙiƙa guda. Domin yana karanta ƙarin gigabytes. Wannan shi ne gaba daya m bayanai. Buƙatun yana da alama yana gudu da sauri, amma yana ɗaukar lokaci mai tsawo don kammalawa.

Kuma idan aka duba adadin bayanan da ke cikin faifan, zai nuna cewa URL ɗin yana da megabytes 126, kuma yankin yana da megabyte 5 kacal. Yana juya sau 25 ƙasa. Koyaya, har yanzu tambayar tana da sauri sau 4 kawai. Amma saboda bayanan sun yi zafi. Kuma idan sanyi ne, tabbas zai yi sauri sau 25 saboda diski I / O.

Af, idan kun kimanta nawa yankin ya kasa da URL, to ya zama kusan sau 4. Amma saboda wasu dalilai, bayanan da ke cikin faifai yana ɗaukar sau 25 ƙasa. Me yasa? Saboda matsawa. Kuma an matsa url, kuma an matsa yankin. Amma sau da yawa url yana ƙunshe da tarin shara.

Ingantacciyar amfani da ClickHouse. Alexei Milovidov (Yandex)

Kuma, ba shakka, yana da kyau a yi amfani da nau'ikan bayanan da suka dace waɗanda aka tsara musamman don madaidaitan dabi'u ko waɗanda suka dace. Idan kana cikin IPv4 to store Uint32*. Idan IPv6, to FixedString(16), saboda adireshin IPv6 shine 128 bits, watau adana kai tsaye a tsarin binary.

Amma menene idan kuna da adiresoshin IPv4 wani lokaci kuma wani lokacin IPv6? Ee, zaku iya kiyaye duka biyun. Ɗayan shafi don IPv4, wani don IPv6. Tabbas, akwai zaɓi don taswirar IPv4 zuwa IPv6. Wannan kuma zai yi aiki, amma idan sau da yawa kuna buƙatar adireshin IPv4 a cikin buƙatunku, to zai yi kyau a saka shi a cikin wani shafi daban.

* ClickHouse yanzu yana da nau'ikan bayanan IPv4 daban daban, nau'ikan bayanan IPv6 waɗanda ke adana bayanai yadda yakamata kamar lambobi, amma suna wakiltar su cikin dacewa kamar kirtani.

Ingantacciyar amfani da ClickHouse. Alexei Milovidov (Yandex)

Hakanan yana da mahimmanci a lura cewa yana da daraja preprocessing bayanai a gaba. Misali, wasu danyen katako suna zuwa gare ku. Kuma, watakila, kada ku sanya su cikin ClickHouse nan da nan, kodayake yana da jaraba don yin komai kuma komai zai yi aiki. Amma har yanzu yana da kyau a aiwatar da waɗannan lissafin da ke yiwuwa.

Alal misali, browser version. A wasu sassan maƙwabta, waɗanda ba na son nuna yatsa a ciki, ana adana nau'in burauzar a can kamar haka, wato, azaman zaren: 12.3. Sannan kuma, don yin rahoto, sai su ɗauki wannan kirtani su raba ta hanyar tsararru, sannan kuma da kashi na farko na tsararru. A zahiri, komai yana raguwa. Na tambayi me yasa suke yin haka. Sun gaya mani cewa ba sa son ingantawa da wuri. Kuma ba na son rashin son zuciya da wuri.

Don haka a wannan yanayin zai zama mafi daidai a raba zuwa ginshiƙai 4. Kada ku ji tsoro a nan, saboda wannan ClickHouse ne. ClickHouse shine bayanan shafi. Kuma mafi kyawun ƙananan ginshiƙai, mafi kyau. Za a sami BrowserVersion 5, yi ginshiƙai 5. Wannan yayi kyau.

Ingantacciyar amfani da ClickHouse. Alexei Milovidov (Yandex)

Yanzu la'akari da abin da za ku yi idan kuna da igiyoyi masu tsayi da yawa, masu tsayi sosai. Ba sa buƙatar adana su a cikin ClickHouse kwata-kwata. Madadin haka, zaku iya adana wasu mai ganowa kawai a cikin ClickHouse. Kuma waɗannan dogayen layukan sun tursasa su zuwa wani tsarin.

Misali, ɗayan sabis ɗin binciken mu yana da wasu sigogi na aukuwa. Kuma idan ma'auni da yawa sun zo ga abubuwan da suka faru, muna kawai adana 512 na farko da suka zo. Domin 512 ba abin tausayi ba ne.

Ingantacciyar amfani da ClickHouse. Alexei Milovidov (Yandex)

Kuma idan ba za ku iya yanke shawara kan nau'ikan bayanan ku ba, to, kuna iya rubuta bayanai zuwa ClickHouse, amma zuwa tebur na wucin gadi na nau'in Log, wanda ke musamman don bayanan wucin gadi. Bayan haka, zaku iya bincika nau'ikan nau'ikan nau'ikan nau'ikan nau'ikan nau'ikan nau'ikan nau'ikan da kuke da su a can, abin da yake gabaɗaya a can.

* Yanzu ClickHouse yana da nau'in bayanai Low Cardinality wanda ke ba ku damar adana kirtani da kyau tare da ƙarancin ƙoƙari.

Ingantacciyar amfani da ClickHouse. Alexei Milovidov (Yandex)

Yanzu la'akari da wani lamari mai ban sha'awa. Wani lokaci abubuwa suna aiki a wata hanya mai ban mamaki ga mutane. Na je na ga wannan. Kuma nan da nan da alama wasu gogaggun ƙwararrun ƙwararrun ƙwararrun ƙwararrun ƙwararrun ƙwararrun ƙwararru ne waɗanda ke da gogewa sosai wajen kafa MySQL version 3.23.

A nan mun ga tebur dubu, kowanne daga cikinsu yana dauke da ragowar rabon ba a san menene da dubu ba.

A ka'ida, ina mutunta kwarewar wasu, gami da fahimtar irin wahalhalu da za a iya samu.

Ingantacciyar amfani da ClickHouse. Alexei Milovidov (Yandex)

Kuma dalilan sun fi ko žasa bayyananne. Waɗannan su ne tsofaffin stereotypes waɗanda ƙila sun taru yayin aiki tare da wasu tsarin. Misali, tebur na MyISAM ba su da maɓalli na farko da aka taru. Kuma wannan hanyar raba bayanai na iya zama yunƙuri na matsananciyar ƙoƙarin samun aiki iri ɗaya.

Wani dalili kuma shi ne cewa yana da wahala a yi kowane canje-canje a kan manyan tebura. Za a toshe komai. Kodayake a cikin nau'ikan MySQL na zamani, wannan matsalar ba ta da tsanani sosai.

Ko, misali, microsharding, amma ƙari akan wancan daga baya.

Ingantacciyar amfani da ClickHouse. Alexei Milovidov (Yandex)

A cikin ClickHouse, ba kwa buƙatar yin wannan, saboda, da farko, maɓalli na farko ya taru, ana yin odar bayanan ta maɓallin farko.

Kuma wasu lokuta mutane suna tambayata: "Ta yaya aikin tambayoyin kewayo a ClickHouse ke canzawa tare da girman tebur?". Nace sam baya canzawa. Misali, kana da tebur mai liyu biliyan daya kuma kana karanta jeri-jeri miliyan daya. Komai yana lafiya. Idan tebur yana da layuka tiriliyan kuma kuna karanta layuka miliyan ɗaya, to kusan zai zama iri ɗaya.

Kuma, na biyu, ba a buƙatar kowane guntu kamar ɓangarorin hannu. Idan ka shiga ka kalli abin da ke cikin tsarin fayil, za ka ga cewa tebur abu ne mai matukar mahimmanci. Kuma a ciki akwai wani abu kamar partitions. Wato, ClickHouse yana yi muku komai kuma ba kwa buƙatar wahala.

Ingantacciyar amfani da ClickHouse. Alexei Milovidov (Yandex)

Canja a ClickHouse kyauta ne idan canza add/saukar shafi.

Kuma kada ku yi kananan teburi, domin idan kuna da layuka 10 ko 10 a teburin ku, to ba komai. ClickHouse shine tsarin da ke inganta kayan aiki, ba latency ba, don haka ba shi da ma'ana don aiwatar da layi 000.

Ingantacciyar amfani da ClickHouse. Alexei Milovidov (Yandex)

Daidai ne a yi amfani da babban teburi ɗaya. Ka rabu da tsofaffin stereotypes, komai zai yi kyau.

Kuma a matsayin kari, a cikin sabon sigar, muna da damar yin maɓallin raba rabe na sabani don aiwatar da kowane nau'in ayyukan kulawa akan ɓangarori ɗaya.

Misali, kuna buƙatar ƙananan teburi da yawa, misali, lokacin da ake buƙatar aiwatar da wasu bayanan tsaka-tsaki, kuna karɓar guntu kuma kuna buƙatar yin canji akan su kafin rubuta zuwa tebur na ƙarshe. Don wannan yanayin, akwai injin tebur mai ban mamaki - StripeLog. Yana kama da TinyLog, kawai mafi kyau.

* Yanzu ClickHouse yana da ƙari shigar da aikin tebur.

Ingantacciyar amfani da ClickHouse. Alexei Milovidov (Yandex)

Wani anti-tsari shine microsharding. Misali, kuna buƙatar share bayanai kuma kuna da sabar guda 5, kuma gobe za a sami sabar guda 6. Kuma kuna tunanin yadda ake daidaita wannan bayanan. Kuma a maimakon haka, ba kuna raba kashi 5 ba, amma zuwa 1 shards. Sannan kuna taswirar kowane ɗayan waɗannan microshards zuwa uwar garken daban. Kuma zaku yi nasara akan sabar guda ɗaya, misali, 000 ClickHouse, misali. Misali daban akan tashoshin jiragen ruwa daban ko bayanan bayanai daban.

Ingantacciyar amfani da ClickHouse. Alexei Milovidov (Yandex)

Amma a ClickHouse wannan ba shi da kyau sosai. Domin ko da misali ɗaya na ClickHouse yana ƙoƙarin amfani da duk albarkatun uwar garken da ake da su don aiwatar da buƙatu ɗaya. Wato kana da wani nau'in uwar garken kuma akwai, misali, 56 processor cores. Kuna gudanar da tambayar da zata ɗauki daƙiƙa ɗaya kuma zata yi amfani da maƙallan 56. Kuma idan kun sanya 200 ClickHouses akan uwar garken guda ɗaya a can, yana nuna cewa zaren 10 zai fara. Gabaɗaya, komai zai zama mara kyau.

Wani dalili kuma shi ne cewa rarraba ayyuka a cikin waɗannan lokuta ba za su yi daidai ba. Wasu za su gama da wuri, wasu za su ƙare daga baya. Idan duk wannan ya faru a wani misali guda, to ClickHouse da kansa zai gano yadda za a rarraba bayanan daidai a tsakanin rafukan.

Kuma wani dalili shine za ku sami sadarwar interprocessor akan TCP. Dole ne a jera bayanan, a ɓata, kuma wannan babban adadin microshards ne. Kawai ba zai yi aiki ba.

Ingantacciyar amfani da ClickHouse. Alexei Milovidov (Yandex)

Wani antipattern, ko da yake yana da wuya a kira shi antipattern. Wannan babban adadin pre-taro ne.

Gabaɗaya, preaggregation yana da kyau. Kuna da layuka biliyan guda, kun haɗa shi kuma ya zama layuka 1, kuma yanzu ana aiwatar da tambayar nan take. Komai yana da kyau. Haka za ku iya. Kuma don wannan, ko da ClickHouse yana da nau'in tebur na AggregatingMergeTree na musamman wanda ke yin haɓaka haɓaka yayin shigar da bayanai.

Amma akwai lokutan da kuke tunanin cewa za mu tara bayanai kamar wannan kuma mu tara bayanai kamar haka. Kuma a wasu sassan maƙwabta, ba na so in faɗi wanne ko ɗaya, suna amfani da tebur na SummingMergeTree don taƙaitawa ta maɓallin farko, kuma ana amfani da ginshiƙai 20 azaman maɓalli na farko. Kawai idan, na canza sunayen wasu ginshiƙai don makirci, amma wannan game da shi ke nan.

Ingantacciyar amfani da ClickHouse. Alexei Milovidov (Yandex)

Kuma irin waɗannan matsalolin suna tasowa. Na farko, adadin bayanan da kuke da shi bai ragu da yawa ba. Misali, an rage shi sau uku. Sau uku zai zama farashi mai kyau don samun damar ƙididdiga marasa iyaka waɗanda ke zuwa tare da samun bayanan da ba a haɗa su ba. Idan an tattara bayanan, to, kuna samun ƙididdiga marasa kyau kawai maimakon nazari.

Kuma menene na musamman mai kyau? Cewa waɗannan mutane daga sashe na gaba, je ku nemi wani lokaci don ƙara ƙarin shafi ɗaya zuwa maɓallin farko. Wato mun tattara bayanai kamar haka, kuma yanzu muna son ƙarin kaɗan. Amma babu maɓalli na farko a ClickHouse. Don haka, dole ne ku rubuta wasu rubutun a cikin C ++. Kuma ba na son rubutun, koda kuwa suna cikin C++.

Kuma idan kun kalli abin da aka ƙirƙira ClickHouse don haka, bayanan da ba a haɗa su ba shine ainihin yanayin da aka haife shi. Idan kuna amfani da ClickHouse don bayanan da ba a tara ba, to kuna yin komai daidai. Idan kuna tarawa, to wannan wani lokaci ana gafartawa.

Ingantacciyar amfani da ClickHouse. Alexei Milovidov (Yandex)

Wani lamari mai ban sha'awa shine buƙatun a cikin madauki mara iyaka. Wani lokaci ina zuwa wasu sabar samarwa kuma in kalli jerin abubuwan nunawa a can. Kuma duk lokacin da na gano cewa wani mummunan abu yana faruwa.

Misali, ga wannan. Nan da nan ya bayyana cewa yana yiwuwa a yi komai a cikin buƙatu ɗaya. Kawai rubuta url a ciki da lissafin a can.

Ingantacciyar amfani da ClickHouse. Alexei Milovidov (Yandex)

Me yasa yawancin irin waɗannan buƙatun a cikin madauki mara iyaka ba su da kyau? Idan ba a yi amfani da fihirisar ba, to za ku sami wucewa da yawa akan bayanai iri ɗaya. Amma idan aka yi amfani da fihirisa, alal misali, kuna da maɓallin farko akan ru kuma kuna rubuta url = wani abu a can. Kuma kuna tsammanin za a karanta url guda ɗaya daga tebur, komai zai yi kyau. Amma da gaske a'a. Domin ClickHouse yana yin komai a cikin batches.

Lokacin da yake buƙatar karanta wasu kewayon bayanai, ya ɗan ƙara karantawa, saboda maƙasudin da ke cikin ClickHouse ba shi da yawa. Wannan fihirisar baya ba ku damar nemo jere guda ɗaya a cikin tebur, kawai wani nau'in kewayon. Kuma an matsa bayanan a cikin tubalan. Domin karanta layi daya, kuna buƙatar ɗaukar duka block ɗin kuma ku kwance shi. Kuma idan kun gudanar da gungun tambayoyi, za ku sami hanyoyin haɗin gwiwa da yawa na irin waɗannan, kuma za ku sami ayyuka da yawa da aka yi akai-akai.

Ingantacciyar amfani da ClickHouse. Alexei Milovidov (Yandex)

Kuma a matsayin kari, zaku iya ganin cewa a cikin ClickHouse kada ku ji tsoro don canja wurin ko da megabyte har ma da daruruwan megabyte zuwa sashin IN. Na tuna daga al'adar mu cewa idan muka wuce ɗimbin dabi'u a cikin sashin IN a cikin MySQL, alal misali, mun wuce megabyte 100 na wasu lambobi a can, sannan MySQL yana cinye gigabytes 10 na ƙwaƙwalwar ajiya kuma babu wani abin da ya faru. shi, komai yana aiki mugun aiki.

Abu na biyu kuma shi ne, a ClickHouse, idan tambayoyinku sun yi amfani da index, to, a ko da yaushe ba kasafai ake yin cikakken scan ba, wato idan kuna bukatar karanta kusan dukkan teburin, zai je bi-bi-da-kulli ya karanta dukkan teburin. Gabaɗaya, zai gane shi.

Duk da haka, akwai wasu matsaloli. Misali, cewa IN tare da subquery baya amfani da fihirisar. Amma wannan ita ce matsalarmu kuma muna buƙatar gyara ta. Babu wani abu mai mahimmanci a nan. Mu yi*.

Kuma wani abu mai ban sha'awa shi ne cewa idan kuna da buƙatu mai tsayi sosai kuma ana aiwatar da aikace-aikacen buƙatun, to za a aika wannan buƙatar mai tsayi sosai ga kowace uwar garke ba tare da matsawa ba. Misali, megabytes 100 da sabar 500. Kuma, saboda haka, 50 gigabytes za a canjawa wuri a kan hanyar sadarwa. Za a canja shi sannan kuma za a yi nasarar aiwatar da komai cikin nasara.

* riga da amfani; komai ya daidaita kamar yadda akayi alkawari.

Ingantacciyar amfani da ClickHouse. Alexei Milovidov (Yandex)

Kuma abu ne gama gari idan buƙatun sun fito daga API. Misali, kun yi wani nau'in sabis. Kuma idan wani yana buƙatar sabis ɗin ku, to, kun buɗe API kuma a zahiri bayan kwana biyu zaku ga cewa wani abu da ba a fahimta ba yana faruwa. Komai yayi yawa kuma wasu munanan buƙatu suna shigowa waɗanda bai kamata a taɓa yi ba.

Kuma mafita daya ce kawai. Idan kun bude API, to dole ne ku yanke shi. Misali, don shigar da wasu ƙididdiga. Babu wasu zaɓuɓɓuka masu ma'ana. In ba haka ba, nan da nan za su rubuta rubutun kuma za a sami matsaloli.

Kuma ClickHouse yana da fasali na musamman - wannan shine lissafin ƙididdiga. Bugu da ƙari, za ku iya canja wurin maɓallin keɓaɓɓiyar ku. Wannan, misali, ID na mai amfani na ciki. Kuma za a ƙididdige ƙididdiga ga kowane ɗayan su daban.

Ingantacciyar amfani da ClickHouse. Alexei Milovidov (Yandex)

Yanzu wani abu mai ban sha'awa. Wannan kwafi ne da hannu.

Na san lokuta da yawa inda, duk da ClickHouse yana da goyon bayan kwafi a ciki, mutane suna kwafi ClickHouse da hannu.

Menene ka'ida? Kuna da bututun sarrafa bayanai. Kuma yana aiki da kansa, alal misali, a cikin cibiyoyin bayanai daban-daban. Kuna rubuta bayanai iri ɗaya a hanya ɗaya zuwa ClickHouse, kamar yadda yake. Gaskiya, aiki yana nuna cewa har yanzu bayanan za su bambanta saboda wasu abubuwan da ke cikin lambar ku. Ina fatan hakan a cikin naku.

Kuma lokaci-lokaci har yanzu kuna yin aiki tare da hannu. Misali, sau daya a wata admins suna yin rsync.

A zahiri, yana da sauƙin amfani da ginanniyar kwafi a cikin ClickHouse. Amma akwai iya zama wasu contraindications, saboda wannan kana bukatar ka yi amfani da ZooKeeper. Ba zan ce wani abu mara kyau game da ZooKeeper ba, bisa ga ka'ida, tsarin yana aiki, amma ya faru cewa mutane ba sa amfani da shi saboda java-phobia, saboda ClickHouse shine irin wannan tsarin mai kyau da aka rubuta a C ++ wanda zaka iya amfani da shi kuma komai zai yi kyau . Kuma ZooKeeper a cikin java. Kuma ko ta yaya ba kwa so ku duba, amma sai za ku iya amfani da kwafin hannu.

Ingantacciyar amfani da ClickHouse. Alexei Milovidov (Yandex)

ClickHouse tsarin aiki ne. Yana la'akari da bukatun ku. Idan kuna da kwafi na hannu, to zaku iya ƙirƙirar Tebu mai Rarraba wanda ke duba kwafin hannunku kuma yayi kasala a tsakanin su. Kuma akwai ma zaɓi na musamman wanda ke ba ku damar guje wa flops, ko da layukan ku sun bambanta da tsari.

Ingantacciyar amfani da ClickHouse. Alexei Milovidov (Yandex)

Bugu da ari, ana iya samun matsaloli idan kun yi amfani da injunan tebur na farko. ClickHouse shine mai ginawa wanda ke da tarin injunan tebur daban-daban. Ga duk manyan lamuran, kamar yadda aka rubuta a cikin takaddun, yi amfani da tebur na dangin MergeTree. Kuma duk sauran - wannan haka ne, don lokuta na mutum ko don gwaje-gwaje.

A cikin teburin MergeTree, ba kwa buƙatar samun kwanan wata da lokaci. Kuna iya amfani da har yanzu. Idan babu kwanan wata da lokaci, rubuta cewa tsoho shine 2000. Zai yi aiki kuma ba zai buƙaci albarkatun ba.

Kuma a cikin sabon sigar uwar garken, har ma za ku iya tantance cewa kuna da rarrabawar al'ada ba tare da maɓallin ɓangarori ba. Haka zai kasance.

Ingantacciyar amfani da ClickHouse. Alexei Milovidov (Yandex)

A gefe guda, ana iya amfani da injunan tebur na farko. Misali, cika bayanan sau ɗaya kuma duba, murɗawa kuma share. Kuna iya amfani da Log.

Ko adana ƙananan juzu'i don sarrafa matsakaici shine StripeLog ko TinyLog.

Ana iya amfani da ƙwaƙwalwar ajiya idan akwai ƙaramin adadin bayanai kuma kawai karkatar da wani abu a cikin RAM.

Ingantacciyar amfani da ClickHouse. Alexei Milovidov (Yandex)

ClickHouse ba ya son bayanan da aka sabunta sosai.

Ga misali na yau da kullun. Wannan babban adadin urls ne. Kuna sanya su a cikin tebur kusa. Sannan mun yanke shawarar yin JOIN tare da su, amma wannan ba zai yi aiki ba, a ka'ida, saboda ClickHouse kawai yana goyan bayan Hash JOIN. Idan babu isassun RAM don yawancin bayanai da za a haɗa su, to JOIN ba zai yi aiki ba *.

Idan bayanan suna da babban mahimmanci, to, kada ku damu, adana shi a cikin tsari mara kyau, URLs suna cikin wurin kai tsaye a cikin babban tebur.

* kuma yanzu ClickHouse yana da haɗin haɗin gwiwa kuma, kuma yana aiki a cikin yanayin da matsakaicin bayanan bai dace da RAM ba. Amma wannan ba shi da tasiri kuma shawarar ta kasance mai inganci.

Ingantacciyar amfani da ClickHouse. Alexei Milovidov (Yandex)

Wasu karin misalan, amma na riga na yi shakka ko sun sabawa tsarin ko a'a.

ClickHouse yana da sanannen koma baya. Bai san yadda ake sabuntawa ba *. A wata ma'ana, wannan ma yana da kyau. Idan kuna da wasu mahimman bayanai, misali, lissafin kuɗi, to babu wanda zai iya aika su, saboda babu sabuntawa.

* tallafi don sabuntawa da sharewa cikin yanayin tsari an daɗe ana ƙarawa.

Amma akwai wasu hanyoyi na musamman waɗanda ke ba da damar ɗaukakawa su bayyana a bango. Misali, tebur nau'in ReplaceMergeTree. Suna yin sabuntawa yayin haɗawar bango. Kuna iya tilasta wannan tare da ingantaccen tebur. Amma kar a yawaita yin hakan, domin gaba daya za ta sake rubuta partition din.

Rarraba JOINs a cikin ClickHouse - wannan kuma mai tsara tsarin tambaya ba ya sarrafa shi.

Mummuna, amma wani lokacin yayi kyau.

Yin amfani da ClickHouse kawai don karanta bayanan baya tare da zaɓi*.

Ba zan ba da shawarar yin amfani da ClickHouse don ƙididdiga masu girma ba. Amma wannan ba gaskiya bane gaba ɗaya, domin mun riga mun ƙaura daga wannan shawarar. Kuma kwanan nan mun ƙara ikon yin amfani da samfuran koyon injin a ClickHouse - Catboost. Kuma yana damun ni, domin ina tsammanin: "Mene ne abin tsoro. Wannan shine yawan hawan keke a kowane byte yana fitowa! Abin tausayi ne a gare ni don fara hawan agogo akan bytes.

Ingantacciyar amfani da ClickHouse. Alexei Milovidov (Yandex)

Amma kada ku ji tsoro, shigar ClickHouse, komai zai yi kyau. Idan wani abu, muna da al'umma. Af, al'umma ku ne. Kuma idan kuna da wata matsala, za ku iya aƙalla zuwa tattaunawarmu, kuma ina fatan za a taimake ku.

Tambayoyi

Na gode da rahoton! A ina zan yi korafi game da hadarin ClickHouse?

Kuna iya kawo min ƙararraki a yanzu.

Kwanan nan na fara amfani da ClickHouse. Nan da nan jefar da cli interface.

Menene maki.

Daga baya kadan, na sauke uwar garken tare da ƙaramin zaɓi.

Kuna da basira.

Na bude bug GitHub, amma an yi watsi da shi.

Za mu gani.

Aleksey ya yaudare ni don halartar rahoton, yana yi mani alkawarin gaya mani yadda kuke matsi bayanan a ciki.

Mai sauqi qwarai.

Wannan shi ne abin da na gane jiya. Ƙarin ƙayyadaddun bayanai.

Babu mugun dabaru. Kawai toshe-by-block matsawa. Tsohuwar ita ce LZ4, zaku iya kunna ZSTD*. Tubalan daga 64 kilobytes zuwa 1 megabyte.

* Hakanan akwai goyan baya ga ƙwararrun codecs na matsawa waɗanda za a iya amfani da su cikin sarka tare da sauran algorithms.

Shin tubalan kawai danyen bayanai ne?

Ba daidai danye ba. Akwai tsararraki. Idan kana da ginshiƙi na lamba, to, lambobin da ke jere suna jeri a jeri.

Yana da zahiri.

Alexey, misali wanda yake tare da uniqExact akan IPs, watau gaskiyar cewa uniqExact yana ɗaukar tsayin ƙidayawa ta kirtani fiye da ta lambobi, da sauransu. Mene ne idan muka yi amfani da kunnuwan mu kuma muka jefa a lokacin gyarawa? Wato kamar ka ce bai bambanta da yawa akan faifai ba. Idan muka karanta layi daga faifai, jefa, to, za mu sami jimlar sauri ko a'a? Ko kuwa har yanzu muna samun riba kaɗan a nan? Da alama kun gwada shi, amma saboda wasu dalilai ba ku nuna shi a cikin ma'auni ba.

Ina tsammanin zai kasance a hankali fiye da babu simintin. A wannan yanayin, dole ne a warware adireshin IP daga igiyar. Tabbas, a cikin ClickHouse, ana inganta fassarar adireshi na IP kuma. Mun yi ƙoƙari sosai, amma a wuri guda kuna da lambobin da aka rubuta a cikin nau'i na dubu goma. Ba dadi sosai. A gefe guda, aikin uniqExact zai yi aiki a hankali akan kirtani, ba wai kawai saboda waɗannan igiyoyi ba ne, amma kuma saboda an zaɓi ƙwararru daban-daban na algorithm. Ana sarrafa igiyoyi daban.

Kuma idan muka ɗauki nau'in bayanai na farko? Misali, sun rubuta id din mai amfani da muke da su, suka rubuta shi a matsayin layi, sannan su jefa shi, zai fi dadi ko a’a?

Ina shakka. Ina ganin zai fi bacin rai, domin bayan haka, tantance lambobi babbar matsala ce. Da alama wannan abokin aikin yana da rahoto kan yadda yake da wahala a tantance lambobi a cikin nau'i na dubu goma, amma watakila ba haka ba.

Alexey, na gode sosai da rahoton! Kuma na gode sosai don ClickHouse! Ina da tambaya game da tsare-tsare. Shin akwai fasali a cikin shirye-shiryen sabunta ƙamus ɗin da bai cika ba?

watau sake yi wani bangare?

Na iya. Kamar ikon saita filin MySQL a wurin, watau sabuntawa bayan haka kawai ana loda wannan bayanan idan ƙamus yana da girma sosai.

Siffa mai ban sha'awa sosai. Kuma, da alama a gare ni, wani ya ba da shawarar hakan a cikin tattaunawarmu. Wataƙila ma kai ne.

Bana tunanin haka.

Mai girma, yanzu ya bayyana cewa buƙatun biyu. Kuma za ku iya fara yin shi a hankali. Amma ina so in yi muku gargaɗi nan da nan cewa wannan fasalin yana da sauƙin aiwatarwa. Wato, a ka'idar, kawai kuna buƙatar rubuta lambar sigar a cikin tebur sannan ku rubuta: sigar ɗin bai kai irin wannan ba. Kuma wannan yana nufin cewa, mafi mahimmanci, za mu ba da shi ga masu sha'awar. Shin kai mai sha'awa ne?

Ee, amma abin takaici ba a cikin C++ ba.

Shin abokan aikinku za su iya rubutawa a C++?

Zan sami wani.

Mai girma*.

* An ƙara fasalin watanni biyu bayan rahoton - marubucin tambayar ne ya haɓaka shi kuma ya gabatar da shi cire takaddama.

Na gode!

Sannu! Na gode da rahoton! Kun ambaci cewa ClickHouse yana cinye duk albarkatun da ke cikinsa sosai. Kuma mai magana da ke kusa da Luxoft ya yi magana game da shawarar da ya yanke game da Post na Rasha. Ya ce suna matukar son ClickHouse, amma ba su yi amfani da shi maimakon babban abokin hamayyar su daidai ba saboda ya ci gaba dayan masarrafar. Kuma ba za su iya shigar da shi cikin gine-ginen su ba, cikin ZooKeeper tare da dockers. Shin yana yiwuwa a ko ta yaya a taƙaice ClickHouse don kada ya cinye duk abin da ya same shi?

Ee, yana yiwuwa kuma mai sauƙi. Idan kuna son cinye ƴan ɗimbin maɗaukaki, to kawai rubuta set max_threads = 1. Kuma shi ke nan, zai aiwatar da bukatar a cikin guda ɗaya. Bugu da ƙari, za ka iya ƙayyade saituna daban-daban don masu amfani daban-daban. Don haka babu matsala. Kuma gaya wa abokan aikin ku na Luxoft cewa ba shi da kyau ba su sami wannan saitin a cikin takaddun ba.

Alexey, hello! Ina so in yi wannan tambayar. Wannan ba shine karo na farko da na ji cewa mutane da yawa sun fara amfani da ClickHouse a matsayin ma'ajiyar rajistan ayyukan ba. A cikin rahoton, kun ce kada ku yi haka, wato, ba ku buƙatar adana dogayen layi. Me kuke tunani akai?

Na farko, gungumen azaba yawanci ba dogon layi ba ne. Akwai, ba shakka, keɓancewa. Misali, wasu sabis da aka rubuta a java suna jefa banbance-banbance, an shigar da shi. Sabili da haka a cikin madauki mara iyaka, da gudu daga sararin rumbun kwamfutarka. Maganin yana da sauqi qwarai. Idan layin suna da tsayi sosai, to, yanke su. Menene dogon nufi? Dubun kilobytes ba shi da kyau *.

* a cikin 'yan kwanan nan na ClickHouse, an kunna "daidaitaccen ginshiƙan ƙididdiga", wanda ke kawar da matsalar adana dogayen kirtani na galibi.

Kilobyte al'ada ce?

Yana da al'ada.

Sannu! Na gode da rahoton! Na riga na yi tambaya game da wannan a cikin hira, amma ban tuna ko na sami amsa ba. Shin akwai wani shiri don tsawaita sashin WITH a cikin salon CTE?

Tukuna. Sashen WITH yana da ɗan rashin hankali. Yana kama da ɗan ƙaramin siffa a gare mu.

Na gane. Na gode!

Na gode da rahoton! Ban sha'awa sosai! tambaya ta duniya. Shin ana shirin yin, watakila ta hanyar wasu nau'ikan stubs, gyaran gogewar bayanai?

Lallai. Wannan shine aikinmu na farko a jerin gwanonmu. Yanzu muna tunanin yadda za mu yi komai daidai. Kuma ya kamata ka fara danna maballin *.

* danna maballin akan madannai kuma an yi komai.

Shin ko ta yaya zai shafi aikin tsarin ko a'a? Shin shigar zai yi sauri kamar yadda yake a yanzu?

Wataƙila masu share kansu, sabuntawa da kansu za su yi nauyi sosai, amma wannan ba zai shafi aikin zaɓin da aikin shigarwa ta kowace hanya ba.

Da kuma wata ƙaramar tambaya. A gabatarwar, kun yi magana game da maɓallin farko. Saboda haka, muna da partitioning, wanda shi ne wata-wata ta tsohuwa, daidai? Kuma idan muka sanya adadin kwanan watan da ya dace da wata guda, sai mu karanta wannan bangare kawai, ko?

Ee.

Tambaya. Idan ba za mu iya zaɓar kowane maɓalli na farko ba, to shin yana da kyau a yi shi daidai ta filin “Kwanan Wata” ta yadda a bayan fage a sami ƙaramin gyare-gyaren wannan bayanan ta yadda za su dace da tsari? Idan ba ku da tambayoyin kewayo kuma ba za ku iya zaɓar kowane maɓalli na farko ba, shin yana da daraja saka kwanan wata a maɓalli na farko?

Ee.

Watakila yana da ma'ana a sanya a cikin maɓalli na farko filin da zai fi dacewa da matse bayanan idan an jera su ta wannan filin. Misali, ID mai amfani. Mai amfani, alal misali, yana zuwa rukunin yanar gizo ɗaya. A wannan yanayin, sanya id na mai amfani da lokaci. Sannan bayananku zasu fi matsewa. Dangane da kwanan wata, idan da gaske ba ku da kuma ba ku taɓa samun tambayoyin kewayon akan kwanakin ba, to ba za ku iya sanya kwanan wata a cikin maɓalli na farko ba.

Ok na gode sosai!

source: www.habr.com

Add a comment