Ka'idar da aikin amfani da ClickHouse a cikin aikace-aikace na gaske. Alexander Zaitsev (2018)

Ka'idar da aikin amfani da ClickHouse a cikin aikace-aikace na gaske. Alexander Zaitsev (2018)

Duk da cewa a yanzu akwai bayanai da yawa a kusan ko'ina, har yanzu ma'ajin bayanai sun yi fice sosai. Ba a san su sosai ba kuma har ma da muni suna iya amfani da su yadda ya kamata. Mutane da yawa suna ci gaba da "cin abincin cactus" tare da MySQL ko PostgreSQL, waɗanda aka ƙera don wasu al'amuran, suna fama da NoSQL, ko ƙarin biya don mafita na kasuwanci. ClickHouse yana canza ƙa'idodin wasan kuma yana rage ƙimar shiga duniyar DBMS na nazari.

Rahoton daga BackEnd Conf 2018 kuma an buga shi tare da izinin mai magana.


Ka'idar da aikin amfani da ClickHouse a cikin aikace-aikace na gaske. Alexander Zaitsev (2018)
Wanene ni kuma me yasa nake magana game da ClickHouse? Ni darektan ci gaba ne a LifeStreet, wanda ke amfani da ClickHouse. Har ila yau, ni ne wanda ya kafa Altinity. Abokin hulɗar Yandex ne wanda ke haɓaka ClickHouse kuma yana taimaka wa Yandex yin ClickHouse mafi nasara. Hakanan a shirye don raba ilimi game da ClickHouse.

Ka'idar da aikin amfani da ClickHouse a cikin aikace-aikace na gaske. Alexander Zaitsev (2018)

Kuma ni ba ɗan'uwan Petya Zaitsev ba ne. Ana yawan tambayata akan wannan. A'a, mu ba 'yan'uwa ba ne.

Ka'idar da aikin amfani da ClickHouse a cikin aikace-aikace na gaske. Alexander Zaitsev (2018)

"Kowa ya san" cewa ClickHouse:

  • Da sauri sosai,
  • Jin dadi sosai
  • Ana amfani dashi a cikin Yandex.

An san kadan kadan a cikin kamfanonin da yadda ake amfani da shi.

Ka'idar da aikin amfani da ClickHouse a cikin aikace-aikace na gaske. Alexander Zaitsev (2018)

Zan gaya muku dalilin da yasa, inda kuma yadda ake amfani da ClickHouse, ban da Yandex.

Zan gaya muku yadda ake warware takamaiman ayyuka tare da taimakon ClickHouse a cikin kamfanoni daban-daban, menene kayan aikin ClickHouse da zaku iya amfani da su don ayyukanku, da kuma yadda aka yi amfani da su a cikin kamfanoni daban-daban.

Na ɗauki misalai guda uku waɗanda ke nuna ClickHouse daga kusurwoyi daban-daban. Ina tsammanin zai zama mai ban sha'awa.

Ka'idar da aikin amfani da ClickHouse a cikin aikace-aikace na gaske. Alexander Zaitsev (2018)

Tambaya ta farko ita ce: "Me yasa muke buƙatar ClickHouse?". Da alama tambaya ce a sarari, amma akwai amsa fiye da ɗaya.

Ka'idar da aikin amfani da ClickHouse a cikin aikace-aikace na gaske. Alexander Zaitsev (2018)

  • Amsar farko ita ce don aiki. ClickHouse yana da sauri sosai. Bincike akan ClickHouse shima yana da sauri sosai. Ana iya amfani da shi sau da yawa inda wani abu ya kasance a hankali ko kuma mara kyau.
  • Amsa ta biyu ita ce farashi. Kuma da farko, farashin sikelin. Misali, Vertica babban rumbun adana bayanai ne. Yana aiki sosai idan ba ku da tarin terabytes na bayanai. Amma idan ya zo ga ɗaruruwan terabytes ko petabytes, farashin lasisi da tallafi yana shiga cikin adadi mai mahimmanci. Kuma yana da tsada. Kuma ClickHouse kyauta ne.
  • Amsa ta uku ita ce kudin aiki. Wannan hanya ce ta ɗan bambanta. RedShift babban analog ne. A RedShift, zaku iya yanke shawara da sauri. Zai yi aiki da kyau, amma a lokaci guda, kowane sa'a, kowace rana, da kowane wata, zaku biya Amazon da gaske, saboda wannan sabis ne mai tsada sosai. Google BigQuery kuma. Idan wani ya yi amfani da shi, to ya san cewa a can za ku iya gudanar da buƙatun da yawa kuma ku sami lissafin kuɗi na daruruwan daloli kwatsam.

ClickHouse ba shi da waɗannan matsalolin.

Ka'idar da aikin amfani da ClickHouse a cikin aikace-aikace na gaske. Alexander Zaitsev (2018)

Ina ake amfani da ClickHouse yanzu? Baya ga Yandex, ClickHouse ana amfani dashi a cikin gungun kamfanoni da kamfanoni daban-daban.

  • Da farko, wannan shine nazarin aikace-aikacen yanar gizo, watau wannan yanayin amfani ne wanda ya fito daga Yandex.
  • Yawancin kamfanonin AdTech suna amfani da ClickHouse.
  • Kamfanoni da yawa waɗanda ke buƙatar nazarin rajistan ayyukan ma'amala daga tushe daban-daban.
  • Kamfanoni da yawa suna amfani da ClickHouse don saka idanu kan rajistan ayyukan tsaro. Suna loda su zuwa ClickHouse, suna yin rahotanni, kuma suna samun sakamakon da suke buƙata.
  • Kamfanoni sun fara amfani da shi a cikin nazarin kuɗi, watau a hankali manyan kamfanoni kuma suna gabatowa ClickHouse.
  • Cloudflare. Idan wani ya bi ClickHouse, to tabbas sun ji sunan wannan kamfani. Wannan yana ɗaya daga cikin mahimman gudummawa daga al'umma. Kuma suna da shigarwar ClickHouse mai mahimmanci. Misali, sun yi Injin Kafka don ClickHouse.
  • Kamfanonin sadarwa sun fara amfani da su. Kamfanoni da yawa suna amfani da ClickHouse ko dai a matsayin hujja akan ra'ayi ko kuma suna cikin samarwa.
  • Ɗayan kamfani yana amfani da ClickHouse don saka idanu kan ayyukan samarwa. Suna gwada microcircuits, suna rubuta bunch of sigogi, akwai kusan halaye 2. Sannan suna tantance ko wasan yana da kyau ko mara kyau.
  • Blockchain Analytics. Akwai irin wannan kamfani na Rasha kamar Bloxy.info. Wannan bincike ne na cibiyar sadarwar ethereum. Sun kuma yi hakan akan ClickHouse.

Ka'idar da aikin amfani da ClickHouse a cikin aikace-aikace na gaske. Alexander Zaitsev (2018)

Kuma girman ba komai. Akwai kamfanoni da yawa waɗanda ke amfani da ƙaramin uwar garken guda ɗaya. Kuma yana ba su damar magance matsalolinsu. Kuma har ma da ƙarin kamfanoni suna amfani da manyan gungu na sabar da yawa ko sabar da yawa.

Kuma idan ka duba bayanan, to:

  • Yandex: Sabar 500+, suna adana bayanan biliyan 25 a rana a can.
  • LifeStreet: Sabar 60, kusan bayanan biliyan 75 kowace rana. Akwai ƙarancin sabobin, ƙarin bayanan fiye da na Yandex.
  • CloudFlare: Sabar 36, suna adana rikodin biliyan 200 a rana. Suna da ƙarancin sabobin kuma suna adana ƙarin bayanai.
  • Bloomberg: Sabar 102, kusan shigarwar tiriliyan a kowace rana. Mai rikodi.

Ka'idar da aikin amfani da ClickHouse a cikin aikace-aikace na gaske. Alexander Zaitsev (2018)

Geographically, wannan kuma yana da yawa. Wannan taswira anan yana nuna taswirar zafi na inda ake amfani da ClickHouse a cikin duniya. Rasha, China, Amurka sun fito fili a nan. Akwai ƴan ƙasashen Turai. Kuma akwai gungu 4.

Wannan bincike ne na kwatankwacin, babu buƙatar neman cikakken adadi. Wannan bincike ne na maziyartan da suka karanta kayan yaren Ingilishi a gidan yanar gizon Altinity, saboda babu masu magana da Rasha a wurin. Kuma Rasha, Ukraine, Belarus, watau bangaren masu magana da Rashanci na al'umma, waɗannan sune mafi yawan masu amfani. Sai kuma Amurka da Kanada. Kasar Sin tana samun ci gaba sosai. Kusan babu kasar Sin a can watanni shida da suka wuce, yanzu kasar Sin ta riga ta wuce Turai kuma tana ci gaba da girma. Tsohuwar Turai kuma ba ta da nisa a baya, kuma jagora a cikin amfani da ClickHouse, abin ban mamaki ne, Faransa.

Ka'idar da aikin amfani da ClickHouse a cikin aikace-aikace na gaske. Alexander Zaitsev (2018)

Me yasa nake fadin wannan duka? Don nuna cewa ClickHouse yana zama daidaitaccen bayani don babban bincike na bayanai kuma an riga an yi amfani dashi a wurare da yawa. Idan kun yi amfani da shi, kuna cikin yanayin da ya dace. Idan ba ku yi amfani da shi ba tukuna, to, ba za ku iya jin tsoro cewa za a bar ku kadai ba kuma ba wanda zai taimake ku, saboda da yawa sun riga sun yi haka.

Ka'idar da aikin amfani da ClickHouse a cikin aikace-aikace na gaske. Alexander Zaitsev (2018)

Waɗannan misalai ne na ainihin ClickHouse amfani a cikin kamfanoni da yawa.

  • Misali na farko shine hanyar sadarwar talla: ƙaura daga Vertica zuwa ClickHouse. Kuma na san wasu ƴan kamfanoni da suka sauya sheka daga Vertica ko kuma suna kan aiwatar da canji.
  • Misali na biyu shine ajiyar ma'amala akan ClickHouse. Wannan misali ne da aka gina akan antipatterns. Duk abin da bai kamata a yi ba a ClickHouse akan shawarar masu haɓakawa ana yin su anan. Kuma an yi shi sosai har yana aiki. Kuma yana aiki da kyau fiye da yadda aka saba da ma'amala.
  • Misali na uku ana rarraba kwamfuta akan ClickHouse. An yi tambaya game da yadda za a iya haɗa ClickHouse cikin yanayin yanayin Hadoop. Zan nuna misali na yadda kamfani ya yi wani abu mai kama da taswira ya rage ganga a ClickHouse, kiyaye bayanan gano bayanai, da sauransu, don ƙididdige wani aiki maras muhimmanci.

Ka'idar da aikin amfani da ClickHouse a cikin aikace-aikace na gaske. Alexander Zaitsev (2018)

  • LifeStreet kamfani ne na Ad Tech wanda ke da duk fasahar da ta zo tare da hanyar sadarwar talla.
  • Ta tsunduma cikin talla ingantawa, shirye-shirye bidding.
  • Yawancin bayanai: game da abubuwan da suka faru biliyan 10 a kowace rana. A lokaci guda, abubuwan da suka faru a can za a iya raba su zuwa ƙananan abubuwa da yawa.
  • Akwai abokan ciniki da yawa na wannan bayanan, kuma waɗannan ba kawai mutane ba ne, ƙari - waɗannan su ne algorithms daban-daban waɗanda ke tsunduma cikin shirin shirye-shirye.

Ka'idar da aikin amfani da ClickHouse a cikin aikace-aikace na gaske. Alexander Zaitsev (2018)

Kamfanin ya zo hanya mai tsawo da ƙaya. Kuma na yi magana game da shi akan HighLoad. Da farko, LifeStreet ya ƙaura daga MySQL (tare da ɗan gajeren tasha a Oracle) zuwa Vertica. Kuma kuna iya samun labari game da shi.

Kuma duk abin da yake da kyau sosai, amma da sauri ya bayyana cewa bayanan suna girma kuma Vertica yana da tsada. Don haka, an nemi hanyoyin daban-daban. Wasu daga cikinsu an jera su anan. Kuma a gaskiya ma, mun yi shaidar ra'ayi ko wani lokacin gwajin aiki na kusan dukkanin bayanan da aka samu a kasuwa daga shekara ta 13 zuwa 16 kuma sun kasance kusan dacewa dangane da ayyuka. Kuma na yi magana game da wasu daga cikinsu akan HighLoad.

Ka'idar da aikin amfani da ClickHouse a cikin aikace-aikace na gaske. Alexander Zaitsev (2018)

Aikin shine ƙaura daga Vertica a farkon wuri, saboda bayanai sun girma. Kuma sun girma sosai tsawon shekaru. Daga nan suka tafi kan shiryayye, amma duk da haka. Kuma tsinkayar wannan ci gaban, bukatun kasuwanci don adadin bayanan da ake buƙatar yin wani nau'i na nazari, ya bayyana a fili cewa nan da nan za a tattauna petabytes. Kuma biyan kuɗin petabytes ya riga ya yi tsada sosai, don haka muna neman madadin inda za mu je.

Ka'idar da aikin amfani da ClickHouse a cikin aikace-aikace na gaske. Alexander Zaitsev (2018)

Ina zan je? Kuma na dogon lokaci ba a san inda za a je ba, saboda a gefe guda akwai bayanan kasuwanci, suna da alama suna aiki sosai. Wasu suna aiki kusan kamar Vertica, wasu sun fi muni. Amma duk suna da tsada, babu wani abu mai rahusa kuma mafi kyau ba za a iya samu ba.

A gefe guda, akwai mafita na buɗewa, waɗanda ba su da yawa sosai, watau don nazari, ana iya ƙidaya su akan yatsunsu. Kuma suna da kyauta ko arha, amma a hankali. Kuma sau da yawa suna rasa ayyukan da ake bukata da amfani.

Kuma babu wani abu da zai haɗa kyawawan abubuwan da ke cikin bayanan kasuwanci da duk kyauta waɗanda ke cikin buɗewa.

Ka'idar da aikin amfani da ClickHouse a cikin aikace-aikace na gaske. Alexander Zaitsev (2018)

Babu wani abu har sai, ba zato ba tsammani, Yandex ya fitar da ClickHouse, kamar mai sihiri daga hula, kamar zomo. Kuma yanke shawara ne wanda ba zato ba tsammani, har yanzu suna tambayar tambaya: "Me ya sa?", Amma duk da haka.

Ka'idar da aikin amfani da ClickHouse a cikin aikace-aikace na gaske. Alexander Zaitsev (2018)

Kuma nan da nan a lokacin rani na 2016, mun fara kallon abin da ClickHouse yake. Kuma ya juya cewa wani lokacin yana iya sauri fiye da Vertica. Mun gwada yanayi daban-daban akan tambayoyi daban-daban. Kuma idan tambayar ta yi amfani da tebur ɗaya kawai, wato, ba tare da haɗin gwiwa ba (join), to ClickHouse ya ninka Vertica sau biyu.

Ban yi kasala sosai ba kuma na kalli gwajin Yandex kwanakin baya. Haka yake a can: ClickHouse yana da saurin Vertica sau biyu, don haka sukan yi magana akai.

Amma idan akwai shiga cikin tambayoyin, to, komai ya juya ba sosai ba. Kuma ClickHouse na iya zama sau biyu a hankali kamar Vertica. Idan kuma ka ɗan gyara buƙatar ka sake rubutawa, to sun yi kusan daidai. Ba sharri ba. Kuma kyauta.

Ka'idar da aikin amfani da ClickHouse a cikin aikace-aikace na gaske. Alexander Zaitsev (2018)

Kuma da samun sakamakon gwajin, da kuma kallonsa ta kusurwoyi daban-daban, LifeStreet ya tafi ClickHouse.

Ka'idar da aikin amfani da ClickHouse a cikin aikace-aikace na gaske. Alexander Zaitsev (2018)

Wannan ita ce shekara ta 16, ina tunatar da ku. Kamar wasa ne akan berayen da suka yi kuka suna soka kansu, amma suka ci gaba da cin kaktus. Kuma an bayyana wannan dalla-dalla, akwai bidiyo game da wannan, da dai sauransu.

Ka'idar da aikin amfani da ClickHouse a cikin aikace-aikace na gaske. Alexander Zaitsev (2018)

Saboda haka, ba zan yi magana game da shi daki-daki ba, zan yi magana ne kawai game da sakamakon da wasu abubuwa masu ban sha'awa waɗanda ban yi magana ba a lokacin.

Sakamakon shine:

  • Ƙaura mai nasara kuma fiye da shekara guda tsarin yana aiki a samarwa.
  • Yawan aiki da sassauci sun karu. Daga cikin bayanan biliyan 10 da za mu iya adanawa kowace rana sannan na ɗan gajeren lokaci, LifeStreet yanzu tana adana bayanan biliyan 75 a kowace rana kuma tana iya yin hakan na tsawon watanni 3 ko fiye. Idan kun ƙidaya a kololuwa, to wannan shine har zuwa abubuwan da suka faru miliyan ɗaya a sakan daya. Fiye da tambayoyin SQL miliyan ɗaya a rana suna zuwa cikin wannan tsarin, galibi daga mutummutumi daban-daban.
  • Duk da cewa an yi amfani da ƙarin sabobin don ClickHouse fiye da na Vertica, sun kuma ajiye su akan hardware, saboda an yi amfani da faifan SAS masu tsada a Vertica. ClickHouse yayi amfani da SATA. Kuma me yasa? Domin a cikin Vertica saka yana aiki tare. Kuma aiki tare yana buƙatar kada faifan diski su rage gudu sosai, haka nan kuma cibiyar sadarwar ba ta raguwa da yawa, wato aiki mai tsada. Kuma a cikin ClickHouse saka shi asynchronous. Bugu da ƙari, koyaushe kuna iya rubuta komai a cikin gida, babu ƙarin farashi don wannan, don haka za'a iya shigar da bayanai cikin ClickHouse da sauri fiye da Vertika, har ma a hankali. Kuma karatu kusan iri daya ne. Karatu akan SATA, idan suna cikin RAID, to wannan duka yana da sauri sosai.
  • Ba'a iyakance ta lasisi ba, watau petabytes 3 na bayanai a cikin sabobin 60 (sabar 20 kwafi ɗaya ne) da kuma bayanan 6 tiriliyan a gaskiya da tarawa. Babu wani abu makamancin haka da za a iya samu a Vertica.

Ka'idar da aikin amfani da ClickHouse a cikin aikace-aikace na gaske. Alexander Zaitsev (2018)

Yanzu na juya ga abubuwa masu amfani a cikin wannan misalin.

  • Na farko tsari ne mai inganci. Yawancin ya dogara da tsarin.
  • Na biyu shine ingantaccen tsarin SQL.

Ka'idar da aikin amfani da ClickHouse a cikin aikace-aikace na gaske. Alexander Zaitsev (2018)

Tambayar OLAP ta al'ada shine zaɓi. Wasu daga cikin ginshiƙan suna zuwa rukuni ta hanyar, wasu daga cikin ginshiƙan suna zuwa ayyukan tarawa. Akwai inda, wanda za a iya wakilta a matsayin yanki na cube. Ana iya ɗaukar duka ƙungiyar ta azaman tsinkaya. Kuma shi ya sa ake kiransa nazarin bayanai masu yawa.

Ka'idar da aikin amfani da ClickHouse a cikin aikace-aikace na gaske. Alexander Zaitsev (2018)

Kuma sau da yawa ana yin wannan a cikin tsarin tsarin taurari, lokacin da akwai ainihin gaskiya da halaye na wannan gaskiyar tare da tarnaƙi, tare da haskoki.

Ka'idar da aikin amfani da ClickHouse a cikin aikace-aikace na gaske. Alexander Zaitsev (2018)

Kuma dangane da ƙirar jiki, yadda ya dace a kan tebur, yawanci suna yin wakilci na yau da kullun. Kuna iya ragewa, amma yana da tsada akan faifai kuma baya aiki sosai akan tambayoyin. Sabili da haka, yawanci suna yin wakilci na yau da kullun, watau tebur na gaskiya da yawa, tebur mai girma.

Amma ba ya aiki da kyau a ClickHouse. Akwai dalilai guda biyu:

  • Na farko shi ne saboda ClickHouse ba shi da kyau sosai, watau akwai shiga, amma ba su da kyau. Alhali mara kyau.
  • Na biyu shi ne cewa ba a sabunta allunan ba. Yawancin lokaci a cikin waɗannan faranti, waɗanda ke kewaye da tauraro, wani abu yana buƙatar canzawa. Misali, sunan abokin ciniki, sunan kamfani, da sauransu. Kuma ba ya aiki.

Kuma akwai hanyar fita daga wannan a cikin ClickHouse. ko da biyu:

  • Na farko shine amfani da ƙamus. Kamus na waje shine abin da ke taimakawa 99% warware matsalar tare da tsarin tauraron, tare da sabuntawa da sauransu.
  • Na biyu shine amfani da tsararru. Arrays kuma suna taimakawa wajen kawar da haɗin gwiwa da matsaloli tare da daidaitawa.

Ka'idar da aikin amfani da ClickHouse a cikin aikace-aikace na gaske. Alexander Zaitsev (2018)

  • Babu shiga da ake buƙata.
  • Ana iya haɓakawa. Tun daga Maris 2018, dama mara izini ta bayyana (ba za ku sami wannan a cikin takaddun ba) don sabunta ƙamus a wani ɓangare, watau waɗannan shigarwar da suka canza. A zahiri, yana kama da tebur.
  • Koyaushe a cikin ƙwaƙwalwar ajiya, don haka haɗawa tare da aikin ƙamus da sauri fiye da tebur da ke kan faifai kuma ba tukuna ba cewa yana cikin cache, mai yuwuwa a'a.

Ka'idar da aikin amfani da ClickHouse a cikin aikace-aikace na gaske. Alexander Zaitsev (2018)

  • Ba kwa buƙatar shiga kuma.
  • Wannan ƙaramin wakilci 1-zuwa-yawa ne.
  • Kuma a ganina, an yi tsararru don geeks. Waɗannan ayyuka ne na lambda da sauransu.

Wannan ba don kalmomin ja ba ne. Wannan aiki ne mai ƙarfi wanda ke ba ku damar yin abubuwa da yawa a cikin hanya mai sauƙi da kyakkyawa.

Ka'idar da aikin amfani da ClickHouse a cikin aikace-aikace na gaske. Alexander Zaitsev (2018)

Misalai na yau da kullun waɗanda ke taimakawa warware tsararraki. Waɗannan misalan suna da sauƙi kuma a fili isa:

  • Bincika ta tags. Idan kuna da hashtags a can kuma kuna son samun wasu posts ta hashtag.
  • Bincika ta maɓalli-darajar nau'i-nau'i. Akwai kuma wasu halaye masu kima.
  • Adana lissafin maɓallan da kuke buƙatar fassara zuwa wani abu dabam.

Duk waɗannan ayyuka ana iya magance su ba tare da tsararru ba. Ana iya sanya alamun a cikin wasu layi kuma a zaɓa tare da magana ta yau da kullun ko a cikin tebur daban, amma sai ku yi haɗin gwiwa.

Ka'idar da aikin amfani da ClickHouse a cikin aikace-aikace na gaske. Alexander Zaitsev (2018)

Kuma a cikin ClickHouse, ba kwa buƙatar yin wani abu, ya isa ya bayyana tsararrun kirtani don hashtags ko yin tsarin gida don tsarin ƙima mai mahimmanci.

Tsarin gida bazai zama mafi kyawun suna ba. Waɗannan tsararraki ne guda biyu waɗanda ke da ɓangaren gama gari a cikin sunan da wasu halaye masu alaƙa.

Kuma yana da sauƙin bincika ta tag. Yi aiki has, wanda ke bincika cewa tsararrun ya ƙunshi wani abu. Kowa, ya sami duk shigarwar da suka shafi taronmu.

Bincike ta subid ya ɗan fi rikitarwa. Muna buƙatar fara nemo fihirisar maɓalli, sannan mu ɗauki kashi tare da wannan fihirisar mu duba cewa wannan ƙimar ita ce abin da muke buƙata. Duk da haka, yana da sauqi kuma m.

Kalmomin yau da kullun da kuke son rubutawa idan kun ajiye su duka a layi ɗaya, zai zama, da farko, mara nauyi. Kuma, na biyu, ya yi aiki fiye da tsararru biyu.

Ka'idar da aikin amfani da ClickHouse a cikin aikace-aikace na gaske. Alexander Zaitsev (2018)

Wani misali. Kuna da tsararru inda kuke adana ID. Kuma kuna iya fassara su zuwa sunaye. Aiki arrayMap. Wannan aikin lambda ne na yau da kullun. Kuna wuce maganganun lambda a can. Kuma ta ciro darajar sunan kowane ID daga cikin ƙamus.

Ana iya yin bincike ta hanya ɗaya. An wuce aikin tantancewa wanda ke bincika abin da abubuwa suka daidaita.

Ka'idar da aikin amfani da ClickHouse a cikin aikace-aikace na gaske. Alexander Zaitsev (2018)

Wadannan abubuwa suna sauƙaƙa da kewayawa sosai kuma suna magance tarin matsaloli.

Amma matsala ta gaba da muke fuskanta, wacce nake so in ambata, tambaya ce mai inganci.

  • ClickHouse bashi da mai shirin tambaya. Babu shakka.
  • Duk da haka, har yanzu ana buƙatar shirya tambayoyi masu rikitarwa. A wanne yanayi?
  • Idan akwai mahaɗai da yawa a cikin tambayar, kuna kunsa su cikin zaɓaɓɓu. Kuma tsarin aiwatar da su ya shafi al'amura.
  • Kuma na biyu - idan an rarraba buƙatar. Domin a cikin tambayar da aka rarraba, kawai zaɓaɓɓen zaɓi ne kawai ake rarrabawa, kuma duk abin da aka canza zuwa uwar garken guda ɗaya da kuka haɗa kuma ku aiwatar a can. Don haka, idan kun rarraba tambayoyin tare da haɗin gwiwa da yawa (haɗa), to kuna buƙatar zaɓar tsari.

Kuma ko da a cikin mafi sauƙi, wani lokacin ma wajibi ne don yin aikin mai tsarawa da sake rubuta tambayoyin kadan.

Ka'idar da aikin amfani da ClickHouse a cikin aikace-aikace na gaske. Alexander Zaitsev (2018)

Ga misali. A gefen hagu akwai tambaya da ke nuna manyan ƙasashe 5. Kuma yana ɗaukar daƙiƙa 2,5, a ganina. Kuma a gefen dama, tambaya iri ɗaya, amma an sake rubutawa kaɗan. Maimakon tarawa ta hanyar kirtani, mun fara rukuni ta maɓalli (int). Kuma yana da sauri. Sannan mun haɗa ƙamus zuwa sakamakon. Maimakon sakan 2,5, buƙatar tana ɗaukar daƙiƙa 1,5. Wannan yana da kyau.

Ka'idar da aikin amfani da ClickHouse a cikin aikace-aikace na gaske. Alexander Zaitsev (2018)

Misalin irin wannan tare da matattarar sake rubutawa. Anan akwai bukatar Rasha. Yana aiki na 5 seconds. Idan muka sake rubuta shi ta hanyar da za mu sake kwatanta ba kirtani ba, amma lambobi tare da wasu saitin waɗannan maɓallan waɗanda ke da alaƙa da Rasha, to zai yi sauri sosai.

Ka'idar da aikin amfani da ClickHouse a cikin aikace-aikace na gaske. Alexander Zaitsev (2018)

Akwai irin wadannan dabaru da yawa. Kuma suna ba ku damar haɓaka tambayoyin da kuke tsammanin tuni suna gudana cikin sauri, ko kuma, akasin haka, suna gudana a hankali. Ana iya yin su har ma da sauri.

Ka'idar da aikin amfani da ClickHouse a cikin aikace-aikace na gaske. Alexander Zaitsev (2018)

  • Matsakaicin aiki a yanayin rarraba.
  • Rarraba ta mafi ƙarancin nau'ikan, kamar yadda na yi ta ints.
  • Idan akwai masu shiga (join), ƙamus, to yana da kyau a yi su azaman makoma ta ƙarshe, idan kun riga kun sami bayanai aƙalla an haɗa su, to za a kira aikin shiga ko kiran ƙamus ɗin ƙasa da sauri kuma zai yi sauri. .
  • Maye gurbin tacewa.

Akwai wasu dabaru, kuma ba kawai waɗanda na nuna ba. Kuma dukkansu wani lokaci suna iya hanzarta aiwatar da tambayoyi.

Ka'idar da aikin amfani da ClickHouse a cikin aikace-aikace na gaske. Alexander Zaitsev (2018)

Bari mu ci gaba zuwa misali na gaba. Kamfanin X daga Amurka. Me take yi?

Akwai wani aiki:

  • Haɗin kan layi na ma'amalar talla.
  • Modeling daban-daban dauri model.

Ka'idar da aikin amfani da ClickHouse a cikin aikace-aikace na gaske. Alexander Zaitsev (2018)

Menene yanayin?

Baƙo na yau da kullun yana zuwa shafin, misali sau 20 a kowane wata daga tallace-tallace daban-daban, ko makamancin haka wani lokacin yana zuwa ba tare da talla ba, saboda yana tuna wannan rukunin yanar gizon. Ya dubi wasu samfurori, sanya su a cikin kwandon, fitar da su daga cikin kwandon. Kuma, a ƙarshe, wani abu yana saya.

Tambayoyi masu ma'ana: "Wanene ya kamata ya biya don talla, idan ya cancanta?" da "Wane talla ne ya rinjayi shi, idan akwai?". Wato me ya sa ya saya da yadda zai sa mutane irin wannan su ma su saya?

Domin magance wannan matsala, kuna buƙatar haɗa abubuwan da ke faruwa a gidan yanar gizon ta hanyar da ta dace, wato, ko ta yaya za ku gina dangantaka a tsakaninsu. Sannan a aika su don bincike zuwa DWH. Kuma bisa ga wannan bincike, gina samfuran wane da wane tallan da za a nuna.

Ka'idar da aikin amfani da ClickHouse a cikin aikace-aikace na gaske. Alexander Zaitsev (2018)

Ma'amalar talla saitin abubuwan masu amfani ne masu alaƙa waɗanda ke farawa daga nuna talla, sannan wani abu ya faru, sannan watakila sayayya, sannan ana iya samun sayayya a cikin siyayya. Misali, idan wannan aikace-aikacen wayar hannu ne ko kuma wasan wayar hannu, to yawanci ana shigar da aikace-aikacen kyauta, kuma idan an yi wani abu a wurin, ana iya buƙatar kuɗi don wannan. Kuma idan mutum yana kashe kuɗi a cikin aikace-aikacen, yana da daraja. Amma don wannan kuna buƙatar haɗa komai.

Ka'idar da aikin amfani da ClickHouse a cikin aikace-aikace na gaske. Alexander Zaitsev (2018)

Akwai samfura masu ɗauri da yawa.

Mafi shahara sune:

  • Mu'amala ta ƙarshe, inda hulɗar ta kasance ko dai dannawa ko ra'ayi.
  • Farko Interaction, watau abu na farko da ya kawo mutum wurin.
  • Haɗin layi - duk daidai.
  • Attenuation.
  • Da sauransu.

Ka'idar da aikin amfani da ClickHouse a cikin aikace-aikace na gaske. Alexander Zaitsev (2018)

Kuma ta yaya duk ya yi aiki tun farko? Akwai Runtime da Cassandra. An yi amfani da Cassandra azaman ajiyar ma'amala, watau an adana duk ma'amaloli masu alaƙa a ciki. Kuma a lokacin da wani taron ya zo a cikin Runtime, misali, nuna wani shafi ko wani abu, sa'an nan aka yi bukatar Cassandra - akwai irin wannan mutum ko a'a. Sannan aka samu hada-hadar da ke da alaka da ita. Kuma an yi haɗin gwiwa.

Kuma idan yana da sa'a cewa buƙatar tana da id na ma'amala, to yana da sauƙi. Amma yawanci babu sa'a. Saboda haka, ya zama dole a nemo ma'amala ta ƙarshe ko ma'amala tare da dannawa ta ƙarshe, da sauransu.

Kuma duk ya yi aiki sosai muddin ɗaurin ya kasance zuwa danna ƙarshe. Domin a ce akwai dannawa miliyan 10 a rana, miliyan 300 a kowane wata, idan muka saita taga wata guda. Kuma tun da yake a Cassandra dole ne ya kasance a cikin ƙwaƙwalwar ajiya don yin gudu da sauri, saboda lokacin Runtime yana buƙatar amsawa da sauri, ya ɗauki kusan 10-15 sabobin.

Kuma lokacin da suke son haɗa ma'amala zuwa nunin, nan da nan ya juya ba mai daɗi sosai ba. Kuma me yasa? Ana iya ganin cewa sau 30 ƙarin abubuwan suna buƙatar adanawa. Kuma, bisa ga haka, kuna buƙatar ƙarin sabobin sau 30. Kuma ya zama cewa wannan wani nau'i ne na astronomical. Don kiyaye har zuwa sabobin 500 don yin haɗin gwiwa, duk da cewa akwai ƙarancin sabar a cikin Runtime, to wannan wani nau'in kuskure ne. Kuma suka fara tunanin abin da za su yi.

Ka'idar da aikin amfani da ClickHouse a cikin aikace-aikace na gaske. Alexander Zaitsev (2018)

Kuma mun je ClickHouse. Kuma yadda ake yin shi akan ClickHouse? Da kallo na farko, da alama wannan wani tsari ne na anti-style.

  • Ma'amala tana haɓaka, muna haɓaka abubuwan da suka faru da yawa zuwa gare ta, watau yana canzawa, kuma ClickHouse baya aiki sosai tare da abubuwa masu canzawa.
  • Lokacin da baƙo ya zo wurinmu, muna buƙatar fitar da kasuwancinsa ta maɓalli, ta id ɗin ziyararsa. Wannan kuma tambaya ce, ba sa yin hakan a ClickHouse. Yawancin lokaci ClickHouse yana da manyan…scans, amma a nan muna buƙatar samun wasu bayanan. Hakanan antipattern.
  • Bugu da ƙari, ma'amalar ta kasance a json, amma ba sa so su sake rubutawa, don haka suna so su adana json ta hanyar da ba a tsara ba, kuma idan ya cancanta, cire wani abu daga ciki. Kuma wannan ma antipattern ne.

Wato saitin antipattern.

Ka'idar da aikin amfani da ClickHouse a cikin aikace-aikace na gaske. Alexander Zaitsev (2018)

Amma duk da haka ya juya ya zama tsarin da ya yi aiki sosai.

Me aka yi? ClickHouse ya bayyana, wanda aka jefa rajistan ayyukan a cikinsa, an raba su zuwa bayanan. Sabis da aka dangana ya bayyana wanda ya karɓi rajistan ayyukan daga ClickHouse. Bayan haka, ga kowane shigarwa, ta id na ziyara, na karɓi ma'amaloli waɗanda wataƙila ba a aiwatar da su ba tukuna da ƙari hotuna, watau ma'amaloli da aka riga aka haɗa, wato sakamakon aikin da ya gabata. Na riga na yi tunani daga cikinsu, na zaɓi ma'amala daidai, haɗa sabbin abubuwan da suka faru. An sake shiga Login ya koma ClickHouse, watau tsarin zagaye ne na yau da kullun. Kuma ban da haka, na je DWH don yin nazari a can.

A cikin wannan tsari ne bai yi aiki sosai ba. Kuma don sauƙaƙawa ClickHouse, lokacin da aka sami buƙatun ta id na ziyara, sun haɗa waɗannan buƙatun zuwa buƙatun ids na ziyarar 1-000 kuma sun fitar da duk wani ciniki na mutane 2-000. Sannan duk ya yi aiki.

Ka'idar da aikin amfani da ClickHouse a cikin aikace-aikace na gaske. Alexander Zaitsev (2018)

Idan kun duba cikin ClickHouse, to akwai manyan teburi guda 3 kawai waɗanda ke ba da wannan duka.

Teburin farko wanda ake loda rajistan ayyukan a cikinsa, kuma ana loda rajistan ayyukan kusan ba tare da sarrafa su ba.

Tebur na biyu. Ta hanyar ra'ayi na zahiri, daga waɗannan kujerun, abubuwan da ba a danganta su ba tukuna, watau waɗanda ba su da alaƙa, an cije su. Kuma ta hanyar ra'ayi na zahiri, an fitar da ma'amaloli daga waɗannan rajistan ayyukan don gina hoto. Wato, ra'ayi na musamman na zahiri ya gina hoto, wato yanayin da aka tara na ƙarshe na ciniki.

Ka'idar da aikin amfani da ClickHouse a cikin aikace-aikace na gaske. Alexander Zaitsev (2018)

Ga rubutun da aka rubuta a cikin SQL. Ina so in yi tsokaci a kan wasu muhimman abubuwa a ciki.

Abu na farko mai mahimmanci shine ikon cire ginshiƙai da filayen daga json a cikin ClickHouse. Wato, ClickHouse yana da wasu hanyoyin yin aiki tare da json. Suna da matuƙar daɗaɗɗen gaske.

visitParamExtractInt yana ba ku damar fitar da sifofi daga json, watau bugun farko yana aiki. Kuma ta wannan hanyar zaku iya cire id ɗin ciniki ko ziyartar id. Wannan karon.

Abu na biyu, ana amfani da filin da aka yi amfani da shi a nan. Me ake nufi? Wannan yana nufin ba za ka iya saka shi a cikin tebur ba, watau ba a saka shi ba, ana lissafta shi kuma a adana shi idan an saka shi. Lokacin liƙa, ClickHouse yana yin aikin a gare ku. Kuma abin da kuke buƙata daga baya an riga an cire shi daga json.

A wannan yanayin, ra'ayi na zahiri shine don ɗanyen layuka. Kuma tebur na farko tare da kusan danyen katako ana amfani dashi kawai. Kuma me yake yi? Da fari dai, yana canza rarrabuwa, watau rarrabawa yanzu yana tafiya ta hanyar ziyartar id, saboda muna buƙatar fitar da mu'amalarsa da sauri ga wani takamaiman mutum.

Abu mai mahimmanci na biyu shine index_granularity. Idan kun ga MergeTree, yawanci 8 ne ta tsohuwar index_granularity. Menene shi? Wannan shi ne ma'auni na sparseness ma'auni. A cikin ClickHouse fihirisar ba ta da yawa, ba ta taɓa yin bayanin kowace shigarwa ba. Yana yin haka kowane 192. Kuma wannan yana da kyau idan ana buƙatar ƙididdige yawan bayanai, amma mara kyau idan kaɗan, saboda akwai babban sama. Kuma idan muka rage granularity index, sa'an nan mu rage sama da sama. Ba za a iya rage shi zuwa ɗaya ba, saboda ƙila ba za a iya samun isasshen ƙwaƙwalwar ajiya ba. A koyaushe ana adana fihirisar a ƙwaƙwalwar ajiya.

Ka'idar da aikin amfani da ClickHouse a cikin aikace-aikace na gaske. Alexander Zaitsev (2018)

Hoton hoto kuma yana amfani da wasu fasalolin ClickHouse masu ban sha'awa.

Na farko, shine AggregatingMergeTree. Kuma AggregatingMergeTree yana adana argMax, watau wannan shine yanayin ciniki wanda yayi daidai da tambarin lokaci na ƙarshe. Ana samar da ma'amaloli koyaushe don baƙon da aka bayar. Kuma a cikin yanayin ƙarshe na wannan ciniki, mun ƙara wani taron kuma mun sami sabuwar jiha. Ya sake buga ClickHouse. Kuma ta hanyar argMax a cikin wannan ra'ayi na zahiri, koyaushe zamu iya samun halin yanzu.

Ka'idar da aikin amfani da ClickHouse a cikin aikace-aikace na gaske. Alexander Zaitsev (2018)

  • An “kwance” daurin daga lokacin aiki.
  • Ana adana da sarrafa har zuwa 3 biliyan ma'amaloli a kowane wata. Wannan tsari ne na girma fiye da yadda yake a Cassandra, watau a cikin tsarin kasuwanci na yau da kullun.
  • Tari na 2x5 ClickHouse sabobin. Sabar 5 kuma kowane uwar garken yana da kwafi. Wannan ma ya yi ƙasa da yadda yake a Cassandra don yin latsa tushen sifa, kuma a nan muna da tushen ra'ayi. Wato maimakon a kara adadin sabar da sau 30, sai suka yi nasarar rage su.

Ka'idar da aikin amfani da ClickHouse a cikin aikace-aikace na gaske. Alexander Zaitsev (2018)

Kuma misali na ƙarshe shine kamfanin kuɗi Y, wanda yayi nazarin alaƙar canje-canje a farashin hannun jari.

Kuma aikin shine:

  • Akwai kusan hannun jari 5.
  • Ana san maganganun kowane miliyon 100.
  • An tattara bayanan sama da shekaru 10. A bayyane, ga wasu kamfanoni ƙari, ga wasu ƙananan.
  • Akwai kusan layuka biliyan 100 gabaɗaya.

Kuma ya zama dole don lissafta alaƙar canje-canje.

Ka'idar da aikin amfani da ClickHouse a cikin aikace-aikace na gaske. Alexander Zaitsev (2018)

Anan akwai hannun jari guda biyu da maganganunsu. Idan ɗaya ya hau ɗayan kuma ya hau, to wannan haɗin gwiwa ne mai kyau, watau ɗaya ya hau, ɗayan kuma ya hau. Idan ɗaya ya hau, kamar a ƙarshen jadawali, ɗayan kuma ya sauka, to wannan dangantaka mara kyau ce, watau lokacin da ɗaya ya tashi, ɗayan ya faɗi.

Yin nazarin waɗannan canje-canjen juna, mutum zai iya yin tsinkaya a cikin kasuwar kuɗi.

Ka'idar da aikin amfani da ClickHouse a cikin aikace-aikace na gaske. Alexander Zaitsev (2018)

Amma aikin yana da wahala. Me ake yi don haka? Muna da bayanan biliyan 100 waɗanda ke da: lokaci, hannun jari, da farashi. Muna buƙatar ƙididdige farkon sau biliyan 100 na guduBambanci daga algorithm farashin. RunningDifference aiki ne a ClickHouse wanda ke ƙididdige bambanci tsakanin kirtani biyu.

Kuma bayan haka, kuna buƙatar ƙididdige haɗin kai, kuma dole ne a ƙididdige haɗin kai ga kowane nau'i. Don hannun jari 5, nau'i-nau'i miliyan 000 ne. Kuma wannan yana da yawa, watau sau 12,5 wajibi ne a lissafta irin wannan aikin haɗin gwiwa kawai.

Kuma idan wani ya manta, to ͞x kuma ͞y abokin bincike ne. samfurin tsammanin. Wato ya zama dole ba wai kawai a lissafta tushen da jimillar ba, har ma da ƙarin jimillar guda ɗaya a cikin waɗannan jimillan. Ana buƙatar yin gungu na ƙididdiga sau miliyan 12,5, har ma a haɗa su ta sa'o'i. Hakanan muna da sa'o'i masu yawa. Kuma dole ne ku yi shi a cikin dakika 60. Abin wasa ne.

Ka'idar da aikin amfani da ClickHouse a cikin aikace-aikace na gaske. Alexander Zaitsev (2018)

Dole ne a sami lokaci aƙalla ko ta yaya, saboda duk wannan yana aiki sosai da sannu a hankali kafin ClickHouse ya zo.

Ka'idar da aikin amfani da ClickHouse a cikin aikace-aikace na gaske. Alexander Zaitsev (2018)

Sun yi ƙoƙarin lissafta shi akan Hadoop, akan Spark, akan Greenplum. Kuma duk wannan ya kasance a hankali ko tsada. Wato ana iya yin lissafin ko ta yaya, amma sai ya yi tsada.

Ka'idar da aikin amfani da ClickHouse a cikin aikace-aikace na gaske. Alexander Zaitsev (2018)

Sannan ClickHouse ya zo tare kuma abubuwa sun yi kyau sosai.

Ina tunatar da ku cewa muna da matsala game da wurin bayanan, saboda ba za a iya mayar da alaƙa ba. Ba za mu iya sanya wasu bayanan akan sabar ɗaya ba, wasu akan wani kuma mu lissafta, dole ne mu sami duk bayanan a ko'ina.

Me suka yi? Da farko, bayanan suna cikin gida. Kowane uwar garken yana adana bayanai akan farashin wani sashe na hannun jari. Kuma ba sa yin karo da juna. Sabili da haka, yana yiwuwa a lissafta logReturn a layi daya da kansa, duk wannan yana faruwa a layi daya da rarraba.

Sa'an nan kuma muka yanke shawarar rage waɗannan bayanan, yayin da ba mu rasa fahimtar magana ba. Rage yin amfani da tsararru, watau kowane lokaci, yi tsararrun hannun jari da jeri na farashi. Saboda haka, yana ɗaukar sararin bayanai kaɗan. Kuma sun ɗan fi sauƙin aiki da su. Waɗannan su ne kusan ayyuka na layi ɗaya, watau muna karanta wani yanki a layi daya sannan mu rubuta zuwa uwar garken.

Bayan haka, ana iya maimaita shi. Harafin "r" yana nufin cewa mun kwafi wannan bayanan. Wato muna da bayanai iri ɗaya akan duk sabar guda uku - waɗannan su ne arrays.

Sannan tare da rubutun musamman daga wannan saiti na haɗin gwiwar miliyan 12,5 waɗanda ke buƙatar ƙididdige su, zaku iya yin fakiti. Wato, ayyuka 2 tare da nau'i-nau'i na 500. Kuma wannan aikin shine a lissafta shi akan takamaiman sabar ClickHouse. Yana da duk bayanan, saboda bayanan ɗaya ne kuma yana iya ƙididdige su.

Ka'idar da aikin amfani da ClickHouse a cikin aikace-aikace na gaske. Alexander Zaitsev (2018)

Har yanzu, wannan shine yadda yake kama. Na farko, muna da duk bayanai a cikin wannan tsari: lokaci, hannun jari, farashin. Sannan mun ƙididdige logReturn, watau bayanan tsarin iri ɗaya, amma maimakon farashi mun riga mun sami logReturn. Sa'an nan kuma aka sake gyara su, watau mun sami lokaci da groupArray don hannun jari da farashi. Maimaita Kuma bayan haka, mun ƙirƙira tarin ayyuka kuma muka ciyar da su zuwa ClickHouse don ya ƙidaya su. Kuma yana aiki.

Ka'idar da aikin amfani da ClickHouse a cikin aikace-aikace na gaske. Alexander Zaitsev (2018)

A kan tabbacin ra'ayi, aikin ya kasance ƙaramin aiki, watau, an ɗauki ƙananan bayanai. Kuma sabobin uku kacal.

Waɗannan matakai biyu na farko: ƙididdige Log_return da nannade cikin jeri ya ɗauki kusan awa ɗaya.

Kuma lissafin haɗin kai shine kimanin sa'o'i 50. Amma sa'o'i 50 bai isa ba, saboda sun kasance suna aiki tsawon makonni. Babban nasara ce. Kuma idan kun ƙidaya, to sau 70 a kowace daƙiƙa an ƙidaya komai akan wannan gungu.

Amma abu mafi mahimmanci shi ne cewa wannan tsarin a zahiri ba shi da kwalabe, watau kusan kusan layi. Kuma suka duba. Nasarar haɓaka shi.

Ka'idar da aikin amfani da ClickHouse a cikin aikace-aikace na gaske. Alexander Zaitsev (2018)

  • Tsarin da ya dace shine rabin nasara. Kuma madaidaicin makirci shine amfani da duk mahimman fasahar ClickHouse.
  • Summing/AggregatingMergeTrees fasahohi ne da ke ba ku damar tarawa ko la'akari da hoton jaha a matsayin lamari na musamman. Kuma yana sauƙaƙa abubuwa da yawa.
  • Ra'ayin Materialized yana ba ku damar ƙetare iyakar fihirisa ɗaya. Wataƙila ban faɗi shi sosai ba, amma lokacin da muka ɗora rajistan ayyukan, ɗanyen rajistan ayyukan sun kasance a cikin tebur tare da fihirisa ɗaya, kuma gunkin sifa suna cikin tebur, watau bayanai iri ɗaya, kawai tacewa, amma index ɗin gaba ɗaya ya cika. wasu. Da alama bayanai iri ɗaya ne, amma rarrabuwa daban-daban. Kuma Ra'ayin Materialized yana ba ku damar, idan kuna buƙata, don ƙetare irin wannan iyakancewar ClickHouse.
  • Rage fihirisar fihirisa don tambayoyin aya.
  • Kuma rarraba bayanan cikin wayo, yi ƙoƙarin gano bayanan a cikin uwar garken gwargwadon iko. Kuma yi ƙoƙarin tabbatar da cewa buƙatun kuma suna amfani da gurɓata wuri inda zai yiwu gwargwadon yiwuwa.

Ka'idar da aikin amfani da ClickHouse a cikin aikace-aikace na gaske. Alexander Zaitsev (2018)

Kuma a taƙaice wannan ɗan gajeren jawabin, muna iya cewa ClickHouse a yanzu ya mamaye yanki na duka bayanan kasuwanci da kuma buɗaɗɗen bayanan bayanai, watau, musamman don nazari. Ya dace daidai da wannan yanayin. Kuma menene ƙari, sannu a hankali yana fara cunkoson wasu, saboda lokacin da kake da ClickHouse, ba kwa buƙatar InfiniDB. Maiyuwa ba za a buƙaci Vertika nan da nan ba idan sun yi goyan bayan SQL na al'ada. Ji dadin!

Ka'idar da aikin amfani da ClickHouse a cikin aikace-aikace na gaske. Alexander Zaitsev (2018)

-Na gode da rahoton! Ban sha'awa sosai! Shin akwai kwatancen Apache Phoenix?

A'a, ban ji wani kwatanta ba. Mu da Yandex muna ƙoƙarin kiyaye duk kwatancen ClickHouse tare da bayanan bayanai daban-daban. Domin idan ba zato ba tsammani wani abu ya juya ya zama sauri fiye da ClickHouse, to Lesha Milovidov ba zai iya barci da dare ba kuma ya fara sauri sauri. Ban ji irin wannan kwatancen ba.

  • (Aleksey Milovidov) Apache Phoenix injin SQL ne wanda Hbase ke ba da ƙarfi. Hbase yafi dacewa don yanayin aiki mai ƙima. A can, a cikin kowane layi, ana iya samun adadin ginshiƙai na sabani tare da sunaye na sabani. Ana iya faɗi wannan game da tsarin irin su Hbase, Cassandra. Kuma ainihin tambayoyin nazari ne masu nauyi waɗanda ba za su yi aiki akai-akai a kansu ba. Ko kuna iya tunanin cewa suna aiki lafiya idan ba ku da wata gogewa tare da ClickHouse.

  • Спасибо

    • Barka da rana Na riga na yi sha'awar wannan batu, saboda ina da tsarin bincike. Amma lokacin da na kalli ClickHouse, Ina jin cewa ClickHouse ya dace sosai don nazarin taron, mai canzawa. Kuma idan ina buƙatar nazarin bayanan kasuwanci da yawa tare da tarin manyan tebur, to ClickHouse, kamar yadda na fahimta, bai dace da ni ba? Musamman idan sun canza. Shin wannan daidai ne ko akwai misalan da za su iya karyata wannan?

    • Wannan daidai ne. Kuma wannan gaskiya ne ga galibin bayanan bayanai na musamman. An keɓance su don gaskiyar cewa akwai manyan teburi ɗaya ko fiye waɗanda ke canzawa, kuma ga ƙanana da yawa waɗanda ke canzawa a hankali. Wato ClickHouse baya kama da Oracle, inda zaku iya sanya komai kuma ku gina wasu hadaddun tambayoyi. Domin amfani da ClickHouse yadda ya kamata, kuna buƙatar gina tsari ta hanyar da ke aiki da kyau a ClickHouse. Wato, guje wa daidaitawa da yawa, yi amfani da ƙamus, yi ƙoƙarin yin ɗan gajeren hanyoyin haɗin gwiwa. Kuma idan an gina makircin ta wannan hanyar, to, ana iya magance ayyukan kasuwanci iri ɗaya akan ClickHouse da kyau fiye da kan bayanan alaƙa na gargajiya.

Na gode da rahoton! Ina da tambaya game da sabuwar shari'ar kuɗi. Sun yi nazari. Ya zama dole a kwatanta yadda suke hawa da sauka. Kuma na fahimci cewa kun gina tsarin musamman don wannan nazari? Idan gobe, alal misali, suna buƙatar wani rahoto kan wannan bayanan, shin suna buƙatar sake gina tsarin da loda bayanan? Wato don yin wani nau'i na preprocessing don samun buƙatar?

Tabbas, wannan shine amfani da ClickHouse don takamaiman aiki. Ana iya magance shi fiye da al'ada a cikin Hadoop. Ga Hadoop, wannan kyakkyawan aiki ne. Amma akan Hadoop yana da hankali sosai. Kuma burina shine in nuna cewa ClickHouse na iya magance ayyukan da galibi ana warware su ta hanyoyi daban-daban, amma a lokaci guda yi shi da inganci. An keɓe wannan don takamaiman aiki. A bayyane yake cewa idan aka sami matsala da wani abu makamancin haka, to ana iya magance shi ta irin wannan hanya.

Yana da zahiri. Kun ce an sarrafa sa'o'i 50. Shin tun farko yaushe kuka loda bayanan ko aka samu sakamako?

Na iya.

Ok na gode sosai.

Wannan yana kan gungu na uwar garken 3.

Gaisuwa! Na gode da rahoton! Komai yana da ban sha'awa sosai. Ba zan yi tambaya kadan game da aikin ba, amma game da amfani da ClickHouse dangane da kwanciyar hankali. Wato shin kuna da wani, dole ne ku gyara? Yaya ClickHouse ke aikatawa a wannan yanayin? Kuma ya faru cewa kuna da kwafi kuma? Misali, mun ci karo da matsala tare da ClickHouse lokacin da har yanzu yana fita daga iyakarsa kuma ya faɗi.

Tabbas, babu tsarin da ya dace. Kuma ClickHouse shima yana da nasa matsalolin. Amma kun ji labarin Yandex.Metrica baya aiki na dogon lokaci? Wataƙila a'a. Yana aiki da dogaro tun 2012-2013 akan ClickHouse. Zan iya faɗi haka game da gwaninta. Ba mu taɓa samun cikakkiyar gazawa ba. Wasu sassa na iya faruwa, amma ba su kasance masu mahimmanci ba don yin tasiri sosai ga kasuwancin. Bai taba faruwa ba. ClickHouse yana da aminci sosai kuma baya faɗuwa da ka. Ba lallai ne ka damu da shi ba. Ba danyen abu bane. Kamfanoni da yawa sun tabbatar da hakan.

Sannu! Kun ce kuna buƙatar yin tunani kan tsarin bayanan nan da nan. Idan abin ya faru fa? Data na zuba yana zubowa. Watanni shida sun wuce, kuma na fahimci cewa ba zai yiwu a yi rayuwa irin wannan ba, Ina buƙatar sake shigar da bayanan kuma in yi wani abu tare da su.

Wannan ya dogara mana akan tsarin ku. Akwai hanyoyi da yawa don yin wannan tare da kusan babu tsayawa. Misali, za ka iya ƙirƙiri Ƙirƙirar Ƙararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwaƙwalwa na Ƙaƙa ) wanda za a iya yin taswira na musamman. Wato idan yana ba da damar yin taswira ta amfani da ClickHouse, watau cire wasu abubuwa, canza maɓalli na farko, canza partitioning, to zaku iya yin Materialized View. Rubuta tsoffin bayananku a wurin, sababbi za a rubuta su ta atomatik. Sannan kawai canza zuwa amfani da Materialized View, sannan canza rikodin kuma kashe tsohon tebur. Wannan gabaɗaya hanya ce mara tsayawa.

Спасибо.

source: www.habr.com

Add a comment