Abin da kuma dalilin da ya sa muke yi a cikin Open Source databases. Andrey Borodin (Yandex.Cloud)

Abin da kuma dalilin da ya sa muke yi a cikin Open Source databases. Andrey Borodin (Yandex.Cloud)

Za a sake duba gudunmawar Yandex ga ma'ajin bayanai masu zuwa.

  • DannaHause
  • Odyssey
  • Farfadowa zuwa aya cikin lokaci (WAL-G)
  • PostgreSQL (gami da kurakurai, Amcheck, heapcheck)
  • Greenplum

Video:

Sannu Duniya! Sunana Andrey Borodin. Kuma abin da nake yi a Yandex.Cloud shine haɓaka buɗaɗɗen bayanai na alaƙa cikin buƙatun abokan cinikin Yandex.Cloud da Yandex.Cloud.

Abin da kuma dalilin da ya sa muke yi a cikin Open Source databases. Andrey Borodin (Yandex.Cloud)

A cikin wannan magana, za mu yi magana game da ƙalubalen da ke fuskantar buɗaɗɗen bayanai a ma'auni. Me yasa yake da mahimmanci? Domin kadan, ƙananan matsalolin da, kamar sauro, sai su zama giwaye. Suna girma lokacin da kuke da gungu da yawa.

Amma wannan ba shine babban abin ba. Abubuwa masu ban mamaki suna faruwa. Abubuwan da ke faruwa a cikin daya cikin miliyan guda. Kuma a cikin yanayin girgije, dole ne ku kasance cikin shiri don hakan, saboda abubuwa masu ban mamaki sun zama mai yiwuwa sosai lokacin da wani abu ya kasance a sikelin.

Amma! Menene fa'idar buɗaɗɗen bayanai? Gaskiyar ita ce, kuna da damar da za ku iya magance kowace matsala. Kuna da lambar tushe, kuna da ilimin shirye-shirye. Muna haɗa shi kuma yana aiki.

Abin da kuma dalilin da ya sa muke yi a cikin Open Source databases. Andrey Borodin (Yandex.Cloud)

Wadanne hanyoyi ake da su wajen aiki a kan buɗaɗɗen software?

  • Hanyar da ta fi dacewa ita ce amfani da software. Idan kuna amfani da ka'idoji, idan kuna amfani da ma'auni, idan kuna amfani da tsari, idan kuna rubuta tambayoyi a cikin buɗaɗɗen software, to kun riga kun goyi bayansa.
  • Kuna inganta yanayin yanayin yanayinsa. Kuna sanya yuwuwar gano kwaro da wuri mafi girma. Kuna ƙara amincin wannan tsarin. Kuna ƙara samun masu haɓakawa a kasuwa. Kuna inganta wannan software. Kun riga kun kasance mai ba da gudummawa idan kun gama samun salo kuma kun haɗa da wani abu a can.
  • Wata hanyar da za a iya fahimta ita ce tallafawa buɗaɗɗen software. Misali, sanannen shirin Google Summer of Code, lokacin da Google ke biyan ɗimbin ɗalibai daga ko'ina cikin duniya kuɗi da za a iya fahimta don su haɓaka ayyukan buɗaɗɗen software waɗanda suka cika wasu buƙatun lasisi.
  • Wannan hanya ce mai ban sha'awa sosai saboda tana ba da damar software ta haɓaka ba tare da kawar da hankali daga al'umma ba. Google, a matsayinsa na katafaren fasaha, bai ce muna son wannan fasalin ba, muna son gyara wannan kwaro kuma a nan ne muke buƙatar tono. Google ya ce: “Ku yi abin da kuke yi. Kawai ci gaba da aiki kamar yadda kuke aiki kuma komai zai yi kyau. "
  • Hanya ta gaba don shiga cikin buɗaɗɗen tushe ita ce shiga. Lokacin da kuka sami matsala a cikin buɗaɗɗen software kuma akwai masu haɓakawa, masu haɓaka ku zasu fara magance matsalolin. Sun fara sa abubuwan more rayuwa su kasance masu inganci, shirye-shiryenku cikin sauri da aminci.

Abin da kuma dalilin da ya sa muke yi a cikin Open Source databases. Andrey Borodin (Yandex.Cloud)

Ɗaya daga cikin shahararrun ayyukan Yandex a fagen buɗaɗɗen software shine ClickHouse. Wannan ita ce bayanan da aka haifa a matsayin martani ga kalubalen da ke fuskantar Yandex.Metrica.

Kuma a matsayin tushen bayanai, an yi shi a buɗaɗɗen tushe don ƙirƙirar yanayin muhalli da haɓaka shi tare da sauran masu haɓakawa (ba a cikin Yandex kawai ba). Kuma yanzu wannan babban aiki ne wanda kamfanoni daban-daban suka shiga cikinsa.

Abin da kuma dalilin da ya sa muke yi a cikin Open Source databases. Andrey Borodin (Yandex.Cloud)

A cikin Yandex.Cloud, mun ƙirƙiri ClickHouse a saman Yandex Object Storage, watau a saman ajiyar girgije.

Abin da kuma dalilin da ya sa muke yi a cikin Open Source databases. Andrey Borodin (Yandex.Cloud)

Me yasa wannan yake da mahimmanci a cikin gajimare? Domin duk wani rumbun adana bayanai yana aiki a cikin wannan triangle, a cikin wannan pyramid, a cikin wannan matsayi na nau'ikan ƙwaƙwalwar ajiya. Kuna da sauri amma ƙananan rajista da arha manyan amma jinkirin SSDs, rumbun kwamfyuta da wasu na'urori masu toshewa. Kuma idan kun kasance masu inganci a saman dala, to kuna da rumbun adana bayanai mai sauri. idan kana da inganci a kasan wannan dala, to kana da ma'auni mai ma'auni. Kuma dangane da wannan, ƙara wani Layer daga ƙasa hanya ce mai ma'ana don haɓaka haɓakar bayanan.

Abin da kuma dalilin da ya sa muke yi a cikin Open Source databases. Andrey Borodin (Yandex.Cloud)

Ta yaya za a yi? Wannan wani muhimmin batu ne a cikin wannan rahoto.

  • Za mu iya aiwatar da ClickHouse akan MDS. MDS shine keɓantawar ajiyar girgije na Yandex na ciki. Ya fi rikitarwa fiye da ka'idar S3 na kowa, amma ya fi dacewa da na'urar toshe. Yana da kyau don rikodin bayanai. Yana buƙatar ƙarin shirye-shirye. Masu shirye-shirye za su shirya, yana da kyau har ma, yana da ban sha'awa.
  • S3 wata hanya ce ta gama gari wacce ke ba da sauƙi a sauƙaƙe a farashin ƙarancin daidaitawa ga wasu nau'ikan kayan aiki.

A zahiri, muna son samar da ayyuka ga duk yanayin muhallin ClickHouse da yin aikin da ake buƙata a cikin Yandex.Cloud, mun yanke shawarar tabbatar da cewa duk al'ummar ClickHouse za su amfana da shi. Mun aiwatar da ClickHouse akan S3, ba ClickHouse akan MDS ba. Kuma wannan aiki ne mai yawa.

Abin da kuma dalilin da ya sa muke yi a cikin Open Source databases. Andrey Borodin (Yandex.Cloud)

Tunani:

https://github.com/ClickHouse/ClickHouse/pull/7946 "Filesystem abstraction Layer"
https://github.com/ClickHouse/ClickHouse/pull/8011 "Haɗin AWS SDK S3"
https://github.com/ClickHouse/ClickHouse/pull/8649 "Tsarin aiwatar da hulɗar IDisk don S3"
https://github.com/ClickHouse/ClickHouse/pull/8356 "Haɗin injunan ajiya na log tare da ƙirar IDisk"
https://github.com/ClickHouse/ClickHouse/pull/8862 "Tallafin injin log don S3 da SeekableReadBuffer"
https://github.com/ClickHouse/ClickHouse/pull/9128 Taimakon "Ajiya Stripe Log S3"
https://github.com/ClickHouse/ClickHouse/pull/9415 Goyan bayan farko na Storage MergeTree don S3
https://github.com/ClickHouse/ClickHouse/pull/9646 "MergeTree cikakken goyon baya ga S3"
https://github.com/ClickHouse/ClickHouse/pull/10126 "Tallafi MaimaitaMergeTree akan S3"
https://github.com/ClickHouse/ClickHouse/pull/11134 "Ƙara tsoffin takaddun shaida da masu kai na al'ada don ajiyar s3"
https://github.com/ClickHouse/ClickHouse/pull/10576 "S3 tare da tsayayyen tsari na wakili"
https://github.com/ClickHouse/ClickHouse/pull/10744 "S3 tare da wakili mai warwarewa"

Wannan jerin buƙatun ja ne don aiwatar da tsarin fayil mai kama-da-wane a ClickHouse. Wannan babban adadin buƙatun ja ne.

Abin da kuma dalilin da ya sa muke yi a cikin Open Source databases. Andrey Borodin (Yandex.Cloud)

Tunani:

https://github.com/ClickHouse/ClickHouse/pull/9760 "DiskS3 hardlinks mafi kyawun aiwatarwa"
https://github.com/ClickHouse/ClickHouse/pull/11522 "Abokin ciniki na S3 HTTP - Guji kwafin rafin amsawa cikin ƙwaƙwalwar ajiya"
https://github.com/ClickHouse/ClickHouse/pull/11561 "Kauce kwafi gabaɗayan rafin amsawa zuwa ƙwaƙwalwar ajiya a cikin S3 HTTP
abokin ciniki"
https://github.com/ClickHouse/ClickHouse/pull/13076 "Ikon cache cache da index files for S3 disk"
https://github.com/ClickHouse/ClickHouse/pull/13459 "Matsar da sassa daga DiskLocal zuwa DiskS3 a layi daya"

Amma aikin bai kare a nan ba. Bayan an yi fasalin, an buƙaci ƙarin aiki don inganta wannan aikin.

Abin da kuma dalilin da ya sa muke yi a cikin Open Source databases. Andrey Borodin (Yandex.Cloud)

Tunani:

https://github.com/ClickHouse/ClickHouse/pull/12638 "Ƙara SelectedRows da SelectedBytes events"
https://github.com/ClickHouse/ClickHouse/pull/12464 "Ƙara abubuwan da suka faru daga buƙatun S3 zuwa system.events"
https://github.com/ClickHouse/ClickHouse/pull/13028 "Ƙara QueryTimeMicrosecond, ZaɓiQueryTimeMicrosecond da SakaQueryTimeMicrose seconds"

Sa'an nan kuma ya zama dole don tabbatar da ganewar asali, saita sa ido da kuma tabbatar da shi.

Kuma duk wannan an yi shi ne don dukan al'umma, da dukan muhallin ClickHouse, sun sami sakamakon wannan aikin.

Abin da kuma dalilin da ya sa muke yi a cikin Open Source databases. Andrey Borodin (Yandex.Cloud)

Bari mu matsa zuwa ma'ajin bayanai na ma'amala, zuwa rumbun adana bayanai na OLTP, wadanda suka fi kusanci da ni da kaina.

Abin da kuma dalilin da ya sa muke yi a cikin Open Source databases. Andrey Borodin (Yandex.Cloud)

Wannan shine sashin ci gaban DBMS na buɗaɗɗen tushe. Waɗannan mutanen suna yin sihirin titi don haɓaka buɗe bayanan ma'amala.

Abin da kuma dalilin da ya sa muke yi a cikin Open Source databases. Andrey Borodin (Yandex.Cloud)

Ɗaya daga cikin ayyukan, ta amfani da misalin wanda za mu iya magana game da yadda da abin da muke yi, shine Connection Pooler a Postgres.

Postgres shine bayanan tsari. Wannan yana nufin cewa ma'ajin bayanai yakamata ya kasance yana da ƴan hanyoyin haɗin yanar gizo gwargwadon yuwuwar da ke tafiyar da ma'amaloli.

A gefe guda, a cikin yanayin girgije, yanayi na yau da kullun shine lokacin da haɗi dubu suka zo gungu ɗaya lokaci ɗaya. Kuma aikin mahaɗar haɗin gwiwa shine haɗa haɗin haɗin kai dubu cikin ƙaramin adadin haɗin uwar garken.

Abin da kuma dalilin da ya sa muke yi a cikin Open Source databases. Andrey Borodin (Yandex.Cloud)

Za mu iya cewa haɗin haɗin yanar gizon shine ma'aikacin tarho wanda ke sake tsara bytes ta yadda za su iya isa ga bayanan.

Abin takaici, babu kyakkyawar kalmar Rashanci don haɗin haɗin gwiwa. Wani lokaci ana kiran shi haɗin kai na multiplexer. Idan kun san abin da za ku kira mai haɗin haɗin gwiwa, to, ku tabbata ku gaya mani, zan yi farin ciki da yin magana da harshen fasaha na Rasha daidai.

Abin da kuma dalilin da ya sa muke yi a cikin Open Source databases. Andrey Borodin (Yandex.Cloud)

https://pgconf.ru/2017/92899

Mun binciki ma'aikatan haɗin gwiwa waɗanda suka dace da gungu na postgres da aka sarrafa. Kuma PgBouncer shine mafi kyawun zaɓi a gare mu. Amma mun ci karo da matsaloli da yawa tare da PgBouncer. Shekaru da yawa da suka wuce, Volodya Borodin ya ba da rahoton cewa muna amfani da PgBouncer, muna son komai, amma akwai nuances, akwai wani abu da za a yi aiki a kai.

Abin da kuma dalilin da ya sa muke yi a cikin Open Source databases. Andrey Borodin (Yandex.Cloud)

https://pgconf.ru/media/2017/04/03/20170316H1_V.Borodin.pdf

Kuma mun yi aiki. Mun gyara matsalolin da muka ci karo da su, mun faci Bouncer, kuma muka yi ƙoƙarin tura buƙatun zuwa sama. Amma ainihin zaren guda ɗaya yana da wahala a yi aiki dashi.

Dole ne mu tattara cascades daga patched Bouncers. Lokacin da muke da Bouncers masu zare da yawa, haɗin haɗin kan saman Layer ana canja shi zuwa Layer na Bouncers na ciki. Wannan tsarin mara kyau ne wanda ke da wahalar ginawa da sikelin gaba da gaba.

Abin da kuma dalilin da ya sa muke yi a cikin Open Source databases. Andrey Borodin (Yandex.Cloud)

Mun kai ga ƙarshe cewa mun ƙirƙiri namu haɗin haɗin gwiwa, wanda ake kira Odyssey. Mun rubuta shi daga karce.

Abin da kuma dalilin da ya sa muke yi a cikin Open Source databases. Andrey Borodin (Yandex.Cloud)

https://www.pgcon.org/2019/schedule/events/1312.en.html

A cikin 2019, a taron PgCon, na gabatar da wannan mahaɗa ga ƙungiyar masu haɓakawa. Yanzu muna da ɗan ƙasa da taurari 2 akan GitHub, watau aikin yana raye, aikin ya shahara.

Kuma idan kun ƙirƙiri gungu na Postgres a cikin Yandex.Cloud, to, zai zama gungu tare da ginanniyar Odyssey, wanda aka sake daidaita shi lokacin zazzage gunkin baya ko baya.

Abin da kuma dalilin da ya sa muke yi a cikin Open Source databases. Andrey Borodin (Yandex.Cloud)

Menene muka koya daga wannan aikin? Ƙaddamar da aikin gasa ko da yaushe mataki ne mai tsauri, babban ma'auni ne idan muka ce akwai matsalolin da ba a magance su cikin sauri ba, ba a magance su a cikin tazarar lokaci da za su dace da mu. Amma wannan ma'auni ne mai tasiri.

PgBouncer ya fara haɓaka da sauri.

Kuma yanzu wasu ayyuka sun bayyana. Misali, pgagroal, wanda masu haɓaka Red Hat suka haɓaka. Suna bin manufofin iri ɗaya kuma suna aiwatar da irin wannan ra'ayi, amma, ba shakka, tare da ƙayyadaddun nasu, waɗanda ke kusa da masu haɓaka pgagroal.

Abin da kuma dalilin da ya sa muke yi a cikin Open Source databases. Andrey Borodin (Yandex.Cloud)

Wani shari'ar aiki tare da al'ummar postgres yana dawowa zuwa wani lokaci a cikin lokaci. Wannan farfadowa ne bayan gazawar, wannan farfadowa ne daga madadin.

Abin da kuma dalilin da ya sa muke yi a cikin Open Source databases. Andrey Borodin (Yandex.Cloud)

Akwai ajiya da yawa kuma duk sun bambanta. Kusan kowane mai siyar da Postgres yana da nasa maganin madadin.

Idan ka ɗauki duk tsarin ajiya, ƙirƙiri matrix fasalin kuma cikin raha da ƙididdige mai tantancewa a cikin wannan matrix, zai zama sifili. Menene ma'anar wannan? Menene idan kun ɗauki takamaiman fayil ɗin madadin, to ba za a iya haɗa shi daga guntu na duk sauran ba. Ya kebanta da aiwatar da shi, ya kebanta da manufarsa, ya kebanta da ra'ayoyin da ke cikinsa. Kuma dukkansu takamaiman ne.

Abin da kuma dalilin da ya sa muke yi a cikin Open Source databases. Andrey Borodin (Yandex.Cloud)

https://www.citusdata.com/blog/2017/08/18/introducing-wal-g-faster-restores-for-postgres/

Yayin da muke aiki kan wannan batu, CitusData ta ƙaddamar da aikin WAL-G. Wannan tsari ne na ajiya wanda aka yi tare da ido ga yanayin girgije. Yanzu CitusData ya riga ya zama ɓangare na Microsoft. Kuma a wannan lokacin, muna matukar son ra'ayoyin da aka shimfida a farkon fitowar WAL-G. Kuma mun fara ba da gudummawa ga wannan aikin.

Abin da kuma dalilin da ya sa muke yi a cikin Open Source databases. Andrey Borodin (Yandex.Cloud)

https://github.com/wal-g/wal-g/graphs/contributors

Yanzu akwai da yawa na masu haɓakawa a cikin wannan aikin, amma manyan masu ba da gudummawar 10 ga WAL-G sun haɗa da Yandexoids 6. Mun kawo ra'ayoyinmu da yawa a wurin. Kuma, ba shakka, mun aiwatar da su da kanmu, mun gwada su da kanmu, muka fitar da su zuwa samarwa da kanmu, muna amfani da su da kanmu, mu kanmu mun gano inda za mu ci gaba, yayin da muke hulɗa da manyan al'ummar WAL-G.

Abin da kuma dalilin da ya sa muke yi a cikin Open Source databases. Andrey Borodin (Yandex.Cloud)

Kuma daga ra'ayinmu, yanzu wannan tsarin ajiya, gami da la'akari da ƙoƙarinmu, ya zama mafi kyau ga yanayin girgije. Wannan shine mafi kyawun farashi na tallafawa Postgres a cikin gajimare.

Me ake nufi? Muna haɓaka kyakkyawan ra'ayi: madadin ya kamata ya kasance amintacce, mai arha don aiki da sauri da sauri don dawo da shi.

Me yasa zai zama mai arha yin aiki? Lokacin da babu abin da ya karye, bai kamata ku san kuna da madogara ba. Komai yana aiki lafiya, kuna ɓata kaɗan kaɗan gwargwadon yuwuwar CPU, kuna amfani da kaɗan daga albarkatun faifan ku gwargwadon yuwuwar, kuma kuna aika kaɗan kaɗan zuwa cibiyar sadarwar gwargwadon yiwuwa don kada ku tsoma baki tare da biyan kuɗin sabis ɗinku masu mahimmanci.

Kuma idan komai ya lalace, misali admin ya watsar da bayanan, wani abu ya ɓace, kuma kuna buƙatar komawa baya, kuna dawo da duk kuɗin, saboda kuna son dawo da bayananku cikin sauri da inganci.

Kuma mun inganta wannan ra'ayi mai sauƙi. Kuma, ga alama, mun sami nasarar aiwatar da shi.

Abin da kuma dalilin da ya sa muke yi a cikin Open Source databases. Andrey Borodin (Yandex.Cloud)

Amma ba haka kawai ba. Muna son ƙarin ƙaramin abu ɗaya. Mun so bayanai daban-daban da yawa. Ba duk abokan cinikinmu ke amfani da Postgres ba. Wasu mutane suna amfani da MySQL, MongoDB. A cikin al'umma, wasu masu haɓakawa sun goyi bayan FoundationDB. Kuma wannan jerin suna ci gaba da fadadawa.

Al'umma na son ra'ayin da ake gudanar da rumbun adana bayanai a cikin yanayin da ake sarrafawa a cikin gajimare. Kuma masu haɓakawa suna kula da bayanansu, waɗanda za'a iya samun tallafi iri ɗaya tare da Postgres tare da tsarin ajiyar mu.

Abin da kuma dalilin da ya sa muke yi a cikin Open Source databases. Andrey Borodin (Yandex.Cloud)

Menene muka koya daga wannan labarin? Samfurin mu, a matsayin rabon haɓakawa, ba layukan lamba ba ne, ba kalamai ba ne, ba fayiloli ba ne. Samfurin mu ba buƙatun ja ba ne. Waɗannan su ne ra'ayoyin da muke isarwa ga al'umma. Wannan ƙwarewar fasaha ce da motsin fasaha zuwa yanayin girgije.

Abin da kuma dalilin da ya sa muke yi a cikin Open Source databases. Andrey Borodin (Yandex.Cloud)

Akwai irin wannan bayanai kamar Postgres. Ina son ainihin Postgres. Ina ciyar da lokaci mai yawa don haɓaka tushen Postgres tare da al'umma.

Abin da kuma dalilin da ya sa muke yi a cikin Open Source databases. Andrey Borodin (Yandex.Cloud)

Amma a nan dole ne a ce Yandex.Cloud yana da shigarwa na ciki na bayanan da aka sarrafa. Kuma ya fara da dadewa a cikin Yandex.Mail. Ƙwarewar da yanzu ta haifar da sarrafa Postgres an tara shi lokacin da wasiƙar ya so ya shiga cikin Postgres.

Wasiku yana da kamanceceniya da buƙatu ga gajimare. Yana buƙatar ku sami damar yin ƙima zuwa haɓakar fa'ida mara tsammani a kowane lokaci a cikin bayananku. Kuma wasiƙar ta riga tana da kaya tare da wasu ɗaruruwan miliyoyin akwatunan wasiku na ɗimbin masu amfani waɗanda koyaushe suna yin buƙatu da yawa.

Kuma wannan babban kalubale ne ga ƙungiyar da ke haɓaka Postgres. A lokacin, duk wata matsala da muka fuskanta, an kai rahoto ga al'umma. Kuma an gyara wadannan matsalolin, kuma al’umma sun gyara su a wasu wuraren har ma a matakin biyan tallafi na wasu ma’ajin bayanai da ma mafi inganci. Wato, zaku iya aika wasiƙa zuwa PgSQL hacker kuma ku sami amsa cikin mintuna 40. Tallafin da aka biya a wasu ma'ajin bayanai na iya tunanin cewa akwai fifikon abubuwa fiye da kwarorin ku.

Yanzu shigarwa na ciki na Postgres shine wasu petabytes na bayanai. Waɗannan wasu miliyoyin buƙatun ne a cikin daƙiƙa guda. Waɗannan dubban gungu ne. Yana da girman gaske.

Amma akwai nuance. Yana rayuwa ba akan fitattun hanyoyin sadarwa ba, amma akan kayan masarufi masu sauƙi. Kuma akwai yanayin gwaji na musamman don sababbin abubuwa masu ban sha'awa.

Abin da kuma dalilin da ya sa muke yi a cikin Open Source databases. Andrey Borodin (Yandex.Cloud)

Kuma a wani lokaci a cikin yanayin gwaji mun sami sakon da ke nuna cewa an keta bambance-bambancen ciki na bayanan bayanan.

Maɓalli wani nau'i ne na dangantaka da muke sa ran za mu riƙe.

Wani mawuyacin hali a gare mu. Yana nuna cewa wasu bayanai na iya ɓacewa. Kuma asarar bayanai wani abu ne mai ban tsoro.

Babban ra'ayin da muke bi a cikin bayanan da aka sarrafa shi ne cewa ko da tare da ƙoƙari, zai yi wahala a rasa bayanai. Ko da kun cire su da gangan, kuna buƙatar yin watsi da rashin su na dogon lokaci. Tsaron bayanai addini ne da muke bi sosai.

Kuma a nan ne wani yanayi ya taso da ke nuna cewa za a iya samun yanayin da ba za mu yi shiri ba. Kuma mun fara shiri don wannan yanayin.

Abin da kuma dalilin da ya sa muke yi a cikin Open Source databases. Andrey Borodin (Yandex.Cloud)

https://commitfest.postgresql.org/23/2171/

Abu na farko da muka yi shi ne binne gundumomi daga waɗannan dubban gungu. Mun gano wanne ne daga cikin gungu a kan faifai tare da firmware mai matsala waɗanda ke rasa sabuntawar shafin bayanai. An yi alama duk lambar bayanan Postgres. Kuma mun sanya waɗancan saƙonnin da ke nuna cin zarafi na ɓangarorin ciki tare da lambar da aka ƙera don gano ɓarnatar bayanai.

Wannan faci a zahiri ya samu karbuwa a wurin al’umma ba tare da tattaunawa da yawa ba, domin a kowane yanayi na musamman a fili yake cewa wani mummunan abu ya faru kuma yana bukatar a kai rahoto ga gungumen azaba.

Abin da kuma dalilin da ya sa muke yi a cikin Open Source databases. Andrey Borodin (Yandex.Cloud)

Bayan haka, mun kai ga cewa muna da saka idanu wanda ke bincikar log. Kuma in an samu sakwanni na tuhuma, sai ya tada jami’in kula da aiki, sai jami’in kula da aikin ya gyara.

Amma! Duban rajistan ayyukan aiki ne mai arha akan gungu ɗaya kuma mai tsadar bala'i ga gungu dubu.

Mun rubuta wani kari mai suna Logerrors. Yana haifar da ra'ayi na bayanan bayanan da za ku iya zabar ƙididdiga cikin arha da sauri akan kurakuran da suka gabata. Kuma idan muna buƙatar tada ma'aikacin aiki, to, za mu gano game da wannan ba tare da bincika fayilolin gigabyte ba, amma ta hanyar ciro ƴan bytes daga tebur ɗin zanta.

An karɓi wannan tsawaita, alal misali, a cikin ma'ajiya don CentOS. Idan kana so ka yi amfani da shi, za ka iya shigar da shi da kanka. Tabbas budadden tushe ne.

Abin da kuma dalilin da ya sa muke yi a cikin Open Source databases. Andrey Borodin (Yandex.Cloud)

https://www.postgresql.org/message-id/flat/[email kariya]

Amma ba haka kawai ba. Mun fara amfani da Amcheck, tsawaita gina al'umma, don nemo sabani a cikin fihirisa.

Kuma mun gano cewa idan kun yi amfani da shi a sikelin, akwai kwari. Mun fara gyara su. An karɓi gyararmu.

Abin da kuma dalilin da ya sa muke yi a cikin Open Source databases. Andrey Borodin (Yandex.Cloud)

https://www.postgresql.org/message-id/flat/[email kariya]

Mun gano cewa wannan tsawaita ba zai iya yin nazarin fihirisar GiST & GIT ba. Muka ba su goyon baya. Amma wannan tallafin har yanzu al'umma suna tattaunawa, saboda wannan sabon aiki ne kuma akwai cikakkun bayanai a can.

Abin da kuma dalilin da ya sa muke yi a cikin Open Source databases. Andrey Borodin (Yandex.Cloud)

https://commitfest.postgresql.org/29/2667/

Kuma mun kuma gano cewa lokacin bincika alamomi don cin zarafi akan jagoran kwafi, akan maigidan, komai yana aiki da kyau, amma akan kwafi, akan mai bi, binciken cin hanci da rashawa ba shi da tasiri sosai. Ba duk sãɓãwar launukansa ake duba ba. Kuma daya sabani ya dame mu sosai. Kuma mun shafe shekara daya da rabi muna tattaunawa da al’umma domin samun damar wannan cak na kwafi.

Mun rubuta code wanda ya kamata ya bi duk iya ... ladabi. Mun tattauna wannan facin na ɗan lokaci tare da Peter Gaghan daga Crunchy Data. Dole ne ya ɗan ɗan gyara itacen B da ke cikin Postgres don karɓar wannan facin. Aka karbe shi. Kuma yanzu duba fihirisa akan kwafi shima ya yi tasiri sosai don gano ta'addancin da muka fuskanta. Wato, waɗannan su ne keta haddi da kurakurai a cikin firmware na diski, bugs a cikin Postgres, kwari a cikin kernel Linux, da matsalolin hardware. Mafi yawan jeri na tushen matsalolin da muke shiryawa.

Abin da kuma dalilin da ya sa muke yi a cikin Open Source databases. Andrey Borodin (Yandex.Cloud)

https://www.postgresql.org/message-id/flat/38AF687F-8F6B-48B4-AB9E-A60CFD6CC261%40enterprisedb.com#0e86a12c01d967bac04a9bf83cd337cb

Amma baya ga fihirisa, akwai irin wannan bangare kamar tsibi, watau wurin da ake adana bayanan. Kuma babu ɓangarorin da yawa waɗanda za a iya bincika.

Muna da kari mai suna Heapcheck. Mun fara bunkasa shi. Kuma a cikin layi daya, tare da mu, EnterpriseDB kamfanin ma ya fara rubuta wani module, wanda suka kira Heapcheck kamar yadda. Mu kawai muka kira shi PgHeapcheck, kuma kawai sun kira shi Heapcheck. Suna da shi tare da ayyuka iri ɗaya, sa hannu na ɗan bambanta, amma tare da ra'ayoyi iri ɗaya. Sun aiwatar da su da kyau a wasu wurare. Kuma sun buga shi a budaddiyar madogara a baya.

Kuma a yanzu muna bunkasa su, domin ba fadada su ba ne, sai dai fadada al’umma. Kuma a nan gaba, wannan wani bangare ne na kwaya da za a ba kowa don ya san matsalolin da za su fuskanta a gaba.

Abin da kuma dalilin da ya sa muke yi a cikin Open Source databases. Andrey Borodin (Yandex.Cloud)

https://www.postgresql.org/message-id/flat/fe9b3722df94f7bdb08768f50ee8fe59%40postgrespro.ru

A wasu wuraren, har ma mun kai ga ƙarshe cewa muna da tabbataccen gaskiya a cikin tsarin sa ido. Misali, tsarin 1C. Lokacin amfani da bayanai, Postgres wani lokaci yana rubuta bayanai a ciki wanda zai iya karantawa, amma pg_dump ba ya iya karantawa.

Wannan lamarin ya yi kama da cin hanci da rashawa ga tsarin gano matsalolin mu. An tada jami'in tsaro. Jami'in tsaro ya dubi abin da ke faruwa. Bayan wani lokaci, wani abokin ciniki ya zo ya ce ina da matsala. Ma'aikacin ya bayyana menene matsalar. Amma matsalar tana cikin core Postgres.

Na sami tattaunawa game da wannan fasalin. Kuma ya rubuta cewa mun ci karo da wannan siffa kuma ba ta da daɗi, mutum ya tashi da daddare don gane ko menene.

Abin da kuma dalilin da ya sa muke yi a cikin Open Source databases. Andrey Borodin (Yandex.Cloud)

https://www.postgresql.org/message-id/flat/fe9b3722df94f7bdb08768f50ee8fe59%40postgrespro.ru

Al'ummar sun amsa, "Oh, da gaske muna buƙatar gyara shi."

Ina da kwatanci mai sauƙi. Idan kuna tafiya a cikin takalmin da ke da ƙwayar yashi a ciki, to, bisa manufa, za ku iya ci gaba - babu matsala. Idan kun sayar da takalma ga dubban mutane, to, bari mu yi takalma ba tare da yashi ba kwata-kwata. Kuma idan ɗaya daga cikin masu amfani da takalmanku zai yi tseren marathon, to kuna son yin takalma masu kyau sosai, sannan ku daidaita su ga duk masu amfani da ku. Kuma irin waɗannan masu amfani da ba zato ba tsammani koyaushe suna cikin yanayin girgije. A koyaushe akwai masu amfani waɗanda ke yin amfani da tarin ta wata hanya ta asali. Dole ne a koyaushe ku shirya don wannan.

Abin da kuma dalilin da ya sa muke yi a cikin Open Source databases. Andrey Borodin (Yandex.Cloud)

Me muka koya a nan? Mun koyi abu mai sauƙi: abu mafi mahimmanci shi ne bayyana wa al'umma cewa akwai matsala. Idan al'umma sun gane matsalar, to, gasar dabi'a ta tashi don magance matsalar. Domin kowa yana son magance wata muhimmiyar matsala. Duk dillalai, duk masu kutse sun fahimci cewa su kansu zasu iya taka wannan rake, don haka suna son kawar da su.

Idan kana aiki a kan matsala, amma ba kowa ya dame shi ba sai kai, amma ka yi aiki da shi bisa tsari kuma an dauke shi a matsayin matsala, to lallai za a karbi buƙatar ka. Za a karɓi facin ku, haɓakawar ku ko ma buƙatun inganta al'umma za su sake duba su. A ƙarshen rana, muna sa ma'auni mafi kyau ga juna.

Abin da kuma dalilin da ya sa muke yi a cikin Open Source databases. Andrey Borodin (Yandex.Cloud)

Bayanai mai ban sha'awa shine Greenplum. Rubuce-rubuce ce mai kamanceceniya da yawa dangane da Postgres codebase, wanda na saba da shi.

Abin da kuma dalilin da ya sa muke yi a cikin Open Source databases. Andrey Borodin (Yandex.Cloud)

https://greenplum.org/greenplum-database-tables-compression/

Kuma Greenplum yana da ayyuka masu ban sha'awa - append ingantattun tebur. Waɗannan teburi ne waɗanda zaku iya ƙarawa cikin sauri. Suna iya zama ko dai columnar ko jere.

Amma babu tari, watau babu wani aiki inda za ka iya tsara bayanan da ke cikin tebur daidai da tsari wanda ke cikin ɗaya daga cikin fihirisa.

Mutanen da ke cikin tasi sun zo wurina suka ce: “Andrey, ka san Postgres. Kuma a nan kusan iri ɗaya ne. Canja zuwa minti 20. Ka dauka ka yi.” Na yi tunanin cewa eh, na san Postgres, canzawa na mintuna 20 - Ina buƙatar yin wannan.

Abin da kuma dalilin da ya sa muke yi a cikin Open Source databases. Andrey Borodin (Yandex.Cloud)

https://github.com/greenplum-db/gpdb/commit/179feb77a034c2547021d675082aae0911be40f7

Amma a'a, ba mintuna 20 ba ne, na rubuta shi tsawon watanni. A taron PgConf.Russia, na tuntuɓi Heikki Linakangas daga Pivotal kuma na tambaya: “Shin akwai wata matsala da wannan? Me ya sa babu append ingantattun tarin teburi?" Ya ce: “Ku ɗauki bayanan. Kuna tsarawa, kuna sake tsarawa. Aiki ne kawai." Ni: "Oh, eh, kawai kuna buƙatar ɗauka ku yi." Ya ce: “Ee, muna bukatar hannaye masu ’yanci don yin wannan.” Na yi tunanin cewa lallai ina bukatar yin wannan.

Kuma bayan 'yan watanni na ƙaddamar da buƙatar ja wanda ya aiwatar da wannan aikin. Pivotal ya duba wannan buƙatar ja tare da al'umma. Tabbas, akwai kurakurai.

Abin da kuma dalilin da ya sa muke yi a cikin Open Source databases. Andrey Borodin (Yandex.Cloud)

https://github.com/greenplum-db/gpdb/issues/10150

Amma abu mafi ban sha'awa shine lokacin da aka haɗa wannan buƙatar ja, an sami kwari a cikin Greenplum kanta. Mun gano cewa tudun tebur wani lokaci suna karya ciniki lokacin da aka taru. Kuma wannan abu ne da ya kamata a gyara shi. Kuma tana wurin da na taba. Kuma halayena na dabi'a shine - lafiya, bari in yi wannan kuma.

Abin da kuma dalilin da ya sa muke yi a cikin Open Source databases. Andrey Borodin (Yandex.Cloud)

https://github.com/greenplum-db/gpdb/pull/10290

Na gyara wannan kwaro. An aika buƙatar ja ga masu gyara. An kashe shi.

Abin da kuma dalilin da ya sa muke yi a cikin Open Source databases. Andrey Borodin (Yandex.Cloud)

https://github.com/greenplum-db/gpdb-postgres-merge/pull/53

Bayan haka ya bayyana cewa ana buƙatar samun wannan aikin a cikin nau'in Greenplum don PostgreSQL 12. Wato, wasan kwaikwayo na minti 20 yana ci gaba da sababbin abubuwan ban sha'awa. Yana da ban sha'awa don taɓa ci gaban halin yanzu, inda al'umma ke yanke sabbin abubuwa masu mahimmanci. An daskare.

Abin da kuma dalilin da ya sa muke yi a cikin Open Source databases. Andrey Borodin (Yandex.Cloud)

https://github.com/greenplum-db/gpdb/pull/10565

Amma abin bai kare a nan ba. Bayan komai, sai ya juya cewa muna buƙatar rubuta takardu don duk wannan.

Na fara rubuta takardu. Sa'ar al'amarin shine, 'yan jarida daga Pivotal sun zo tare. Turanci shine harshensu na asali. Sun taimake ni da takardun. A gaskiya ma, su da kansu sun sake rubuta abin da na tsara zuwa Turanci na gaske.

Kuma a nan, zai zama alama, kasada ta ƙare. Kuma ka san abin da ya faru a lokacin? Mutanen da ke cikin tasi sun zo wurina suka ce: “Har yanzu akwai abubuwan ban sha’awa guda biyu, kowanne na minti 10.” Kuma me zan gaya musu? Na ce yanzu zan ba da rahoto a kan sikelin, sa'an nan kuma za mu ga abubuwan da suka faru, saboda wannan aiki ne mai ban sha'awa.

Abin da kuma dalilin da ya sa muke yi a cikin Open Source databases. Andrey Borodin (Yandex.Cloud)

Menene muka koya daga wannan lamarin? Domin aiki tare da bude tushen koyaushe yana aiki tare da takamaiman mutum, koyaushe yana aiki tare da al'umma. Domin a kowane mataki na yi aiki tare da wasu masu haɓakawa, wasu masu gwadawa, wasu hacker, wasu masu ba da labari, wasu masu zane-zane. Ban yi aiki da Greenplum ba, na yi aiki tare da mutanen da ke kusa da Greenplum.

Amma! Akwai wani muhimmin batu - aiki ne kawai. Wato ka zo, ka sha kofi, ka rubuta code. Duk nau'ikan sauye-sauye masu sauƙi suna aiki. Yi shi kullum - zai yi kyau! Kuma aiki ne mai ban sha'awa. Akwai buƙatar wannan aikin daga abokan cinikin Yandex.Cloud, masu amfani da gungu na mu duka a cikin Yandex da waje. Kuma ina tsammanin yawan ayyukan da muke shiga za su karu kuma zurfin shigar mu ma zai karu.

Shi ke nan. Mu ci gaba zuwa ga tambayoyin.

Abin da kuma dalilin da ya sa muke yi a cikin Open Source databases. Andrey Borodin (Yandex.Cloud)

Tambayoyi zaman

Sannu! Muna da wani zaman tambaya da amsa. Kuma a cikin studio Andrei Borodin. Wannan shine mutumin da ya gaya muku game da gudummawar Yandex.Cloud da Yandex don buɗe tushen. Rahotonmu a yanzu ba gaba ɗaya ba ne game da Cloud, amma a lokaci guda muna dogara ne akan irin waɗannan fasahohin. Ba tare da abin da kuka yi a cikin Yandex ba, ba za a sami sabis a cikin Yandex.Cloud ba, don haka na gode daga gare ni da kaina. Kuma tambaya ta farko daga watsa shirye-shiryen: "Mene ne kowane aikin da kuka ambata ya rubuta a kai?"

An rubuta tsarin madadin a WAL-G a cikin Go. Wannan daya ne daga cikin sabbin ayyukan da muka yi aiki akai. A zahiri yana da shekaru 3 kacal. Kuma rumbun adana bayanai galibi akan dogaro ne. Kuma wannan yana nufin cewa ma'ajin bayanai sun tsufa kuma galibi ana rubuta su a cikin C. Aikin Postgres ya fara kusan shekaru 30 da suka gabata. Sa'an nan C89 shine zabi mai kyau. Kuma an rubuta Postgres akansa. Ƙarin bayanai na zamani kamar ClickHouse yawanci ana rubuta su a cikin C++. Duk ci gaban tsarin yana dogara ne akan C da C++.

Tambaya daga manajan kuɗin mu, wanda ke da alhakin kashe kuɗi a Cloud: "Me yasa Cloud ke kashe kuɗi don tallafawa buɗe tushen?"

Akwai amsa mai sauƙi ga mai sarrafa kuɗi a nan. Muna yin wannan don inganta ayyukanmu. A waɗanne hanyoyi ne za mu iya yin mafi kyau? Za mu iya yin abubuwa cikin inganci, da sauri, kuma mu sa abubuwa su fi girma. Amma a gare mu, wannan labarin da farko game da dogara ne. Misali, a cikin tsarin ajiya muna yin bitar 100% na facin da suka shafi shi. Mun san menene lambar. Kuma mun fi jin daɗin mirgine sabbin nau'ikan don samarwa. Wato, da farko, shi ne game da amincewa, game da shirye-shiryen ci gaba da kuma dogara

Wata tambaya: "Shin bukatun masu amfani da waje waɗanda ke zaune a cikin Yandex.Cloud sun bambanta da masu amfani na ciki waɗanda ke zaune a cikin Cloud na ciki?"

Bayanin nauyin kaya, ba shakka, daban ne. Amma daga ra'ayi na sashen na, duk lokuta na musamman da ban sha'awa an halicce su akan nauyin da ba daidai ba. Masu haɓakawa tare da hasashe, masu haɓakawa waɗanda ke yin abin da ba zato ba tsammani, ana iya samun su a ciki da waje. Dangane da haka, dukkanmu kusan iri daya ne. Kuma, tabbas, kawai muhimmin fasalin a cikin aikin Yandex na bayanan bayanai shine cewa a cikin Yandex muna da koyarwa. A wani lokaci, wani yanki na samuwa gaba ɗaya yana shiga cikin inuwa, kuma duk ayyukan Yandex dole ne su ci gaba da aiki duk da haka. Wannan ɗan bambanci ne. Amma yana haifar da ci gaba mai yawa na bincike a mahaɗin bayanan bayanai da tari na cibiyar sadarwa. In ba haka ba, shigarwa na waje da na ciki suna haifar da buƙatun iri ɗaya don fasali da buƙatun makamantansu don haɓaka aminci da aiki.

Tambaya ta gaba: "Yaya kuke ji game da gaskiyar cewa yawancin abubuwan da kuke yi wasu Gizagizai ne ke amfani da su?" Ba za mu bayyana takamaiman sunayen ba, amma yawancin ayyukan da aka yi a cikin Yandex.Cloud ana amfani da su a cikin gajimare na wasu.

Wannan yana da kyau. Na farko, alama ce cewa mun yi wani abu daidai. Kuma yana tozarta girman kai. Kuma muna da tabbaci cewa mun yanke shawara mai kyau. A gefe guda, wannan shine fatan cewa a nan gaba wannan zai kawo mana sababbin ra'ayoyi, sababbin buƙatun daga masu amfani na ɓangare na uku. Yawancin batutuwan da ke kan GitHub an ƙirƙira su ne ta hanyar masu gudanar da tsarin guda ɗaya, ɗaya DBAs, masu ginin gine-gine, injiniyoyi guda ɗaya, amma wani lokacin mutanen da ke da gogewar tsari suna zuwa suna cewa a cikin 30% na wasu lokuta muna da wannan matsalar kuma bari mu yi tunanin yadda za a magance ta. Wannan shi ne abin da muke fata da yawa. Muna sa ido don raba gogewa tare da sauran dandamali na girgije.

Kun yi magana da yawa game da tseren marathon. Na san cewa kun yi gudun fanfalaki a Moscow. Saboda? Cire mutanen daga PostgreSQL?

A'a, Oleg Bartunov yana aiki da sauri. Ya karasa awa daya a gabana. Gabaɗaya, na yi farin ciki da nisan da na samu. A gare ni, kammalawa kawai nasara ce. Gabaɗaya, abin mamaki ne cewa akwai masu gudu da yawa a cikin al'ummar postgres. Ga alama a gare ni cewa akwai wani nau'i na dangantaka tsakanin wasanni na motsa jiki da kuma sha'awar shirye-shiryen tsarin.

Kuna cewa babu masu gudu a ClickHouse?

Na san tabbas suna can. ClickHouse kuma ma'adanin bayanai ne. Af, yanzu Oleg yana rubuta mani: "Za mu je gudu bayan rahoton?" Wannan babban ra'ayi ne.

Wata tambaya daga watsa shirye-shirye daga Nikita: "Me ya sa kuka gyara kwaro a cikin Greenplum da kanku kuma ba ku ba wa yara ba?" Gaskiya ne, ba a bayyana ainihin abin da kwaro yake da kuma a wace sabis ba, amma mai yiwuwa yana nufin wanda kuka yi magana akai.

Ee, bisa ƙa'ida, da an ba wa wani. Lambar code ce kawai na canza. Kuma abu ne na halitta don ci gaba da yin shi nan da nan. A ka'ida, ra'ayin raba gwaninta tare da ƙungiyar shine kyakkyawan ra'ayi. Tabbas za mu raba ayyukan Greenplum tsakanin duk membobin sashin mu.

Tunda muna maganar kananan yara, ga tambaya. Mutumin ya yanke shawarar ƙirƙirar aikin farko a Postgres. Menene ya kamata ya yi don yin na farko?

Wannan tambaya ce mai ban sha'awa: "A ina zan fara?" Yawancin lokaci yana da wuya a fara da wani abu a cikin kwaya. A cikin Postgres, alal misali, akwai jerin abubuwan da za a yi. Amma a gaskiya, wannan takarda ce ta abin da suka yi ƙoƙari su yi, amma ba su yi nasara ba. Wadannan abubuwa ne masu rikitarwa. Kuma yawanci za ku iya samun wasu abubuwan amfani a cikin muhallin halittu, wasu kari waɗanda za a iya inganta su, waɗanda ke jawo hankali kaɗan daga masu haɓaka kernel. Kuma, bisa ga haka, akwai ƙarin maki don haɓaka a can. A cikin shirin bazara na Google, kowace shekara al'ummar postgres suna gabatar da batutuwa daban-daban waɗanda za a iya magance su. A wannan shekarar muna da, ina tsammanin, dalibai uku. Wani ma ya rubuta a cikin WAL-G akan batutuwa masu mahimmanci ga Yandex. A cikin Greenplum, komai ya fi sauƙi fiye da a cikin al'ummar Postgres, saboda Greenplum hackers suna kula da buƙatun ja da kyau kuma suna fara bita nan da nan. Aika faci zuwa Postgres shine al'amarin watanni, amma Greenplum zai zo a cikin rana kuma ya ga abin da kuka yi. Wani abu kuma shine Greenplum yana buƙatar magance matsalolin yanzu. Greenplum ba a amfani da shi sosai, don haka gano matsalar ku yana da wahala sosai. Kuma da farko, muna buƙatar magance matsalolin, ba shakka.

source: www.habr.com