Muna rubutawa a cikin PostgreSQL akan hasken rana: 1 mai watsa shiri, rana 1, 1TB

Kwanan nan na gaya muku yadda, ta amfani da daidaitattun girke-girke haɓaka aikin tambayoyin karanta SQL daga bayanan bayanan PostgreSQL. A yau za mu yi magana game da yadda ana iya yin rikodi da inganci a cikin ma'ajin bayanai ba tare da yin amfani da kowane "karkatar da" ba a cikin saitin - kawai ta hanyar tsara tsarin tafiyar da bayanai daidai.

Muna rubutawa a cikin PostgreSQL akan hasken rana: 1 mai watsa shiri, rana 1, 1TB

#1. Sashe

Labari game da yadda da kuma dalilin da ya sa ya kamata a tsara shi amfani da partitioning "a theory" ya riga ya kasance, a nan za mu yi magana game da al'adar yin amfani da wasu hanyoyi a cikin mu sabis na saka idanu don ɗaruruwan sabar PostgreSQL.

"Abubuwan da suka wuce..."

Da farko, kamar kowane MVP, aikinmu ya fara ƙarƙashin nauyi mai sauƙi - an gudanar da sa ido ne kawai don sabobin mafi mahimmanci guda goma, duk tebur ɗin sun kasance kaɗan… , kuma mun sake ƙoƙarin yin wani abu da ɗayan Tables 1.5TB a girman, Mun gane cewa ko da yake yana yiwuwa a ci gaba da rayuwa kamar wannan, yana da matukar damuwa.

Lokutan sun kasance kamar lokutan almara, nau'ikan PostgreSQL 9.x daban-daban sun dace, don haka duk rabuwar dole ne a yi “da hannu” - ta hanyar gadon tebur da jawo kewayawa tare da tsauri EXECUTE.

Muna rubutawa a cikin PostgreSQL akan hasken rana: 1 mai watsa shiri, rana 1, 1TB
Maganin da aka samu ya zama isashen duniya wanda za a iya fassara shi zuwa duk teburi:

  • An bayyana teburin iyaye na “header” mara komai, wanda ya bayyana duka fihirisa masu dacewa da abubuwan da ke jawowa.
  • An yi rikodin daga ra'ayi na abokin ciniki a cikin tebur "tushen", kuma ta amfani da ciki magudanar hanya BEFORE INSERT an saka rikodin "a zahiri" a cikin sashin da ake buƙata. Idan babu irin wannan abu tukuna, mun sami banbanci kuma…
  • …da amfani CREATE TABLE ... (LIKE ... INCLUDING ...) an ƙirƙira bisa samfurin tebur na iyaye sashe tare da ƙuntatawa akan ranar da ake sota yadda idan aka dawo da bayanai, a cikinsa kawai ake yin karatu.

PG10: ƙoƙari na farko

Amma rabon gado ta tarihi bai dace da ma'amala da rafi mai aiki da rubutu ko adadi mai yawa na ɓangaren yara ba. Misali, zaku iya tuna cewa algorithm don zaɓar sashin da ake buƙata yana da hadaddun quadratic, cewa yana aiki tare da sassan 100+, ku da kanku kun fahimci yadda ...

A cikin PG10 an inganta wannan yanayin sosai ta hanyar aiwatar da tallafi na asali partitioning. Saboda haka, nan da nan muka yi ƙoƙari mu yi amfani da shi nan da nan bayan hijirar ma'ajiyar, amma ...

Kamar yadda ya faru bayan haƙa ta cikin littafin, tebur ɗin da aka raba a cikin wannan sigar shine:

  • baya goyan bayan kwatancen fihirisa
  • baya goyon bayan tsokana akansa
  • ba zai iya zama "zuriyar" kowa ba
  • kada ku goyi baya INSERT ... ON CONFLICT
  • ba zai iya samar da sashe ta atomatik ba

Bayan da aka yi masa raɗaɗi a goshi tare da rake, mun fahimci cewa ba zai yiwu ba a yi ba tare da gyara aikace-aikacen ba, kuma mun dage ci gaba da bincike na tsawon watanni shida.

PG10: dama ta biyu

Don haka, mun fara magance matsalolin da suka taso daya bayan daya:

  1. Domin jawowa da kuma ON CONFLICT Mun gano cewa har yanzu muna bukatar su nan da can, don haka muka yi tsaka-tsakin mataki don aiwatar da su tebur wakili.
  2. An rabu da "routing" a cikin jawo - wato, daga EXECUTE.
  3. Suka fita daban tebur samfuri tare da duk fihirisata yadda ba su ma kasance a cikin tebur ɗin wakili ba.

Muna rubutawa a cikin PostgreSQL akan hasken rana: 1 mai watsa shiri, rana 1, 1TB
A ƙarshe, bayan duk wannan, mun raba babban tebur na asali. Ƙirƙirar sabon sashe har yanzu an bar shi ga lamirin aikace-aikacen.

Kamus na "Sawing".

Kamar yadda a cikin kowane tsarin nazari, mu ma muna da "gaskiya" da "yanke" (kamus). A cikin mu, a cikin wannan matsayi sun yi aiki, misali. samfurin jiki irin wannan a hankali tambayoyin ko kuma rubutun tambayar da kansa.

"Gaskiya" an raba su da rana na dogon lokaci, don haka a hankali mun share sassan da ba su daɗe ba, kuma ba su dame mu ba (lokulan!). Amma an sami matsala game da ƙamus...

Ba wai a ce akwai su da yawa ba, amma kusan 100TB na "gaskiya" ya haifar da ƙamus na 2.5TB. Ba za ku iya dacewa da share wani abu daga irin wannan tebur ɗin ba, ba za ku iya matsa shi cikin isasshen lokaci ba, kuma a hankali rubutu zuwa gare shi ya zama mai hankali.

Kamar ƙamus ... a cikinsa, kowace shigarwa yakamata a gabatar da ita daidai sau ɗaya ... kuma wannan daidai ne, amma!.. Babu wanda ya hana mu samun. ƙamus daban na kowace rana! Haka ne, wannan yana kawo wani sakewa, amma yana ba da izini:

  • rubuta/karanta sauri saboda ƙananan girman sashe
  • cinye ƙarancin ƙwaƙwalwar ajiya ta hanyar yin aiki tare da ƙarin ƙayyadaddun bayanai
  • adana ƙasa da bayanai saboda ikon da sauri cire tsohon

A sakamakon dukkan hadaddun matakan An rage nauyin CPU da ~ 30%, nauyin faifai ta ~ 50%:

Muna rubutawa a cikin PostgreSQL akan hasken rana: 1 mai watsa shiri, rana 1, 1TB
A lokaci guda, mun ci gaba da rubuta ainihin abu ɗaya a cikin ma'ajin bayanai, kawai tare da ƙarancin kaya.

#2. Juyin bayanai da sake fasalin bayanai

Don haka muka daidaita a kan abin da muke da shi kowace rana tana da sashenta da data. A gaskiya, CHECK (dt = '2018-10-12'::date) - kuma akwai maɓallin rarrabawa da yanayin rikodin faɗawa cikin takamaiman sashe.

Tun da duk rahotannin da ke cikin sabis ɗinmu an gina su a cikin mahallin takamaiman kwanan wata, maƙasudin su tun “lokutan da ba a raba su” sun kasance iri-iri. (Server, Kwanan wata, Samfurin Tsari), (Server, Kwanan wata, Tsari kumburi), (Kwanan wata, Ajin Kuskure, Server)...

Amma yanzu suna rayuwa a kowane sashe kwafin ku kowane irin wannan fihirisa... Kuma a cikin kowane sashe kwanan wata na dindindin... Ya zama cewa yanzu muna cikin kowane irin wannan index kawai shigar da akai a matsayin daya daga cikin filayen, wanda ke kara yawan sautinsa da lokacin nemansa, amma ba ya kawo wani sakamako. Suka bar rake a ransu, kash...

Muna rubutawa a cikin PostgreSQL akan hasken rana: 1 mai watsa shiri, rana 1, 1TB
Hanyar ingantawa a bayyane yake - mai sauƙi cire filin kwanan wata daga duk fihirisa akan teburi da aka raba. Idan aka ba da kundin mu, riba tana kusa 1TB/mako!

Yanzu bari mu lura cewa wannan terabyte har yanzu dole ne a yi rikodin ko ta yaya. Wato mu ma faifan ya kamata yanzu yayi ƙasa da ƙasa! Wannan hoton yana nuna a fili tasirin da aka samu daga tsaftacewa, wanda muka sadaukar da mako guda:

Muna rubutawa a cikin PostgreSQL akan hasken rana: 1 mai watsa shiri, rana 1, 1TB

#3. "Yaɗa" mafi girman lodi

Ɗaya daga cikin manyan matsalolin tsarin da aka ɗora shine m aiki tare wasu ayyuka da ba sa bukatarsa. Wani lokaci "saboda ba su lura ba", wani lokacin "yana da sauƙi a haka", amma ba da daɗewa ba dole ne ku kawar da shi.

Bari mu zuƙowa a kan hoton da ya gabata mu ga cewa muna da faifai "famfofi" a ƙarƙashin kaya tare da amplitude biyu tsakanin samfuran da ke kusa, wanda a fili "ƙididdiga" bai kamata ya faru tare da adadin ayyuka ba:

Muna rubutawa a cikin PostgreSQL akan hasken rana: 1 mai watsa shiri, rana 1, 1TB

Wannan abu ne mai sauƙin cimma. Mun riga mun fara sa ido kusan sabobin 1000, kowanne ana sarrafa shi ta hanyar zaren ma'ana daban, kuma kowane zaren yana sake saita bayanan da aka tara don aikawa zuwa ma'ajin bayanai a wani mitar, wani abu kamar haka:

setInterval(sendToDB, interval)

Matsalar a nan ta ta'allaka ne a kan gaskiyar hakan duk zaren suna farawa a kusan lokaci guda, don haka lokutan aika su kusan koyaushe suna daidai “zuwa ma’ana.” Kash #2...

Abin farin ciki, wannan yana da sauƙin gyarawa, ƙara "bazuwar" gudu-up da lokaci:

setInterval(sendToDB, interval * (1 + 0.1 * (Math.random() - 0.5)))

#4. Muna adana abin da muke bukata

Matsala ta uku na gargajiya ta highload shine babu cache inda yake iya zama.

Misali, mun ba da damar yin nazari cikin sharuddan tsarin nodes (duk waɗannan Seq Scan on users), amma nan da nan tunanin cewa su ne, a mafi yawan bangare, iri ɗaya - sun manta.

A'a, ba shakka, ba a sake rubuta wani abu zuwa ga ma'adanar bayanai ba, wannan yana yanke abin tayar da hankali INSERT ... ON CONFLICT DO NOTHING. Amma wannan bayanan har yanzu yana isa ga ma'aunin bayanai, kuma ba lallai ba ne karanta don bincika rikici dole a yi. Kash #3...

Bambanci a cikin adadin bayanan da aka aika zuwa bayanan bayanan kafin/bayan an kunna caching a bayyane yake:

Muna rubutawa a cikin PostgreSQL akan hasken rana: 1 mai watsa shiri, rana 1, 1TB

Kuma wannan shine rakiyar digowar lodin ajiya:

Muna rubutawa a cikin PostgreSQL akan hasken rana: 1 mai watsa shiri, rana 1, 1TB

Jimlar

"Terabyte-per-day" kawai yana jin tsoro. Idan kun yi komai daidai, to wannan shine kawai 2^40 bytes / 86400 seconds = ~ 12.5MB/swanda hatta faifan IDE na tebur suna riƙe. 🙂

Amma da gaske, ko da tare da ninki goma "skew" na kaya yayin rana, zaku iya saduwa da iyawar SSDs na zamani.

Muna rubutawa a cikin PostgreSQL akan hasken rana: 1 mai watsa shiri, rana 1, 1TB

source: www.habr.com

Add a comment