Babban haɓakawa na tambayoyin PostgreSQL. Kirill Borovikov (Tensor)

Rahoton ya gabatar da wasu hanyoyin da ke ba da izini saka idanu akan aikin tambayoyin SQL lokacin da akwai miliyoyin su kowace rana, kuma akwai ɗaruruwan sabar PostgreSQL da ake sa ido.

Wadanne hanyoyin fasaha ne ke ba mu damar aiwatar da irin wannan adadin bayanai yadda ya kamata, kuma ta yaya wannan ke sa rayuwar mai haɓakawa ta gari cikin sauƙi?


Wanene ke sha'awar? nazarin takamaiman matsaloli da dabaru na ingantawa daban-daban Tambayoyin SQL da warware matsalolin DBA na yau da kullun a cikin PostgreSQL - zaku iya kuma karanta jerin labarai akan wannan batu.

Babban haɓakawa na tambayoyin PostgreSQL. Kirill Borovikov (Tensor)
Sunana Kirill Borovikov, na wakilci Kamfanin Tensor. Musamman, na ƙware a yin aiki tare da bayanan bayanai a cikin kamfaninmu.

A yau zan gaya muku yadda muke haɓaka tambayoyin, lokacin da ba kwa buƙatar "ɗaba" aikin tambaya ɗaya, amma warware matsalar gaba ɗaya. Lokacin da akwai miliyoyin buƙatun, kuma kuna buƙatar nemo wasu hanyoyin magancewa wannan babbar matsala.

Gabaɗaya, Tensor na abokan cinikinmu miliyan ne VLSI shine aikace-aikacen mu: sadarwar zamantakewar kamfanoni, mafita don sadarwar bidiyo, don kwararar takardu na ciki da na waje, tsarin lissafin kuɗi don lissafin kuɗi da ɗakunan ajiya, ... Wato, irin wannan "mega-combine" don gudanar da harkokin kasuwanci mai haɗaka, wanda akwai fiye da 100 daban-daban. ayyukan ciki.

Don tabbatar da cewa dukkansu suna aiki kuma suna ci gaba kamar yadda aka saba, muna da cibiyoyin ci gaba guda 10 a duk faɗin ƙasar, tare da ƙari a cikinsu 1000 developers.

Muna aiki tare da PostgreSQL tun daga 2008 kuma mun tara adadi mai yawa na abin da muke aiwatarwa - bayanan abokin ciniki, ƙididdiga, nazari, bayanai daga tsarin bayanan waje - fiye da 400TB. Akwai kusan sabobin 250 a samarwa kadai, kuma a cikin duka akwai kusan sabar bayanai 1000 da muke saka idanu.

Babban haɓakawa na tambayoyin PostgreSQL. Kirill Borovikov (Tensor)

SQL harshe ne mai bayyanawa. Kuna bayyana ba "yadda" wani abu ya kamata yayi aiki ba, amma "abin da" kuke son cimmawa. DBMS ya fi sanin yadda ake yin JOIN - yadda ake haɗa tebur ɗinku, waɗanne yanayi don sanyawa, abin da zai shiga cikin fihirisar, menene ba zai ...

Wasu DBMS sun yarda da alamu: "A'a, haɗa waɗannan tebur biyu a cikin irin wannan layin," amma PostgreSQL ba zai iya yin wannan ba. Wannan shine sahihan matsayin jagororin masu haɓakawa: "Mun gwammace mu gama mafi inganta binciken fiye da ƙyale masu haɓakawa suyi amfani da wasu nau'ikan alamu."

Amma, duk da cewa PostgreSQL ba ya ƙyale "a waje" don sarrafa kanta, yana ba da izini daidai ga abinda ke faruwa a cikinsalokacin da kake gudanar da tambayarka, da kuma inda yake samun matsala.

Babban haɓakawa na tambayoyin PostgreSQL. Kirill Borovikov (Tensor)

Gabaɗaya, wadanne matsaloli na yau da kullun mai haɓakawa [zuwa DBA] yakan zo da su? “A nan mun biya bukata, kuma komai yana sannu a hankali tare da mu, komai yana rataye, wani abu yana faruwa... Wani irin matsala!”

Dalilan kusan koyaushe iri ɗaya ne:

  • algorithm mara inganci
    Developer: "Yanzu ina ba shi tebur 10 a cikin SQL ta hanyar JOIN ..." - kuma yana tsammanin cewa yanayinsa zai kasance cikin mu'ujiza "ba a kwance ba" kuma zai sami komai da sauri. Amma abubuwan al'ajabi ba sa faruwa, kuma duk wani tsarin da ke da irin wannan bambancin (Tables 10 a cikin ɗaya DAGA) koyaushe yana ba da wani nau'in kuskure. [labarin]
  • ƙididdiga marasa mahimmanci
    Wannan batu yana da matukar dacewa musamman ga PostgreSQL, lokacin da kuka "zuba" babban bayanan bayanai akan uwar garken, yi buƙatu, kuma yana "sexcanits" kwamfutar hannu. Domin jiya akwai rubuce-rubuce 10 a ciki, kuma a yau akwai miliyan 10, amma har yanzu PostgreSQL bai san wannan ba, kuma muna buƙatar gaya masa game da shi. [labarin]
  • "toshe" a kan albarkatun
    Kun shigar da babban bayanan bayanai masu nauyi akan sabar mara ƙarfi wacce ba ta da isasshen faifai, ƙwaƙwalwa, ko aikin sarrafawa. Kuma shi ke nan ... Wani wuri akwai rufin wasan kwaikwayo a sama wanda ba za ku iya tsalle ba.
  • tarewa
    Wannan batu ne mai wahala, amma sun fi dacewa ga tambayoyin gyara daban-daban (SAKA, KYAUTA, GAME) - wannan babban jigo ne daban.

Samun tsari

...Kuma ga komai mu bukatar tsari! Muna buƙatar ganin abin da ke faruwa a cikin uwar garken.

Babban haɓakawa na tambayoyin PostgreSQL. Kirill Borovikov (Tensor)

Shirin aiwatar da tambaya don PostgreSQL itace bishiyar algorithm aiwatar da tambaya a cikin wakilcin rubutu. Daidai ne algorithm wanda, sakamakon binciken da mai tsarawa ya yi, an gano shi ne mafi tasiri.

Kowane kullin bishiya aiki ne: maido da bayanai daga tebur ko fihirisa, gina bitmap, haɗa teburi biyu, haɗawa, tsaka-tsaki, ko ban da zaɓi. Aiwatar da tambaya ya ƙunshi tafiya ta nodes na wannan bishiyar.

Don samun shirin tambaya, hanya mafi sauƙi ita ce aiwatar da bayanin EXPLAIN. Don samun tare da duk ainihin halayen, wato, aiwatar da tambaya a kan tushe - EXPLAIN (ANALYZE, BUFFERS) SELECT ....

Sashin mara kyau: lokacin da kake gudanar da shi, yana faruwa "a nan da yanzu", don haka kawai ya dace da gyaran gida. Idan ka ɗauki uwar garken da aka yi lodi sosai wanda ke ƙarƙashin sauye-sauyen bayanai masu ƙarfi, kuma ka ga: “Oh! Anan muna da sannu a hankalisya roqon." Rabin sa'a, awa daya da suka wuce - yayin da kuke gudana kuma kuna samun wannan buƙatar daga rajistan ayyukan, dawo da shi zuwa uwar garken, duk bayananku da ƙididdiga sun canza. Kuna gudanar da shi don gyara kuskure - kuma yana aiki da sauri! Kuma ba za ku iya fahimtar dalilin ba, me yasa ya kasance sannu a hankali.

Babban haɓakawa na tambayoyin PostgreSQL. Kirill Borovikov (Tensor)

Don fahimtar abin da ya faru daidai lokacin da aka aiwatar da buƙatar akan sabar, mutane masu wayo sun rubuta auto_explain module. Yana nan a kusan duk mafi yawan rabawa na PostgreSQL, kuma ana iya kunna shi kawai a cikin fayil ɗin daidaitawa.

Idan ta gane cewa wasu buƙatun suna yin tsayi fiye da iyakar da kuka gaya mata, yana yi "Hoton hoto" na shirin wannan buƙatar kuma ya rubuta su tare a cikin log ɗin.

Babban haɓakawa na tambayoyin PostgreSQL. Kirill Borovikov (Tensor)

Komai yana da kyau yanzu, mun je gungumen kuma mu ga can... [rubutun ƙafar ƙafa]. Amma ba za mu iya cewa komai game da shi ba, ban da gaskiyar cewa kyakkyawan tsari ne saboda ya ɗauki 11ms don aiwatarwa.

Komai yana da kyau - amma babu abin da ya bayyana ainihin abin da ya faru. Ban da lokaci na gaba ɗaya, ba mu ga wani abu da gaske. Domin kallon irin wannan “rago” na rubutu a sarari gabaɗaya ba na gani ba ne.

Amma ko da ba a bayyane yake ba, koda kuwa ba shi da kyau, akwai ƙarin matsalolin asali:

  • Kumburi ya nuna jimlar albarkatun dukan subtree karkashinsa. Wato, ba za ku iya gano nawa ne lokacin da aka kashe akan wannan Fihirisar Fihirisar ba idan akwai wani yanayi na gida a ƙarƙashinsa. Dole ne mu duba sosai don ganin ko akwai "ya'ya" da masu canjin yanayi, CTEs a ciki - kuma mu cire duk wannan "a cikin zukatanmu".
  • Batu na biyu: lokacin da aka nuna akan kumburi shine lokacin aiwatar da kumburi guda ɗaya. Idan an aiwatar da wannan kumburin sakamakon, alal misali, madauki ta hanyar rikodin tebur sau da yawa, to adadin madaukai - zagayowar wannan kumburi - yana ƙaruwa a cikin shirin. Amma lokacin aiwatar da atomic da kansa ya kasance iri ɗaya ne dangane da tsari. Wato, don fahimtar tsawon lokacin da aka yi wannan kullin gaba ɗaya, kuna buƙatar ninka abu ɗaya da wani - kuma, "a cikin kanku."

A irin waɗannan yanayi, fahimtar "Wane ne mafi raunin hanyar haɗin gwiwa?" kusan ba zai yiwu ba. Saboda haka, har ma masu haɓakawa da kansu suna rubuta a cikin "manual" cewa "Fahimtar tsari fasaha ce da dole ne a koya, gogewa...".

Amma muna da masu haɓakawa 1000, kuma ba za ku iya isar da wannan ƙwarewar ga kowane ɗayansu ba. Ni, ku, ya sani, amma wani a can bai sani ba. Wataƙila zai koya, ko kuma a'a, amma yana buƙatar yin aiki a yanzu - kuma a ina zai sami wannan ƙwarewar?

Tsarin hangen nesa

Saboda haka, mun gane cewa don magance waɗannan matsalolin, muna bukata kyakkyawan hangen nesa na shirin. [labarin]

Babban haɓakawa na tambayoyin PostgreSQL. Kirill Borovikov (Tensor)

Mun fara tafiya "ta kasuwa" - bari mu duba Intanet don ganin abin da yake.

Amma ya juya cewa akwai 'yan kaɗan in mun gwada da "rayuwa" mafita waɗanda ke haɓaka ko žasa - a zahiri, ɗaya kawai: bayyana.depesz.com da Hubert Lubaczewski. Lokacin da ka shigar da filin "feed" wakilcin rubutu na shirin, yana nuna maka tebur tare da bayanan da aka tantance:

  • kumburin kansa lokacin sarrafawa
  • jimlar lokaci ga dukan subtree
  • adadin bayanan da aka dawo dasu wadanda aka yi tsammanin kididdigar
  • jikin kumburin kanta

Wannan sabis ɗin kuma yana da ikon raba tarihin hanyoyin haɗin gwiwa. Kun jefa shirin ku a wurin kuma ku ce: "Hey, Vasya, ga hanyar haɗin gwiwa, akwai wani abu ba daidai ba a can."

Babban haɓakawa na tambayoyin PostgreSQL. Kirill Borovikov (Tensor)

Amma kuma akwai ƙananan matsaloli.

Na farko, adadi mai yawa na "kwafin-manna". Za ku ɗauki guntun guntun, ku manne shi a ciki, da sake, da sake.

Abu na biyu, babu nazarin adadin bayanan da aka karanta - guda buffers cewa fitarwa EXPLAIN (ANALYZE, BUFFERS), ba mu gani a nan. Shi dai bai san yadda zai tarwatsa su ba, ya fahimce su da aiki da su. Lokacin da kuke karanta bayanai da yawa kuma ku gane cewa kuna iya yin kuskuren ɓata faifan diski da cache, wannan bayanin yana da mahimmanci.

Batu na uku mara kyau shine raunin ci gaban wannan aikin. Ayyukan suna da ƙananan ƙananan, yana da kyau idan sau ɗaya a kowane watanni shida, kuma lambar tana cikin Perl.

Babban haɓakawa na tambayoyin PostgreSQL. Kirill Borovikov (Tensor)

Amma wannan duk “waƙoƙi ne”, ko ta yaya za mu iya rayuwa tare da wannan, amma akwai abu ɗaya da ya kawar da mu daga wannan sabis ɗin. Waɗannan kurakurai ne a cikin nazarin Maganar Tebu gama gari (CTE) da nau'ikan nodes masu ƙarfi kamar InitPlan/SubPlan.

Idan kun yi imani da wannan hoton, to jimlar lokacin aiwatar da kowane kumburi ya fi yawan lokacin aiwatar da buƙatun gabaɗayan. Yana da sauki - Ba a rage lokacin tsara wannan CTE ba daga kumburin CTE Scan. Saboda haka, ba mu ƙara sanin amsar daidai ba na tsawon lokacin da CTE scan kanta ya ɗauka.

Babban haɓakawa na tambayoyin PostgreSQL. Kirill Borovikov (Tensor)

Sai muka gane cewa lokaci ya yi da za mu rubuta namu - sauri! Kowane mai haɓaka yana cewa: "Yanzu za mu rubuta namu, zai zama da sauƙi sosai!"

Mun ɗauki nau'i-nau'i na yau da kullun don ayyukan gidan yanar gizo: ainihin tushen Node.js + Express, da aka yi amfani da Bootstrap da D3.js don kyawawan zane-zane. Kuma tsammaninmu ya tabbata - mun sami samfurin farko a cikin makonni 2:

  • al'ada shirin parser
    Wato, yanzu za mu iya tantance kowane shiri daga waɗanda PostgreSQL ya haifar.
  • daidai bincike na tsauri nodes - CTE Scan, InitPlan, SubPlan
  • nazarin rarraba buffers - inda ake karanta shafukan bayanai daga ƙwaƙwalwar ajiya, inda daga cache na gida, inda daga faifai
  • ya samu haske
    Don kada a "tono" duk wannan a cikin log ɗin, amma don ganin "hanyar mafi rauni" nan da nan a cikin hoton.

Babban haɓakawa na tambayoyin PostgreSQL. Kirill Borovikov (Tensor)

Mun sami wani abu kamar wannan, tare da nuna alama an haɗa. Amma yawanci masu haɓaka mu ba sa aiki tare da cikakken wakilcin shirin, amma tare da ɗan guntu. Bayan haka, mun riga mun rarraba dukkan lambobin kuma mun jefa su hagu da dama, kuma a tsakiya mun bar layin farko kawai, wane nau'i ne: CTE Scan, CTE generation ko Seq Scan bisa ga wata alamar.

Wannan ita ce gajeriyar wakilci da muke kira tsarin tsari.

Babban haɓakawa na tambayoyin PostgreSQL. Kirill Borovikov (Tensor)

Menene kuma zai dace? Zai zama dacewa don ganin wane rabo na jimlar lokacinmu aka ware wa wane kumburi - kuma kawai "manne shi" a gefe. zane.

Muna nunawa a kumburi kuma mu gani - ya bayyana cewa Seq Scan ya ɗauki ƙasa da kwata na jimlar lokacin, kuma CTE Scan ya ɗauki sauran 3/4. Abin tsoro! Wannan ƙaramin bayanin kula ne game da “ƙimar wuta” na CTE Scan idan kuna amfani da su sosai a cikin tambayoyinku. Ba su da sauri sosai - sun yi ƙasa da ko da na yau da kullun na sikanin tebur. [labarin] [labarin]

Amma yawanci irin waɗannan zane-zane sun fi ban sha'awa, sun fi rikitarwa, lokacin da muka nuna kai tsaye a wani yanki kuma mu ga, alal misali, fiye da rabin lokacin wasu Seq Scan "ci". Bugu da ƙari, akwai wani nau'i na Filter a ciki, an yi watsi da yawancin rikodin bisa ga shi ... Kuna iya jefa wannan hoton kai tsaye ga mai haɓakawa kuma ku ce: "Vasya, duk abin da ke da kyau a gare ku! Gane shi, duba - wani abu ba daidai ba ne!"

Babban haɓakawa na tambayoyin PostgreSQL. Kirill Borovikov (Tensor)

A zahiri, akwai wasu “rakes” da ke tattare da su.

Abu na farko da muka samu shine matsalar zagaye. Ana nuna lokacin kowane kumburi a cikin shirin tare da daidaiton 1 μs. Kuma lokacin da adadin kumburin kumburi ya wuce, alal misali, 1000 - bayan aiwatarwa PostgreSQL ya raba “cikin daidaito”, sannan lokacin kirga baya zamu sami jimlar lokaci “wani wuri tsakanin 0.95ms da 1.05ms”. Lokacin da ƙididdigewa ya tafi microseconds, hakan yayi kyau, amma lokacin da ya riga ya [milli] daƙiƙa, dole ne kuyi la'akari da wannan bayanin lokacin da ake "kwance" albarkatun zuwa ga nodes na shirin "wanda ya cinye nawa".

Babban haɓakawa na tambayoyin PostgreSQL. Kirill Borovikov (Tensor)

Batu na biyu, mafi rikitarwa, shine rarraba albarkatu (waɗannan buffers) tsakanin nodes masu ƙarfi. Wannan ya kashe mana makonni 2 na farkon samfurin tare da wasu makonni 4.

Yana da sauƙin samun irin wannan matsalar - muna yin CTE kuma muna tsammanin karanta wani abu a ciki. A zahiri, PostgreSQL “mai wayo ne” kuma ba zai karanta wani abu kai tsaye a can ba. Sa'an nan kuma mu ɗauki rikodin farko daga gare shi, kuma zuwa gare shi ɗari da na farko daga CTE ɗaya.

Babban haɓakawa na tambayoyin PostgreSQL. Kirill Borovikov (Tensor)

Muna duban shirin kuma mu fahimta - abin ban mamaki ne, muna da buffers guda 3 (shafukan bayanai) “cinyewa” a cikin Seq Scan, ƙarin 1 a cikin CTE Scan, da ƙari 2 a cikin CTE Scan na biyu. Wato, idan muka taƙaita komai kawai, za mu sami 6, amma daga kwamfutar hannu mun karanta 3 kawai! CTE Scan ba ya karanta komai daga ko'ina, amma yana aiki kai tsaye tare da ƙwaƙwalwar tsari. Wato, wani abu a bayyane yake ba daidai ba a nan!

A hakikanin gaskiya, ga duk wadannan shafuka 3 na bayanan da aka nema daga Seq Scan, na farko ya nemi CTE Scan na 1, sannan na 1, kuma an karanta masa wasu 2. Ma'ana, jimlar. An karanta bayanan shafuka 2, ba 3 ba.

Babban haɓakawa na tambayoyin PostgreSQL. Kirill Borovikov (Tensor)

Kuma wannan hoton ya kai mu ga fahimtar cewa aiwatar da shirin ba itace ba ne, amma kawai wani nau'i ne na acyclic graph. Kuma mun sami zane kamar wannan, don mu fahimci "abin da ya fito daga ina da farko." Wato a nan mun kirkiro CTE daga pg_class, kuma mun nemi sau biyu, kuma kusan duk lokacinmu ya ƙare a reshe lokacin da muka nemi shi karo na 2. A bayyane yake cewa karatun shigarwa na 101 ya fi tsada fiye da karanta shigarwar 1st daga kwamfutar hannu.

Babban haɓakawa na tambayoyin PostgreSQL. Kirill Borovikov (Tensor)

Muka fitar da numfashi na dan wani lokaci. Suka ce: “Yanzu Neo, kun san kung fu! Yanzu kwarewarmu tana kan allon ku. Yanzu za ku iya amfani da shi." [labarin]

Ƙarfafa log

Masu haɓakawa 1000 sun numfasa. Amma mun fahimci cewa kawai muna da ɗaruruwan sabobin "yaƙin", kuma duk wannan "kwafin-manna" a ɓangaren masu haɓaka ba su dace ba. Mun gane cewa dole ne mu tattara da kanmu.

Babban haɓakawa na tambayoyin PostgreSQL. Kirill Borovikov (Tensor)

Gabaɗaya, akwai madaidaicin module wanda zai iya tattara ƙididdiga, duk da haka, yana buƙatar kunna shi a cikin saitin - wannan. module pg_stat_statements. Amma bai dace da mu ba.

Da fari dai, yana ba da tambayoyi iri ɗaya ta amfani da tsare-tsare daban-daban a cikin rumbun adana bayanai iri ɗaya daban-daban QueryIds. Wato idan kun fara yi SET search_path = '01'; SELECT * FROM user LIMIT 1;sa'an nan kuma SET search_path = '02'; kuma irin wannan bukata, to, kididdigar wannan module din za ta kasance da bayanai daban-daban, kuma ba zan iya tattara kididdiga na gaba daya musamman a cikin mahallin wannan bukatu ba, ba tare da la'akari da tsare-tsare ba.

Batu na biyu da ya hana mu amfani da shi shi ne rashin tsare-tsare. Wato babu wani shiri, kawai bukatar da kanta. Mun ga abin da ke raguwa, amma ba mu fahimci dalilin ba. Kuma a nan za mu koma ga matsalar canjin bayanai da sauri.

Kuma lokacin ƙarshe - rashin "gaskiya". Wato, ba za ku iya magance takamaiman misali na aiwatar da tambaya ba - babu ko ɗaya, akwai ƙididdiga tara kawai. Ko da yake yana yiwuwa a yi aiki tare da wannan, yana da wahala kawai.

Babban haɓakawa na tambayoyin PostgreSQL. Kirill Borovikov (Tensor)

Saboda haka, mun yanke shawarar yaƙar kwafin-paste kuma muka fara rubutu mai tara kaya.

Mai tarawa yana haɗa ta hanyar SSH, yana kafa amintaccen haɗi zuwa uwar garken tare da bayanan bayanai ta amfani da takaddun shaida, kuma tail -F "manne" da shi a cikin log file. Don haka a wannan zaman muna samun cikakken "duba" na dukan log fayil, wanda uwar garken ke haifarwa. Kayan da ke kan uwar garken da kansa ba shi da yawa, saboda ba mu rarraba wani abu a can ba, muna kawai madubi zirga-zirga.

Tun da mun riga mun fara rubuta abin dubawa a Node.js, mun ci gaba da rubuta mai tarawa a ciki. Kuma wannan fasaha ta tabbatar da kanta, saboda yana da matukar dacewa don amfani da JavaScript don aiki tare da bayanan rubutu mara ƙarfi, wanda shine log. Kuma Node.js kayayyakin more rayuwa da kanta a matsayin dandali na baya yana ba ku damar aiki cikin sauƙi da dacewa tare da haɗin yanar gizo, kuma hakika tare da kowane rafukan bayanai.

Saboda haka, muna "miƙe" haɗin gwiwa guda biyu: na farko don "saurara" ga log ɗin kanta kuma mu kai shi ga kanmu, na biyu kuma mu tambayi tushe lokaci-lokaci. "Amma log ɗin ya nuna cewa an katange alamar tare da oid 123," amma wannan ba ya nufin wani abu ga mai haɓakawa, kuma yana da kyau a tambayi bayanan, "Mene ne OID = 123 ko ta yaya?" Don haka muna tambayar tushe lokaci-lokaci abin da ba mu sani ba game da kanmu.

Babban haɓakawa na tambayoyin PostgreSQL. Kirill Borovikov (Tensor)

"Akwai abu daya ne kawai ba ku yi la'akari da shi ba, akwai nau'in kudan zuma masu kama da giwa! ..." Mun fara haɓaka wannan tsarin lokacin da muke son saka idanu akan sabobin 10. Mafi mahimmanci a cikin fahimtarmu, inda wasu matsalolin suka taso waɗanda suke da wuyar magance su. Amma a cikin kwata na farko, mun sami ɗari don saka idanu - saboda tsarin ya yi aiki, kowa yana son shi, kowa yana jin dadi.

Duk wannan yana buƙatar ƙarawa, kwararar bayanai yana da girma kuma yana aiki. A gaskiya ma, abin da muke sa ido, abin da za mu iya magance shi ne abin da muke amfani da shi. Muna kuma amfani da PostgreSQL azaman ajiyar bayanai. Kuma babu abin da ya fi sauri don "zuba" bayanai a ciki fiye da mai aiki COPY Tukuna.

Amma kawai "zuba" bayanai ba ainihin fasahar mu ba ne. Domin idan kana da kusan buƙatun 50k a sakan daya akan uwar garken ɗari, to wannan zai samar da 100-150GB na log ɗin kowace rana. Saboda haka, dole ne mu "yanke" tushe a hankali.

Na farko, mun yi rabuwa da rana, saboda, gabaɗaya, babu wanda ke sha'awar alaƙar tsakanin kwanaki. Menene bambanci ya haifar da abin da kuke da shi jiya, idan yau da dare kun fitar da sabon sigar aikace-aikacen - kuma tuni wasu sabbin ƙididdiga.

Na biyu, mun koya (an tilasta) sosai, da sauri don rubuta ta amfani da COPY. Wato ba kawai ba COPYsaboda ya fi sauri INSERT, har ma da sauri.

Babban haɓakawa na tambayoyin PostgreSQL. Kirill Borovikov (Tensor)

Batu na uku - dole ne watsi da abubuwan jan hankali, bi da bi, da maɓallan ƙasashen waje. Ma'ana, ba mu da cikakkiyar daidaiton ra'ayi kwata-kwata. Domin idan kana da tebur mai nau'in FK guda biyu, kuma ka ce a cikin tsarin bayanai cewa "Ga rikodin log ɗin da FK ya yi nuni da shi, misali, zuwa rukunin faifai," to idan kun saka shi, PostgreSQL. babu abin da ya rage sai yadda za a dauka a yi shi da gaskiya SELECT 1 FROM master_fk1_table WHERE ... tare da mai ganowa da kuke ƙoƙarin sakawa - kawai don bincika cewa wannan rikodin yana can, cewa ba ku “karye” wannan Maɓalli na waje tare da saka ku.

Maimakon rikodin guda ɗaya zuwa teburin da aka yi niyya da maƙasudinsa, muna samun ƙarin fa'idar karantawa daga duk teburin da yake magana akai. Amma ba ma buƙatar wannan kwata-kwata - aikinmu shine yin rikodin gwargwadon yiwuwar kuma da sauri tare da ƙaramin nauyi. So FK - kasa!

Batu na gaba shine tarawa da hashing. Da farko, mun aiwatar da su a cikin database - bayan haka, yana da dacewa don nan da nan, lokacin da rikodin ya zo, yi shi a cikin wani nau'in kwamfutar hannu. "da daya" daidai a cikin fararwa. To, yana da dacewa, amma abu mara kyau - kuna saka rikodin guda ɗaya, amma an tilasta ku karantawa da rubuta wani abu daga wani tebur. Bugu da ƙari, ba kawai kuna karantawa da rubutu ba, kuna kuma yin shi kowane lokaci.

Yanzu yi tunanin cewa kuna da tebur wanda kawai zaku ƙidaya adadin buƙatun da suka wuce ta takamaiman mai masaukin baki: +1, +1, +1, ..., +1. Kuma ku, bisa manufa, ba ku buƙatar wannan - duk yana yiwuwa jimlar a ƙwaƙwalwar ajiya akan mai tarawa kuma aika zuwa ga database a tafi daya +10.

Ee, a cikin wasu matsalolin, amincin ku na ma'ana yana iya "fadi", amma wannan lamari ne kusan rashin gaskiya - saboda kuna da sabar al'ada, tana da baturi a cikin mai sarrafawa, kuna da log ɗin ciniki, log akan tsarin fayil ... Gaba ɗaya, ba shi da daraja. Asarar yawan aiki da kuke samu daga masu kunnawa / FK bai cancanci kuɗin da kuke jawowa ba.

Haka yake da hashing. Wata buƙata ta tashi zuwa gare ku, za ku ƙididdige wani mai ganowa daga gare ta a cikin ma'ajin bayanai, rubuta shi zuwa bayanan bayanan sannan ku gaya wa kowa. Komai yana da kyau har sai lokacin yin rikodi, mutum na biyu ya zo wurin ku wanda ke son yin rikodin abu ɗaya - kuma an toshe ku, kuma wannan ya riga ya yi kyau. Saboda haka, idan za ka iya canja wurin ƙarni na wasu ID ga abokin ciniki (dangi ga database), shi ne mafi alhẽri yin haka.

Daidai ne kawai a gare mu mu yi amfani da MD5 daga rubutu - buƙatu, tsari, samfuri, ... Muna ƙididdige shi a gefen mai tarawa, kuma "zuba" ID ɗin da aka shirya a cikin bayanan. Tsawon MD5 da rarrabuwar yau da kullun suna ba mu damar damuwa game da yuwuwar karo.

Babban haɓakawa na tambayoyin PostgreSQL. Kirill Borovikov (Tensor)

Amma don yin rikodin duk wannan da sauri, muna buƙatar gyara tsarin rikodi da kanta.

Ta yaya kuke yawan rubuta bayanai? Muna da wani nau'i na bayanai, muna raba shi zuwa teburi da yawa, sannan mu kwafi shi - na farko zuwa na farko, sannan zuwa na biyu, zuwa na uku ... Ba shi da kyau, saboda kamar muna rubuta rafin bayanai guda uku cikin matakai uku. a jere. mara dadi. Za a iya yi da sauri? Can!

Don yin wannan, ya isa kawai don lalata waɗannan kwararar ruwa a cikin layi daya da juna. Ya bayyana cewa muna da kurakurai, buƙatun, samfuri, toshewa, ... tashi a cikin zaren daban - kuma muna rubuta shi duka a layi daya. Isa wannan ci gaba da buɗe tashar COPY koyaushe ga kowane tebur da aka yi niyya.

Babban haɓakawa na tambayoyin PostgreSQL. Kirill Borovikov (Tensor)

Wato wajen mai tarawa ko da yaushe akwai rafi, wanda zan iya rubuta bayanan da nake buƙata. Amma ta yadda ma’adanar bayanai ta ga wadannan bayanai, kuma wani bai makale ba yana jiran a rubuta wannan bayanan. Dole ne a katse COPY a wasu tazara. A gare mu, lokacin mafi inganci shine kusan 100ms - muna rufe shi kuma nan da nan muka sake buɗe shi zuwa tebur guda. Kuma idan ba mu da isasshen kwararar guda ɗaya yayin wasu kololuwa, to muna yin haɗuwa har zuwa ƙayyadaddun iyaka.

Bugu da ƙari, mun gano cewa don irin wannan bayanin martaba, duk wani haɗuwa, lokacin da aka tattara bayanai a cikin batches, mugunta ne. Classic mugunta ne INSERT ... VALUES da kuma karin bayanai 1000. Domin a wannan lokacin kuna da kololuwar rubutu akan kafofin watsa labarai, kuma duk wanda ke ƙoƙarin rubuta wani abu a faifan zai jira.

Don kawar da irin wannan anomalies, kawai kar a tara komai, kar a ajiye komai. Kuma idan buffer zuwa faifai ya faru (an yi sa'a, Stream API a Node.js yana ba ku damar ganowa) - jinkirta wannan haɗin. Lokacin da kuka sami taron cewa yana da kyauta kuma, rubuta masa daga jerin gwanon da aka tara. Kuma yayin da yake aiki, ɗauki na gaba kyauta daga tafkin kuma rubuta zuwa gare shi.

Kafin gabatar da wannan hanyar yin rikodin bayanai, muna da kusan 4K rubuta ops, kuma ta wannan hanyar mun rage nauyin da sau 4. Yanzu sun kara girma sau 6 saboda sabbin bayanan kula da bayanai - har zuwa 100MB/s. Kuma yanzu muna adana rajistan ayyukan na watanni 3 na ƙarshe a cikin adadin kusan 10-15TB, muna fatan cewa a cikin watanni uku kawai kowane mai haɓakawa zai iya magance kowace matsala.

Mun fahimci matsalolin

Amma kawai tattara duk waɗannan bayanan yana da kyau, amfani, dacewa, amma bai isa ba - yana buƙatar fahimtar. Domin waɗannan miliyoyin tsare-tsare ne daban-daban a kowace rana.

Babban haɓakawa na tambayoyin PostgreSQL. Kirill Borovikov (Tensor)

Amma miliyoyin ba za a iya sarrafa su ba, dole ne mu fara yin "ƙananan". Kuma, da farko, kuna buƙatar yanke shawarar yadda za ku tsara wannan "ƙananan" abu.

Mun gano muhimman abubuwa guda uku:

  • Hukumar Lafiya ta Duniya aika wannan bukata
    Wato, daga wane aikace-aikacen ya “shigo”: mahaɗar yanar gizo, bayanan baya, tsarin biyan kuɗi ko wani abu dabam.
  • inda ya faru
    A kan wane takamaiman uwar garken? Domin idan kuna da sabobin da yawa a ƙarƙashin aikace-aikacen guda ɗaya, kuma ba zato ba tsammani ɗayan “ya zama wawa” (saboda “disk ɗin ya lalace”, “memory leaked”, wata matsala), to kuna buƙatar magance uwar garken musamman.
  • yadda matsalar ta bayyana kanta ta wata hanya ko wata

Don fahimtar “wanda” ya aiko mana da buƙata, muna amfani da daidaitaccen kayan aiki - saita canjin zama: SET application_name = '{bl-host}:{bl-method}'; - muna aika sunan mai masaukin kasuwancin kasuwanci wanda buƙatun ke zuwa, da sunan hanyar ko aikace-aikacen da ya fara.

Bayan mun wuce "mai shi" na buƙatun, dole ne a fitar da shi zuwa log ɗin - don wannan muna saita mai canzawa. log_line_prefix = ' %m [%p:%v] [%d] %r %a'. Ga masu sha'awar, watakila duba cikin littafinme ake nufi da duka. Ya zama muna gani a cikin log ɗin:

  • время
  • tsari da masu gano ma'amala
  • suna database
  • IP na mutumin da ya aika wannan buƙatar
  • da sunan hanya

Babban haɓakawa na tambayoyin PostgreSQL. Kirill Borovikov (Tensor)

Sa'an nan kuma mun gane cewa ba shi da ban sha'awa sosai don duba ma'amala don buƙatu ɗaya tsakanin sabobin daban-daban. Ba sau da yawa cewa kuna da yanayin da aikace-aikacen ɗaya ke yin ƙulle daidai a nan da can ba. Amma ko da iri ɗaya ne, duba kowane ɗayan waɗannan sabobin.

To ga yanke "server daya - rana daya" sai ya zama ya ishe mu ga kowane bincike.

Sashin nazari na farko daya ne "samfurin" - taƙaitaccen nau'i na gabatar da shirin, wanda aka share daga duk alamomin lambobi. Yanke na biyu shine aikace-aikace ko hanya, kuma yanke na uku shine ƙayyadadden kumburin shirin wanda ya haifar mana da matsala.

Lokacin da muka matsa daga takamaiman misalai zuwa samfuri, mun sami fa'idodi guda biyu lokaci guda:

  • raguwa da yawa a cikin adadin abubuwa don bincike
    Dole ne mu bincika matsalar ba ta dubban tambayoyi ko tsare-tsare ba, amma ta ɗimbin samfuri.
  • tsarin lokaci
    Wato, ta hanyar taƙaita "gaskiya" a cikin wani sashe, za ku iya nuna bayyanar su a cikin rana. Kuma a nan za ku iya fahimtar cewa idan kuna da wani nau'i na abin da ke faruwa, alal misali, sau ɗaya a sa'a, amma ya kamata ya faru sau ɗaya a rana, ya kamata ku yi tunanin abin da ba daidai ba - wanda ya haifar da shi kuma me yasa, watakila ya kamata a nan. bai kamata ba. Wannan wata hanyar bincike ce wacce ba ta ƙididdigewa ba, na gani zalla.

Babban haɓakawa na tambayoyin PostgreSQL. Kirill Borovikov (Tensor)

Sauran hanyoyin sun dogara ne akan alamomin da muka ciro daga shirin: sau nawa irin wannan tsari ya faru, jimla da matsakaicin lokaci, adadin bayanai da aka karanta daga faifai, da nawa daga ƙwaƙwalwar ajiya ...

Domin, alal misali, kun zo shafin nazari don mai watsa shiri, duba - wani abu yana fara karantawa da yawa akan faifai. Faifan da ke kan uwar garken ba zai iya sarrafa shi ba - wa ya karanta daga ciki?

Kuma za ku iya warware ta kowace ginshiƙi kuma ku yanke shawarar abin da za ku yi aiki a yanzu - nauyin da ke kan processor ko faifai, ko jimlar buƙatun ... Mun tsara shi, muka kalli "saman", muka gyara shi kuma fitar da sabon sigar aikace-aikacen.
[laccar bidiyo]

Kuma nan da nan za ku iya ganin aikace-aikacen daban-daban waɗanda suka zo da samfuri iri ɗaya daga buƙatun kamar SELECT * FROM users WHERE login = 'Vasya'. Gaba, baya, sarrafawa... Kuma kuna mamakin dalilin da yasa sarrafa zai karanta mai amfani idan bai yi hulɗa da shi ba.

Kishiyar hanyar ita ce nan da nan a ga daga aikace-aikacen abin da yake yi. Misali, gaba shine wannan, wannan, wannan, kuma wannan sau ɗaya a cikin sa'a (lokacin lokaci yana taimakawa). Kuma tambayar nan da nan ta taso: kamar ba aikin gaba ba ne don yin wani abu sau ɗaya a sa'a ...

Babban haɓakawa na tambayoyin PostgreSQL. Kirill Borovikov (Tensor)

Bayan ɗan lokaci, mun gane cewa ba mu da tarin yawa kididdiga ta hanyar nodes na tsari. Mun ware daga tsare-tsaren kawai waɗancan nodes waɗanda ke yin wani abu tare da bayanan teburin da kansu (karanta / rubuta su ta index ko a'a). A haƙiƙa, an ƙara bangare ɗaya kawai dangane da hoton da ya gabata - rubuce-rubuce nawa wannan kumburin ya kawo mana?, da nawa aka jefar (Layukan Cire Ta Tace).

Ba ku da maƙasudin da ya dace a kan farantin, kuna yin buƙatu zuwa gare shi, yana tashi sama da index, ya faɗi cikin Seq Scan ... kun tace duk bayanan sai ɗaya. Me yasa kuke buƙatar tace bayanan 100M a kowace rana? Shin bai fi kyau a mirgine fihirisar ba?

Babban haɓakawa na tambayoyin PostgreSQL. Kirill Borovikov (Tensor)

Bayan nazarin duk kullin tsare-tsare ta kumburi, mun gane cewa akwai wasu sifofi na yau da kullun a cikin tsare-tsaren waɗanda ke da yuwuwar yin kama da shakku. Kuma yana da kyau a gaya wa mai haɓakawa: "Aboki, a nan ka fara karantawa ta hanyar index, sannan ka rarraba, sannan ka yanke" - a matsayin mai mulkin, akwai rikodin guda ɗaya.

Duk wanda ya rubuta tambaya tabbas ya ci karo da wannan tsarin: “Ba ni umarni na ƙarshe na Vasya, kwanan wata.” Kuma idan ba ku da fihirisa ta kwanan wata, ko kuma babu kwanan wata a cikin fihirisar da kuka yi amfani da ita, to za ku iya. taka a daidai wannan “rake” .

Amma mun san cewa wannan "rake" ne - don haka me zai hana a gaya wa mai haɓaka abin da ya kamata ya yi nan da nan. Saboda haka, sa’ad da muke buɗe wani shiri a yanzu, nan da nan maginin namu ya ga wani kyakkyawan hoto tare da nasihohi, inda nan da nan suka gaya masa: “Kana da matsaloli nan da can, amma ana magance su ta wannan hanya da kuma haka.”

Sakamakon haka, yawan ƙwarewar da ake buƙata don magance matsaloli a farkon kuma yanzu ya ragu sosai. Wannan shine irin kayan aikin da muke da shi.

Babban haɓakawa na tambayoyin PostgreSQL. Kirill Borovikov (Tensor)

source: www.habr.com

Add a comment