Ukwenza kahle ngenqwaba yemibuzo ye-PostgreSQL. Kirill Borovikov (Tensor)

Umbiko uveza izindlela ezithile ezivumelayo qapha ukusebenza kwemibuzo ye-SQL uma kunezigidi zayo ngosuku, futhi kunamakhulu amaseva e-PostgreSQL agadiwe.

Yiziphi izixazululo zobuchwepheshe ezisivumela ukuthi sicubungule ngempumelelo umthamo onjalo wolwazi, futhi lokhu kwenza kanjani impilo yonjiniyela ojwayelekile ibe lula?


Ubani onentshisekelo? ukuhlaziywa kwezinkinga ezithile kanye namasu ahlukahlukene okwenza kahle Imibuzo ye-SQL nokuxazulula izinkinga ezijwayelekile ze-DBA ku-PostgreSQL - ungakwazi futhi funda uchungechunge lwezihloko kulesi sihloko.

Ukwenza kahle ngenqwaba yemibuzo ye-PostgreSQL. Kirill Borovikov (Tensor)
Igama lami nginguKirill Borovikov, ngimele Inkampani ye-Tensor. Ikakhulukazi, ngisebenza ngokukhethekile ngemininingwane egciniwe enkampanini yethu.

Namuhla ngizokutshela ukuthi silungiselela kanjani imibuzo, lapho ungadingi "ukuhlukanisa" ukusebenza kombuzo owodwa, kodwa uxazulule inkinga ngobuningi. Lapho kunezigidi zezicelo, futhi udinga ukuthola ezinye izindlela zesixazululo inkinga enkulu le.

Ngokuvamile, i-Tensor yesigidi samakhasimende ethu injalo I-VLSI uhlelo lwethu lokusebenza: inethiwekhi yezokuxhumana yezinkampani, izixazululo zokuxhumana ngevidiyo, ukuhamba kwedokhumenti yangaphakathi nangaphandle, izinhlelo zokubala ze-accounting kanye nezindawo zokugcina impahla,... Okungukuthi, "i-megacombine" enjalo yokuphathwa kwebhizinisi okudidiyelwe, lapho kunamaphrojekthi angaphakathi ahlukene angaphezu kwe-100. .

Ukuqinisekisa ukuthi zonke zisebenza futhi zithuthuka ngendlela evamile, sinezikhungo zokuthuthukisa eziyi-10 ezweni lonke, neziningi kuzo 1000 onjiniyela.

Besisebenza ne-PostgreSQL kusukela ngo-2008 futhi siqongelele inani elikhulu lalokho esikucubungulayo - idatha yeklayenti, izibalo, ukuhlaziya, idatha evela ezinhlelweni zolwazi zangaphandle - ngaphezu kuka-400TB. Kunamaseva angaba ngu-250 ekukhiqizweni kuphela, futhi sekukonke kunamaseva esizindalwazi angaba ngu-1000 esiwaqaphayo.

Ukwenza kahle ngenqwaba yemibuzo ye-PostgreSQL. Kirill Borovikov (Tensor)

I-SQL iwulimi olumemezelayo. Awuchazi ukuthi "kanjani" into okufanele isebenze, kodwa "yini" ofuna ukuyizuza. I-DBMS iyazi kangcono indlela yokwenza i-JOIN - indlela yokuxhuma amatafula akho, yiziphi izimo okufanele uzibeke, yini ezodlula kunkomba, yini engeke...

Amanye ama-DBMS amukela izeluleko: “Cha, xhuma lawa mathebula amabili kulayini othize nonjalo,” kodwa i-PostgreSQL ayikwazi ukwenza lokhu. Lesi yisimo sokuqaphela sonjiniyela abaholayo: "Sincamela ukuqeda isilungiseleli semibuzo kunokuvumela onjiniyela ukuthi basebenzise uhlobo oluthile lwamacebiso."

Kodwa, naphezu kweqiniso lokuthi i-PostgreSQL ayikuvumeli "okungaphandle" ukuzilawula, ivumela ngokuphelele abone ukuthi kwenzekani ngaphakathi kuyeuma uqhuba umbuzo wakho, nalapho unezinkinga khona.

Ukwenza kahle ngenqwaba yemibuzo ye-PostgreSQL. Kirill Borovikov (Tensor)

Sekukonke, yiziphi izinkinga zakudala umthuthukisi [oya ku-DBA] avame ukuza nazo? “Lapha safeza isicelo, futhi konke kuhamba kancane ngathi, konke kulenga, kukhona okwenzekayo... Uhlobo oluthile lwenkathazo!”

Izizathu zicishe zifane:

  • i-algorithm yombuzo engasebenzi
    Unjiniyela: "Manje ngimnika amatafula angu-10 ku-SQL nge-JOIN..." - futhi ulindele ukuthi izimo zakhe "zizokhululwa" ngokuyisimangaliso futhi uzothola yonke into ngokushesha. Kodwa izimangaliso azenzeki, futhi noma yiluphi uhlelo olunokuhlukahluka okunjalo (amatafula angu-10 kwelinye elithi FROM) lihlala linikeza uhlobo oluthile lwephutha. [indatshana]
  • izibalo ezingabalulekile
    Leli phuzu libaluleke kakhulu ku-PostgreSQL, lapho "uthela" idathasethi enkulu kuseva, wenza isicelo, futhi "i-excanits" ithebhulethi yakho. Ngoba izolo bekunamarekhodi ayi-10 kuwo, futhi namuhla kunezigidi eziyi-10, kepha i-PostgreSQL ayikakwazi lokhu, futhi sidinga ukuyitshela ngakho. [indatshana]
  • "plug" ezinsizeni
    Ufake isizindalwazi esikhulu nesilayishwe kakhulu kusiphakeli esibuthaka esingenayo idiski eyanele, inkumbulo, noma ukusebenza kwephrosesa. Futhi yilokho kuphela ... Endaweni ethile kukhona uphahla lokusebenza ngaphezulu ongeke usakwazi ukugxumela.
  • ukuvimba
    Leli yiphuzu elinzima, kodwa afaneleka kakhulu emibuzweni ehlukahlukene yokulungisa (FAKA, BUYEKEZA, SUSA) - lesi yisihloko esikhulu esihlukile.

Ukuthola uhlelo

...Nakho konke okunye thina ngidinga isu! Sidinga ukubona ukuthi kwenzekani ngaphakathi kweseva.

Ukwenza kahle ngenqwaba yemibuzo ye-PostgreSQL. Kirill Borovikov (Tensor)

Uhlelo lokwenziwa kombuzo lwe-PostgreSQL luyisihlahla se-algorithm yokwenza umbuzo ekumelelweni kombhalo. Yiyo kanye i-algorithm okuthi, ngenxa yokuhlaziywa komhleli, itholakale iphumelela kakhulu.

Indawo ngayinye yesihlahla iwumsebenzi: ukubuyisa idatha kuthebula noma inkomba, ukwakha i-bitmap, ukuhlanganisa amathebula amabili, ukujoyina, ukuphambana, noma ukungafaki okukhethiwe. Ukwenza umbuzo kuhlanganisa ukuhamba ezindaweni zalesi sihlahla.

Ukuze uthole uhlelo lombuzo, indlela elula ukwenza isitatimende EXPLAIN. Ukuze uthole zonke izici zangempela, okungukuthi, ukwenza umbuzo ngesisekelo - EXPLAIN (ANALYZE, BUFFERS) SELECT ....

Ingxenye embi: uma uyiqhuba, iyenzeka "lapha futhi manje", ngakho ifaneleka kuphela ukulungisa iphutha lendawo. Uma uthatha iseva elayishwe kakhulu engaphansi kokugeleza okunamandla kwezinguquko zedatha, futhi ubona: “Oh! Lapha sinokukhishwa okunensayosya isicelo." Isigamu sehora, ihora eledlule - ngenkathi uqhuba futhi uthola lesi sicelo kulogi, usibuyisela kuseva, yonke idathasethi yakho nezibalo zishintshile. Uyisebenzisela ukulungisa iphutha - futhi igijima ngokushesha! Futhi awukwazi ukuqonda ukuthi kungani, kungani kwaba kancane.

Ukwenza kahle ngenqwaba yemibuzo ye-PostgreSQL. Kirill Borovikov (Tensor)

Ukuze uqonde ukuthi yini eyenzekile ngaleso sikhathi lapho isicelo senziwa kuseva, abantu abahlakaniphile babhala auto_explain module. Ikhona cishe kukho konke okusatshalaliswa okuvame kakhulu kwe-PostgreSQL, futhi ingamane isebenze kufayela lokumisa.

Uma ibona ukuthi isicelo esithile sisebenza isikhathi eside kunomkhawulo obusitshele ukuthi siwenze, siyayenza “isifinyezo” sohlelo lwalesi sicelo futhi sizibhale ndawonye kulogi.

Ukwenza kahle ngenqwaba yemibuzo ye-PostgreSQL. Kirill Borovikov (Tensor)

Konke kubonakala kuhamba kahle manje, siya kulogi futhi sibone lapho... [indwangu yombhalo]. Kodwa akukho esingakusho ngakho, ngaphandle kokuthi wuhlelo oluhle kakhulu ngoba kuthathe ama-11ms ukulenza.

Konke kubonakala kuhamba kahle - kodwa akukho okucacile ukuthi empeleni kwenzekeni. Ngaphandle kwesikhathi esijwayelekile, asiboni lutho ngempela. Ngoba ukubuka “iwundlu” elinjalo lombhalo ongenalutho ngokuvamile akubonakali.

Kodwa ngisho noma kungabonakali, ngisho noma kungasebenzi, kunezinkinga ezibaluleke kakhulu:

  • I-node ikhombisa isamba sezinsiza zawo wonke umucu ongezansi ngaphansi kwakhe. Okusho ukuthi, awukwazi ukuthola nje ukuthi singakanani isikhathi esichithwe kulesi Sika Senkomba uma kunesimo esithile esifakwe ngaphansi kwaso. Kufanele sibheke ngamandla ukubona ukuthi zikhona yini “izingane” neziguquguqukayo ezinemibandela, ama-CTE ngaphakathi - futhi sisuse konke lokhu “ezingqondweni zethu”.
  • Iphuzu lesibili: isikhathi esikhonjiswe ku-node ngu isikhathi sokwenza i-node eyodwa. Uma le node yenziwa njengomphumela, isibonelo, iluphu ngokusebenzisa amarekhodi ethebula izikhathi eziningana, khona-ke inani lama-loops-imijikelezo yale node-liyanda ohlelweni. Kodwa isikhathi sokwenziwa kwe-athomu ngokwaso sihlala sifana ngokohlelo. Okungukuthi, ukuze uqonde ukuthi le node yenziwa isikhathi esingakanani, udinga ukuphindaphinda into eyodwa kwenye - futhi, "ekhanda lakho."

Ezimweni ezinjalo, qonda ukuthi "Ubani isixhumanisi esibuthakathaka?" cishe akunakwenzeka. Ngakho-ke, ngisho nabathuthukisi ngokwabo babhala "ebhukwana" lokho "Ukuqonda uhlelo kuwubuciko okufanele bufundwe, isipiliyoni...".

Kodwa sinabathuthukisi abangu-1000, futhi awukwazi ukudlulisela lokhu okuhlangenwe nakho komunye nomunye wabo. Mina, wena, uyazi, kodwa othile laphaya akasakwazi. Mhlawumbe uzofunda, noma mhlawumbe cha, kodwa udinga ukusebenza manje - futhi uzokutholaphi lokhu okuhlangenwe nakho?

Hlela ukubona ngeso lengqondo

Ngakho-ke, saqaphela ukuthi ukuze sibhekane nalezi zinkinga, sidinga ukubonwa okuhle kohlelo. [isihloko]

Ukwenza kahle ngenqwaba yemibuzo ye-PostgreSQL. Kirill Borovikov (Tensor)

Siqale sadlula “emakethe” - ake sibheke ku-inthanethi ukuze sibone ukuthi yini ekhona.

Kodwa kwavela ukuthi kunezixazululo ezimbalwa kakhulu "ezibukhoma" ezithuthuka kancane - ngokwezwi nezwi, eyodwa kuphela: chaza.depesz.com nguHubert Lubaczewski. Uma ufaka inkambu "yokuphakelayo" ukumelwa kombhalo wohlelo, ikubonisa ithebula elinedatha ehlukanisiwe:

  • isikhathi sokucubungula se-node
  • isikhathi esiphelele sawo wonke umucu ongezansi
  • inani lamarekhodi abuyisiwe abelindelwe ngokwezibalo
  • umzimba we-node ngokwawo

Le sevisi futhi inekhono lokwabelana ngengobo yomlando yezixhumanisi. Uphose icebo lakho lapho wathi: “Sawubona, Vasya, nasi isixhumanisi, kukhona okungalungile lapho.”

Ukwenza kahle ngenqwaba yemibuzo ye-PostgreSQL. Kirill Borovikov (Tensor)

Kodwa kukhona nezinkinga ezincane.

Okokuqala, inani elikhulu le "copy-paste". Uthatha ucezu logodo, ulunamathisele lapho, futhi futhi, futhi futhi.

Okwesibili, akukho ukuhlaziywa kwenani ledatha efundiwe - amabhafa afanayo aphumayo EXPLAIN (ANALYZE, BUFFERS), asiyiboni lapha. Umane akazi ukuthi angazihlakaza kanjani, aziqonde futhi asebenze nazo. Uma ufunda idatha eningi futhi uqaphela ukuthi ungahle uhlukanise ngokungalungile idiski kanye nenqolobane yememori, lolu lwazi lubaluleke kakhulu.

Iphuzu lesithathu elibi ukuthuthuka okubuthakathaka kakhulu kwale phrojekthi. Izibophezelo zincane kakhulu, kuhle uma kanye njalo ezinyangeni eziyisithupha, futhi ikhodi ikuPerl.

Ukwenza kahle ngenqwaba yemibuzo ye-PostgreSQL. Kirill Borovikov (Tensor)

Kodwa konke lokhu "izingoma", singaphila nalokhu ngandlela thize, kodwa kukhona into eyodwa eyasisusa kakhulu kule nkonzo. Lawa amaphutha ekuhlaziyweni kwe-Common Table Expression (CTE) namanodi ahlukahlukene ashukumisayo njenge-InitPlan/SubPlan.

Uma usikholelwa lesi sithombe, isikhathi sokwenza senodi ngayinye sikhulu kunesamba sesikhathi sokwenza saso sonke isicelo. Kulula - isikhathi sokukhiqiza sale CTE asisuswanga endaweni ye-CTE Scan. Ngakho-ke, asisayazi impendulo efanele yokuthi ukuskena kwe-CTE kwathatha isikhathi esingakanani.

Ukwenza kahle ngenqwaba yemibuzo ye-PostgreSQL. Kirill Borovikov (Tensor)

Sabe sesibona ukuthi kwase kuyisikhathi sokubhala ezethu - hurray! Wonke unjiniyela uthi: "Manje sizobhala ezethu, kuzoba lula kakhulu!"

Sithathe isitaki esijwayelekile samasevisi ewebhu: ingqikithi esekelwe ku-Node.js + Express, i-Bootstrap esetshenzisiwe kanye ne-D3.js yemidwebo emihle. Futhi ebesikulindele kwathethelelwa ngokugcwele - sithole isibonelo sokuqala emavikini angu-2:

  • umhlahleli wohlelo ngokwezifiso
    Okusho ukuthi, manje sesingakwazi ukuhlaziya noma yiluphi uhlelo kulawo akhiqizwe yi-PostgreSQL.
  • ukuhlaziya okulungile kwamanodi ashukumisayo - CTE Scan, InitPlan, SubPlan
  • ukuhlaziywa kokusatshalaliswa kwamabhafa - lapho amakhasi edatha afundwa kumemori, kuphi kusuka kunqolobane yendawo, kuphi kusuka kudiski
  • uthole ukucaciseleka
    Ukuze "ungambi" konke lokhu kulogi, kodwa ukuze ubone "isixhumanisi esibuthakathaka" ngokushesha esithombeni.

Ukwenza kahle ngenqwaba yemibuzo ye-PostgreSQL. Kirill Borovikov (Tensor)

Sithole okuthile okufana nalokhu, nokugqanyiswa kwe-syntax kufakiwe. Kodwa ngokuvamile abathuthukisi bethu abasasebenzi ngokumelela okuphelele kohlelo, kodwa okufushane. Phela, sesivele sihlukanise zonke izinombolo futhi ziphonse kwesokunxele nakwesokudla, futhi phakathi nendawo sishiye umugqa wokuqala kuphela, hlobo luni lwe-node: I-CTE Scan, isizukulwane se-CTE noma i-Seq Scan ngokusho kwesibonakaliso esithile.

Lesi yisifanekiso esifushanisiwe esisibizayo isifanekiso sepulani.

Ukwenza kahle ngenqwaba yemibuzo ye-PostgreSQL. Kirill Borovikov (Tensor)

Yini enye engaba lula? Kungaba lula ukubona ukuthi yisiphi isabelo sesikhathi sethu esiphelele esabelwe ukuthi iyiphi i-node - futhi nje "uyinamathisele" eceleni Ishadi likaphayi.

Sikhomba i-node futhi sibone - kuvela ukuthi i-Seq Scan ithathe isikhathi esingaphansi kwekota yesikhathi esiphelele, kanti u-3/4 osele uthathwe yi-CTE Scan. Okuthusayo! Leli yinothi elincane elimayelana “nezinga lomlilo” le-CTE Scan uma uwasebenzisa ngokuqhubekayo emibuzweni yakho. Azisheshi kakhulu - ziphansi ngisho nokuskena okujwayelekile kwetafula. [isihloko] [isihloko]

Kodwa ngokuvamile imidwebo enjalo iyathakazelisa kakhulu, iyinkimbinkimbi kakhulu, lapho sikhomba ngokushesha ingxenye futhi sibone, isibonelo, ukuthi ngaphezu kwesigamu sesikhathi ezinye ze-Seq Scan "zadla". Ngaphezu kwalokho, kwakukhona uhlobo oluthile Lokuhlunga ngaphakathi, amarekhodi amaningi alahlwa ngokusho kwalo ... Ungakwazi ukuphonsa lesi sithombe ngokuqondile kumthuthukisi bese uthi: "Vasya, konke kubi lapha kuwe! Zibonele, bheka - kukhona okungalungile!"

Ukwenza kahle ngenqwaba yemibuzo ye-PostgreSQL. Kirill Borovikov (Tensor)

Ngokwemvelo, kwakukhona "ama-rakes" athile ahilelekile.

Into yokuqala esiyitholile kwaba yinkinga yokuqoqa. Isikhathi se-node ngayinye ohlelweni sikhonjiswe ngokunemba kwe-1 μs. Futhi uma inani lemijikelezo ye-node lidlula, isibonelo, i-1000 - ngemva kokubulawa kwe-PostgreSQL ihlukaniswe “ngaphakathi kokunemba”, lapho-ke sibala sibuyela emuva sithola isikhathi esiphelele “ndawana-thile phakathi kuka-0.95ms no-1.05ms”. Uma isibalo siya kuma-microseconds, kulungile, kodwa uma sekungamasekhondi angu-[milli], kufanele ucabangele lolu lwazi lapho "ukhulula" izinsiza kumanodi wohlelo "lokudle kangakanani".

Ukwenza kahle ngenqwaba yemibuzo ye-PostgreSQL. Kirill Borovikov (Tensor)

Iphuzu lesibili, eliyinkimbinkimbi kakhulu, ukusatshalaliswa kwezinsiza (lezo zibhafa) phakathi kwamanodi ashukumisayo. Lokhu kusibize amaviki okuqala angu-2 esibonelo kanye namanye amaviki angu-4.

Kulula kakhulu ukuthola lolu hlobo lwenkinga - senza i-CTE futhi kuthiwa sifunda okuthile kuyo. Eqinisweni, i-PostgreSQL “ihlakaniphile” futhi ngeke ifunde lutho ngqo lapho. Khona-ke sithatha irekhodi lokuqala kuyo, futhi kulo ikhulu nelokuqala kusukela ku-CTE efanayo.

Ukwenza kahle ngenqwaba yemibuzo ye-PostgreSQL. Kirill Borovikov (Tensor)

Sibheka icebo futhi siqonde - kuyamangaza, sinamabhafa angu-3 (amakhasi edatha) "adliwe" ku-Seq Scan, 1 ngaphezulu ku-CTE Scan, kanye namanye angu-2 ku-CTE Scan yesibili. Okusho ukuthi, uma nje sifingqa yonke into, sizothola 6, kodwa kuthebhulethi sifunda 3 kuphela! I-CTE Scan ayifundi lutho noma yikuphi, kodwa isebenza ngokuqondile nememori yenqubo. Okungukuthi, kukhona okungahambi kahle lapha!

Eqinisweni, kuvele ukuthi nanka wonke lawo makhasi ama-3 edatha ayecelwe kwa-Seq Scan, okokuqala 1 wacela i-1st CTE Scan, kwase kuba eyesi-2, namanye angu-2. Okusho ukuthi, ingqikithi Amakhasi angu-3 afundiwe idatha, hhayi angu-6.

Ukwenza kahle ngenqwaba yemibuzo ye-PostgreSQL. Kirill Borovikov (Tensor)

Futhi lesi sithombe sisiholele ekuqondeni ukuthi ukuqaliswa kwecebo akuseyona isihlahla, kodwa kumane nje uhlobo oluthile lwegrafu ye-acyclic. Futhi sinomdwebo ofana nalo, ukuze siqonde “lokho okwavela lapho kwasekuqaleni.” Okusho ukuthi, lapha sakha i-CTE kusuka ku-pg_class, futhi siyicela kabili, futhi cishe sonke isikhathi sethu sachithwa egatsheni lapho siyicela isikhathi se-2. Kuyacaca ukuthi ukufunda i-101st entry kubiza kakhulu kunokufunda okufakiwe kwe-1 kusuka kuthebhulethi.

Ukwenza kahle ngenqwaba yemibuzo ye-PostgreSQL. Kirill Borovikov (Tensor)

Sadonsa umoya isikhashana. Bathi: “Manje, Neo, uyayazi i-kung fu! Manje umuzwa wethu usesikrinini sakho. Manje ungayisebenzisa." [isihloko]

Ukuhlanganiswa kwelogi

Onjiniyela bethu abayi-1000 baphefumulelwe. Kodwa saqonda ukuthi sinamakhulukhulu kuphela eziphakeli “zokulwa”, futhi konke lokhu “kukopisha-namathisela” konjiniyela akulula neze. Sabona ukuthi kufanele siziqoqele thina.

Ukwenza kahle ngenqwaba yemibuzo ye-PostgreSQL. Kirill Borovikov (Tensor)

Ngokuvamile, kunemojula ejwayelekile engaqoqa izibalo, nokho, futhi idinga ukwenziwa isebenze ku-config - lokhu module pg_stat_statements. Kodwa akasifanelanga.

Okokuqala, yabela imibuzo efanayo isebenzisa izikimu ezihlukene ngaphakathi kwesizindalwazi esifanayo Ama-QueryIds ahlukene. Okusho ukuthi, uma uqala ukwenza SET search_path = '01'; SELECT * FROM user LIMIT 1;bese kuthi SET search_path = '02'; kanye nesicelo esifanayo, khona-ke izibalo zale module zizoba namarekhodi ahlukene, futhi ngeke ngikwazi ukuqoqa izibalo ezijwayelekile ngokuqondile kumongo wale phrofayili yesicelo, ngaphandle kokucabangela izikimu.

Iphuzu lesibili elisivimbele ukuthi silisebenzise liwukuthi ukuntula izinhlelo. Okusho ukuthi, akukho cebo, kunesicelo kuphela uqobo. Siyabona ukuthi yini ebihamba kancane, kodwa asiqondi ukuthi kungani. Futhi lapha sibuyela enkingeni yedathasethi eshintsha ngokushesha.

Futhi umzuzu wokugcina - ukuntula "amaqiniso". Okusho ukuthi, awukwazi ukubhekana nesenzakalo esithile sokwenziwa kombuzo - asikho, kukhona izibalo ezihlanganisiwe kuphela. Nakuba kungenzeka ukusebenza nalokhu, kunzima kakhulu.

Ukwenza kahle ngenqwaba yemibuzo ye-PostgreSQL. Kirill Borovikov (Tensor)

Ngakho-ke, sanquma ukulwa ne-copy-paste futhi saqala ukubhala umqoqi.

Umqoqi uxhuma nge-SSH, usungula ukuxhumana okuphephile kuseva ngesizindalwazi kusetshenziswa isitifiketi, futhi tail -F "inamathela" kuyo kufayela lokungena. Ngakho kulesi seshini sithola "isibuko" esiphelele salo lonke ifayela lokungena, ekhiqizwa iseva. Umthwalo osesiphakelini ngokwawo mncane, ngoba asincozululi lutho lapho, simane silingisa ithrafikhi.

Njengoba sase siqalile ukubhala isikhombimsebenzisi ku-Node.js, saqhubeka nokubhala umqoqi kuyo. Futhi lobu buchwepheshe buziphendulele, ngoba kulula kakhulu ukusebenzisa i-JavaScript ukuze usebenze ngedatha yombhalo efomethwe kahle, okuyilogi. Futhi ingqalasizinda ye-Node.js ngokwayo njengenkundla ye-backend ikuvumela ukuthi usebenze kalula futhi kalula ngokuxhumanisa inethiwekhi, futhi ngempela nanoma yimiphi imifudlana yedatha.

Ngokufanelekile, "selula" ukuxhumana okubili: eyokuqala "ukulalela" ilogi ngokwayo futhi siziyise kithi, kanti okwesibili ukubuza isisekelo ngezikhathi ezithile. "Kodwa ilogi ibonisa ukuthi uphawu olune-oid 123 luvinjelwe," kodwa lokhu akusho lutho kunjiniyela, futhi kungaba kuhle ukubuza isizindalwazi, "Iyini i-OID = 123 noma kunjalo?" Futhi ngakho-ke ngezikhathi ezithile sibuza isisekelo lokho esingakakwazi ngathi.

Ukwenza kahle ngenqwaba yemibuzo ye-PostgreSQL. Kirill Borovikov (Tensor)

"Kunento eyodwa kuphela ongazange uyicabangele, kukhona uhlobo lwezinyosi ezinjengezindlovu!.." Saqala ukuthuthukisa lolu hlelo lapho sifuna ukuqapha amaseva angu-10. Okubaluleke kakhulu ekuqondeni kwethu, lapho kwavela khona ezinye izinkinga okwakunzima ukubhekana nazo. Kodwa phakathi nekota yokuqala, sithole ikhulu lokuqapha - ngoba uhlelo lwalusebenza, wonke umuntu wayelufuna, wonke umuntu wayekhululekile.

Konke lokhu kudinga ukwengezwa, ukugeleza kwedatha kukhulu futhi kuyasebenza. Eqinisweni, esikuqaphayo, esingabhekana nakho, yilokho esikusebenzisayo. Siphinde sisebenzisa i-PostgreSQL njengendawo yokugcina idatha. Futhi akukho okushesha “ukuthulula” idatha kuyo kuno-opharetha COPY Hhayi okwamanje.

Kodwa ukumane “uthulule” idatha akubona ngempela ubuchwepheshe bethu. Ngoba uma unezicelo ezingaba ngu-50k ngomzuzwana kumaseva ayikhulu, khona-ke lokhu kuzokhiqiza i-100-150GB yamalogi ngosuku. Ngakho-ke, kwakudingeka "sisike" ngokucophelela isisekelo.

Okokuqala, senze ukwahlukanisa phakathi nosuku, ngoba, ngokuvamile, akekho onentshisekelo ekuhlobaneni phakathi kwezinsuku. Kwenza mehluko muni lokhu obunakho izolo, uma namuhla kusihlwa ukhiphe inguqulo entsha yohlelo lokusebenza - kanye nezibalo ezintsha kakade.

Okwesibili, sifundile (saphoqelelwa) kakhulu, ngokushesha kakhulu ukubhala usebenzisa COPY. Okungukuthi, hhayi nje COPYngoba uyashesha ukwedlula INSERT, futhi ngokushesha nakakhulu.

Ukwenza kahle ngenqwaba yemibuzo ye-PostgreSQL. Kirill Borovikov (Tensor)

Iphuzu lesithathu - kwadingeka lahla izibangeli, ngokulandelana, nokhiye bangaphandle. Okusho ukuthi, asinakho nhlobo ubuqotho obuyinkomba. Ngoba uma unetafula elinama-FK amabili, futhi usho esakhiweni sedatha ukuthi "nali irekhodi lelogi elibhekiselwa yi-FK, isibonelo, eqenjini lamarekhodi," khona-ke uma ulifaka, i-PostgreSQL. akusele lutho ngaphandle kokuthi kuthathwe kanjani futhi ukwenze ngokwethembeka SELECT 1 FROM master_fk1_table WHERE ... ngesihlonzi ozama ukusifaka - ukuze nje uhlole ukuthi leli rekhodi likhona yini, ukuze "ungawuphuli" lo Khiye Wangaphandle ngokufaka kwakho.

Esikhundleni serekhodi elilodwa eliya kuthebula eliqondiwe nezikhombo zalo, sithola inzuzo eyengeziwe yokufunda kuwo wonke amathebula elibhekisela kuwo. Kodwa asikudingi nhlobo lokhu - umsebenzi wethu ukurekhoda ngangokunokwenzeka futhi ngokushesha ngangokunokwenzeka ngomthwalo omncane. Ngakho FK - phansi!

Iphuzu elilandelayo i-aggregation kanye ne-hashing. Ekuqaleni, sazisebenzisa ku-database - emva kwakho konke, kulula ukuthi, lapho irekhodi lifika, likwenze kuhlobo oluthile lwethebhulethi. "plus one" khona kanye kusicuphi. Yebo, kulula, kodwa into embi efanayo - ufaka irekhodi elilodwa, kodwa uphoqeleka ukuthi ufunde futhi ubhale enye into kwelinye ithebula. Ngaphezu kwalokho, awugcini nje ngokufunda nokubhala, futhi uyakwenza ngaso sonke isikhathi.

Manje ake ucabange ukuthi unetafula lapho ubala khona inani lezicelo ezidlule kumsingathi othile: +1, +1, +1, ..., +1. Futhi wena, empeleni, awukudingi lokhu - konke kungenzeka isamba enkumbulweni kumqoqi bese uthumela kusizindalwazi ngesikhathi esisodwa +10.

Yebo, uma kwenzeka kuba nezinkinga ezithile, ubuqotho bakho obunengqondo “bungase buhlakazeke”, kodwa lokhu kuyindaba ecishe ingenzeki - ngoba uneseva evamile, inebhethri kusilawuli, unelogi yokuthengiselana, ilogi ku-server. uhlelo lwefayela... Ngokuvamile, akufanelekile. Ukulahlekelwa kokukhiqiza okuthola ngokuqalisa izingcipho/i-FK akuzifanele izindleko onazo.

Kuyafana ne-hashing. Isicelo esithile sindiza kuwe, ubala isihlonzi esithile kuso ku-database, usibhale ku-database bese usitshela wonke umuntu. Konke kuhamba kahle kuze kube, ngesikhathi sokurekhoda, kufika umuntu wesibili kuwe ofuna ukurekhoda into efanayo - futhi uvinjelwe, futhi lokhu sekuvele kukubi. Ngakho-ke, uma ungakwazi ukudlulisa ukukhiqizwa kwamanye ama-ID kuklayenti (okuhlobene nesizindalwazi), kungcono ukwenza lokhu.

Bekukuhle kakhulu ngathi ukusebenzisa i-MD5 embhalweni - isicelo, uhlelo, isifanekiso,... Siyibala ohlangothini lomqoqi, bese "sithela" i-ID esenziwe ngomumo kusizindalwazi. Ubude be-MD5 nokuhlukaniswa kwansuku zonke kusivumela ukuthi singakhathazeki ngokushayisana okungenzeka.

Ukwenza kahle ngenqwaba yemibuzo ye-PostgreSQL. Kirill Borovikov (Tensor)

Kodwa ukuze siqophe ngokushesha konke lokhu, kwakudingeka silungise inqubo yokurekhoda ngokwayo.

Uvamise ukuyibhala kanjani idatha? Sinohlobo oluthile lwedathasethi, siyihlukanisa ibe amathebula amaningana, bese SIKOPELA - okokuqala kweyokuqala, bese kwesibili, kweyesithathu... Akulula, ngoba kubonakala sengathi sibhala ukusakazwa kwedatha eyodwa ngezinyathelo ezintathu. ngokulandelana. Okungajabulisi. Ingabe ingenziwa ngokushesha? Angakwazi!

Ukwenza lokhu, kwanele nje ukubola lokhu kugeleza ngokuhambisana nomunye nomunye. Kuvela ukuthi sinamaphutha, izicelo, izifanekiso, ukuvinjwa, ... ukundiza ngemicu ehlukene - futhi sibhala konke ngokuhambisana. Sekwanele lokhu gcina isiteshi se-COPY sivuliwe njalo kuthebula ngalinye eliqondiwe.

Ukwenza kahle ngenqwaba yemibuzo ye-PostgreSQL. Kirill Borovikov (Tensor)

Okungukuthi, kumqoqi kukhona njalo umfudlana, lapho ngingabhala khona idatha engiyidingayo. Kodwa ukuze i-database ibone le datha, futhi othile angabambeki elinde ukuthi le datha ibhalwe, IKOPISHA kufanele liphazanyiswe ngezikhathi ezithile. Kithina, isikhathi esisebenza kahle kakhulu sasicishe sibe ngu-100ms - siyayivala bese siyivula ngokushesha futhi etafuleni elifanayo. Futhi uma singenakho okwanele kokugeleza okukodwa phakathi kwezinye iziqongo, khona-ke sihlanganisa kuze kufike emkhawulweni othile.

Ukwengeza, sithole ukuthi kuphrofayela yomthwalo onjalo, noma yikuphi ukuhlanganisa, lapho amarekhodi eqoqwe ngamaqoqo, kubi. Ububi be-Classic bunjalo INSERT ... VALUES kanye namanye amarekhodi ayi-1000. Ngoba ngaleso sikhathi unenani eliphakeme lokubhala kwabezindaba, futhi wonke umuntu ozama ukubhala okuthile kudiski uzobe elindile.

Ukuqeda lokho okudidayo, mane nje ungahlanganisi lutho, ungabhafa nhlobo. Futhi uma ukubhafa kudiski kwenzeka (ngenhlanhla, i-Stream API ku-Node.js ikuvumela ukuthi uthole) - kuhlehlisa lokhu kuxhumana. Uma uthola umcimbi ukuthi umahhala futhi, sibhalele usuka kulayini osuqoqiwe. Futhi ngenkathi kumatasa, thatha elandelayo yamahhala echibini bese uyibhalela.

Ngaphambi kokwethula le ndlela ekurekhodweni kwedatha, sibe nama-ops okubhala cishe angu-4K, futhi ngale ndlela sehlise umthwalo izikhathi ezingu-4. Manje sebekhule ezinye izikhathi eziyisi-6 ngenxa yolwazi olusha olugadwayo - kufika ku-100MB/s. Futhi manje sigcina amalogi ezinyangeni ezingu-3 zokugcina ngevolumu engaba ngu-10-15TB, sithemba ukuthi ezinyangeni ezintathu nje noma yimuphi umthuthukisi uzokwazi ukuxazulula noma iyiphi inkinga.

Siyaziqonda izinkinga

Kodwa ukumane uqoqe yonke le datha kuhle, kuyasiza, kuyafaneleka, kodwa akwanele - kudinga ukuqondwa. Ngoba lezi izigidi zezinhlelo ezahlukene ngosuku.

Ukwenza kahle ngenqwaba yemibuzo ye-PostgreSQL. Kirill Borovikov (Tensor)

Kodwa izigidi azilawuleki, kumele siqale senze “okuncane”. Futhi, okokuqala, udinga ukunquma ukuthi uzoyihlela kanjani le nto "encane".

Sihlonze amaphuzu amathathu abalulekile:

  • ubani uthumele lesi sicelo
    Okusho ukuthi, kusukela kuluphi uhlelo lokusebenza "elufike": isikhombimsebenzisi sewebhu, i-backend, uhlelo lokukhokha noma enye into.
  • kuphi kwenzeka
    Kuyiphi iseva ethile? Ngoba uma unamaseva amaningana ngaphansi kwesicelo esisodwa, futhi kungazelelwe eyodwa "iba yisiphukuphuku" (ngoba "idiski ibolile", "inkumbulo iputshukile", enye inkinga), khona-ke udinga ukubhekana ngqo neseva.
  • kanjani inkinga yaziveza ngandlela thize

Ukuze siqonde ukuthi “ubani” usithumelele isicelo, sisebenzisa ithuluzi elivamile - ukusetha okuguquguqukayo kweseshini: SET application_name = '{bl-host}:{bl-method}'; — sithumela igama lomsingathi wengqondo yebhizinisi lapho kuvela khona isicelo, kanye negama lendlela noma isicelo esisiqalile.

Ngemuva kokuthi sesiphumelele "umnikazi" wesicelo, kufanele siphume kulogi - kulokhu simisa okuguquguqukayo. log_line_prefix = ' %m [%p:%v] [%d] %r %a'. Kulabo abanentshisekelo, mhlawumbe bheka encwadinikusho ukuthini konke. Kuvela ukuthi sibona ku-log:

  • время
  • izihlonzi zenqubo nezokwenziwe
  • igama lesizindalwazi
  • I-IP yomuntu othumele lesi sicelo
  • kanye negama lendlela

Ukwenza kahle ngenqwaba yemibuzo ye-PostgreSQL. Kirill Borovikov (Tensor)

Sabe sesibona ukuthi akuthakazelisi kakhulu ukubheka ukuhlobana kwesicelo esisodwa phakathi kwamaseva ahlukene. Akuvamile ukuthi ube nesimo lapho uhlelo lokusebenza olulodwa lufinyelela ngokulinganayo lapha nalaphaya. Kodwa noma ngabe kuyafana, bheka noma iyiphi yalezi ziphakeli.

Ngakho nakhu ukusika "iseva eyodwa - usuku olulodwa" kuvele ukuthi kwanele kithi kunoma yikuphi ukuhlaziya.

Isigaba sokuqala sokuhlaziya siyafana "isampula" - uhlobo olufushanisiwe lokwethula uhlelo, olusulwe kuzo zonke izinkomba zezinombolo. Ukusika kwesibili kuwuhlelo lokusebenza noma indlela, futhi ukusika kwesithathu yinodi yepulani ethile esibangele izinkinga.

Lapho sisuka ezimweni ezithile siye kuzifanekiso, sithole izinzuzo ezimbili ngesikhathi esisodwa:

  • ukuncishiswa okuningi kwenani lezinto ezizohlaziywa
    Akumele sihlaziye inkinga ngezinkulungwane zemibuzo noma izinhlelo, kodwa ngezifanekiso eziningi.
  • umugqa wesikhathi
    Okusho ukuthi, ngokufingqa "amaqiniso" ngaphakathi kwesigaba esithile, ungabonisa ukubukeka kwawo phakathi nosuku. Futhi lapha ungaqonda ukuthi uma unohlobo oluthile lwephethini okwenzekayo, isibonelo, kanye ngehora, kodwa kufanele kwenzeke kanye ngosuku, kufanele ucabange ngalokho okungahambanga kahle - ngubani owabangela nokuthi kungani, mhlawumbe kufanele kube lapha. akufanele. Lena enye indlela yokuhlaziya engeyona inombolo, ebonakalayo kuphela.

Ukwenza kahle ngenqwaba yemibuzo ye-PostgreSQL. Kirill Borovikov (Tensor)

Izindlela ezisele zisekelwe ezinkomba esizikhipha ohlelweni: zingaki izikhathi zephethini enjalo, isikhathi esiphelele nesilinganiso, ingakanani idatha efundwe kudiski, nokuthi ingakanani inkumbulo...

Ngoba, ngokwesibonelo, ufika ekhasini le-analytics lomsingathi, bheka - kukhona into eqala ukufunda kakhulu kudiski. Idiski kuseva ayikwazi ukuyiphatha - ubani ofunda kuyo?

Futhi ungakwazi ukuhlunga nganoma iyiphi ikholomu futhi unqume ukuthi yini ozobhekana nayo njengamanje - umthwalo ku-processor noma idiski, noma inani eliphelele lezicelo... Siyihlele, sabheka "eziphezulu", salungisa futhi. ikhiphe inguqulo entsha yohlelo lokusebenza.
[inkulumo yevidiyo]

Futhi ngokushesha ungabona izinhlelo zokusebenza ezehlukene eziza nesifanekiso esifanayo esicelweni esifana naso SELECT * FROM users WHERE login = 'Vasya'. I-frontend, i-backend, processing... Futhi uyazibuza ukuthi kungani ukucubungula kungafunda umsebenzisi uma engaxhumani naye.

Indlela ephambene ukubona ngokushesha kusuka kuhlelo lokusebenza ukuthi lenzani. Isibonelo, i-frontend yilena, lokhu, lokhu, futhi lokhu kanye ngehora (umugqa wesikhathi uyasiza). Futhi umbuzo uphakama ngokushesha: kubonakala sengathi akuwona umsebenzi we-frontend ukwenza okuthile kanye ngehora ...

Ukwenza kahle ngenqwaba yemibuzo ye-PostgreSQL. Kirill Borovikov (Tensor)

Ngemva kwesikhathi esithile, sabona ukuthi sasintula iqoqo izibalo ngamanodi ohlelo. Sizihlukanise nezinhlelo kuphela lawo manodi enza okuthile ngedatha yamathebula ngokwawo (wafunde/wabhale ngenkomba noma cha). Eqinisweni, isici esisodwa kuphela esingeziwe esihlobene nesithombe sangaphambilini - mangaki amarekhodi le nodi esilethele yona?, nokuthi mangaki alahliwe (Imigqa Ikhishwe Ngesihlungi).

Awunayo inkomba efanelekile epuletini, wenza isicelo kuyo, indiza idlule inkomba, iwela ku-Seq Scan ... uhlunge wonke amarekhodi ngaphandle kweyodwa. Kungani udinga amarekhodi ahlungiwe angu-100M ngosuku? Akungcono yini ukugoqa inkomba?

Ukwenza kahle ngenqwaba yemibuzo ye-PostgreSQL. Kirill Borovikov (Tensor)

Ngemva kokuhlaziya yonke i-node yezinhlelo nge-node, sabona ukuthi kunezakhiwo ezijwayelekile ezinhlelweni ezingahle zibukeke isolisa. Futhi kungaba kuhle ukutshela unjiniyela: "Mngane, lapha uqala ukufunda ngenkomba, bese uhlela, bese usika" - njengomthetho, kukhona irekhodi elilodwa.

Wonke umuntu obhale imibuzo cishe uhlangabezane naleli phethini: “Nginike i-oda lokugcina le-Vasya, idethi yayo.” Futhi uma ungenayo inkomba ngosuku, noma lungekho usuku kunkomba oyisebenzisile, uyobe gibela “i-rake” efanayo ncamashi .

Kodwa siyazi ukuthi lokhu “i-rake” - kungani-ke ungatsheli unjiniyela ngokushesha lokho okufanele akwenze. Ngakho-ke, lapho evula uhlelo manje, umthuthukisi wethu ubona ngokushesha isithombe esihle esinamathiphu, lapho amtshela khona ngokushesha: "Unezinkinga lapha nalaphaya, kodwa zixazululwa ngapha nangapha."

Ngenxa yalokho, inani lokuhlangenwe nakho obekudingeka ukuxazulula izinkinga ekuqaleni futhi manje lehle kakhulu. Lolu wuhlobo lwethuluzi esinalo.

Ukwenza kahle ngenqwaba yemibuzo ye-PostgreSQL. Kirill Borovikov (Tensor)

Source: www.habr.com

Engeza amazwana