Ukulungiswa ngobuninzi bemibuzo yePostgreSQL. Kirill Borovikov (Tensor)

Ingxelo ibonisa ezinye iindlela ezivumelayo esweni ukusebenza imibuzo SQL xa kukho izigidi kubo ngosuku, kwaye kukho amakhulu eeseva zePostgreSQL ezibekwe iliso.

Ziziphi izisombululo zobugcisa ezisivumela ukuba siqhube ngokufanelekileyo umthamo onjalo wolwazi, kwaye oku kwenza njani ukuba ubomi bomphuhlisi oqhelekileyo bube lula?


Ngubani onomdla? uhlalutyo lweengxaki ezithile kunye neendlela ezahlukeneyo zokuphucula Imibuzo yeSQL kunye nokusombulula iingxaki zeDBA eziqhelekileyo kwiPostgreSQL - unakho kwakhona funda uthotho lwamanqaku ngalo mxholo.

Ukulungiswa ngobuninzi bemibuzo yePostgreSQL. Kirill Borovikov (Tensor)
Igama lam nguKirill Borovikov, ndimele Inkampani ye-tensor. Ngokukodwa, ndisebenza ngokukhethekileyo ngogcino-lwazi kwinkampani yethu.

Namhlanje ndiza kukuxelela indlela esiyenza ngayo imibuzo, xa ungadingi "kukhetha" ukusebenza kombuzo omnye, kodwa ukusombulula ingxaki ngobuninzi. Xa kukho izigidi zezicelo, kwaye kufuneka ufumane ezinye iindlela zokusombulula le ngxaki inkulu.

Ngokubanzi, iTensor yesigidi sabathengi bethu I-VLSI sisicelo sethu: inethiwekhi yentlalo yenkampani, izisombululo zonxibelelwano lwevidiyo, ukuhamba kwamaxwebhu angaphakathi nangaphandle, iinkqubo ze-accounting ze-accounting kunye neendawo zokugcina iimpahla, ... Oko kukuthi, "i-mega-combine" enjalo yokulawula ishishini elidibeneyo, apho kukho ngaphezu kwe-100 eyahlukeneyo. iiprojekthi zangaphakathi.

Ukuqinisekisa ukuba zonke ziyasebenza kwaye ziphuhla ngokwesiqhelo, sinamaziko ophuhliso ali-10 kulo lonke ilizwe, nangaphezulu kuwo 1000 abaphuhlisi.

Siye sasebenza kunye nePostgreSQL ukususela ngo-2008 kwaye siqokelele isixa esikhulu sento esiyiqhubayo-idatha yomxhasi, izibalo, uhlalutyo, idatha evela kwiinkqubo zolwazi lwangaphandle - ngaphezulu kwe-400TB. Kukho malunga neeseva ezingama-250 kwimveliso yodwa, kwaye zizonke kukho malunga ne-1000 yeeseva zedatha esizibeka esweni.

Ukulungiswa ngobuninzi bemibuzo yePostgreSQL. Kirill Borovikov (Tensor)

I-SQL lulwimi oluchazayo. Awuyichazi "indlela" into ekufuneka isebenze ngayo, kodwa "yintoni" ofuna ukuyifeza. I-DBMS iyazi ngcono indlela yokwenza i-JOIN - indlela yokudibanisa iitafile zakho, zeziphi iimeko zokunyanzelisa, yintoni eya kuhamba ngesalathisi, yintoni engayi ...

Ezinye ii-DBMS zamkela iingcebiso: β€œHayi, qhagamshela ezi tafile zimbini kumgca onje nanje,” kodwa i-PostgreSQL ayikwazi ukwenza oku. Esi sisikhundla solwazi sabaphuhlisi abakhokelayo: "Singathanda ukugqiba isicelo esilungeleyo kunokuvumela abaphuhlisi basebenzise uhlobo oluthile lweengcebiso."

Kodwa, ngaphandle kwento yokuba iPostgreSQL ayivumeli "ngaphandle" ukuba izilawule, ivumela ngokugqibeleleyo ubone ukuba kuqhubeka ntoni na ngaphakathi kuyexa uqhuba umbuzo wakho, nalapho uneengxaki.

Ukulungiswa ngobuninzi bemibuzo yePostgreSQL. Kirill Borovikov (Tensor)

Ngokubanzi, zeziphi iingxaki zakudala adla ngokuza nazo umphuhlisi [kwi-DBA]? β€œApha sifezekise isicelo, kwaye yonke into ihamba kancinci nathi, yonke into ijinga, kukho into eyenzekayo... Ingxaki ethile!”

Izizathu ziphantse zifane:

  • ialgorithm yombuzo engasebenziyo
    Unjiniyela: "Ngoku ndimnika iitafile ze-10 kwi-SQL nge-JOIN ..." - kwaye ulindele ukuba iimeko zakhe ziya kuphumelela ngokungummangaliso "zikhulule" kwaye uya kufumana yonke into ngokukhawuleza. Kodwa imimangaliso ayenzeki, kwaye nayiphi na inkqubo enokuguquguquka okunjalo (iitafile ezili-10 kwelinye UKUSUKA) zihlala zinika uhlobo lwempazamo. [inqaku]
  • izibalo ezingenamsebenzi
    Le ngongoma ibaluleke kakhulu ngokukodwa kwi-PostgreSQL, xa "ugalela" i-dataset enkulu kwiseva, yenza isicelo, kwaye "i-sexcanits" ithebhulethi yakho. Ngenxa yokuba izolo bekukho iirekhodi ezili-10 kuyo, kwaye namhlanje kukho izigidi ezili-10, kodwa i-PostgreSQL ayikayazi le nto, kwaye kufuneka siyixelele ngayo. [inqaku]
  • "plug" kwizibonelelo
    Ufake isiseko sedatha esikhulu nesilayishwe kakhulu kwiseva ebuthathaka engenayo idiski eyaneleyo, inkumbulo, okanye ukusebenza komqhubekekisi. Kwaye yiyo yonke loo nto ... Kwenye indawo kukho isilingi yokusebenza ngaphezulu ongasakwazi ukutsiba.
  • ukuvimba
    Le ngongoma enzima, kodwa ibaluleke kakhulu kwimibuzo eyahlukeneyo yokuguqula (INSERT, UPDATE, DELETE) - esi sisihloko esikhulu esahlukileyo.

Ukufumana isicwangciso

... Kwaye kuyo yonke enye into thina ndidinga isicwangciso! Kufuneka sibone okwenzekayo ngaphakathi kwiseva.

Ukulungiswa ngobuninzi bemibuzo yePostgreSQL. Kirill Borovikov (Tensor)

Isicwangciso sokwenziwa kombuzo wePostgreSQL ngumthi we-algorithm yokwenziwa kombuzo ekuboniseni okubhaliweyo. Yiyo ngokuchanekileyo i-algorithm yokuba, ngenxa yokuhlalutya komcwangcisi, yafunyanwa iyona nto isebenzayo.

Indawo nganye yomthi ngumsebenzi: ukubuyisela idatha kwitafile okanye isalathisi, ukwakha i-bitmap, ukudibanisa iitafile ezimbini, ukudibanisa, ukunqumla, okanye ukungabandakanyi okukhethiweyo. Ukwenza umbuzo kubandakanya ukuhamba kwiindawo ezithile zalo mthi.

Ukufumana isicwangciso sombuzo, eyona ndlela ilula kukuphumeza ingxelo EXPLAIN. Ukufumana zonke iimpawu zokwenyani, oko kukuthi, ukwenza umbuzo kwisiseko - EXPLAIN (ANALYZE, BUFFERS) SELECT ....

Inxalenye embi: xa uyiqhuba, iyenzeka "apha kwaye ngoku", ngoko ifanelekile kuphela ukulungiswa kwendawo. Ukuba uthatha iseva elayishwe kakhulu ephantsi kokuhamba ngamandla kotshintsho lwedatha, kwaye ubona: β€œOwu! Apha sinokuphunyezwa okucothayoся isicelo." Isiqingatha seyure, iyure edlulileyo-ngelixa ububaleka kwaye ufumana esi sicelo kwilog, usibuyisela kumncedisi, yonke idatha yakho kunye nezibalo zitshintshile. Uyiqhubela ukulungisa ingxaki- kwaye ibaleka ngokukhawuleza! Kwaye awukwazi ukuqonda ukuba kutheni, kutheni kwakunjalo kancinci.

Ukulungiswa ngobuninzi bemibuzo yePostgreSQL. Kirill Borovikov (Tensor)

Ukuze uqonde okwenzekayo kanye ngelo xesha xa isicelo senziwe kumncedisi, abantu abahlakaniphile babhala auto_cacisa imodyuli. Ikhona phantse kuzo zonke iindawo eziqhelekileyo zonikezelo lwePostgreSQL, kwaye inokuthi isebenze kwifayile yoqwalaselo.

Ukuba iyaqonda ukuba isicelo esithile sisebenza ixesha elide kunomda obuwuxelele ukuba siwenze, iyenzeka "i-snapshot" yesicwangciso sesi sicelo kwaye izibhala kunye kwilog.

Ukulungiswa ngobuninzi bemibuzo yePostgreSQL. Kirill Borovikov (Tensor)

Yonke into ibonakala ilungile ngoku, siya kwilogi kwaye sibone apho ... [i-text footcloth]. Kodwa asinakuthetha nto ngayo, ngaphandle kwento yokuba sisicwangciso esihle kuba kuthathe 11ms ukuphumeza.

Yonke into ibonakala ihamba kakuhle - kodwa akukho nto icacileyo ukuba yintoni kanye eyenzekayo. Ngaphandle kwexesha eliqhelekileyo, asiboni nto. Kuba ukujonga β€œimvana” elolo hlobo kumbhalo ocacileyo ngokuqhelekileyo akubonakali.

Kodwa nokuba ayicacanga, nokuba ayilunganga, kukho iingxaki ezisisiseko:

  • I-node ibonisa isixa semithombo yomthi ongaphantsi wonke phantsi kwakhe. Oko kukuthi, awukwazi nje ukufumanisa ukuba lingakanani ixesha elichithwe kwesi salathisi Isalathisi ukuba kukho imeko ebekwe phantsi kwayo. Kufuneka sijonge ngamandla ukubona ukuba kukho "abantwana" kunye nezinto eziguquguqukayo ezinemiqathango, ii-CTEs ngaphakathi - kwaye sisuse konke oku "ezingqondweni zethu".
  • Inqaku lesibini: ixesha elibonakaliswe kwi-node li ixesha lokwenziwa kwenodi enye. Ukuba le node iqhutywe ngenxa, umzekelo, i-loop ngokusebenzisa iirekhodi zetafile ngamaxesha amaninzi, ngoko inani le-loops-imijikelezo yale node-iyanda kwisicwangciso. Kodwa ixesha lokwenziwa kweathom ngokwalo lihlala lifana ngokwesicwangciso. Oko kukuthi, ukuze uqonde ukuba le node yenziwa ixesha elingakanani lilonke, kufuneka uphindaphinde into enye kwenye - kwakhona, "entloko yakho."

Kwiimeko ezinjalo, qonda ukuba "Ngubani oyena nxibelelwano lubuthathaka?" phantse akunakwenzeka. Ngoko ke, nabaphuhlisi ngokwabo babhala "kwi-manual" ukuba "Ukuqonda isicwangciso bubugcisa ekufuneka bufundwe, amava ...".

Kodwa sinabaphuhlisi abayi-1000, kwaye awukwazi ukuhambisa la mava kuye ngamnye wabo. Mna, uyazi, kodwa akukho mntu waziyo phaya. Mhlawumbi uya kufunda, okanye hayi, kodwa kufuneka asebenze ngoku - kwaye wayeza kuwafumana phi la mava?

Ukucwangcisa umbono

Ngoko ke, siye saqonda ukuba ukuze sihlangabezane nezi ngxaki, kufuneka ukubonwa kakuhle kwesicwangciso. [inqaku]

Ukulungiswa ngobuninzi bemibuzo yePostgreSQL. Kirill Borovikov (Tensor)

Siqale sahamba "kwimarike" - makhe sijonge kwi-Intanethi ukubona ukuba yintoni na ekhoyo.

Kodwa kuye kwavela ukuba kukho izisombululo ezimbalwa kakhulu "eziphilayo" ezikhula ngakumbi okanye ezingaphantsi - ngokoqobo, enye kuphela: explain.depesz.com nguHubert Lubaczewski. Xa ufaka indawo "yokutya" umboniso wombhalo wesicwangciso, ikubonisa itafile enedatha ecaluliweyo:

  • ixesha lokuqhubekeka le nodi
  • ixesha lilonke le-subtree yonke
  • inani leerekhodi ezifunyenweyo ebezilindelwe ngokweenkcukacha-manani
  • umzimba we-node ngokwayo

Le nkonzo ikwanakho ukwabelana ngovimba wekhonkco. Uphose icebo lakho apho kwaye wathi: "Heyi, Vasya, nali ikhonkco, kukho into engalunganga apho."

Ukulungiswa ngobuninzi bemibuzo yePostgreSQL. Kirill Borovikov (Tensor)

Kodwa kukho iingxaki ezincinci.

Okokuqala, inani elikhulu le "copy-paste". Uthatha isiqwenga sesigodo, usincamathele apho, kwaye kwakhona, kwaye kwakhona.

Okwesibini, akukho uhlalutyo lobungakanani bedatha efundiweyo - izithinteli ezifanayo eziphumayo EXPLAIN (ANALYZE, BUFFERS), asiyiboni apha. Akayazi nje ukuba angaziqhawula njani, aziqonde kwaye asebenze nazo. Xa ufunda idatha eninzi kwaye uqonde ukuba unokuba usasaza kakubi idiski kunye ne-cache yememori, olu lwazi lubaluleke kakhulu.

Inqaku lesithathu elibi luphuhliso olubuthathaka kakhulu lwale projekthi. Izibophelelo zincinci kakhulu, kulungile ukuba kube kanye kwiinyanga ezintandathu, kwaye ikhowudi isePerl.

Ukulungiswa ngobuninzi bemibuzo yePostgreSQL. Kirill Borovikov (Tensor)

Kodwa yonke le "ingoma", singaphila ngandlela thile nale nto, kodwa kukho into enye eyasisusa kakhulu kule nkonzo. Ezi ziziphoso kuhlalutyo lweSibonakaliso seThebhule esiQhelekileyo (CTE) kunye neenodi ezahlukeneyo eziguquguqukayo ezifana ne-InitPlan/SubPlan.

Ukuba uyawukholelwa lo mfanekiso, ke ixesha elipheleleyo lokwenziwa kwenodi nganye likhulu kunexesha elipheleleyo lokwenziwa kwesicelo sonke. Ilula - ixesha lokuvelisa le CTE alithatyathwanga kwi CTE Scan node. Ke ngoko, asisayazi impendulo echanekileyo yokuba i-CTE scan yathatha ixesha elingakanani.

Ukulungiswa ngobuninzi bemibuzo yePostgreSQL. Kirill Borovikov (Tensor)

Emva koko siye sabona ukuba lixesha lokuba sibhale eyethu - hurray! Wonke umphuhlisi uthi: "Ngoku siza kubhala ezethu, kuya kuba lula kakhulu!"

Sithathe isitakhi esiqhelekileyo kwiinkonzo zewebhu: ingundoqo esekwe kwi-Node.js + Express, i-Bootstrap esetyenzisiweyo kunye ne-D3.js kwimizobo emihle. Kwaye ulindelo lwethu lwaluthetheleleka ngokupheleleyo - sifumene iprototype yokuqala kwiiveki ezi-2:

  • umcazululi wesicwangciso esiqhelekileyo
    Oko kukuthi, ngoku sinokucazulula nasiphi na isicwangciso kwezo zenziwe yiPostgreSQL.
  • uhlalutyo oluchanekileyo lweenodi eziguqukayo -CTE Scan, InitPlan, SubPlan
  • uhlalutyo losasazo lwezithinteli - apho amaphepha edatha afundwa kwimemori, apho avela kwi-cache yendawo, apho avela kwidiski
  • ndifumene ukucaca
    Ukuze ungabi "ukumba" konke oku kwilogi, kodwa ukubona "ikhonkco elibuthathaka" ngoko nangoko emfanekisweni.

Ukulungiswa ngobuninzi bemibuzo yePostgreSQL. Kirill Borovikov (Tensor)

Sinento efana nale, kunye nokuqaqambisa isintaksi kubandakanyiwe. Kodwa ngokwesiqhelo abaphuhlisi bethu abasasebenzi ngokumelwa okupheleleyo kwesicwangciso, kodwa ngokufutshane. Emva koko, sele sele sihlalutye onke amanani kwaye siwaphose ngakwesobunxele nasekunene, kwaye phakathi sishiya kuphela umgca wokuqala, luhlobo luni lwe-node: I-CTE Scan, isizukulwana se-CTE okanye i-Seq Scan ngokophawu oluthile.

Olu luphawu olufinyeziweyo esilubizayo itemplate yesicwangciso.

Ukulungiswa ngobuninzi bemibuzo yePostgreSQL. Kirill Borovikov (Tensor)

Yintoni enye enokuba luncedo? Kuya kuba luncedo ukubona ukuba sesiphi isabelo sexesha lethu lilonke esabelwe ukuba yeyiphi indawo- kwaye nje "uyincamathele" ecaleni itshathi yephayi.

Sikhomba kwi-node kwaye sibone - kuvela ukuba i-Seq Scan ithathe ngaphantsi kwekota yexesha elipheleleyo, kwaye i-3/4 eseleyo ithathwe yi-CTE Scan. Ukoyikeka! Eli linqaku elincinci malunga "nezinga lomlilo" le-CTE Scan ukuba uyisebenzisa ngokukhutheleyo kwimibuzo yakho. Azikhawulezi kakhulu - zingaphantsi nakwizinto eziqhelekileyo zokuskena itafile. [inqaku] [inqaku]

Kodwa ngokuqhelekileyo imizobo enjalo inomdla ngakumbi, inzima ngakumbi, xa sikhomba ngokukhawuleza kwicandelo kwaye sibone, umzekelo, ukuba ngaphezu kwesiqingatha sexesha ezinye ze-Seq Scan "zadla". Ngaphezu koko, kwakukho uhlobo oluthile lweSihluzo ngaphakathi, ezininzi iirekhodi zalahlwa ngokungqinelana nayo ... Unokuphosa ngokuthe ngqo lo mfanekiso kumthuthukisi kwaye uthi: "Vasya, yonke into imbi apha kuwe! Yicinge, jonga - kukho undonakele!"

Ukulungiswa ngobuninzi bemibuzo yePostgreSQL. Kirill Borovikov (Tensor)

Ngokwemvelo, kwakukho ezinye "ii-rakes" ezibandakanyekayo.

Into yokuqala esiyifumeneyo yingxaki yokusondeza. Ixesha le-node nganye kwisicwangciso liboniswa ngokuchaneka kwe-1 ΞΌs. Kwaye xa inani lemijikelezo ye-node lidlula, umzekelo, i-1000 - emva kokubulawa kwe-PostgreSQL yahlula "ngaphakathi kokuchaneka", ngoko xa sibala emva sifumana ixesha elipheleleyo "ndaweni ethile phakathi kwe-0.95ms kunye ne-1.05ms". Xa ubalo lusiya kwi-microseconds, oko kulungile, kodwa xa sele sele [milli] imizuzwana, kufuneka uthathele ingqalelo le ngcaciso xa "ukhulula" izibonelelo kwiinodes zesicwangciso esithi "ngubani odle kangakanani".

Ukulungiswa ngobuninzi bemibuzo yePostgreSQL. Kirill Borovikov (Tensor)

Inqaku lesibini, elintsonkothileyo ngakumbi, kukusasazwa kwezibonelelo (ezo buffers) phakathi kweenodi eziguqukayo. Oku kusidle iiveki ezi-2 zokuqala zeprototype kunye nezinye iiveki ezi-4.

Kulula kakhulu ukufumana olu hlobo lwengxaki- senza i-CTE kwaye kuthiwa sifunde into kuyo. Ngapha koko, iPostgreSQL β€œihlakaniphile” kwaye ayizukufunda nto ngqo apho. Emva koko sithatha irekhodi yokuqala kuyo, kwaye kuyo ikhulu lokuqala ukusuka kwi-CTE efanayo.

Ukulungiswa ngobuninzi bemibuzo yePostgreSQL. Kirill Borovikov (Tensor)

Sijonge isicwangciso kwaye siqonde - kuyamangalisa, sinee-buffers ezi-3 (amaphepha edatha) "adliwayo" kwi-Seq Scan, i-1 ngaphezulu kwi-CTE Scan, kunye ne-2 ngaphezulu kwi-CTE Scan yesibini. Oko kukuthi, ukuba sishwankathela yonke into, siya kufumana i-6, kodwa kwithebhulethi sifunda u-3 kuphela! I-CTE Scan ayifundi nantoni na ukusuka naphi na, kodwa isebenza ngokuthe ngqo ngememori yenkqubo. Oko kukuthi, kukho into engalunganga ngokucacileyo apha!

Enyanisweni, kuvela ukuba apha onke la maphepha ama-3 edatha ayecelwe kwi-Seq Scan, kuqala i-1 yacela i-1st CTE Scan, kwaye emva kwe-2, kwaye i-2 ngaphezulu yafundelwa kuye. Amaphepha ama-3 aye afundwa idatha, hayi ama-6.

Ukulungiswa ngobuninzi bemibuzo yePostgreSQL. Kirill Borovikov (Tensor)

Kwaye lo mfanekiso wasikhokelela ekuqondeni ukuba ukuphunyezwa kwesicwangciso akusekho umthi, kodwa nje uhlobo oluthile lwegrafu ye-acyclic. Kwaye sinawo umzobo onje, ukuze siqonde "into eyavela phi kwasekuqaleni." Oko kukuthi, apha senze i-CTE ukusuka kwi-pg_class, kwaye siyicela kabini, kwaye phantse lonke ixesha lethu lachithwa kwisebe xa sasicela ixesha le-2. Kucacile ukuba ukufunda ukungena kwe-101 kubiza kakhulu kunokufunda nje ukungena kwe-1 kwithebhulethi.

Ukulungiswa ngobuninzi bemibuzo yePostgreSQL. Kirill Borovikov (Tensor)

Saphefumla ixeshana. Bathi: β€œNgoku, Neo, uyayazi i-kung fu! Ngoku amava ethu alungile kwiscreen sakho. Ngoku ungayisebenzisa. " [inqaku]

Udibaniso lwelogi

Abaphuhlisi bethu abali-1000 baphefumle ngokukhululeka. Kodwa siye saqonda ukuba sinamakhulu kuphela iiseva "zokulwa", kwaye yonke le "copy-paste" kwicala labaphuhlisi ayilunganga kwaphela. Saqonda ukuba kufuneka siziqokelele ngokwethu.

Ukulungiswa ngobuninzi bemibuzo yePostgreSQL. Kirill Borovikov (Tensor)

Ngokubanzi, kukho imodyuli esemgangathweni enokuqokelela iinkcukacha-manani, nangona kunjalo, ifuna ukuba isebenze kuqwalaselo - oku. imodyuli pg_stat_statements. Kodwa akasifanelanga.

Okokuqala, yabela imibuzo efanayo isebenzisa izikimu ezahlukeneyo ngaphakathi kwesiseko sedatha enye ezahlukeneyo QueryIds. Oko kukuthi, ukuba uqala ukwenza SET search_path = '01'; SELECT * FROM user LIMIT 1;kwaye emva koko SET search_path = '02'; kunye nesicelo esifanayo, ngoko izibalo zale modyuli ziya kuba neerekhodi ezahlukeneyo, kwaye andiyi kukwazi ukuqokelela izibalo ngokubanzi ngokukodwa kumxholo wesicelo seprofayili, ngaphandle kokuthathela ingqalelo izikimu.

Ingongoma yesibini eyasithintelayo ukuba singayisebenzisa ukungabikho kwezicwangciso. Oko kukuthi, akukho sicwangciso, kukho kuphela isicelo ngokwaso. Siyayibona into eyayicotha, kodwa asiqondi ukuba kutheni. Kwaye apha sibuyela kwingxaki yedatha eguqukayo ngokukhawuleza.

Kwaye umzuzu wokugqibela - ukungabikho "kwenyani". Oko kukuthi, awukwazi ukujongana nomzekelo othile wokwenziwa kombuzo - akukho nanye, kukho kuphela izibalo ezidibeneyo. Nangona kunokwenzeka ukusebenza nale nto, kunzima kakhulu.

Ukulungiswa ngobuninzi bemibuzo yePostgreSQL. Kirill Borovikov (Tensor)

Ke ngoko, sagqiba kwelokuba silwe i-copy-paste kwaye saqala ukubhala umqokeleli.

Umqokeleli udibanisa nge-SSH, useka uxhumano olukhuselekileyo kumncedisi kunye nedathabheyisi usebenzisa isatifikethi, kunye tail -F "ibambelela" kuyo kwifayile yelog. Ngoko kule seshoni sifumana "isipili" esipheleleyo sayo yonke ifayile yelog, eyenziwa ngumncedisi. Umthwalo kumncedisi ngokwawo uncinci, kuba asicazululi nantoni na apho, sijonga nje i-traffic.

Ekubeni sele siqale ukubhala i-interface kwi-Node.js, saqhubeka sibhala umqokeleli kuyo. Kwaye obu buchwephesha buziphendulele, kuba kulungele kakhulu ukusebenzisa iJavaScript ukusebenza ngedatha yokubhaliweyo efomathiweyo ebuthathaka, eyilog. Kwaye i-Node.js i-infrastructure ngokwayo njengeqonga le-backend likuvumela ukuba usebenze ngokulula nangokufanelekileyo kunye noqhagamshelwano lwenethiwekhi, kwaye ngokwenene kunye nayiphi na imijelo yedatha.

Ngokufanelekileyo, "solula" unxibelelwano olubini: eyokuqala "ukumamela" kwilogi ngokwayo kwaye siyithathele kuthi, kwaye okwesibini ukubuza rhoqo isiseko. "Kodwa ilogi ibonisa ukuba uphawu olune-oid 123 luvaliwe," kodwa oku akuthethi nto kumphuhlisi, kwaye kuya kuba kuhle ukubuza i-database, "Yintoni i-OID = 123 kunjalo?" Kwaye ke sihlala sibuza isiseko into esingayaziyo malunga nathi.

Ukulungiswa ngobuninzi bemibuzo yePostgreSQL. Kirill Borovikov (Tensor)

"Kukho into enye ongazange uyithathele ingqalelo, kukho iindidi zeenyosi ezinjengeendlovu! . "Saqala ukuphuhlisa le nkqubo xa sifuna ukubeka iliso kwiiseva ezili-10. Eyona nto ibalulekileyo ekuqondeni kwethu, apho kwavela ezinye iingxaki ekunzima ukujamelana nazo. Kodwa kwikota yokuqala, sifumene ikhulu lokubeka iliso - kuba inkqubo yayisebenza, wonke umntu wayeyifuna, wonke umntu wayekhululekile.

Konke oku kufuneka kongezwe, ukuhamba kwedatha kukhulu kwaye kuyasebenza. Ngapha koko, into esiyijongayo, into esinokujongana nayo, yile siyisebenzisayo. Sikwasebenzisa iPostgreSQL njengendawo yokugcina idatha. Kwaye akukho nto ikhawulezayo "ukugalela" idatha kuyo kunomqhubi COPY Ayikenzeki.

Kodwa ngokulula "ukugalela" idatha ayiyonyani itekhnoloji yethu. Kuba ukuba unezicelo ezimalunga ne-50k ngesekhondi kwiiseva ezilikhulu, ke oku kuya kuvelisa i-100-150GB yeelogi ngosuku. Ngoko ke, kwafuneka ukuba "sinqumle" ngononophelo isiseko.

Okokuqala, senze ukwahlulahlula ngemini, ngenxa yokuba, ngokubanzi, akukho mntu unomdla kunxulumano phakathi kweentsuku. Kwenza wuphi umahluko into obunayo izolo, ukuba ngokuhlwanje ukhuphe uguqulelo olutsha lwesicelo - kwaye sele kukho amanani amatsha.

Okwesibini, sifundile (sinyanzelisiwe) kakhulu, ngokukhawuleza kakhulu ukubhala usebenzisa COPY. Oko kukuthi, hayi nje COPYngokuba ukhawuleza kunaye INSERT, kwaye ngokukhawuleza.

Ukulungiswa ngobuninzi bemibuzo yePostgreSQL. Kirill Borovikov (Tensor)

Inqaku lesithathu - kwafuneka lahla izinto ezibangelayo, ngokulandelelana, kunye nezitshixo zangaphandle. Oko kukuthi, asinakuthenjwa kwaphela. Kuba ukuba unetafile eneqela le-FKs, kwaye uthi kwisiseko sedatha "nantsi irekhodi yelogi echazwe yi-FK, umzekelo, kwiqela leerekhodi," ngoko xa uyifaka, i-PostgreSQL. akukho nto iseleyo ngaphandle kokuyithatha kwaye uyenze ngokunyanisekileyo SELECT 1 FROM master_fk1_table WHERE ... kunye nesazisi ozama ukusifaka - ukujonga nje ukuba le rekhodi ikhona apho, ukuba "ungasiqhawuli" esi Sitshixo sangaphandle ngokufaka kwakho.

Esikhundleni serekhodi enye kwitheyibhile ekujoliswe kuyo kunye nezalathisi zayo, sifumana inzuzo eyongezelelweyo yokufunda kuzo zonke iitheyibhile ezibhekiselele kuzo. Kodwa asiyifuni le nto konke konke - umsebenzi wethu kukurekhoda kangangoko kwaye ngokukhawuleza kunokwenzeka ngowona mthwalo umncinci. Ke FK - phantsi!

Inqaku elilandelayo kukuhlanganisa kunye ne-hashing. Ekuqaleni, siziphumeze kwisiseko sedatha - emva kwayo yonke loo nto, kulula ukuba, xa irekhodi ifika, yenze ngohlobo oluthile lwethebhulethi. "dibanisa enye" ​​kanye kwi-trigger. Ewe, kulungile, kodwa into embi efanayo - ufaka irekhodi enye, kodwa unyanzelekile ukuba ufunde kwaye ubhale enye into kwenye itafile. Ngaphezu koko, awugcini nje ngokufunda nokubhala, uyayenza ngalo lonke ixesha.

Ngoku khawufane ucinge ukuba unetafile obala kuyo ngokulula inani lezicelo ezigqithise kumamkeli othile: +1, +1, +1, ..., +1. Kwaye wena, ngokomgaqo, awuyifuni le nto - yonke into inokwenzeka sum kwinkumbulo kumqokeleli kwaye uthumele kwisiseko sedatha ngexesha elinye +10.

Ewe, kwimeko yeengxaki ezithile, ukuthembeka kwakho okunengqondo "kunokuwa", kodwa le yimeko ephantse ibe yinto engenakwenzeka - kuba unomncedisi oqhelekileyo, unebhetri kumlawuli, unelog yentengiselwano, log kwi inkqubo yefayile... Ngokubanzi, ayifanelanga. Ilahleko yemveliso oyifumanayo kwizixhokonxa/i-FK ayifanelanga indleko oyenzileyo.

Kuyafana ne-hashing. Isicelo esithile sibhabha kuwe, ubala isazisi esithile kuso kwisiseko sedatha, usibhale kwisiseko sedatha kwaye uxelele wonke umntu. Yonke into ilungile de kube, ngexesha lokurekhoda, umntu wesibini uza kuwe ofuna ukurekhoda into efanayo - kwaye uvalwe, kwaye oku sele kubi. Ngoko ke, ukuba unokudlulisa isizukulwana se-ID ethile kumxhasi (ngokunxulumene nesiseko sedatha), kungcono ukwenza oku.

Kwakufanelekile nje ukuba sisebenzise i-MD5 kwisicatshulwa - isicelo, isicwangciso, itemplate, ... Siyibala kwicala lomqokeleli, kwaye "uthulule" i-ID esele yenziwe kwi-database. Ubude be-MD5 kunye nokwahlulahlula kwemihla ngemihla kusivumela ukuba singakhathazeki malunga nokungqubana okunokwenzeka.

Ukulungiswa ngobuninzi bemibuzo yePostgreSQL. Kirill Borovikov (Tensor)

Kodwa ukuze sirekhode ngokukhawuleza konke oku, bekufuneka siyilungise ngokwayo inkqubo yokurekhoda.

Udla ngokubhala njani idatha? Sinolunye uhlobo lwedatha yedatha, siyahlula kwiitafile ezininzi, kwaye emva koko YIKOPIA - kuqala ukuya kweyokuqala, emva koko ukuya kweyesibini, ukuya kweyesithathu ... Ayisiyongxaki, kuba kubonakala ngathi sibhala umlambo wedatha enye ngamanyathelo amathathu. ngokulandelelanayo. Ayimnandanga. Ngaba inokwenziwa ngokukhawuleza? Ngaba!

Ukwenza oku, kwanele nje ukubola oku kuhamba ngokuhambelanayo. Kuvela ukuba sineempazamo, izicelo, iitemplates, iibhlokhi, ... ukubhabha kwimicu eyahlukeneyo - kwaye siyibhala yonke ngaxeshanye. Kwanele oku gcina i-COPY channel rhoqo ivuliwe kwitafile nganye ekujoliswe kuyo.

Ukulungiswa ngobuninzi bemibuzo yePostgreSQL. Kirill Borovikov (Tensor)

Oko kukuthi, kumqokeleli kusoloko kukho umsinga, apho ndingabhala khona idatha endiyidingayo. Kodwa ukuze i-database ibone le datha, kwaye umntu angabambeki elinde ukuba le datha ibhalwe, IKOPI kufuneka iphazanyiswe ngamaxesha athile. Kithina, ixesha elisebenzayo lalimalunga ne-100ms - siyayivala kwaye siyivule ngokukhawuleza kwakhona kwitafile efanayo. Kwaye ukuba asinako ngokwaneleyo ukuhamba okukodwa ngexesha leencopho ezithile, ngoko senza ukudibanisa ukuya kumda othile.

Ukongezelela, sifumene ukuba kwiprofayili yomthwalo onjalo, nayiphi na i-aggregation, xa iirekhodi ziqokelelwa kwiibhetshi, zimbi. Ububi Classic na INSERT ... VALUES kunye nezinye iirekhodi ezili-1000. Kuba ngelo xesha unencopho yokubhala kumajelo eendaba, kwaye wonke umntu ozama ukubhala into ethile kwidiski uya kulinda.

Ukuphelisa ezo mpazamo, musa ukuhlanganisa nantoni na, musa ukuphazamisa konke konke. Kwaye ukuba i-buffering kwidiski iyenzeka (ngethamsanqa, i-Stream API kwi-Node.js ikuvumela ukuba ufumane) - kuhlehlisa olu xhulumaniso. Xa ufumana isiganeko sokuba sisimahla kwakhona, sibhalele usuka kumgca oqokelelweyo. Kwaye ngelixa lixakekile, thatha elinye lasimahla kwi-pool kwaye ubhale kulo.

Ngaphambi kokwazisa le ndlela yokurekhoda idatha, sasine-ops yokubhala ye-4K, kwaye ngale ndlela sinciphise umthwalo ngamaxesha e-4. Ngoku baye bakhula ngamanye amaxesha angama-6 ngenxa yogcino-lwazi olutsha olubekwe esweni-ukuya kuthi ga kwi-100MB/s. Kwaye ngoku sigcina izingodo kwiinyanga ezi-3 zokugqibela kumthamo malunga ne-10-15TB, sinethemba lokuba kwiinyanga nje ezintathu nawuphi na umthuthukisi uya kukwazi ukusombulula nayiphi na ingxaki.

Siyaziqonda iingxaki

Kodwa ukuqokelela nje yonke le datha ilungile, iluncedo, ifanelekile, kodwa akwanelanga - kufuneka iqondwe. Kuba ezi zizigidi zezicwangciso ezahlukeneyo ngosuku.

Ukulungiswa ngobuninzi bemibuzo yePostgreSQL. Kirill Borovikov (Tensor)

Kodwa izigidi azilawuleki, kufuneka siqale senze "ezincinci". Kwaye, okokuqala, kufuneka unqume ukuba uya kulungelelanisa njani le nto "encinci".

Sichonge iingongoma ezintathu eziphambili:

  • ngubani uthumele esi sicelo
    Oko kukuthi, ukusuka kwesiphi isicelo "esifike": ujongano lwewebhu, i-backend, inkqubo yokuhlawula okanye enye into.
  • apho yenzeka
    Kweyiphi iseva ethile? Kuba ukuba unamaseva amaninzi phantsi kwesicelo esinye, kwaye ngequbuliso enye "iya sisidenge" (kuba "idiski ibolile", "inkumbulo ivuza", enye ingxaki), ke kufuneka ujongane ngqo nomncedisi.
  • njani ingxaki ibonakale ngendlela enye okanye enye

Ukuqonda ukuba "ngubani" usithumele isicelo, sisebenzisa isixhobo esiqhelekileyo - ukuseta umahluko weseshoni: SET application_name = '{bl-host}:{bl-method}'; β€” sithumela igama lomamkeli wengqiqo yeshishini apho kuvela khona isicelo, kunye negama lendlela okanye isicelo esisiqalisileyo.

Emva kokuba sigqithise "umnini" wesicelo, kufuneka siphume kwilog - kule nto siqwalasela ukuguquguquka. log_line_prefix = ' %m [%p:%v] [%d] %r %a'. Kwabo banomdla, mhlawumbi jonga kwincwadanakuthetha ntoni konke. Kuya kuvela ukuba sibona kwilog:

  • врСмя
  • inkqubo kunye nezichongi zentengiselwano
  • igama lesiseko sedata
  • IP yomntu othumele esi sicelo
  • kunye negama lendlela

Ukulungiswa ngobuninzi bemibuzo yePostgreSQL. Kirill Borovikov (Tensor)

Emva koko siye safumanisa ukuba akukho nto inomdla kakhulu ukujonga ukulungelelaniswa kwesicelo esinye phakathi kwamaseva ahlukeneyo. Ayiqhelekanga ukuba ube nemeko apho isicelo esinye sijija ngokulinganayo apha naphaya. Kodwa nokuba iyafana, jonga nakweyiphi na kwezi seva.

Ngoko ke nantsi indlela "umncedisi omnye - usuku olunye" kwabonakala kwanele kuthi naluphi na uhlalutyo.

Icandelo lokuqala lohlalutyo liyafana "isampulu" - uhlobo olufinyeziweyo lwenkcazo-ntetho yesicwangciso, lucinywe kuzo zonke izalathisi zamanani. Ukusikwa kwesibini sisicelo okanye indlela, kwaye ukusika kwesithathu yinkqantosi yesicwangciso esithile esibangele iingxaki.

Xa sisuka kumzekelo othile ukuya kwiitemplates, siye safumana iingenelo ezimbini ngaxeshanye:

  • ukunciphisa ezininzi kwinani lezinto zokuhlalutya
    Kufuneka sihlalutye ingxaki hayi ngamawaka emibuzo okanye izicwangciso, kodwa ngamaninzi etemplates.
  • umgca wexesha
    Oko kukuthi, ngokushwankathela "iinyani" ngaphakathi kwecandelo elithile, unokubonisa ukubonakala kwazo emini. Kwaye apha unokuqonda ukuba unohlobo oluthile lwepateni eyenzekayo, umzekelo, kanye ngeyure, kodwa kufuneka kwenzeke kanye ngosuku, kufuneka ucinge malunga nokuba yintoni ephosakeleyo - ngubani obangele kwaye kutheni, mhlawumbi kufanele kube apha. akufunekanga. Le yenye indlela yokuhlalutya engeyoyamanani, ebonakalayo ebonakalayo.

Ukulungiswa ngobuninzi bemibuzo yePostgreSQL. Kirill Borovikov (Tensor)

Iindlela eziseleyo zisekelwe kwizibonakaliso esizikhupha kwisicwangciso: zingaphi iipateni ezinjalo ezenzekayo, ixesha elipheleleyo kunye nexesha eliqhelekileyo, ingakanani idatha efundwe kwidiski, kwaye ingakanani imemori ...

Ngenxa yokuba, umzekelo, ufika kwiphepha le-analytics for host, jonga - into eqala ukufunda kakhulu kwidiski. Idiski kumncedisi ayikwazi ukuyiphatha - ngubani ofunda kuyo?

Kwaye ungahlela ngayo nayiphi na ikholamu kwaye wenze isigqibo malunga nento oza kujongana nayo ngoku - umthwalo kwiprosesa okanye kwidiski, okanye inani elipheleleyo lezicelo ... Siyilungelelanise, sijonge "phezulu", silungise kwaye ikhuphe inguqulelo entsha yesicelo.
[intetho yevidiyo]

Kwaye ngoko nangoko unokubona izicelo ezahlukeneyo eziza netemplate efanayo kwisicelo esinje SELECT * FROM users WHERE login = 'Vasya'. I-frontend, i-backend, processing... Kwaye uyazibuza ukuba kutheni ukucubungula kuya kufunda umsebenzisi ukuba akanxibelelani naye.

Indlela echaseneyo kukubona ngoko nangoko kwisicelo into eyenzayo. Umzekelo, i-frontend yile, le, le, kwaye oku kanye ngeyure (umda wexesha uyanceda). Kwaye umbuzo uvela ngokukhawuleza: kubonakala ngathi akuwona umsebenzi we-frontend ukwenza into kanye ngeyure ...

Ukulungiswa ngobuninzi bemibuzo yePostgreSQL. Kirill Borovikov (Tensor)

Emva kwexesha elithile, siye sabona ukuba asikwazi ukuhlanganisana manani ngamacebo esicwangciso. Sizikhethele kwizicwangciso kuphela ezo nodi ezenza into ngedatha yeetafile ngokwazo (zifunde/zibhale ngesalathiso okanye hayi). Ngapha koko, inkalo enye kuphela yongezwa ngokunxulumene nomfanekiso wangaphambili - zingaphi iirekhodi esiziswe yile nodi?, kwaye zingaphi ezilahliweyo (Imiqolo Ikhutshwe sisihluzo).

Awunayo isalathisi esifanelekileyo kwi-plate, wenza isicelo kuyo, indiza idlula isalathisi, iwela kwi-Seq Scan ... uye wahluza zonke iirekhodi ngaphandle kweyodwa. Kutheni ufuna i-100M iirekhodi ezihluziweyo ngosuku?

Ukulungiswa ngobuninzi bemibuzo yePostgreSQL. Kirill Borovikov (Tensor)

Emva kokuba sihlalutye zonke izicwangciso ze-node nge-node, siye safumanisa ukuba kukho izakhiwo eziqhelekileyo kwizicwangciso ezinokuthi zikhangeleke zikrokreka. Kwaye kuya kuba kuhle ukuxelela umphuhlisi: "Mhlobo, apha uqala ukufunda ngesalathisi, emva koko uhlele, kwaye unqunyulwe" - njengomthetho, kukho irekhodi enye.

Wonke umntu obhale imibuzo mhlawumbi uye wadibana nale pateni: "Ndinike iodolo yokugqibela yeVasya, umhla wayo." Kwaye ukuba awunayo isalathiso ngomhla, okanye akukho mhla kwisalathiso osisebenzisileyo, uya Nyathela kanye kwaloo β€œrake” ifanayo.

Kodwa siyazi ukuba le "yirake" - kutheni ungaxeleli umphuhlisi ngoko nangoko into amakayenze. Ngokufanelekileyo, xa uvula isicwangciso ngoku, umphuhlisi wethu ubona ngokukhawuleza umfanekiso omhle onamacebiso, apho bathi kuye ngokukhawuleza: "Uneengxaki apha naphaya, kodwa zisonjululwa ngale ndlela nalaa ndlela."

Ngenxa yoko, isixa samava ebesifuneka ukusombulula iingxaki ekuqaleni kwaye ngoku sehlile kakhulu. Olu luhlobo lwesixhobo esinaso.

Ukulungiswa ngobuninzi bemibuzo yePostgreSQL. Kirill Borovikov (Tensor)

umthombo: www.habr.com

Yongeza izimvo