Kev ua kom zoo tshaj plaws ntawm PostgreSQL cov lus nug. Kirill Borovikov (Tensor)

Daim ntawv tshaj tawm nthuav tawm qee qhov kev coj ua uas tso cai saib xyuas kev ua tau zoo ntawm SQL queries thaum muaj ntau lab ntawm lawv ib hnub, thiab muaj ntau pua tus saib xyuas PostgreSQL servers.

Cov kev daws teeb meem dab tsi tso cai rau peb kom ua tiav cov ntaub ntawv ntim zoo li no, thiab ua li cas qhov no ua rau lub neej ntawm tus tsim tawm yooj yim dua?


Leej twg txaus siab? tsom xam cov teeb meem tshwj xeeb thiab ntau yam kev ua kom zoo SQL lus nug thiab daws teeb meem DBA raug hauv PostgreSQL - koj tuaj yeem ua tau nyeem ib tsab xov xwm ntawm lub ncauj lus no.

Kev ua kom zoo tshaj plaws ntawm PostgreSQL cov lus nug. Kirill Borovikov (Tensor)
Kuv lub npe yog Kirill Borovikov, kuv sawv cev Tensor tuam txhab. Tshwj xeeb, kuv tshwj xeeb hauv kev ua haujlwm nrog databases hauv peb lub tuam txhab.

Hnub no kuv yuav qhia koj li cas peb txhim kho cov lus nug, thaum koj tsis tas yuav "xaiv sib nrug" qhov kev ua tau zoo ntawm ib qho lus nug, tab sis daws qhov teeb meem en masse. Thaum muaj ntau lab thov, thiab koj yuav tsum nrhiav qee yam txoj kev daws teeb meem qhov teeb meem loj no.

Feem ntau, Tensor rau ib lab ntawm peb cov neeg siv khoom yog VLSI yog peb daim ntawv thov: koom nrog kev sib raug zoo, kev daws teeb meem rau kev sib txuas lus video, rau cov ntaub ntawv sab hauv thiab sab nraud ntws, cov txheej txheem rau kev sau nyiaj thiab cov chaw khaws khoom, ... Qhov ntawd yog, xws li "mega-combine" rau kev tswj hwm kev lag luam, uas muaj ntau dua 100 qhov sib txawv. cov haujlwm sab hauv.

Txhawm rau kom ntseeg tau tias lawv txhua tus ua haujlwm thiab txhim kho ib txwm, peb muaj 10 lub chaw loj hlob thoob plaws hauv lub tebchaws, nrog rau lawv ntau dua 1000 developers.

Peb tau ua haujlwm nrog PostgreSQL txij li xyoo 2008 thiab tau sau ntau yam ntawm peb cov txheej txheem - cov ntaub ntawv neeg siv khoom, txheeb cais, ntsuas, cov ntaub ntawv los ntawm cov ntaub ntawv sab nraud - ntau tshaj 400TB. Muaj txog 250 servers hauv kev tsim khoom ib leeg, thiab tag nrho muaj txog 1000 database servers uas peb saib xyuas.

Kev ua kom zoo tshaj plaws ntawm PostgreSQL cov lus nug. Kirill Borovikov (Tensor)

SQL yog ib hom lus tshaj tawm. Koj piav tsis yog "yuav ua li cas" ib yam dab tsi yuav tsum ua haujlwm, tab sis "dab tsi" koj xav ua kom tiav. DBMS paub zoo dua yuav ua li cas koom nrog - yuav ua li cas txuas koj lub rooj, yam xwm txheej twg los tswj, dab tsi yuav dhau los ntawm qhov ntsuas, dab tsi yuav tsis ...

Qee qhov DBMSs lees txais cov lus qhia: "Tsis yog, txuas ob lub rooj no hauv cov kab ntawv zoo li no," tab sis PostgreSQL tsis tuaj yeem ua qhov no. Qhov no yog txoj haujlwm tseem ceeb ntawm cov thawj coj tsim tawm: "Peb xav ua kom tiav cov lus nug optimizer dua li tso cai rau cov neeg tsim khoom siv qee yam lus qhia."

Tab sis, txawm tias qhov tseeb tias PostgreSQL tsis tso cai rau "sab nraud" los tswj nws tus kheej, nws zoo kawg nkaus tso cai saib dab tsi tshwm sim hauv nwsthaum koj khiav koj cov lus nug, thiab qhov twg nws muaj teeb meem.

Kev ua kom zoo tshaj plaws ntawm PostgreSQL cov lus nug. Kirill Borovikov (Tensor)

Feem ntau, cov teeb meem classic dab tsi uas tus tsim tawm [rau DBA] feem ntau tuaj nrog? β€œNtawm no peb ua tiav qhov kev thov, thiab txhua yam yog qeeb nrog peb, txhua yam yog dai, ib yam dab tsi tshwm sim ... Qee yam teeb meem! ”

Cov laj thawj yuav luag ib txwm zoo ib yam:

  • inefficient query algorithm
    Tus tsim tawm: "Tam sim no kuv muab nws 10 lub rooj hauv SQL ntawm JOIN ..." - thiab cia siab tias nws cov xwm txheej yuav ua tau zoo "untied" thiab nws yuav tau txais txhua yam sai. Tab sis tej txuj ci tseem ceeb tsis tshwm sim, thiab ib qho system uas muaj qhov sib txawv (10 lub rooj hauv ib qho FROM) ib txwm muab qee yam yuam kev. [ib tsab xov xwm]
  • cov txheeb cais tsis cuam tshuam
    Cov ntsiab lus no muaj feem cuam tshuam tshwj xeeb rau PostgreSQL, thaum koj "tho" cov ntaub ntawv loj rau hauv server, thov, thiab nws "sexcanits" koj lub ntsiav tshuaj. Vim nag hmo muaj 10 cov ntaub ntawv nyob rau hauv, thiab hnub no muaj 10 lab, tab sis PostgreSQL tseem tsis tau paub txog qhov no, thiab peb yuav tsum qhia nws txog nws. [ib tsab xov xwm]
  • "plug" ntawm cov ntaub ntawv
    Koj tau teeb tsa cov ntaub ntawv loj thiab hnyav hnyav rau ntawm lub server tsis muaj zog uas tsis muaj disk txaus, nco, lossis kev ua haujlwm processor. Thiab qhov ntawd yog tag nrho ... Qee qhov chaw muaj lub qab nthab ua haujlwm saum toj no uas koj tuaj yeem dhia tsis tau ntxiv lawm.
  • thaiv
    Qhov no yog qhov nyuaj, tab sis lawv muaj feem cuam tshuam rau ntau yam kev hloov kho cov lus nug (INSERT, Hloov Kho, DELETE) - qhov no yog ib lub ntsiab lus loj.

Tau txais ib txoj kev npaj

...Thiab rau txhua yam peb xav tau ib txoj kev npaj! Peb yuav tsum pom dab tsi tshwm sim hauv lub server.

Kev ua kom zoo tshaj plaws ntawm PostgreSQL cov lus nug. Kirill Borovikov (Tensor)

Cov lus nug ua tiav rau PostgreSQL yog tsob ntoo ntawm cov lus nug ua tiav algorithm hauv cov ntawv sawv cev. Nws yog precisely lub algorithm uas, raws li ib tug tshwm sim ntawm kev soj ntsuam los ntawm tus neeg npaj, tau pom tias yuav ua tau zoo tshaj plaws.

Txhua tsob ntoo node yog kev ua haujlwm: khaws cov ntaub ntawv los ntawm lub rooj lossis qhov ntsuas, tsim ib daim ntawv teev lus, koom nrog ob lub rooj, koom, sib tshuam, lossis tsis suav nrog kev xaiv. Ua ib qho lus nug yuav tsum taug kev los ntawm cov kab ntawm tsob ntoo no.

Txhawm rau kom tau txais cov lus nug, txoj hauv kev yooj yim tshaj plaws yog ua raws li nqe lus EXPLAIN. Kom tau txais nrog txhua tus cwj pwm tiag tiag, uas yog, ua kom tiav cov lus nug ntawm lub hauv paus - EXPLAIN (ANALYZE, BUFFERS) SELECT ....

Qhov tsis zoo: thaum koj khiav nws, nws tshwm sim "ntawm no thiab tam sim no", yog li nws tsuas yog tsim nyog rau kev debugging hauv zos. Yog tias koj siv cov neeg rau zaub mov loaded uas nyob rau hauv cov ntaub ntawv hloov pauv, thiab koj pom: "Auj! Ntawm no peb muaj kev ua haujlwm qeebsya thov." Ib nrab teev, ib teev dhau los - thaum koj tab tom khiav thiab tau txais qhov kev thov no los ntawm cov cav, nqa nws rov qab mus rau lub server, koj cov ntaub ntawv tag nrho thiab cov txheeb cais hloov pauv. Koj khiav nws mus debug - thiab nws khiav ceev! Thiab koj tsis tuaj yeem nkag siab tias yog vim li cas, vim li cas yog maj mam.

Kev ua kom zoo tshaj plaws ntawm PostgreSQL cov lus nug. Kirill Borovikov (Tensor)

Yuav kom nkag siab txog qhov tshwm sim raws nraim lub sijhawm thaum qhov kev thov raug ua tiav ntawm lub server, cov neeg ntse sau auto_explain module. Nws yog tam sim no nyob rau hauv yuav luag tag nrho cov feem ntau PostgreSQL kev faib tawm, thiab tuaj yeem tsuas yog qhib rau hauv cov ntaub ntawv config.

Yog tias nws paub tias qee qhov kev thov ua haujlwm ntev dua li qhov txwv koj tau hais rau nws, nws ua "snapshot" ntawm txoj kev npaj ntawm qhov kev thov no thiab sau lawv ua ke hauv lub cav.

Kev ua kom zoo tshaj plaws ntawm PostgreSQL cov lus nug. Kirill Borovikov (Tensor)

Txhua yam zoo li zoo tam sim no, peb mus rau lub cav thiab pom muaj ... [text footcloth]. Tab sis peb tsis tuaj yeem hais dab tsi txog nws, dua li qhov tseeb tias nws yog txoj kev npaj zoo heev vim nws siv 11ms los ua kom tiav.

Txhua yam zoo li zoo - tab sis tsis muaj dab tsi paub meej tias qhov tshwm sim tiag tiag. Sib nrug los ntawm lub sijhawm dav dav, peb tsis tshua pom dab tsi. Vim saib xws li "me nyuam yaj" ntawm cov ntawv dawb feem ntau tsis pom.

Tab sis txawm tias nws tsis pom tseeb, txawm tias nws tsis yooj yim, muaj ntau yam teeb meem tseem ceeb:

  • Cov node qhia sum ntawm cov peev txheej ntawm tag nrho cov subtree sub nws. Ntawd yog, koj tsis tuaj yeem nrhiav tau ntau npaum li cas lub sijhawm tau siv rau ntawm qhov kev ntsuas ntsuas no tshwj xeeb yog tias muaj qee qhov xwm txheej hauv qab nws. Peb yuav tsum saib xyuas kom pom tias muaj "cov menyuam yaus" thiab cov kev hloov pauv, CTEs sab hauv - thiab rho tawm tag nrho cov no "hauv peb lub siab".
  • Qhov thib ob point: lub sij hawm uas qhia rau ntawm node yog lub sijhawm ua haujlwm ib leeg. Yog tias qhov node tau raug tua raws li qhov tshwm sim ntawm, piv txwv li, ib lub voj los ntawm cov ntaub ntawv sau ntau zaus, ces tus lej ntawm lub voj voog ntawm qhov node-nce hauv txoj kev npaj. Tab sis lub sij hawm atomic execution nws tus kheej tseem zoo ib yam ntawm txoj kev npaj. Ntawd yog, txhawm rau kom nkag siab ntev npaum li cas cov node tau ua tiav tag nrho, koj yuav tsum muab ib yam los ntawm lwm qhov - dua, "hauv koj lub taub hau."

Hauv cov xwm txheej zoo li no, nkag siab "Leej twg yog qhov txuas tsis muaj zog tshaj plaws?" yuav luag tsis yooj yim sua. Yog li ntawd, txawm tus tsim tawm lawv tus kheej sau rau hauv "phau ntawv" uas "Kev nkag siab txog txoj kev npaj yog ib qho txuj ci uas yuav tsum tau kawm, kev paub ...".

Tab sis peb muaj 1000 tus tsim tawm, thiab koj tsis tuaj yeem qhia qhov kev paub no rau lawv txhua tus. Kuv, koj, nws paub, tab sis ib tug neeg nyob ntawd tsis paub lawm. Tej zaum nws yuav kawm, lossis tej zaum tsis yog, tab sis nws yuav tsum tau ua haujlwm tam sim no - thiab qhov twg nws yuav tau txais qhov kev paub no?

Kev npaj ua kom pom tseeb

Yog li ntawd, peb pom tau hais tias txhawm rau daws cov teeb meem no, peb xav tau kev pom zoo ntawm txoj kev npaj. [kab lus]

Kev ua kom zoo tshaj plaws ntawm PostgreSQL cov lus nug. Kirill Borovikov (Tensor)

Peb xub mus "los ntawm kev ua lag luam" - cia saib hauv Is Taws Nem kom pom tias muaj dab tsi tshwm sim.

Tab sis nws tau muab tawm tias muaj tsawg heev kuj "nyob" kev daws teeb meem uas ntau dua los yog tsawg dua kev loj hlob - lus, tsuas yog ib qho: piav.depesz.com los ntawm Hubert Lubaczewski. Thaum koj nkag mus rau "pub" teb cov ntawv sawv cev ntawm txoj kev npaj, nws qhia koj lub rooj nrog cov ntaub ntawv txheeb xyuas:

  • node lub sijhawm ua haujlwm
  • tag nrho lub sij hawm rau tag nrho cov subtree
  • tus naj npawb ntawm cov ntaub ntawv khaws tseg uas tau muab khaws cia uas xav tau los ntawm kev txheeb cais
  • lub cev ntawm nws tus kheej

Qhov kev pabcuam no tseem muaj peev xwm los muab cov ntaub ntawv khaws tseg ntawm cov ntawv txuas. Koj tau muab koj txoj kev npaj rau hauv qhov ntawd thiab hais tias: "Hav, Vasya, ntawm no yog qhov txuas, muaj qee yam tsis raug nyob ntawd."

Kev ua kom zoo tshaj plaws ntawm PostgreSQL cov lus nug. Kirill Borovikov (Tensor)

Tab sis kuj tseem muaj teeb meem me me.

Firstly, ib tug loj npaum li cas ntawm "copy-paste". Koj nqa ib daig ntawm lub cav, lo rau hauv, thiab dua, thiab dua.

Thib ob tsis muaj kev soj ntsuam ntawm tus nqi ntawm cov ntaub ntawv nyeem - tib yam buffers uas tso zis EXPLAIN (ANALYZE, BUFFERS), peb tsis pom nws ntawm no. Nws tsuas tsis paub yuav ua li cas disassemble lawv, nkag siab lawv thiab ua hauj lwm nrog lawv. Thaum koj nyeem ntau cov ntaub ntawv thiab paub tias koj tuaj yeem faib cov disk thiab nco cache, cov ntaub ntawv no tseem ceeb heev.

Qhov thib peb qhov tsis zoo yog qhov tsis muaj zog ntawm txoj haujlwm no. Cov kev cog lus tsawg heev, nws zoo yog tias ib zaug txhua rau lub hlis, thiab cov cai nyob hauv Perl.

Kev ua kom zoo tshaj plaws ntawm PostgreSQL cov lus nug. Kirill Borovikov (Tensor)

Tab sis qhov no yog tag nrho cov "lyrics", peb tuaj yeem ua neej nyob nrog qhov no, tab sis muaj ib yam uas ua rau peb tig mus rau qhov kev pabcuam no. Cov no yog qhov yuam kev hauv kev tsom xam ntawm Cov Lus Qhia Txog Cov Lus Qhia (CTE) thiab ntau yam dynamic nodes xws li InitPlan/SubPlan.

Yog tias koj ntseeg tias daim duab no, tag nrho lub sijhawm ua tiav ntawm txhua tus ntawm tus kheej yog ntau dua li tag nrho lub sijhawm ua tiav ntawm tag nrho cov lus thov. Nws yog qhov yooj yim - lub sijhawm tiam ntawm CTE no tsis raug rho tawm ntawm CTE Scan node. Yog li ntawd, peb tsis paub cov lus teb raug rau ntev npaum li cas CTE scan nws tus kheej coj mus.

Kev ua kom zoo tshaj plaws ntawm PostgreSQL cov lus nug. Kirill Borovikov (Tensor)

Tom qab ntawd peb pom tau tias nws yog lub sijhawm los sau peb tus kheej - hurray! Txhua tus tsim tawm hais tias: "Tam sim no peb yuav sau peb tus kheej, nws yuav yooj yim heev!"

Peb coj cov pawg ib txwm rau cov kev pabcuam hauv lub vev xaib: ib qho tseem ceeb raws li Node.js + Express, siv Bootstrap thiab D3.js rau daim duab zoo nkauj. Thiab peb qhov kev cia siab tau ua tiav tiav - peb tau txais thawj tus qauv hauv 2 lub lis piam:

  • kev cai parser
    Ntawd yog, tam sim no peb tuaj yeem txheeb xyuas cov phiaj xwm los ntawm cov tsim los ntawm PostgreSQL.
  • kev tsom xam kom raug ntawm dynamic nodes - CTE Scan, InitPlan, SubPlan
  • tsom xam ntawm buffers faib - qhov twg cov nplooj ntawv nyeem los ntawm lub cim xeeb, qhov twg los ntawm lub zos cache, qhov twg los ntawm disk
  • tau clarity
    Yog li tsis yog "dag" tag nrho cov no hauv lub cav, tab sis kom pom qhov "tsis muaj zog txuas" tam sim ntawd hauv daim duab.

Kev ua kom zoo tshaj plaws ntawm PostgreSQL cov lus nug. Kirill Borovikov (Tensor)

Peb tau txais ib yam dab tsi zoo li no, nrog syntax highlighting suav nrog. Tab sis feem ntau peb cov neeg tsim khoom tsis ua haujlwm nrog tus sawv cev ua tiav ntawm txoj kev npaj, tab sis nrog luv luv. Tom qab tag nrho, peb twb tau txheeb xyuas tag nrho cov lej thiab muab pov rau sab laug thiab sab xis, thiab hauv nruab nrab peb sab laug tsuas yog thawj kab, hom ntawm node yog dab tsi: CTE Scan, CTE tiam lossis Seq Scan raws li qee qhov kos npe.

Qhov no yog qhov luv luv sawv cev peb hu npaj template.

Kev ua kom zoo tshaj plaws ntawm PostgreSQL cov lus nug. Kirill Borovikov (Tensor)

Dab tsi ntxiv yuav yooj yim? Nws yuav yooj yim pom qhov sib faib ntawm peb lub sijhawm tag nrho yog faib rau qhov twg - thiab tsuas yog "lo nws" rau sab paj chart.

Peb taw tes rau ntawm lub node thiab pom - nws hloov tawm tias Seq Scan siv tsawg dua ib lub hlis twg ntawm tag nrho lub sijhawm, thiab qhov seem 3/4 raug coj los ntawm CTE Scan. Ntshai heev! Nov yog daim ntawv me me txog "tus nqi hluav taws" ntawm CTE Scan yog tias koj nquag siv lawv hauv koj cov lus nug. Lawv tsis ceev heev - lawv yog cov qis dua txawm tias tsis tu ncua scanning lub rooj. [kab lus] [kab lus]

Tab sis feem ntau cov duab kos yog qhov nthuav ntau dua, nyuaj dua, thaum peb tam sim ntawd taw tes rau ntawm ntu thiab pom, piv txwv li, ntau tshaj li ib nrab ntawm lub sijhawm qee qhov Seq Scan "noj". Ntxiv mus, muaj qee yam Lim hauv, ntau cov ntaub ntawv raug muab pov tseg raws li nws ... Koj tuaj yeem pov daim duab no ncaj qha rau tus tsim tawm thiab hais tias: "Vasya, txhua yam tsis zoo ntawm no rau koj! Xav txog nws, saib - ib yam dab tsi tsis ncaj ncees lawm!"

Kev ua kom zoo tshaj plaws ntawm PostgreSQL cov lus nug. Kirill Borovikov (Tensor)

Lawm, muaj qee qhov "rakes" koom nrog.

Thawj qhov uas peb tuaj hla yog qhov teeb meem sib npaug. Lub sijhawm ntawm txhua tus ntawm tus kheej hauv txoj kev npaj tau qhia nrog qhov tseeb ntawm 1 ΞΌs. Thiab thaum tus naj npawb ntawm cov voj voog ntau dhau, piv txwv li, 1000 - tom qab ua tiav PostgreSQL faib "hauv qhov tseeb", tom qab ntawd thaum xam rov qab peb tau txais tag nrho lub sijhawm "qhov chaw ntawm 0.95ms thiab 1.05ms". Thaum suav mus rau microseconds, qhov ntawd tsis ua li cas, tab sis thaum nws twb tau [milli] vib nas this, koj yuav tsum coj cov ntaub ntawv no mus rau hauv tus account thaum "untying" cov peev txheej rau cov nodes ntawm "leej twg siv ntau npaum li cas" txoj kev npaj.

Kev ua kom zoo tshaj plaws ntawm PostgreSQL cov lus nug. Kirill Borovikov (Tensor)

Qhov thib ob, qhov nyuaj dua, yog kev faib cov peev txheej (cov buffers) ntawm cov dynamic nodes. Qhov no raug nqi peb thawj 2 lub lis piam ntawm tus qauv ntxiv rau lwm 4 lub lis piam.

Nws yooj yim heev kom tau txais cov teeb meem no - peb ua CTE thiab xav tias yuav nyeem qee yam hauv nws. Qhov tseeb, PostgreSQL yog "ntse" thiab yuav tsis nyeem dab tsi ncaj qha rau ntawd. Tom qab ntawd peb muab thawj cov ntaub ntawv los ntawm nws, thiab rau nws ib puas thiab thawj ib los ntawm tib CTE.

Kev ua kom zoo tshaj plaws ntawm PostgreSQL cov lus nug. Kirill Borovikov (Tensor)

Peb saib ntawm txoj kev npaj thiab nkag siab - nws yog qhov txawv, peb muaj 3 buffers (cov ntaub ntawv nplooj ntawv) "siv" hauv Seq Scan, 1 ntxiv hauv CTE Scan, thiab 2 ntxiv hauv CTE Scan thib ob. Qhov ntawd yog, yog tias peb tsuas suav txhua yam, peb yuav tau 6, tab sis los ntawm cov ntsiav tshuaj peb tsuas nyeem 3! CTE Scan tsis nyeem dab tsi los ntawm txhua qhov chaw, tab sis ua haujlwm ncaj qha nrog cov txheej txheem nco. Qhov ntawd yog, ib yam dab tsi tsis ncaj ncees lawm ntawm no!

Qhov tseeb, nws hloov tawm tias ntawm no yog tag nrho cov 3 nplooj ntawv ntawm cov ntaub ntawv tau thov los ntawm Seq Scan, thawj 1 thov rau 1st CTE Scan, thiab tom qab ntawd 2nd, thiab 2 ntxiv tau nyeem rau nws, uas yog, tag nrho ntawm 3 nplooj ntawv tau nyeem cov ntaub ntawv, tsis yog 6.

Kev ua kom zoo tshaj plaws ntawm PostgreSQL cov lus nug. Kirill Borovikov (Tensor)

Thiab daim duab no coj peb mus rau kev nkag siab tias kev ua tiav ntawm txoj kev npaj tsis yog tsob ntoo lawm, tab sis tsuas yog qee yam ntawm acyclic graph. Thiab peb tau txais daim duab zoo li no, kom peb nkag siab tias "dab tsi tuaj ntawm qhov chaw thawj." Ntawd yog, ntawm no peb tsim CTE los ntawm pg_class, thiab thov nws ob zaug, thiab yuav luag tag nrho peb lub sijhawm tau siv rau ntawm ceg ntoo thaum peb thov nws thib 2. Nws yog qhov tseeb tias kev nyeem ntawv 101st nkag yog kim dua li tsuas yog nyeem 1st nkag los ntawm cov ntsiav tshuaj.

Kev ua kom zoo tshaj plaws ntawm PostgreSQL cov lus nug. Kirill Borovikov (Tensor)

Peb exhaled ib pliag. Lawv hais tias: β€œNimno, koj paub kung fu! Tam sim no peb qhov kev paub yog nyob ntawm koj qhov screen. Tam sim no koj tuaj yeem siv nws. " [kab lus]

Log consolidation

Peb 1000 tus neeg tsim khoom tau ua pa zoo siab. Tab sis peb nkag siab tias peb tsuas muaj ntau pua tus "sib ntaus sib tua" servers, thiab tag nrho cov "copy-paste" ntawm ib feem ntawm cov neeg tsim khoom tsis yooj yim. Peb pom tau hais tias peb yuav tsum sau nws tus kheej.

Kev ua kom zoo tshaj plaws ntawm PostgreSQL cov lus nug. Kirill Borovikov (Tensor)

Feem ntau, muaj cov qauv qauv uas tuaj yeem sau cov txheeb cais, txawm li cas los xij, nws kuj yuav tsum tau qhib rau hauv kev teeb tsa - qhov no module pg_stat_statements. Tab sis nws tsis haum peb.

Ua ntej, nws muab rau tib cov lus nug uas siv cov tswv yim sib txawv hauv tib lub database txawv QueryIds. Qhov ntawd yog, yog tias koj ua ntej SET search_path = '01'; SELECT * FROM user LIMIT 1;thiab ntawd SET search_path = '02'; thiab tib qhov kev thov, tom qab ntawd cov txheeb cais ntawm cov qauv no yuav muaj cov ntaub ntawv sib txawv, thiab kuv yuav tsis tuaj yeem sau cov txheeb cais dav dav tshwj xeeb hauv cov ntsiab lus ntawm qhov kev thov profile no, yam tsis xav txog cov txheej txheem.

Qhov thib ob uas txwv tsis pub peb siv nws yog tsis muaj kev npaj. Qhov ntawd yog, tsis muaj kev npaj, tsuas yog qhov kev thov nws tus kheej. Peb pom dab tsi tau qeeb, tab sis peb tsis nkag siab vim li cas. Thiab ntawm no peb rov qab mus rau qhov teeb meem ntawm cov ntaub ntawv hloov pauv sai.

Thiab lub caij kawg - tsis muaj "qhov tseeb". Ntawd yog, koj tsis tuaj yeem hais txog qee qhov tshwj xeeb ntawm cov lus nug ua tiav - tsis muaj, tsuas muaj cov txheeb cais sib sau ua ke. Txawm hais tias nws tuaj yeem ua haujlwm nrog qhov no, nws tsuas yog nyuaj heev.

Kev ua kom zoo tshaj plaws ntawm PostgreSQL cov lus nug. Kirill Borovikov (Tensor)

Yog li ntawd, peb txiav txim siab tawm tsam daim ntawv luam thiab pib sau neeg sau nyiaj.

Tus neeg siv khoom sib txuas ntawm SSH, tsim kom muaj kev ruaj ntseg txuas rau lub server nrog cov ntaub ntawv siv daim ntawv pov thawj, thiab tail -F "clings" rau nws nyob rau hauv lub log ntaub ntawv. Yog li hauv qhov kev sib tham no peb tau txais "daim iav" tiav ntawm tag nrho cov ntaub ntawv teev tseg, uas lub server generates. Kev thauj khoom ntawm tus neeg rau zaub mov nws tus kheej yog qhov tsawg, vim tias peb tsis cais dab tsi nyob ntawd, peb tsuas yog tsom iav tsheb.

Txij li thaum peb twb tau pib sau cov interface hauv Node.js, peb txuas ntxiv sau tus sau rau hauv nws. Thiab cov thev naus laus zis no tau tso cai rau nws tus kheej, vim nws yooj yim heev los siv JavaScript los ua haujlwm nrog cov ntaub ntawv tsis muaj zog, uas yog lub cav. Thiab Node.js infrastructure nws tus kheej ua lub backend platform tso cai rau koj kom yooj yim thiab yooj yim ua hauj lwm nrog kev sib txuas network, thiab muaj tseeb nrog cov ntaub ntawv ntws.

Raws li, peb "txhaj" ob qho kev sib txuas: thawj zaug "mloog" rau lub cav nws tus kheej thiab coj mus rau peb tus kheej, thiab qhov thib ob kom ncua sij hawm nug lub hauv paus. "Tab sis lub cav qhia tau hais tias lub cim nrog oid 123 raug thaiv," tab sis qhov no tsis txhais tau dab tsi rau tus tsim tawm, thiab nws yuav zoo rau nug cov ntaub ntawv, "OID = 123 yog dab tsi?" Thiab yog li peb niaj hnub nug lub hauv paus yam peb tseem tsis tau paub txog peb tus kheej.

Kev ua kom zoo tshaj plaws ntawm PostgreSQL cov lus nug. Kirill Borovikov (Tensor)

"Muaj ib yam uas koj tsis tau coj mus rau hauv tus account, muaj ib hom ntawm ntxhw zoo li muv!.." Peb pib tsim cov kab ke no thaum peb xav saib xyuas 10 servers. Qhov tseem ceeb tshaj plaws hauv peb txoj kev nkag siab, qhov twg qee qhov teeb meem tshwm sim uas nyuaj rau kev daws teeb meem. Tab sis thaum lub quarter thib peb, peb tau txais ib puas rau kev saib xyuas - vim tias lub kaw lus ua haujlwm, txhua tus xav tau nws, txhua tus neeg nyiam.

Tag nrho cov no yuav tsum tau ntxiv, cov ntaub ntawv ntws yog loj thiab nquag. Qhov tseeb, qhov peb saib xyuas, qhov peb tuaj yeem ua nrog, yog qhov peb siv. Peb kuj siv PostgreSQL ua cov ntaub ntawv khaws cia. Thiab tsis muaj dab tsi sai dua rau "dhau" cov ntaub ntawv rau hauv nws dua li tus neeg teb xov tooj COPY Tsis tau.

Tab sis tsuas yog "dhau" cov ntaub ntawv tsis yog peb cov thev naus laus zis tiag tiag. Vim tias yog tias koj muaj kwv yees li 50k thov ib ob ntawm ib puas servers, ces qhov no yuav tsim 100-150GB ntawm cov cav hauv ib hnub. Yog li ntawd, peb yuav tsum ua tib zoo "txiav" lub hauv paus.

Ua ntej, peb tau ua partitioning los ntawm hnub, vim, los ntawm thiab loj, tsis muaj leej twg txaus siab rau kev sib raug zoo ntawm hnub. Dab tsi txawv nws ua rau koj muaj nag hmo, yog tias hmo no koj tau dov tawm ib qho tshiab ntawm daim ntawv thov - thiab twb muaj qee qhov kev txheeb cais tshiab.

Thib ob, peb kawm (tau yuam) heev, ceev heev los sau siv COPY. Ntawd yog, tsis yog xwb COPYvim nws nrawm dua INSERT, thiab txawm sai dua.

Kev ua kom zoo tshaj plaws ntawm PostgreSQL cov lus nug. Kirill Borovikov (Tensor)

Qhov thib peb point - kuv yuav tsum tau tso tseg cov yuam sij, feem, thiab txawv teb chaws yuam sij. Qhov ntawd yog, peb tsis muaj kev ntseeg ncaj ncees txhua. Vim hais tias yog tias koj muaj ib lub rooj uas muaj ib khub ntawm FKs, thiab koj hais nyob rau hauv lub database qauv hais tias "Ntawm no yog cov ntaub ntawv teev tseg uas yog siv los ntawm FK, piv txwv li, rau ib pawg ntawm cov ntaub ntawv," ces thaum koj ntxig nws, PostgreSQL tsis muaj dab tsi tshuav tab sis yuav ua li cas coj nws thiab ua ncaj ncees SELECT 1 FROM master_fk1_table WHERE ... nrog rau tus cim uas koj tab tom sim ntxig - tsuas yog txhawm rau txheeb xyuas tias cov ntaub ntawv no muaj nyob rau ntawd, uas koj tsis "tawm" Qhov Txawv Teb Chaws Qhov Tseem Ceeb nrog koj qhov ntxig.

Hloov ntawm ib cov ntaub ntawv mus rau lub hom phiaj lub rooj thiab nws cov indexes, peb tau txais txiaj ntsig ntxiv ntawm kev nyeem ntawv los ntawm txhua lub rooj uas nws hais txog. Tab sis peb tsis xav tau qhov no tag nrho - peb txoj haujlwm yog sau kom ntau li ntau tau thiab sai li sai tau nrog qhov tsawg kawg nkaus. Yog li FK - nqis!

Cov ntsiab lus tom ntej yog aggregation thiab hashing. Thaum xub thawj, peb siv lawv hauv cov ntaub ntawv - tom qab tag nrho, nws yooj yim rau tam sim ntawd, thaum cov ntaub ntawv tuaj txog, ua rau qee yam ntsiav tshuaj. "plus ib" txoj cai nyob rau hauv lub trigger. Zoo, nws yooj yim, tab sis tib yam tsis zoo - koj ntxig ib cov ntaub ntawv, tab sis raug yuam kom nyeem thiab sau lwm yam los ntawm lwm lub rooj. Ntxiv mus, tsis yog koj nyeem thiab sau xwb, koj kuj ua txhua zaus.

Tam sim no xav txog tias koj muaj ib lub rooj uas koj tsuas suav tus naj npawb ntawm kev thov uas tau dhau los ntawm ib tus tswv tsev tshwj xeeb: +1, +1, +1, ..., +1. Thiab koj, hauv paus ntsiab lus, tsis xav tau qhov no - nws yog txhua yam ua tau sum nyob rau hauv lub cim xeeb ntawm lub collector thiab xa mus rau database hauv ib qho mus +10.

Yog lawm, nyob rau hauv cov ntaub ntawv ntawm ib co teeb meem, koj lub logic kev ncaj ncees tej zaum yuav "poob", tab sis qhov no yog ib tug yuav luag unrealistic rooj plaub - vim hais tias koj muaj ib tug neeg rau zaub mov, nws muaj ib tug roj teeb nyob rau hauv lub maub los, koj muaj ib tug sib pauv log, ib tug log ntawm lub file system... Feem ntau, tsis yog nws tsim nyog. Kev poob ntawm cov khoom lag luam uas koj tau txais los ntawm kev ua haujlwm / FK tsis tsim nyog tus nqi koj tau txais.

Nws zoo ib yam nrog hashing. Ib qho kev thov yoov rau koj, koj suav qee tus cim los ntawm nws hauv cov ntaub ntawv, sau rau hauv cov ntaub ntawv thiab tom qab ntawd qhia rau txhua tus. Txhua yam zoo kom txog thaum, thaum lub sijhawm kaw, tus neeg thib ob tuaj rau koj uas xav sau ib yam - thiab koj tau thaiv, thiab qhov no twb phem lawm. Yog li ntawd, yog tias koj tuaj yeem hloov lub cim ntawm qee tus ID rau tus neeg siv khoom (kwv yees rau cov ntaub ntawv), nws yog qhov zoo dua los ua qhov no.

Nws tsuas yog zoo meej rau peb siv MD5 los ntawm cov ntawv nyeem - thov, npaj, template, ... Peb xam nws ntawm tus sau, thiab "dhau" tus ID npaj txhij rau hauv cov ntaub ntawv. Qhov ntev ntawm MD5 thiab kev faib txhua hnub tso cai rau peb tsis txhob txhawj txog kev sib tsoo.

Kev ua kom zoo tshaj plaws ntawm PostgreSQL cov lus nug. Kirill Borovikov (Tensor)

Tab sis txhawm rau sau tag nrho cov no sai sai, peb yuav tsum hloov kho cov txheej txheem kaw nws tus kheej.

Koj feem ntau sau cov ntaub ntawv li cas? Peb muaj qee hom dataset, peb faib nws ua ob peb lub rooj, thiab tom qab ntawd COPY nws - ua ntej rau hauv thawj, ces mus rau qhov thib ob, mus rau peb... Nws tsis yooj yim, vim tias peb zoo li sau ib cov ntaub ntawv ntws hauv peb kauj ruam. ua ntu zus. Tsis kaj siab. Nws puas tuaj yeem ua tiav sai dua? Ua tau!

Ua li no, nws tsuas yog txaus kom decompose cov ntws nyob rau hauv parallel nrog ib leeg. Nws hloov tawm hais tias peb muaj qhov yuam kev, thov, templates, blockings, ... ya nyob rau hauv nyias threads - thiab peb sau tag nrho nyob rau hauv parallel. Txaus rau qhov no khaws COPY channel tas li qhib rau txhua tus neeg lub rooj sib tham.

Kev ua kom zoo tshaj plaws ntawm PostgreSQL cov lus nug. Kirill Borovikov (Tensor)

Ntawd yog, ntawm tus sau yeej ib txwm muaj kwj deg, uas kuv tuaj yeem sau cov ntaub ntawv kuv xav tau. Tab sis kom lub database pom cov ntaub ntawv no, thiab ib tug neeg tsis tau daig tos cov ntaub ntawv no yuav tsum tau sau, COPY yuav tsum raug cuam tshuam rau qee lub sijhawm. Rau peb, lub sijhawm ua haujlwm zoo tshaj plaws yog txog 100ms - peb kaw nws thiab tam sim ntawd qhib nws dua rau tib lub rooj. Thiab yog tias peb tsis muaj txaus ntawm ib qho dej ntws thaum qee lub ncov, ces peb sib sau ua ke mus txog qhov txwv.

Tsis tas li ntawd, peb pom tau tias rau xws li ib tug load profile, ib qho kev sib sau ua ke, thaum cov ntaub ntawv raug sau rau hauv pawg, yog qhov phem. Classic phem yog INSERT ... VALUES thiab ntxiv 1000 cov ntaub ntawv. Vim tias lub sijhawm ntawd koj muaj qhov sau siab tshaj plaws ntawm kev tshaj tawm, thiab txhua tus neeg uas sim sau ib yam dab tsi rau lub disk yuav tos.

Txhawm rau tshem tawm cov kev tsis txaus ntseeg no, tsuas yog tsis sib sau ua ke, tsis txhob buffer kiag li. Thiab yog tias buffering rau disk tshwm sim (hmoov zoo, Stream API hauv Node.js tso cai rau koj pom) - ncua qhov kev sib txuas no. Thaum koj tau txais ib qho kev tshwm sim uas nws yog dawb dua, sau rau nws los ntawm cov kab ntawv sau. Thiab thaum nws tsis khoom, nqa ib qho dawb tom ntej los ntawm lub pas dej thiab sau rau nws.

Ua ntej tshaj tawm txoj hauv kev no rau kev sau cov ntaub ntawv, peb muaj kwv yees li 4K sau ops, thiab ua li no peb txo cov load los ntawm 4 zaug. Tam sim no lawv tau loj hlob ntxiv 6 zaug vim muaj cov ntaub ntawv saib xyuas tshiab - txog 100MB / s. Thiab tam sim no peb khaws cov ntawv teev tseg rau 3 lub hlis dhau los hauv qhov ntim ntawm 10-15TB, vam tias hauv peb lub hlis xwb tus tsim tawm yuav tuaj yeem daws txhua yam teeb meem.

Peb to taub cov teeb meem

Tab sis tsuas yog sau tag nrho cov ntaub ntawv no yog qhov zoo, muaj txiaj ntsig, cuam tshuam, tab sis tsis txaus - nws yuav tsum nkag siab. Vim cov no yog tsheej lab ntawm cov phiaj xwm sib txawv hauv ib hnub.

Kev ua kom zoo tshaj plaws ntawm PostgreSQL cov lus nug. Kirill Borovikov (Tensor)

Tab sis ntau lab tus tswj tsis tau, peb yuav tsum ua ntej ua "me me". Thiab, ua ntej ntawm tag nrho cov, koj yuav tsum txiav txim siab seb koj yuav npaj qhov "me me" li cas.

Peb tau txheeb xyuas peb lub ntsiab lus tseem ceeb:

  • leej twg xa daim ntawv thov no
    Ntawd yog, los ntawm daim ntawv thov twg nws tau "mus txog": web interface, backend, them nyiaj lossis lwm yam.
  • qhov twg nws tshwm sim
    Ntawm dab tsi tshwj xeeb server? Vim tias yog tias koj muaj ntau lub servers hauv ib daim ntawv thov, thiab tam sim ntawd ib qho "mus ruam" (vim "disk rotten", "nco leaked", qee qhov teeb meem), ces koj yuav tsum tau hais tshwj xeeb rau lub server.
  • yuav ua li cas qhov teeb meem manifested nws tus kheej nyob rau hauv ib txoj kev los yog lwm yam

Txhawm rau nkag siab "leej twg" xa tuaj rau peb, peb siv cov cuab yeej txheem - teeb tsa kev sib tham sib txawv: SET application_name = '{bl-host}:{bl-method}'; - peb xa lub npe ntawm tus tswv lag luam logic los ntawm qhov kev thov tuaj, thiab lub npe ntawm txoj kev lossis daim ntawv thov uas pib nws.

Tom qab peb tau dhau qhov "tus tswv" ntawm qhov kev thov, nws yuav tsum tso tawm rau lub cav - rau qhov no peb teeb tsa qhov sib txawv. log_line_prefix = ' %m [%p:%v] [%d] %r %a'. Rau cov neeg nyiam, tej zaum saib hauv phau ntawvnws txhais li cas. Nws hloov tawm tias peb pom hauv lub log:

  • lub sijhawm
  • cov txheej txheem thiab kev hloov pauv
  • database npe
  • IP ntawm tus neeg xa daim ntawv thov no
  • thiab method npe

Kev ua kom zoo tshaj plaws ntawm PostgreSQL cov lus nug. Kirill Borovikov (Tensor)

Tom qab ntawd peb pom tau hais tias nws tsis yog qhov nthuav heev los saib ntawm kev sib raug zoo rau ib qho kev thov ntawm cov servers sib txawv. Nws tsis yog feem ntau uas koj muaj qhov xwm txheej uas ib daim ntawv thov screws sib npaug ntawm no thiab nyob ntawd. Tab sis txawm tias nws zoo ib yam, saib ib qho ntawm cov servers no.

Yog li ntawm no yog txiav "ib server - ib hnub" Nws muab tawm kom txaus rau peb rau txhua qhov kev tshuaj xyuas.

Thawj seem analytical yog tib yam "cov qauv" - daim ntawv luv luv ntawm kev nthuav qhia ntawm txoj kev npaj, tshem tawm tag nrho cov lej ntsuas. Qhov kev txiav thib ob yog daim ntawv thov lossis cov txheej txheem, thiab qhov kev txiav thib peb yog cov phiaj xwm tshwj xeeb uas ua rau peb muaj teeb meem.

Thaum peb txav los ntawm qee qhov tshwj xeeb rau cov qauv, peb tau txais ob qhov zoo ib zaug:

  • ntau qhov kev txo qis ntawm cov khoom siv rau kev tsom xam
    Peb yuav tsum txheeb xyuas qhov teeb meem tsis yog los ntawm ntau txhiab cov lus nug lossis cov phiaj xwm, tab sis los ntawm ntau tus qauv.
  • ncua sij hawm
    Ntawd yog, los ntawm kev sau cov "qhov tseeb" hauv ib ntu, koj tuaj yeem pom lawv cov tsos thaum nruab hnub. Thiab ntawm no koj tuaj yeem nkag siab tias yog tias koj muaj qee yam qauv uas tshwm sim, piv txwv li, ib zaug ib teev, tab sis nws yuav tsum tshwm sim ib hnub ib zaug, koj yuav tsum xav txog qhov ua tsis ncaj ncees lawm - leej twg ua rau nws thiab vim li cas, tej zaum nws yuav tsum nyob ntawm no. yuav tsum tsis txhob. Qhov no yog lwm yam uas tsis yog tus lej, pom tseeb, txoj kev tsom xam.

Kev ua kom zoo tshaj plaws ntawm PostgreSQL cov lus nug. Kirill Borovikov (Tensor)

Cov txheej txheem ntxiv yog ua raws li cov ntsuas uas peb rho tawm los ntawm txoj kev npaj: pes tsawg zaus xws li tus qauv tshwm sim, tag nrho thiab nruab nrab lub sijhawm, pes tsawg cov ntaub ntawv tau nyeem los ntawm disk, thiab ntau npaum li cas los ntawm kev nco ...

Vim hais tias, piv txwv li, koj tuaj rau nplooj ntawv tshuaj ntsuam rau tus tswv tsev, saib - ib yam dab tsi pib nyeem ntau dhau ntawm lub disk. Lub disk ntawm lub server tsis tuaj yeem tuav nws - leej twg nyeem los ntawm nws?

Thiab koj tuaj yeem txheeb los ntawm txhua kab thiab txiav txim siab seb koj yuav ua li cas rau tam sim no - lub thauj khoom ntawm lub processor lossis disk, lossis tag nrho cov kev thov ... dov tawm ib tug tshiab version ntawm daim ntawv thov.
[video lecture]

Thiab tam sim ntawd koj tuaj yeem pom cov ntawv thov sib txawv uas tuaj nrog tib lub qauv los ntawm kev thov zoo li SELECT * FROM users WHERE login = 'Vasya'. Frontend, backend, processing... Thiab koj xav paub vim li cas kev ua haujlwm yuav nyeem tus neeg siv yog tias nws tsis cuam tshuam nrog nws.

Txoj kev tawm tsam yog pom tam sim ntawm daim ntawv thov nws ua li cas. Piv txwv li, lub frontend yog qhov no, qhov no, qhov no, thiab qhov no ib zaug ib teev (lub sijhawm pab). Thiab cov lus nug tam sim ntawd tshwm sim: nws zoo li nws tsis yog txoj haujlwm ntawm lub hauv ntej ua ib yam dab tsi ib teev ...

Kev ua kom zoo tshaj plaws ntawm PostgreSQL cov lus nug. Kirill Borovikov (Tensor)

Tom qab qee lub sijhawm, peb pom tau tias peb tsis muaj kev sib sau ua ke txheeb cais los ntawm cov phiaj xwm nodes. Peb cais tawm ntawm cov phiaj xwm tsuas yog cov nodes uas ua ib yam dab tsi nrog cov ntaub ntawv ntawm cov ntxhuav lawv tus kheej (nyeem / sau lawv los ntawm index lossis tsis). Qhov tseeb, tsuas yog ib qho ntxiv ntxiv txheeb ze rau daim duab dhau los - muaj pes tsawg cov ntaub ntawv no tau coj peb?, thiab muaj pes tsawg tus tau muab pov tseg (Kaw Tshem Tawm los ntawm Lim).

Koj tsis muaj qhov ntsuas tsim nyog ntawm lub phaj, koj thov rau nws, nws ya dhau qhov ntsuas, poob rau hauv Seq Scan ... koj tau lim tawm tag nrho cov ntaub ntawv tsuas yog ib qho. Vim li cas koj thiaj xav tau 100M cov ntaub ntawv lim dej ib hnub twg?

Kev ua kom zoo tshaj plaws ntawm PostgreSQL cov lus nug. Kirill Borovikov (Tensor)

Tom qab txheeb xyuas tag nrho cov phiaj xwm ntawm node, peb pom tau tias muaj qee cov qauv tsim hauv cov phiaj xwm uas yuav zoo li tsis txaus ntseeg. Thiab nws yuav zoo los qhia rau tus tsim tawm: "Tus phooj ywg, ntawm no koj thawj zaug nyeem los ntawm kev ntsuas, tom qab ntawd txheeb, thiab txiav tawm" - raws li txoj cai, muaj ib cov ntaub ntawv.

Txhua tus neeg uas sau cov lus nug tej zaum yuav ntsib tus qauv no: "Muab qhov kawg xaj rau Vasya, nws hnub tim." Thiab yog tias koj tsis muaj qhov ntsuas los ntawm hnub, lossis tsis muaj hnub hauv qhov ntsuas koj siv, ces koj yuav kauj ruam ntawm tib yam "rake".

Tab sis peb paub tias qhov no yog "rake" - yog li vim li cas ho tsis qhia rau tus tsim tawm tam sim ntawd nws yuav tsum ua li cas. Raws li, thaum qhib lub phiaj xwm tam sim no, peb tus tsim tawm tam sim ntawd pom daim duab zoo nkauj nrog cov lus qhia, qhov chaw lawv qhia nws tam sim: "Koj muaj teeb meem ntawm no thiab qhov ntawd, tab sis lawv tau daws txoj kev no thiab txoj kev ntawd."

Yog li ntawd, cov kev paub uas xav tau los daws cov teeb meem thaum pib thiab tam sim no tau poob qis heev. Qhov no yog hom cuab yeej peb muaj.

Kev ua kom zoo tshaj plaws ntawm PostgreSQL cov lus nug. Kirill Borovikov (Tensor)

Tau qhov twg los: www.hab.com

Ntxiv ib saib