PostgreSQL Antipatterns: ib zaj dab neeg ntawm kev ua kom zoo dua ntawm kev tshawb nrhiav los ntawm lub npe, lossis "Kev ua kom zoo rov qab"

Ntau txhiab tus thawj tswj hwm los ntawm cov chaw muag khoom thoob plaws lub tebchaws tau sau tseg peb CRM system kaum tawm txhiab tus neeg sib cuag txhua hnub - qhov tseeb ntawm kev sib txuas lus nrog cov neeg muaj peev xwm lossis cov neeg siv khoom uas twb muaj lawm. Thiab rau qhov no, koj yuav tsum xub nrhiav tus neeg siv khoom, thiab nyiam dua sai sai. Thiab qhov no tshwm sim feem ntau los ntawm lub npe.

Yog li ntawd, nws tsis yog qhov xav tsis thoob tias, ib zaug ntxiv kev tshuaj xyuas "hnyav" cov lus nug ntawm ib qho ntawm cov ntaub ntawv thauj khoom tshaj plaws - peb tus kheej VLSI corporate account, Kuv pom "nyob rau saum" thov kom nrhiav "ceev" los ntawm lub npe rau lub koom haum cards.

Tsis tas li ntawd, kev tshawb nrhiav ntxiv tau nthuav tawm ib qho piv txwv nthuav thawj optimization thiab ces degradation kev ua tau zoo thov nrog nws cov kev ua kom zoo dua qub los ntawm ntau pawg, txhua tus ua haujlwm nkaus xwb nrog lub hom phiaj zoo tshaj plaws.

0: Tus neeg siv xav tau dab tsi?

PostgreSQL Antipatterns: ib zaj dab neeg ntawm kev ua kom zoo dua ntawm kev tshawb nrhiav los ntawm lub npe, lossis "Kev ua kom zoo rov qab"[KDPV ntawm no]

Tus neeg siv feem ntau txhais li cas thaum lawv tham txog kev tshawb nrhiav "ceev" los ntawm lub npe? Nws yuav luag tsis tau dhau los ua "kev ncaj ncees" tshawb nrhiav rau substring zoo li ... LIKE '%Ρ€ΠΎΠ·Π°%' - vim tias qhov tshwm sim suav nrog tsis yog xwb 'Розалия' ΠΈ 'Магазин Π ΠΎΠ·Π°'Tab sis 'Π“Ρ€ΠΎΠ·Π°' thiab txawm 'Π”ΠΎΠΌ Π”Π΅Π΄Π° ΠœΠΎΡ€ΠΎΠ·Π°'.

Tus neeg siv xav tias nyob rau theem niaj hnub uas koj yuav muab rau nws nrhiav los ntawm qhov pib ntawm lo lus nyob rau hauv lub npe thiab ua rau nws muaj feem xyuam rau qhov ntawd pib rau nkag mus. Thiab koj yuav ua nws yuav luag tam sim ntawd - rau interlinear tswv yim.

1: txwv txoj haujlwm

Thiab txawm ntau li ntawd, ib tug neeg yuav tsis tshwj xeeb nkag 'Ρ€ΠΎΠ· ΠΌΠ°Π³Π°Π·', yog li koj yuav tsum tshawb nrhiav txhua lo lus los ntawm prefix. Tsis yog, nws yooj yim dua rau tus neeg siv los teb rau cov lus qhia ceev rau lo lus kawg tshaj li lub hom phiaj "underspecify" cov yav dhau los - saib seb qhov kev tshawb fawb cav ua li cas.

General txoj cai formulating cov kev cai rau qhov teeb meem yog ntau tshaj li ib nrab ntawm kev daws. Qee zaum ceev faj siv cov ntaub ntawv tsom xam tuaj yeem cuam tshuam qhov tshwm sim.

Tus tsim tawm abstract ua dab tsi?

1.0: Tshawb nrhiav sab nraud

Auj, kev tshawb nrhiav nyuaj, Kuv tsis xav ua dab tsi txhua - cia peb muab rau devops! Cia lawv xa cov tshuab tshawb nrhiav sab nraud rau cov ntaub ntawv: Sphinx, ElasticSearch, ...

Ib qho kev xaiv ua haujlwm, txawm hais tias kev ua haujlwm hnyav hauv cov ntsiab lus ntawm synchronization thiab nrawm ntawm kev hloov pauv. Tab sis tsis yog nyob rau hauv peb cov ntaub ntawv, txij li thaum kev tshawb fawb yog nqa tawm rau txhua tus neeg siv tsuas yog nyob rau hauv lub moj khaum ntawm nws cov ntaub ntawv account. Thiab cov ntaub ntawv muaj qhov sib txawv ntawm qhov sib txawv - thiab yog tias tam sim no tus thawj coj tau nkag mus rau hauv daim npav 'Магазин Роза', tom qab 5-10 vib nas this nws yuav nco ntsoov tias nws tsis nco qab qhia nws tus email nyob ntawd thiab xav nrhiav nws thiab kho nws.

Yog li ntawd - cia peb nrhiav " ncaj qha rau hauv database". Hmoov zoo, PostgreSQL tso cai rau peb ua qhov no, thiab tsis yog ib qho kev xaiv - peb yuav saib lawv.

1.1: "Honest" substring

Peb cling rau lo lus "substring". Tab sis rau kev tshawb nrhiav los ntawm substring (thiab txawm tias los ntawm cov lus hais tsis tu ncua!) muaj qhov zoo heev module pg_trgm! Tsuas yog tom qab ntawd nws yuav tsim nyog los txheeb xyuas kom raug.

Cia peb sim coj cov phaj hauv qab no kom yooj yim rau tus qauv:

CREATE TABLE firms(
  id
    serial
      PRIMARY KEY
, name
    text
);

Peb upload 7.8 lab cov ntaub ntawv ntawm cov koom haum tiag tiag nyob ntawd thiab ntsuas lawv:

CREATE EXTENSION pg_trgm;
CREATE INDEX ON firms USING gin(lower(name) gin_trgm_ops);

Cia wb mus saib thawj 10 cov ntaub ntawv rau kev tshawb nrhiav interlinear:

SELECT
  *
FROM
  firms
WHERE
  lower(name) ~ ('(^|s)' || 'Ρ€ΠΎΠ·Π°')
ORDER BY
  lower(name) ~ ('^' || 'Ρ€ΠΎΠ·Π°') DESC -- сначала "Π½Π°Ρ‡ΠΈΠ½Π°ΡŽΡ‰ΠΈΠ΅ΡΡ Π½Π°"
, lower(name) -- ΠΎΡΡ‚Π°Π»ΡŒΠ½ΠΎΠ΅ ΠΏΠΎ Π°Π»Ρ„Π°Π²ΠΈΡ‚Ρƒ
LIMIT 10;

PostgreSQL Antipatterns: ib zaj dab neeg ntawm kev ua kom zoo dua ntawm kev tshawb nrhiav los ntawm lub npe, lossis "Kev ua kom zoo rov qab"
[saib ntawm piav qhia.tensor.ru]

Zoo, uas yog ... 26ms, 31 MB nyeem cov ntaub ntawv thiab ntau dua 1.7K lim cov ntaub ntawv - rau 10 qhov tshawb nrhiav. Cov nqi nyiaj siv ua haujlwm siab dhau lawm, puas muaj qee yam ua tau zoo dua?

1.2: Nrhiav los ntawm cov ntawv? Nws yog FTS!

Tseeb tiag, PostgreSQL muab qhov muaj zog heev tag nrho cov ntawv tshawb nrhiav cav (Full Text Search), suav nrog kev muaj peev xwm los nrhiav ua ntej. Ib qho kev xaiv zoo heev, koj tsis tas yuav nruab qhov txuas ntxiv! Wb sim:

CREATE INDEX ON firms USING gin(to_tsvector('simple'::regconfig, lower(name)));

SELECT
  *
FROM
  firms
WHERE
  to_tsvector('simple'::regconfig, lower(name)) @@ to_tsquery('simple', 'Ρ€ΠΎΠ·Π°:*')
ORDER BY
  lower(name) ~ ('^' || 'Ρ€ΠΎΠ·Π°') DESC
, lower(name)
LIMIT 10;

PostgreSQL Antipatterns: ib zaj dab neeg ntawm kev ua kom zoo dua ntawm kev tshawb nrhiav los ntawm lub npe, lossis "Kev ua kom zoo rov qab"
[saib ntawm piav qhia.tensor.ru]

Ntawm no parallelization ntawm query execution pab peb me ntsis, txiav lub sij hawm nyob rau hauv ib nrab rau 11ms ib. Thiab peb yuav tsum tau nyeem 1.5 npaug tsawg dua - tag nrho 20MB. Tab sis ntawm no, qhov tsawg dua, qhov zoo dua, vim tias qhov loj dua qhov ntim peb nyeem, qhov ntau dua qhov yuav tau txais lub cache nco, thiab txhua nplooj ntawv ntxiv ntawm cov ntaub ntawv nyeem los ntawm disk yog qhov muaj peev xwm "brakes" rau qhov kev thov.

1.3: Tseem nyiam?

Qhov kev thov yav dhau los yog qhov zoo rau txhua tus, tab sis tsuas yog koj rub nws ib puas txhiab zaus hauv ib hnub, nws yuav tuaj 2TB nyeem cov ntaub ntawv. Hauv qhov zoo tshaj plaws, los ntawm kev nco, tab sis yog tias koj tsis muaj hmoo, ces los ntawm disk. Yog li cia peb sim ua kom me me.

Cia peb nco ntsoov tias tus neeg siv xav pom dab tsi thawj "uas pib nrog ...". Yog li qhov no yog nyob rau hauv nws daim ntawv purest ua ntej nrhiav nrog kev pab text_pattern_ops! Thiab tsuas yog yog tias peb "tsis muaj txaus" txog 10 cov ntaub ntawv peb tab tom nrhiav, ces peb yuav tsum tau nyeem kom tiav lawv siv FTS tshawb nrhiav:

CREATE INDEX ON firms(lower(name) text_pattern_ops);

SELECT
  *
FROM
  firms
WHERE
  lower(name) LIKE ('Ρ€ΠΎΠ·Π°' || '%')
LIMIT 10;

PostgreSQL Antipatterns: ib zaj dab neeg ntawm kev ua kom zoo dua ntawm kev tshawb nrhiav los ntawm lub npe, lossis "Kev ua kom zoo rov qab"
[saib ntawm piav qhia.tensor.ru]

Kev ua tau zoo heev - tag nrho 0.05ms thiab me ntsis ntau dua 100KB nyeem! Tsuas yog peb tsis nco qab lawm cais los ntawm lub npekom tus neeg siv tsis tau poob hauv cov txiaj ntsig:

SELECT
  *
FROM
  firms
WHERE
  lower(name) LIKE ('Ρ€ΠΎΠ·Π°' || '%')
ORDER BY
  lower(name)
LIMIT 10;

PostgreSQL Antipatterns: ib zaj dab neeg ntawm kev ua kom zoo dua ntawm kev tshawb nrhiav los ntawm lub npe, lossis "Kev ua kom zoo rov qab"
[saib ntawm piav qhia.tensor.ru]

Auj, ib yam dab tsi tsis zoo nkauj ntxiv lawm - nws zoo li muaj qhov ntsuas, tab sis kev txheeb ya mus dhau nws ... Nws, tau kawg, twb tau siv ntau zaus ntau dua li qhov kev xaiv dhau los, tab sis ...

1.4: "ua tiav nrog cov ntaub ntawv"

Tab sis muaj ib qho kev ntsuas uas tso cai rau koj los tshawb nrhiav los ntawm ntau yam thiab tseem siv kev txheeb xyuas ib txwm - tsis tu ncua btre!

CREATE INDEX ON firms(lower(name));

Tsuas yog qhov kev thov rau nws yuav tsum tau "sau manually":

SELECT
  *
FROM
  firms
WHERE
  lower(name) >= 'Ρ€ΠΎΠ·Π°' AND
  lower(name) <= ('Ρ€ΠΎΠ·Π°' || chr(65535)) -- для UTF8, для ΠΎΠ΄Π½ΠΎΠ±Π°ΠΉΡ‚ΠΎΠ²Ρ‹Ρ… - chr(255)
ORDER BY
   lower(name)
LIMIT 10;

PostgreSQL Antipatterns: ib zaj dab neeg ntawm kev ua kom zoo dua ntawm kev tshawb nrhiav los ntawm lub npe, lossis "Kev ua kom zoo rov qab"
[saib ntawm piav qhia.tensor.ru]

Zoo heev - kev txheeb xyuas ua haujlwm, thiab kev siv peev txheej tseem "microscopic", txhiab lub sij hawm siv tau zoo dua "ntshiab" FTS! Txhua yam uas tseem tshuav yog muab tso ua ke rau hauv ib qho kev thov:

(
  SELECT
    *
  FROM
    firms
  WHERE
    lower(name) >= 'Ρ€ΠΎΠ·Π°' AND
    lower(name) <= ('Ρ€ΠΎΠ·Π°' || chr(65535)) -- для UTF8, для ΠΎΠ΄Π½ΠΎΠ±Π°ΠΉΡ‚ΠΎΠ²Ρ‹Ρ… ΠΊΠΎΠ΄ΠΈΡ€ΠΎΠ²ΠΎΠΊ - chr(255)
  ORDER BY
     lower(name)
  LIMIT 10
)
UNION ALL
(
  SELECT
    *
  FROM
    firms
  WHERE
    to_tsvector('simple'::regconfig, lower(name)) @@ to_tsquery('simple', 'Ρ€ΠΎΠ·Π°:*') AND
    lower(name) NOT LIKE ('Ρ€ΠΎΠ·Π°' || '%') -- "Π½Π°Ρ‡ΠΈΠ½Π°ΡŽΡ‰ΠΈΠ΅ΡΡ Π½Π°" ΠΌΡ‹ ΡƒΠΆΠ΅ нашли Π²Ρ‹ΡˆΠ΅
  ORDER BY
    lower(name) ~ ('^' || 'Ρ€ΠΎΠ·Π°') DESC -- ΠΈΡΠΏΠΎΠ»ΡŒΠ·ΡƒΠ΅ΠΌ Ρ‚Ρƒ ΠΆΠ΅ сортировку, Ρ‡Ρ‚ΠΎΠ±Ρ‹ НЕ ΠΏΠΎΠΉΡ‚ΠΈ ΠΏΠΎ btree-индСксу
  , lower(name)
  LIMIT 10
)
LIMIT 10;

Nco ntsoov tias qhov thib ob subquery raug tua tsuas yog thawj tus rov qab tsawg dua li qhov xav tau kawg LIMIT tus lej ntawm kab. Kuv tab tom tham txog txoj kev nug kom zoo dua no twb sau ua ntej lawm.

Yog li yog, tam sim no peb muaj ob qho tib si btree thiab gin ntawm lub rooj, tab sis kev txheeb cais nws hloov tawm tias tsawg dua 10% ntawm kev thov mus txog qhov ua tiav ntawm qhov thaiv thib ob. Ntawd yog, nrog cov kev txwv zoo li no paub ua ntej rau txoj haujlwm, peb muaj peev xwm txo tau tag nrho cov kev siv ntawm cov neeg siv khoom siv los ntawm yuav luag ib txhiab zaus!

1.5 *: peb tuaj yeem ua yam tsis muaj ntaub ntawv

Siab dua LIKE Peb raug tiv thaiv los ntawm kev siv qhov tsis raug. Tab sis nws tuaj yeem "teeb ​​tsa ntawm txoj hauv kev" los ntawm kev qhia tus neeg siv khoom siv:

Los ntawm default nws yog assumed ASC. Tsis tas li ntawd, koj tuaj yeem qhia lub npe ntawm tus neeg teb xov tooj tshwj xeeb hauv kab lus USING. Tus neeg teb xov tooj yuav tsum yog tus tswv cuab ntawm tsawg dua lossis siab dua ntawm qee tsev neeg ntawm B-ntoo tus tswv. ASC feem ntau sib npaug USING < ΠΈ DESC feem ntau sib npaug USING >.

Hauv peb qhov xwm txheej, "tsawg dua" yog ~<~:

SELECT
  *
FROM
  firms
WHERE
  lower(name) LIKE ('Ρ€ΠΎΠ·Π°' || '%')
ORDER BY
  lower(name) USING ~<~
LIMIT 10;

PostgreSQL Antipatterns: ib zaj dab neeg ntawm kev ua kom zoo dua ntawm kev tshawb nrhiav los ntawm lub npe, lossis "Kev ua kom zoo rov qab"
[saib ntawm piav qhia.tensor.ru]

2: Yuav ua li cas thov tig qaub

Tam sim no peb tso peb qhov kev thov kom "simmer" rau rau lub hlis lossis ib xyoos, thiab peb xav tsis thoob thaum pom nws dua "nyob rau sab saum toj" nrog cov cim ntawm tag nrho txhua hnub "tso" ntawm kev nco (buffers sib koom ntaus) hauv 5.5TB - uas yog, txawm ntau tshaj li nws yog thaum xub thawj.

Tsis yog, tau kawg, peb txoj kev lag luam tau loj hlob thiab peb cov haujlwm tau nce ntxiv, tab sis tsis yog los ntawm tib tus nqi! Qhov no txhais tau hais tias ib yam dab tsi yog fishy ntawm no - cia peb xam nws tawm.

2.1: yug ntawm nplooj ntawv

Qee lub sij hawm, lwm pab pawg txhim kho xav ua kom nws tuaj yeem "dhia" los ntawm kev tshawb nrhiav cov ntawv sau nrawm mus rau npe nrog tib yam, tab sis nthuav tawm cov txiaj ntsig. Yuav ua li cas sau npe yam tsis muaj nplooj ntawv navigation? Wb ntsia nws!

( ... LIMIT <N> + 10)
UNION ALL
( ... LIMIT <N> + 10)
LIMIT 10 OFFSET <N>;

Tam sim no nws muaj peev xwm los qhia cov npe ntawm cov txiaj ntsig tshawb fawb nrog "nplooj-los-nplooj" thauj khoom yam tsis muaj kev ntxhov siab rau tus tsim tawm.

Tau kawg, qhov tseeb, rau txhua nplooj ntawv tom ntej ntawm cov ntaub ntawv ntau thiab ntau yog nyeem (tag nrho los ntawm lub sijhawm dhau los, uas peb yuav muab pov tseg, ntxiv rau qhov tsim nyog "tail") - qhov no yog qhov tseeb antipattern. Tab sis nws yuav yog qhov tseeb dua los pib qhov kev tshawb fawb tom ntej iteration los ntawm tus yuam sij khaws cia hauv lub interface, tab sis hais txog lwm lub sijhawm.

2.2: Kuv xav tau ib yam dab tsi txawv

Qee lub sij hawm tus tsim tawm xav tau diversify lub resulting qauv nrog cov ntaub ntawv los ntawm lwm lub rooj, uas tag nrho cov lus thov dhau los raug xa mus rau CTE:

WITH q AS (
  ...
  LIMIT <N> + 10
)
SELECT
  *
, (SELECT ...) sub_query -- ΠΊΠ°ΠΊΠΎΠΉ-Ρ‚ΠΎ запрос ΠΊ связанной Ρ‚Π°Π±Π»ΠΈΡ†Π΅
FROM
  q
LIMIT 10 OFFSET <N>;

Thiab txawm li ntawd los, nws tsis yog qhov phem, txij li cov lus nug tau raug soj ntsuam tsuas yog rau 10 cov ntaub ntawv rov qab, yog tias tsis yog ...

2.3: DISTINCT tsis paub qab hau thiab tsis muaj kev hlub tshua

Qhov chaw nyob rau hauv tus txheej txheem ntawm xws li evolution los ntawm 2nd subquery tau ploj lawm NOT LIKE mob. Nws yog qhov tseeb tias tom qab no UNION ALL pib rov qab los qee qhov nkag ob zaug - thawj zaug pom nyob rau ntawm qhov pib ntawm kab, thiab tom qab ntawd dua - thaum pib ntawm thawj lo lus ntawm kab no. Hauv qhov txwv, tag nrho cov ntaub ntawv ntawm 2nd subquery tuaj yeem phim cov ntaub ntawv ntawm thawj.

Tus tsim tawm ua dab tsi es tsis txhob nrhiav qhov laj thawj?... Tsis muaj lus nug!

  • ob npaug qhov loj cov qauv qub
  • thov DISTINCTkom tau txais ib qho piv txwv ntawm txhua kab

WITH q AS (
  ( ... LIMIT <2 * N> + 10)
  UNION ALL
  ( ... LIMIT <2 * N> + 10)
  LIMIT <2 * N> + 10
)
SELECT DISTINCT
  *
, (SELECT ...) sub_query
FROM
  q
LIMIT 10 OFFSET <N>;

Ntawd yog, nws yog qhov tseeb tias qhov tshwm sim, thaum kawg, yog tib yam, tab sis lub caij nyoog ntawm "ya" mus rau 2nd CTE subquery tau dhau los ua ntau dua, thiab txawm tias tsis muaj qhov no, kom meej meej nyeem tau.

Tab sis qhov no tsis yog qhov nyuaj siab tshaj plaws. Txij li thaum tus tsim tawm thov xaiv DISTINCT tsis yog rau ib qho tshwj xeeb, tab sis rau txhua qhov chaw ib zaug cov ntaub ntawv, tom qab ntawd sub_query teb - qhov tshwm sim ntawm cov lus nug - tau cia li suav nrog. Tam sim no, ua kom tiav DISTINCT, database yuav tsum tau ua tiav tsis yog 10 subqueries, tab sis tag nrho <2 * N> + 10!

2.4: kev koom tes saum toj kawg nkaus!

Yog li, cov neeg tsim tawm nyob rau - lawv tsis thab, vim hais tias tus neeg siv kom meej meej tsis muaj lub siab ntev txaus rau "kho" cov npe rau qhov tseem ceeb N qhov tseem ceeb nrog rau kev qeeb qeeb hauv kev txais txhua "nplooj" tom ntej.

Txog thaum cov neeg tsim khoom los ntawm lwm lub tuam tsev tuaj rau lawv thiab xav siv txoj kev yooj yim li no rau kev tshawb nrhiav rov qab - uas yog, peb muab ib daim los ntawm ib co qauv, lim nws los ntawm tej yam kev mob ntxiv, kos qhov tshwm sim, ces daim tom ntej no (uas nyob rau hauv peb cov ntaub ntawv yog tiav los ntawm nce N), thiab thiaj li nyob rau kom txog thaum peb sau lub screen.

Feem ntau, nyob rau hauv cov qauv ntes tau N mus txog qhov tseem ceeb ntawm yuav luag 17K, thiab tsuas yog ib hnub tsawg kawg 4K ntawm qhov kev thov no tau raug tua "raws li cov saw hlau". Qhov kawg ntawm lawv tau boldly scanned los ntawm 1 GB ntawm lub cim xeeb ib iteration...

Tag nrho

PostgreSQL Antipatterns: ib zaj dab neeg ntawm kev ua kom zoo dua ntawm kev tshawb nrhiav los ntawm lub npe, lossis "Kev ua kom zoo rov qab"

Tau qhov twg los: www.hab.com

Ntxiv ib saib