Ama-PostgreSQL Antipatterns: inganekwane yokuphindaphinda kabusha kokusesha ngegama, noma “Ukuthuthukisa emuva naphambili”

Izinkulungwane zabaphathi abavela emahhovisi okuthengisa ezweni lonke uhlelo lwethu lwe-CRM amashumi ezinkulungwane oxhumana nabo nsuku zonke - amaqiniso okuxhumana namakhasimende angaba khona noma akhona. Futhi kulokhu, kufanele uqale uthole iklayenti, futhi okungcono kakhulu ngokushesha okukhulu. Futhi lokhu kwenzeka kaningi ngamagama.

Ngakho-ke, akumangazi ukuthi, siphinda sihlaziya imibuzo “esindayo” kwenye yedatha egcwele kakhulu - eyethu. I-akhawunti yebhizinisi ye-VLSI, ngithole "phezulu" isicelo sokusesha "okusheshayo" ngegama amakhadi enhlangano.

Ngaphezu kwalokho, uphenyo olwengeziwe lwembula isibonelo esithakazelisayo okokuqala nokwenza kahle bese kuba ukucekelwa phansi kokusebenza isicelo ngokuhlungwa kwayo okulandelanayo ngamaqembu amaningana, ngalinye lenze ngezinhloso ezinhle kuphela.

0: ubefunani umsebenzisi?

Ama-PostgreSQL Antipatterns: inganekwane yokuphindaphinda kabusha kokusesha ngegama, noma “Ukuthuthukisa emuva naphambili”[KDPV kusuka lapha]

Uvame ukusho ukuthini umsebenzisi uma ekhuluma ngokusesha “okusheshayo” ngegama? Cishe akukaze kuvele kube usesho "oluqotho" lochungechunge oluncane olufana ... LIKE '%роза%' - ngoba ke umphumela uhlanganisa hhayi kuphela 'Розалия' и 'Магазин Роза'Kodwa роза' futhi ngisho 'Дом Деда Мороза'.

Umsebenzisi uthatha ezingeni lansuku zonke ozomhlinzeka ngalo cinga ngokuqala kwegama esihlokweni futhi ukwenze kuhambisane nalokho iqala ngo wangena. Futhi uzokwenza cishe ngokuphazima kweso - okokufaka phakathi kwemigqa.

1: khawula umsebenzi

Futhi nakakhulu, umuntu ngeke angene ngokuqondile 'роз магаз', ukuze ukwazi ukucinga igama ngalinye ngesiqalo. Cha, kulula kakhulu kumsebenzisi ukuthi aphendule iseluleko esisheshayo segama lokugcina kunokuba "angacacisi" ngokudlule - bheka ukuthi noma iyiphi injini yokusesha ikusingatha kanjani lokhu.

Ngokuvamile kwesokudla ukwakha izidingo zenkinga kungaphezu kwesigamu sesixazululo. Ngezinye izikhathi ukuhlaziya izimo zokusebenzisa ngokucophelela kungaba nomthelela omkhulu kumphumela.

Wenzani umthuthukisi we-abstract?

1.0: injini yokusesha yangaphandle

O, ukusesha kunzima, angifuni ukwenza lutho nhlobo - asinikeze ama-devops! Bavumele bakhiphe injini yokusesha ngaphandle kusizindalwazi: Sphinx, ElasticSearch,...

Inketho yokusebenza, nakuba idinga abasebenzi kakhulu mayelana nokuvumelanisa kanye nesivinini soshintsho. Kodwa hhayi kithi, ngoba ukusesha kwenziwa kuklayenti ngalinye ngaphakathi kohlaka lwedatha ye-akhawunti yakhe. Futhi idatha inokuhlukahluka okuphezulu kakhulu - futhi uma umphathi esefake ikhadi 'Магазин Роза', khona-ke ngemva kwemizuzwana engu-5-10 angase akhumbule ukuthi ukhohlwe ukukhombisa i-imeyili yakhe lapho futhi ufuna ukuyithola futhi ayilungise.

Ngakho-ke - ake sesha “ngqo kusizindalwazi”. Ngenhlanhla, i-PostgreSQL isivumela ukuthi senze lokhu, hhayi inketho eyodwa kuphela - sizoyibheka.

1.1: "i-honest" substring

Sinamathela egameni elithi "substring". Kodwa ekusesheni kwenkomba ngochungechunge oluncane (ngisho nangezinkulumo ezivamile!) kukhona okuhle kakhulu imojuli pg_trgm! Kungaleso sikhathi kuphela lapho kuyodingeka ukuhlunga ngendlela efanele.

Ake sizame ukuthatha ipuleti elilandelayo ukwenza imodeli ibe lula:

CREATE TABLE firms(
  id
    serial
      PRIMARY KEY
, name
    text
);

Silayisha amarekhodi ayizigidi ezingu-7.8 ezinhlangano zangempela lapho bese sikhomba:

CREATE EXTENSION pg_trgm;
CREATE INDEX ON firms USING gin(lower(name) gin_trgm_ops);

Ake sibheke amarekhodi ayi-10 okuqala osesho lwe-interlinear:

SELECT
  *
FROM
  firms
WHERE
  lower(name) ~ ('(^|s)' || 'роза')
ORDER BY
  lower(name) ~ ('^' || 'роза') DESC -- сначала "начинающиеся на"
, lower(name) -- остальное по алфавиту
LIMIT 10;

Ama-PostgreSQL Antipatterns: inganekwane yokuphindaphinda kabusha kokusesha ngegama, noma “Ukuthuthukisa emuva naphambili”
[buka kokuthi explain.tensor.ru]

Awu, lokho... 26ms, 31MB funda idatha kanye namarekhodi ahlungiwe angaphezu kuka-1.7K - ku-10 aseshiwe. Izindleko ze-overhead ziphezulu kakhulu, ingabe akukho okunye okusebenzayo?

1.2: sesha ngombhalo? I-FTS!

Ngempela, i-PostgreSQL inikeza amandla amakhulu kakhulu umbhalo ogcwele injini yokusesha (Usesho Olugcwele Lombhalo), okuhlanganisa ikhono lokusesha isiqalo. Inketho enhle kakhulu, awudingi ngisho nokufaka izandiso! Ake sizame:

CREATE INDEX ON firms USING gin(to_tsvector('simple'::regconfig, lower(name)));

SELECT
  *
FROM
  firms
WHERE
  to_tsvector('simple'::regconfig, lower(name)) @@ to_tsquery('simple', 'роза:*')
ORDER BY
  lower(name) ~ ('^' || 'роза') DESC
, lower(name)
LIMIT 10;

Ama-PostgreSQL Antipatterns: inganekwane yokuphindaphinda kabusha kokusesha ngegama, noma “Ukuthuthukisa emuva naphambili”
[buka kokuthi explain.tensor.ru]

Lapha ukufana kokwenziwa kombuzo kwasisiza kancane, ukusika isikhathi phakathi ukuze 11ms. Futhi kwakudingeka sifunde izikhathi ezingu-1.5 ngaphansi - sezizonke 20MB. Kodwa lapha, kancane, kungcono, ngoba ivolumu enkulu esiyifundayo, ayanda amathuba okuthola i-cache miss, futhi wonke amakhasi engeziwe wedatha afundwa kudiski "amabhuleki" angaba khona esicelo.

1.3: usathanda?

Isicelo sangaphambilini sihle kuwo wonke umuntu, kodwa kuphela uma usidonsa izikhathi eziyizinkulungwane eziyikhulu ngosuku, sizofika 2TB funda idatha. Esimweni esihle kakhulu, kusuka enkumbulweni, kepha uma unebhadi, bese usuka kudiski. Ngakho-ke ake sizame ukuyenza ibe mncane.

Masikhumbule lokho umsebenzisi afuna ukukubona okokuqala "okuqala ...". Ngakho lokhu kusesimweni sakho esimsulwa ukusesha isiqalo ngosizo lwe text_pattern_ops! Futhi kuphela uma “singenawo okwanele” amarekhodi afika kwayi-10 esiwafunayo, kuzodingeka siqedele ukuwafunda sisebenzisa ukusesha kwe-FTS:

CREATE INDEX ON firms(lower(name) text_pattern_ops);

SELECT
  *
FROM
  firms
WHERE
  lower(name) LIKE ('роза' || '%')
LIMIT 10;

Ama-PostgreSQL Antipatterns: inganekwane yokuphindaphinda kabusha kokusesha ngegama, noma “Ukuthuthukisa emuva naphambili”
[buka kokuthi explain.tensor.ru]

Ukusebenza okuhle kakhulu - inani 0.05ms kanye nokungaphezulu kancane kuka-100KB funda! Kuphela thina esikhohliwe hlunga ngamagamaukuze umsebenzisi angalahleki emiphumeleni:

SELECT
  *
FROM
  firms
WHERE
  lower(name) LIKE ('роза' || '%')
ORDER BY
  lower(name)
LIMIT 10;

Ama-PostgreSQL Antipatterns: inganekwane yokuphindaphinda kabusha kokusesha ngegama, noma “Ukuthuthukisa emuva naphambili”
[buka kokuthi explain.tensor.ru]

Oh, into ayiseyinhle kangako - kubonakala sengathi kukhona inkomba, kodwa izimpukane zokuhlunga zidlule ... Yiqiniso, isivele iphumelela izikhathi eziningi kunenketho yangaphambilini, kodwa ...

1.4: “qeda ngefayela”

Kepha kunenkomba ekuvumela ukuthi useshe ngobubanzi futhi usebenzise ukuhlunga ngokujwayelekile - i-btree evamile!

CREATE INDEX ON firms(lower(name));

Isicelo sakho kuphela okufanele "siqoqwe mathupha":

SELECT
  *
FROM
  firms
WHERE
  lower(name) >= 'роза' AND
  lower(name) <= ('роза' || chr(65535)) -- для UTF8, для однобайтовых - chr(255)
ORDER BY
   lower(name)
LIMIT 10;

Ama-PostgreSQL Antipatterns: inganekwane yokuphindaphinda kabusha kokusesha ngegama, noma “Ukuthuthukisa emuva naphambili”
[buka kokuthi explain.tensor.ru]

Kuhle kakhulu - ukuhlunga kuyasebenza, futhi ukusetshenziswa kwezinsiza kuhlala "kuncane kakhulu", izinkulungwane zezikhathi ezisebenza kangcono kune-FTS “ehlanzekile”! Okusele nje ukukuhlanganisa kube isicelo esisodwa:

(
  SELECT
    *
  FROM
    firms
  WHERE
    lower(name) >= 'роза' AND
    lower(name) <= ('роза' || chr(65535)) -- для UTF8, для однобайтовых кодировок - chr(255)
  ORDER BY
     lower(name)
  LIMIT 10
)
UNION ALL
(
  SELECT
    *
  FROM
    firms
  WHERE
    to_tsvector('simple'::regconfig, lower(name)) @@ to_tsquery('simple', 'роза:*') AND
    lower(name) NOT LIKE ('роза' || '%') -- "начинающиеся на" мы уже нашли выше
  ORDER BY
    lower(name) ~ ('^' || 'роза') DESC -- используем ту же сортировку, чтобы НЕ пойти по btree-индексу
  , lower(name)
  LIMIT 10
)
LIMIT 10;

Qaphela ukuthi i-subquery yesibili ifakiwe kuphela uma eyokuqala ibuye ngaphansi kwalokho obekulindelekile okokugcina LIMIT inombolo yemigqa. Ngikhuluma ngale ndlela yokuthuthukisa imibuzo osekubhaliwe ngaphambili.

Ngakho-ke yebo, manje sinakho kokubili i-btree ne-gin etafuleni, kodwa ngokwezibalo kuvela lokho ngaphansi kwe-10% yezicelo ezifinyelela ekusetshenzisweni kwebhulokhi yesibili. Okusho ukuthi, ngemikhawulo enjalo eyaziwa kusengaphambili ngomsebenzi, sikwazile ukunciphisa ukusetshenziswa okuphelele kwezinsiza zeseva cishe izikhathi eziyinkulungwane!

1.5*: singenza ngaphandle kwefayela

Phezulu LIKE Sivinjiwe ekusebenziseni ukuhlunga okungalungile. Kodwa "ingasethwa endleleni efanele" ngokucacisa i-USING opharetha:

Ngokuzenzakalelayo kuyacatshangelwa ASC. Ukwengeza, ungacacisa igama le-opharetha yohlobo oluthile esigatshaneni USING. U-opharetha wohlobo kufanele abe yilungu labangaphansi noma elikhulu kunomndeni othile wabasebenzisa i-B-tree. ASC ngokuvamile kuyalingana USING < и DESC ngokuvamile kuyalingana USING >.

Esimweni sethu, "okuncane" kusho ~<~:

SELECT
  *
FROM
  firms
WHERE
  lower(name) LIKE ('роза' || '%')
ORDER BY
  lower(name) USING ~<~
LIMIT 10;

Ama-PostgreSQL Antipatterns: inganekwane yokuphindaphinda kabusha kokusesha ngegama, noma “Ukuthuthukisa emuva naphambili”
[buka kokuthi explain.tensor.ru]

2: izicelo ziba muncu kanjani

Manje sishiya isicelo sethu sokuthi "sibambe" izinyanga eziyisithupha noma unyaka, futhi siyamangala ukuphinde sithole "phezulu" nezinkomba zengqikithi "yokupompa" kwenkumbulo yansuku zonke (ama-buffers abiwe) ku 5.5TB - okungukuthi, ngisho nangaphezu kwalokho okwakuyikho ekuqaleni.

Cha, yebo, ibhizinisi lethu selikhulile futhi nomsebenzi wethu unyukile, kodwa hhayi ngenani elifanayo! Lokhu kusho ukuthi kukhona okushaya amanzi lapha - ake sikuthole.

2.1: ukuzalwa kwekhasi

Ngesinye isikhathi, elinye ithimba labathuthukisi lalifuna ukwenza kube nokwenzeka "ukweqa" kusukela ekusesheni okubhaliselwe okusheshayo kuya kurejista ngemiphumela efanayo, kodwa eyandisiwe. Yini ukubhalisa ngaphandle kokuzulazula kwekhasi? Masiyiklwebhe!

( ... LIMIT <N> + 10)
UNION ALL
( ... LIMIT <N> + 10)
LIMIT 10 OFFSET <N>;

Manje bekungenzeka ukukhombisa ukubhaliswa kwemiphumela yosesho ngokulayisha "ikhasi nekhasi" ngaphandle kwengcindezi kunjiniyela.

Yebo, eqinisweni, ekhasini ngalinye elilandelayo ledatha kuyafundwa futhi kuyafundwa (konke kusukela esikhathini esidlule, esizoyilahla, kanye "nomsila" odingekayo) - okungukuthi, lokhu kuyi-antipattern ecacile. Kodwa kungaba okulungile kakhulu ukuqala usesho ekuphindaphindweni okulandelayo kusuka kukhiye ogcinwe kusixhumi esibonakalayo, kodwa mayelana nalokho ngesinye isikhathi.

2.2: Ngifuna into engavamile

Ngesinye isikhathi umthuthukisi wayefuna hlukanisa isampula eliwumphumela ngedatha kwelinye ithebula, isicelo salo sonke sangaphambilini sathunyelwa ku-CTE:

WITH q AS (
  ...
  LIMIT <N> + 10
)
SELECT
  *
, (SELECT ...) sub_query -- какой-то запрос к связанной таблице
FROM
  q
LIMIT 10 OFFSET <N>;

Futhi noma kunjalo, akukubi, njengoba i-subquery ihlolwa kuphela kumarekhodi ayi-10 abuyisiwe, uma kungenjalo ...

2.3: I-DIISTINCT ayinangqondo futhi ayinasihawu

Endaweni ethile kunqubo yokuvela okunjalo kusukela kumbuzo ongaphansi wesi-2 ilahleke NOT LIKE isimo. Kuyacaca ukuthi emva kwalokhu UNION ALL waqala ukubuya eminye imingenelo kabili - okokuqala kutholakala ekuqaleni komugqa, futhi futhi - ekuqaleni kwegama lokuqala lalo mugqa. Emkhawulweni, wonke amarekhodi emibuzo engaphansi yesi-2 angase afane namarekhodi okuqala.

Wenzani umthuthukisi esikhundleni sokubheka imbangela?.. Akubuzwa!

  • kabili ubukhulu amasampula oqobo
  • sebenzisa i-DISTINCTukuze uthole izibonelo ezilodwa zomugqa ngamunye

WITH q AS (
  ( ... LIMIT <2 * N> + 10)
  UNION ALL
  ( ... LIMIT <2 * N> + 10)
  LIMIT <2 * N> + 10
)
SELECT DISTINCT
  *
, (SELECT ...) sub_query
FROM
  q
LIMIT 10 OFFSET <N>;

Okusho ukuthi, kusobala ukuthi umphumela, ekugcineni, ufana ncamashi, kodwa ithuba "lokundiza" ku-subquery ye-2 CTE seliphakeme kakhulu, futhi ngaphandle kwalokhu, efundeka ngokucacile.

Kodwa lena akuyona into edabukisa kakhulu. Njengoba unjiniyela ecele ukukhetha DISTINCT hhayi kwezithize, kodwa kuzo zonke izinkambu ngesikhathi esisodwa amarekhodi, bese inkambu ye-sub_query - umphumela we-subquery - ifakwe lapho ngokuzenzakalelayo. Manje, ukwenza DISTINCT, isizindalwazi bekufanele sisebenzise kakade hhayi imibuzo eyi-10, kodwa yonke <2 * N> + 10!

2.4: ukusebenzisana ngaphezu kwakho konke!

Ngakho-ke, abathuthukisi baphile - abazange bazihluphe, ngoba umsebenzisi ngokusobala wayengenaso isineke esanele "sokulungisa" ukubhalisa kumanani abalulekile we-N ngokuncipha okungapheli ekutholeni "ikhasi" ngalinye elilandelayo.

Kwaze kwaba yilapho kufika onjiniyela abavela komunye umnyango futhi bafuna ukusebenzisa indlela elula kangaka ngokusesha okuphindaphindayo - okungukuthi, sithatha ucezu kusuka kwesinye isampula, sihlunge ngezimo ezengeziwe, sidwebe umphumela, bese ucezu olulandelayo (okuthi kithi lufezwa ngokwandisa u-N), njalonjalo size sigcwalise isikrini.

Ngokuvamile, ku-specimen ebanjwe N ifinyelele amanani acishe abe ngu-17K, futhi ngosuku olulodwa nje okungenani i-4K yezicelo ezinjalo zenziwa "kanye neketanga". Owokugcina wahlolwa ngesibindi ngu I-1GB yememori ngokuphindaphinda ngakunye...

Inani

Ama-PostgreSQL Antipatterns: inganekwane yokuphindaphinda kabusha kokusesha ngegama, noma “Ukuthuthukisa emuva naphambili”

Source: www.habr.com

Engeza amazwana