PostgreSQL Antipatterns: sheeko dib-u-habayn ku saabsan raadinta magac ahaan, ama "Horumarinta gadaal iyo hor"

Kumanaan maareeyayaal ah oo ka socda xafiisyada iibka ee dalka oo dhan ayaa diiwaan galiyay nidaamkayaga CRM tobanaan kun oo xiriir ah maalin kasta - xaqiiqooyinka la xidhiidha xidhiidhka macaamiisha iman kara ama jira. Taasna, waa inaad marka hore u heshaa macmiil, oo aad doorbidayso si dhakhso ah. Waxayna tani inta badan ku dhacdaa magaca.

Sidaa darteed, maahan wax la yaab leh, in mar kale la falanqeeyo su'aalaha "culus" ee mid ka mid ah xog-ururinta ugu badan - annaga ayaa iska leh. Koontada shirkadda VLSI, Waxaan ka helay "sare" codso raadinta "dhakhso ah" magac ahaan kaararka ururka.

Waxaa intaa dheer, baaritaan dheeraad ah ayaa muujiyay tusaale xiiso leh ugu horrayn ka dibna hoos u dhaca waxqabadka codsi iyada oo sixitaankeeda isdabajooga ah ay sameeyeen kooxo dhowr ah, kuwaas oo mid kastaa u dhaqmay si ula kac ah ulajeeddooyinka ugu wanaagsan.

0: muxuu rabay isticmaaluhu?

PostgreSQL Antipatterns: sheeko dib-u-habayn ku saabsan raadinta magac ahaan, ama "Horumarinta gadaal iyo hor"[KDPV halkan]

Muxuu inta badan ula jeedaa isticmaaluhu marka ay ka hadlayaan raadinta "dhakhso" magaca? Ku dhawaad ​​waligeed ma soo baxdo inay noqoto raadinta "daacad ah" ee xaraf-hoosaadka sida ... LIKE '%Ρ€ΠΎΠ·Π°%' - sababtoo ah markaa natiijadu kuma jiraan oo keliya 'Розалия' ΠΈ 'Магазин Π ΠΎΠ·Π°'Laakiin 'Π“Ρ€ΠΎΠ·Π°' iyo xataa 'Π”ΠΎΠΌ Π”Π΅Π΄Π° ΠœΠΎΡ€ΠΎΠ·Π°'.

Isticmaaluhu wuxuu u qaadanayaa heerka maalin kasta inaad siin doonto isaga ku raadi bilawga ereyga cinwaanka oo ka dhig mid ku habboon taas ka bilaabma galay. Adiguna waad samayn doontaa ku dhawaad ​​isla markiiba - gelinta interlinear.

1: xaddid hawsha

Iyo xataa si ka sii badan, qofku si gaar ah uma geli doono 'Ρ€ΠΎΠ· ΠΌΠ°Π³Π°Π·', si aad u raadiso kelmad kasta horgale. Maya, aad bay ugu sahlan tahay isticmaaluhu inuu ka jawaabo tilmaam degdeg ah kelmadda u dambaysa halkii uu si ulakac ah u "hoos dhigi lahaa" kuwii hore - eeg sida makiinad kasta oo wax raadin ahi u qabto tan.

Guud ahaan, sax dejinta shuruudaha dhibaatadu waxay ka badan tahay kala badh xalka. Mararka qaarkood si taxadar leh u isticmaal falanqaynta kiiska saamayn weyn ku yeelan kartaa natiijada.

Muxuu qabtaa horumariyaha abstract?

1.0: mashiinka raadinta dibadda

Oh, raadintu way adag tahay, ma rabo inaan sameeyo gabi ahaanba - aan u dhiibno wax-qabad! U oggolow inay geeyaan makiinad goobeed ka baxsan xogta xogta: Sphinx, ElasticSearch,...

Ikhtiyaar shaqo, in kasta oo xoog badan marka la eego wada shaqaynta iyo xawaaraha isbeddelada. Laakiin maaha kiiskeena, tan iyo raadinta waxaa loo sameeyaa macmiil kasta oo kaliya gudaha qaabka xogta akoonkiisa. Oo xogtu waxay leedahay kala duwanaansho cadaalad ah oo sarreeya - iyo haddii maamuluhu hadda galay kaarka 'Магазин Роза', ka dib 5-10 ilbiriqsi ka dib waxaa laga yaabaa inuu horey u xasuusto inuu illoobay inuu emailkiisa ku tilmaamo halkaas oo uu rabo inuu helo oo uu saxo.

Sidaa darteed - aynu ka raadi "si toos ah database-ka". Nasiib wanaag, PostgreSQL waxay noo ogolaataa inaan tan samayno, mana aha hal ikhtiyaar oo keliya - waanu eegi doonaa iyaga.

1.1: substring "daacad ah".

Waxaan ku dheggannahay ereyga "substring". Laakiin raadinta index by substring (iyo xitaa tibaaxaha caadiga ah!) Waxaa jira heer sare ah module pg_trgm! Kaliya markaas ayaa lagama maarmaan noqon doonta in si sax ah loo kala saaro.

Aan isku dayno inaan qaadno saxanka soo socda si aan u fududeyno qaabka:

CREATE TABLE firms(
  id
    serial
      PRIMARY KEY
, name
    text
);

Waxaan soo rarnaa 7.8 milyan diiwaanka ururada dhabta ah halkaas oo aan ku tusinno:

CREATE EXTENSION pg_trgm;
CREATE INDEX ON firms USING gin(lower(name) gin_trgm_ops);

Aynu raadinno 10-ka diiwaan ee ugu horreeya raadinta interlinear:

SELECT
  *
FROM
  firms
WHERE
  lower(name) ~ ('(^|s)' || 'Ρ€ΠΎΠ·Π°')
ORDER BY
  lower(name) ~ ('^' || 'Ρ€ΠΎΠ·Π°') DESC -- сначала "Π½Π°Ρ‡ΠΈΠ½Π°ΡŽΡ‰ΠΈΠ΅ΡΡ Π½Π°"
, lower(name) -- ΠΎΡΡ‚Π°Π»ΡŒΠ½ΠΎΠ΅ ΠΏΠΎ Π°Π»Ρ„Π°Π²ΠΈΡ‚Ρƒ
LIMIT 10;

PostgreSQL Antipatterns: sheeko dib-u-habayn ku saabsan raadinta magac ahaan, ama "Horumarinta gadaal iyo hor"
[fiiri sharaxaad.tensor.ru]

Hagaag, taasi waa... 26ms, 31MB akhri xogta iyo in ka badan 1.7K diiwaanno la sifeeyay - oo loogu talagalay 10 kuwa la raadiyay. Kharashka dusha sare waa mid aad u sarreeya, miyaanay jirin wax ka tayo badan?

1.2: qoraal ku raadi? Waa FTS!

Runtii, PostgreSQL waxay bixisaa mid aad u awood badan mashiinka raadinta qoraalka buuxa (Full Text Search), oo ay ku jirto awoodda horgale raadinta. Doorasho aad u fiican, xitaa uma baahnid inaad rakibto kordhinta! Aan isku dayno:

CREATE INDEX ON firms USING gin(to_tsvector('simple'::regconfig, lower(name)));

SELECT
  *
FROM
  firms
WHERE
  to_tsvector('simple'::regconfig, lower(name)) @@ to_tsquery('simple', 'Ρ€ΠΎΠ·Π°:*')
ORDER BY
  lower(name) ~ ('^' || 'Ρ€ΠΎΠ·Π°') DESC
, lower(name)
LIMIT 10;

PostgreSQL Antipatterns: sheeko dib-u-habayn ku saabsan raadinta magac ahaan, ama "Horumarinta gadaal iyo hor"
[fiiri sharaxaad.tensor.ru]

Halkan isbarbardhigga fulinta weydiinta ayaa naga caawisay wax yar, iyada oo wakhtiga kala badh ka dhigaysa 11ms. Oo waxay ahayd inaan wax akhrino 1.5 jeer ka yar - wadar ahaan 20MB. Laakiin halkan, inta ka yar, way ka sii fiican tahay, sababtoo ah mugga weyn ee aan akhrineyno, way sii kordheysaa fursadaha helitaanka khasnad, iyo bog kasta oo dheeraad ah oo xog ah oo laga akhriyo diskka ayaa ah "brakes" suurtagal ah ee codsiga.

1.3: weli like?

Codsigii hore qof walba wuu u roon yahay, laakiin haddii aad boqol kun jeer soo jiidato maalintii, wuu iman doonaa 2TB xogta akhri. Xaaladda ugu fiican, laga bilaabo xusuusta, laakiin haddii aadan nasiib lahayn, ka dibna ka disk. Haddaba aan isku dayno inaan ka dhigno mid yar.

Aan xasuusano waxa isticmaaluhu rabo inuu arko marka hore "oo ka bilaabma...". Markaa tani waxay ku jirtaa qaabkeeda saafiga ah horgale raadinta iyadoo gacan ka heleysa text_pattern_ops! Oo kaliya haddii "aan haysanin wax ku filan" ilaa 10 diiwaan oo aan raadineyno, markaa waa inaan dhammeynaa akhrinta iyaga oo isticmaalaya raadinta FTS:

CREATE INDEX ON firms(lower(name) text_pattern_ops);

SELECT
  *
FROM
  firms
WHERE
  lower(name) LIKE ('Ρ€ΠΎΠ·Π°' || '%')
LIMIT 10;

PostgreSQL Antipatterns: sheeko dib-u-habayn ku saabsan raadinta magac ahaan, ama "Horumarinta gadaal iyo hor"
[fiiri sharaxaad.tensor.ru]

Waxqabad heer sare ah - wadar ahaan 0.05ms iyo in ka yar 100KB akhri! Keliya waan ilownay magac ahaan u kala saarsi aanu isticmaaluhu u lumin natiijooyinka:

SELECT
  *
FROM
  firms
WHERE
  lower(name) LIKE ('Ρ€ΠΎΠ·Π°' || '%')
ORDER BY
  lower(name)
LIMIT 10;

PostgreSQL Antipatterns: sheeko dib-u-habayn ku saabsan raadinta magac ahaan, ama "Horumarinta gadaal iyo hor"
[fiiri sharaxaad.tensor.ru]

Oh, shay maaha mid aad u qurux badan - waxay u muuqataa in ay jirto index, laakiin kala-soocidda duqsigu way dhaaftay ... Waa, dabcan, horeyba marar badan ayay uga waxtar badan tahay doorashadii hore, laakiin ...

1.4: "ku dhammee faylka"

Laakiin waxaa jira tusmeyn kuu ogolaanaya inaad si kala duwan u raadiso oo aad weli u isticmaasho kala-soocidda si caadi ah - btree caadiga ah!

CREATE INDEX ON firms(lower(name));

Kaliya codsigaga ayaa ah in "gacan lagu ururiyo":

SELECT
  *
FROM
  firms
WHERE
  lower(name) >= 'Ρ€ΠΎΠ·Π°' AND
  lower(name) <= ('Ρ€ΠΎΠ·Π°' || chr(65535)) -- для UTF8, для ΠΎΠ΄Π½ΠΎΠ±Π°ΠΉΡ‚ΠΎΠ²Ρ‹Ρ… - chr(255)
ORDER BY
   lower(name)
LIMIT 10;

PostgreSQL Antipatterns: sheeko dib-u-habayn ku saabsan raadinta magac ahaan, ama "Horumarinta gadaal iyo hor"
[fiiri sharaxaad.tensor.ru]

Aad u fiican - kala soocidda ayaa shaqeysa, iyo isticmaalka kheyraadka ayaa weli ah "microscopic", kumanyaal jeer ayaa ka waxtar badan FTS "daafic ah".! Waxa hadhay oo dhan waa in la isku geeyo hal codsi:

(
  SELECT
    *
  FROM
    firms
  WHERE
    lower(name) >= 'Ρ€ΠΎΠ·Π°' AND
    lower(name) <= ('Ρ€ΠΎΠ·Π°' || chr(65535)) -- для UTF8, для ΠΎΠ΄Π½ΠΎΠ±Π°ΠΉΡ‚ΠΎΠ²Ρ‹Ρ… ΠΊΠΎΠ΄ΠΈΡ€ΠΎΠ²ΠΎΠΊ - chr(255)
  ORDER BY
     lower(name)
  LIMIT 10
)
UNION ALL
(
  SELECT
    *
  FROM
    firms
  WHERE
    to_tsvector('simple'::regconfig, lower(name)) @@ to_tsquery('simple', 'Ρ€ΠΎΠ·Π°:*') AND
    lower(name) NOT LIKE ('Ρ€ΠΎΠ·Π°' || '%') -- "Π½Π°Ρ‡ΠΈΠ½Π°ΡŽΡ‰ΠΈΠ΅ΡΡ Π½Π°" ΠΌΡ‹ ΡƒΠΆΠ΅ нашли Π²Ρ‹ΡˆΠ΅
  ORDER BY
    lower(name) ~ ('^' || 'Ρ€ΠΎΠ·Π°') DESC -- ΠΈΡΠΏΠΎΠ»ΡŒΠ·ΡƒΠ΅ΠΌ Ρ‚Ρƒ ΠΆΠ΅ сортировку, Ρ‡Ρ‚ΠΎΠ±Ρ‹ НЕ ΠΏΠΎΠΉΡ‚ΠΈ ΠΏΠΎ btree-индСксу
  , lower(name)
  LIMIT 10
)
LIMIT 10;

Ogsoonow in subquery labaad la fuliyay kaliya haddii kii hore soo laabtay wax ka yar intii la filayay ugu dambeeya LIMIT tirada khadadka. Waxaan ka hadlayaa habkan kor u qaadida weydiinta hore u qoray ka hor.

Markaa haa, waxaan hadda miiska ku haynaa btree iyo gin labadaba, laakiin tirakoob ahaan waxay soo baxday taas in ka yar 10% codsiyada waxay gaadhaan fulinta qaybta labaad. Taasi waa, iyada oo xaddidaadyada caadiga ah ee horay loo yaqaan hawsha, waxaan awoodnay inaan hoos u dhigno wadarta guud ee isticmaalka ilaha server ku dhawaad ​​kun jeer!

1.5*: waxaan samayn karnaa fayl la'aan

Xagga sare LIKE Waxaa nalaga hor istaagay in aan isticmaalno kala saarid khaldan. Laakin waxa lagu "dhigi karaa dariiqa saxda ah" iyadoo la cayimayo hawlwadeenka ISTICMAALKA:

Sida caadiga ah waxaa loo maleynayaa ASC. Intaa waxaa dheer, waxaad ku qeexi kartaa magaca nooc ka mid ah hawlwadeenada nooca gaarka ah ee qodob USING. Hawlwadeenku waa inuu noqdaa xubin ka yar ama ka weyn qoyska qaar ka mid ah hawl-wadeennada geedaha B-geedka. ASC caadi ahaan u dhigma USING < ΠΈ DESC caadi ahaan u dhigma USING >.

Xaaladeena, "ka yar" waa ~<~:

SELECT
  *
FROM
  firms
WHERE
  lower(name) LIKE ('Ρ€ΠΎΠ·Π°' || '%')
ORDER BY
  lower(name) USING ~<~
LIMIT 10;

PostgreSQL Antipatterns: sheeko dib-u-habayn ku saabsan raadinta magac ahaan, ama "Horumarinta gadaal iyo hor"
[fiiri sharaxaad.tensor.ru]

2: sida codsiyadu u noqdaan dhanaan

Hadda waxaan ka tagnay codsigeena in aan "ku karkariyo" lix bilood ama sanad, waxaana la yaabnay inaan mar kale ka helno "sare" oo leh tilmaamayaasha wadarta guud ee maalinlaha ah "bamgareynta" xusuusta (buffers la wadaago hit) gudaha 5.5TB - taasi waa, xitaa in ka badan sidii markii hore.

Maya, dabcan, ganacsigayagu wuu koray, culayska shaqadayaduna way korodhay, laakiin isku si maaha! Tani waxay la macno tahay in ay wax kalluun ku jiraan halkan - aan ogaanno.

2.1: dhalashada paging

Marmarka qaarkood, koox kale oo horumarineed ayaa rabeen inay suurtogal ka dhigaan inay "ka boodaan" raadinta degdega ah ee diiwaanka diiwaanka oo leh natiijooyin isku mid ah, laakiin la ballaariyay. Waa maxay diiwaan-gelin aan lahayn bogga navigation? Aan isku duubno!

( ... LIMIT <N> + 10)
UNION ALL
( ... LIMIT <N> + 10)
LIMIT 10 OFFSET <N>;

Hadda waxaa suurtagal ahayd in lagu muujiyo diiwaanka natiijooyinka raadinta iyadoo "bog-bog-bog" lagu shubayo iyada oo aan wax walbahaar ah ku dhicin horumariyaha.

Dabcan, dhab ahaantii, bog kasta oo xiga ee xogta in ka badan ayaa la akhriyaa (dhammaan laga soo bilaabo waqtigii hore, oo aan tuuri doono, oo lagu daray "dabada" lagama maarmaanka u ah) - taas oo ah, tani waa antipattern cad. Laakin waxa ay ahaan lahayd mid sax ah in raadinta soo socota laga bilaabo furaha ku kaydsan interface-ka, laakiin taasi waa wakhti kale.

2.2: Waxaan rabaa wax qalaad

Mar uu horumariyuhu rabay ku kala saar muunadda ka soo baxda xogta laga soo bilaabo miis kale, kaas oo dhammaan codsigii hore loo diray CTE:

WITH q AS (
  ...
  LIMIT <N> + 10
)
SELECT
  *
, (SELECT ...) sub_query -- ΠΊΠ°ΠΊΠΎΠΉ-Ρ‚ΠΎ запрос ΠΊ связанной Ρ‚Π°Π±Π»ΠΈΡ†Π΅
FROM
  q
LIMIT 10 OFFSET <N>;

Si kastaba ha noqotee, ma xuma, maadaama subquery lagu qiimeeyay kaliya 10 diiwaan oo la soo celiyay, haddii aysan ahayn ...

2.3: Kala duwanaansho waa macno darro iyo naxariis la'aan

Meel ka mid ah geeddi-socodka horumarkan oo kale oo ka yimid subquery 2aad lumay NOT LIKE xaalada. Waxaa cad in tan ka dib UNION ALL bilaabay soo noqoshada gelitaanka qaar ka mid ah laba jeer - marka hore laga helay bilawga xariiqda, ka dibna mar labaad - bilawga ereyga koowaad ee xariiqan. Xadka dhexdiisa, dhammaan diiwaanada subquery 2aad waxay la mid noqon karaan diiwaanada koowaad.

Muxuu sameeyaa horumariye halkii uu ka raadin lahaa sababta?.. Su'aal ma leh!

  • labanlaabo cabbirka muunado asalka ah
  • codso DISTINCTsi aad u hesho hal tusaale oo xariiq kasta ah

WITH q AS (
  ( ... LIMIT <2 * N> + 10)
  UNION ALL
  ( ... LIMIT <2 * N> + 10)
  LIMIT <2 * N> + 10
)
SELECT DISTINCT
  *
, (SELECT ...) sub_query
FROM
  q
LIMIT 10 OFFSET <N>;

Taasi waa, way cadahay in natiijadu, dhamaadka, ay isku mid tahay, laakiin fursada "u duulista" ee 2nd CTE subquery ayaa noqotay mid aad u sarreeya, xitaa tan la'aanteed, si cad loo akhriyi karo.

Laakiin tani maaha waxa ugu murugada badan. Tan iyo markii horumariyuhu codsaday inuu doorto DISTINCT maaha kuwo gaar ah, laakiin dhammaan goobaha hal mar diiwaanada, ka dibna goobta sub_query - natiijada subquery - ayaa si toos ah loogu daray halkaas. Hadda, si loo fuliyo DISTINCT, database-ku waxay ahayd in la fuliyo mar hore ma aha 10 su'aalood, laakiin dhammaan <2 * N> + 10!

2.4: iskaashiga oo dhan ka sarreeya!

Marka, horumariyayaashu way noolaayeen - ma aysan dhibin, sababtoo ah isticmaaluhu wuxuu si cad u haysanin dulqaad ku filan si uu "u hagaajiyo" diiwaanka qiimaha N muhiimka ah iyadoo hoos u dhac joogto ah uu ku helayo helitaanka "bog" kasta oo xiga.

Ilaa horumariyayaal ka socda waax kale ay u yimaadeen oo ay rabeen inay isticmaalaan habka ku habboon raadinta soo noqnoqonaysa - taasi waa, waxaan ka qaadnaa qayb ka mid ah muunado, ku shaandheynno shuruudo dheeraad ah, soo saar natiijada, ka dibna qaybta xigta (taas oo kiiskeena lagu gaaro kordhinta N), iyo wixii la mid ah ilaa aan buuxineyno shaashadda.

Guud ahaan, muunada la qabtay N wuxuu gaaray qiyamka ku dhawaad ​​17K, oo hal maalin gudaheed ugu yaraan 4K ee codsiyada noocaas ah ayaa lagu fuliyay "silsiladda". Kuwii u dambeeyay ayaa si geesinimo leh loo sawiray 1GB ee xusuusta hal mar...

Wadarta

PostgreSQL Antipatterns: sheeko dib-u-habayn ku saabsan raadinta magac ahaan, ama "Horumarinta gadaal iyo hor"

Source: www.habr.com

Add a comment