Ish yo'nalishim tufayli, men ishlab chiquvchi so'rov yozgan va o'ylaydigan vaziyatlarni hal qilishim kerak "Baza aqlli, u hamma narsani o'zi hal qila oladi!Β«
Ba'zi hollarda (qisman ma'lumotlar bazasi imkoniyatlarini bilmaslik, qisman muddatidan oldin optimallashtirish tufayli) bu yondashuv "Frankenshteynlar" paydo bo'lishiga olib keladi.
Birinchidan, men bunday so'rovga misol keltiraman:
-- Π΄Π»Ρ ΠΊΠ°ΠΆΠ΄ΠΎΠΉ ΠΊΠ»ΡΡΠ΅Π²ΠΎΠΉ ΠΏΠ°ΡΡ Π½Π°Ρ
ΠΎΠ΄ΠΈΠΌ Π°ΡΡΠΎΡΠΈΠΈΡΠΎΠ²Π°Π½Π½ΡΠ΅ Π·Π½Π°ΡΠ΅Π½ΠΈΡ ΠΏΠΎΠ»Π΅ΠΉ
WITH RECURSIVE cte_bind AS (
SELECT DISTINCT ON (key_a, key_b)
key_a a
, key_b b
, fld1 bind_fld1
, fld2 bind_fld2
FROM
tbl
)
-- Π½Π°Ρ
ΠΎΠ΄ΠΈΠΌ min/max Π·Π½Π°ΡΠ΅Π½ΠΈΠΉ Π΄Π»Ρ ΠΊΠ°ΠΆΠ΄ΠΎΠ³ΠΎ ΠΏΠ΅ΡΠ²ΠΎΠ³ΠΎ ΠΊΠ»ΡΡΠ°
, cte_max AS (
SELECT
a
, max(bind_fld1) bind_fld1
, min(bind_fld2) bind_fld2
FROM
cte_bind
GROUP BY
a
)
-- ΡΠ²ΡΠ·ΡΠ²Π°Π΅ΠΌ ΠΏΠΎ ΠΏΠ΅ΡΠ²ΠΎΠΌΡ ΠΊΠ»ΡΡΡ ΠΊΠ»ΡΡΠ΅Π²ΡΠ΅ ΠΏΠ°ΡΡ ΠΈ min/max-Π·Π½Π°ΡΠ΅Π½ΠΈΡ
, cte_a_bind AS (
SELECT
cte_bind.a
, cte_bind.b
, cte_max.bind_fld1
, cte_max.bind_fld2
FROM
cte_bind
INNER JOIN
cte_max
ON cte_max.a = cte_bind.a
)
SELECT * FROM cte_a_bind;
So'rov sifatini to'liq baholash uchun keling, o'zboshimchalik bilan ma'lumotlar to'plamini yarataylik:
CREATE TABLE tbl AS
SELECT
(random() * 1000)::integer key_a
, (random() * 1000)::integer key_b
, (random() * 10000)::integer fld1
, (random() * 10000)::integer fld2
FROM
generate_series(1, 10000);
CREATE INDEX ON tbl(key_a, key_b);
Ma'lum bo'ladiki ma'lumotlarni o'qish vaqtning chorak qismidan kamroq vaqtni oldi so'rovning bajarilishi:
Uni qismlarga bo'lib ajratish
Keling, so'rovni batafsil ko'rib chiqaylik va hayron qolamiz:
- Agar rekursiv CTElar bo'lmasa, nima uchun WITH RECURSIVE bu erda?
- Nima uchun min/maksimal qiymatlarni alohida CTEda guruhlash kerak, agar ular asl namunaga bog'langan bo'lsa?
+25% vaqt - Nima uchun oldingi CTE ni takrorlash uchun oxirida shartsiz "SELECT * FROM" dan foydalanish kerak?
+14% vaqt
Bunday holda, ulanish uchun Nested Loop emas, balki Hash Join tanlangani biz uchun juda omadli edi, chunki u holda biz faqat bitta CTE Scan yo'lini emas, balki 10K ga ega bo'lardik!
CTE Scan haqida bir ozBu erda biz buni eslashimiz kerak CTE Scan Seq Scanga o'xshaydi - ya'ni indekslash yo'q, faqat to'liq qidirish kerak bo'ladi 10K x 0.3ms = 3000ms cte_max tomonidan tsikllar uchun yoki 1K x 1.5ms = 1500ms cte_bind tomonidan aylanish paytida!
Aslida, natijada nimani olishni xohladingiz? Ha, odatda bu savol "uch qavatli" so'rovlarni tahlil qilishning 5-daqiqasida paydo bo'ladi.
Biz har bir noyob kalit juftligi uchun chiqishni xohladik key_a tomonidan guruhdan min/maks.
Shunday qilib, keling, buning uchun foydalanaylik
SELECT DISTINCT ON(key_a, key_b)
key_a a
, key_b b
, max(fld1) OVER(w) bind_fld1
, min(fld2) OVER(w) bind_fld2
FROM
tbl
WINDOW
w AS (PARTITION BY key_a);
Ikkala variantda ham ma'lumotlarni o'qish bir xil taxminan 4-5 ms davom etganligi sababli, bizning barcha vaqtimiz oshadi -32% - bu eng sof shaklda asosiy protsessordan yuk olib tashlandi, agar bunday so'rov etarlicha tez-tez bajarilsa.
Umuman olganda, siz bazani "dumaloqni ko'taring, kvadratni aylantiring" deb majburlamasligingiz kerak.
Manba: www.habr.com