Vaqti-vaqti bilan kalitlar to'plamidan foydalangan holda tegishli ma'lumotlarni qidirish vazifasi paydo bo'ladi. biz kerakli umumiy yozuvlar sonini olmaguncha.
Eng "haqiqiy hayot" misoli ko'rsatishdir 20 ta eng qadimgi muammolar, sanab o'tilgan xodimlar ro'yxatida (masalan, bitta bo'linma ichida). Ish sohalari haqida qisqacha ma'lumotga ega bo'lgan turli xil boshqaruv "boshqaruv paneli" uchun shunga o'xshash mavzu juda tez-tez talab qilinadi.
Ushbu maqolada biz PostgreSQL-da bunday muammoning "sodda" yechimini, "aqlliroq" va juda murakkab algoritmni amalga oshirishni ko'rib chiqamiz. Topilgan ma'lumotlardan chiqish sharti bilan SQLda "loop", bu umumiy rivojlanish uchun ham, boshqa shunga o'xshash holatlarda foydalanish uchun ham foydali bo'lishi mumkin.
dan test ma'lumotlar to'plamini olaylik
CREATE INDEX ON task(owner_id, task_date, id);
-- Π° ΡΡΠ°ΡΡΠΉ - ΡΠ΄Π°Π»ΠΈΠΌ
DROP INDEX task_owner_id_task_date_idx;
Qanday eshitilsa, shunday yoziladi
Birinchidan, ijrochilarning identifikatorlarini o'tkazib, so'rovning eng oddiy versiyasini chizamiz
SELECT
*
FROM
task
WHERE
owner_id = ANY('{1,2,4,8,16,32,64,128,256,512}'::integer[])
ORDER BY
task_date, id
LIMIT 20;
Biroz achinarli - biz atigi 20 ta yozuvga buyurtma berdik, ammo Index Scan uni bizga qaytarib berdi 960 qator, keyin ham saralanishi kerak edi... Keling, kamroq o'qishga harakat qilaylik.
unnest + ARRAY
Bizga yordam beradigan birinchi narsa, agar kerak bo'lsa faqat 20 ta saralangan yozuvlar, keyin shunchaki o'qing har biri uchun bir xil tartibda tartiblangan 20 tadan ko'p bo'lmagan kalit. Yaxshi, mos indeks (egasi_identifikatori, vazifa_sanasi, identifikatori) bizda mavjud.
Keling, olish va "ustunlarga yoyish" uchun xuddi shu mexanizmdan foydalanamiz integral jadval yozuvi, kabi ARRAY()
:
WITH T AS (
SELECT
unnest(ARRAY(
SELECT
t
FROM
task t
WHERE
owner_id = unnest
ORDER BY
task_date, id
LIMIT 20 -- ΠΎΠ³ΡΠ°Π½ΠΈΡΠΈΠ²Π°Π΅ΠΌ ΡΡΡ...
)) r
FROM
unnest('{1,2,4,8,16,32,64,128,256,512}'::integer[])
)
SELECT
(r).*
FROM
T
ORDER BY
(r).task_date, (r).id
LIMIT 20; -- ... ΠΈ ΡΡΡ - ΡΠΎΠΆΠ΅
Oh, allaqachon yaxshiroq! 40% tezroq va 4.5 baravar kam ma'lumot Men uni o'qishim kerak edi.
CTE orqali jadval yozuvlarini materiallashtirishE'tiboringizni shunga qarataman ba'zi hollarda Yozuvni pastki so'rovda qidirgandan so'ng, uni CTE-ga "o'ramasdan" darhol uning maydonlari bilan ishlashga urinish: InitPlanni "ko'paytirish" bir xil maydonlar soniga mutanosib:
SELECT
((
SELECT
t
FROM
task t
WHERE
owner_id = 1
ORDER BY
task_date, id
LIMIT 1
).*);
Result (cost=4.77..4.78 rows=1 width=16) (actual time=0.063..0.063 rows=1 loops=1)
Buffers: shared hit=16
InitPlan 1 (returns $0)
-> Limit (cost=0.42..1.19 rows=1 width=48) (actual time=0.031..0.032 rows=1 loops=1)
Buffers: shared hit=4
-> Index Scan using task_owner_id_task_date_id_idx on task t (cost=0.42..387.57 rows=500 width=48) (actual time=0.030..0.030 rows=1 loops=1)
Index Cond: (owner_id = 1)
Buffers: shared hit=4
InitPlan 2 (returns $1)
-> Limit (cost=0.42..1.19 rows=1 width=48) (actual time=0.008..0.009 rows=1 loops=1)
Buffers: shared hit=4
-> Index Scan using task_owner_id_task_date_id_idx on task t_1 (cost=0.42..387.57 rows=500 width=48) (actual time=0.008..0.008 rows=1 loops=1)
Index Cond: (owner_id = 1)
Buffers: shared hit=4
InitPlan 3 (returns $2)
-> Limit (cost=0.42..1.19 rows=1 width=48) (actual time=0.008..0.008 rows=1 loops=1)
Buffers: shared hit=4
-> Index Scan using task_owner_id_task_date_id_idx on task t_2 (cost=0.42..387.57 rows=500 width=48) (actual time=0.008..0.008 rows=1 loops=1)
Index Cond: (owner_id = 1)
Buffers: shared hit=4"
InitPlan 4 (returns $3)
-> Limit (cost=0.42..1.19 rows=1 width=48) (actual time=0.009..0.009 rows=1 loops=1)
Buffers: shared hit=4
-> Index Scan using task_owner_id_task_date_id_idx on task t_3 (cost=0.42..387.57 rows=500 width=48) (actual time=0.009..0.009 rows=1 loops=1)
Index Cond: (owner_id = 1)
Buffers: shared hit=4
Xuddi shu yozuv 4 marta "ko'zdan kechirildi" ... PostgreSQL 11gacha bu xatti-harakatlar muntazam ravishda sodir bo'ladi va yechim uni CTE-ga "o'rash" dir, bu esa ushbu versiyalarda optimallashtiruvchi uchun mutlaq chegaradir.
Rekursiv akkumulyator
Oldingi versiyada jami biz o'qiymiz 200 qator talab uchun 20. 960 emas, balki undan ham kamroq - bu mumkinmi?
Keling, kerakli bilimlardan foydalanishga harakat qilaylik jami xnumx yozuvlar. Ya'ni, biz kerakli miqdorga yetguncha ma'lumotlarni o'qishni takrorlaymiz.
1-qadam: Boshlang'ich ro'yxat
Shubhasiz, 20 ta yozuvdan iborat "maqsadli" ro'yxatimiz egasi_id kalitlarimizdan biri uchun "birinchi" yozuvlardan boshlanishi kerak. Shuning uchun, birinchi navbatda, biz buni topamiz Har bir kalit uchun "juda birinchi" va biz xohlagan tartibda saralab, ro'yxatga qo'shing - (task_date, id).
2-qadam: "Keyingi" yozuvlarni toping
Endi ro'yxatimizdan birinchi yozuvni olib, boshlasak Indeks bo'ylab "qadam" owner_id kalitini saqlab qolgan holda, barcha topilgan yozuvlar natijada olingan tanlovda aynan keyingisi bo'ladi. Albatta, faqat biz dumba kalitini kesib o'tgunimizcha ro'yxatdagi ikkinchi yozuv.
Agar biz ikkinchi rekordni "kesib o'tganimiz" aniqlansa ro'yxatga birinchi o'rniga oxirgi o'qilgan yozuv qo'shilishi kerak (xuddi shu owner_id bilan), shundan so'ng biz ro'yxatni yana saralaymiz.
Ya'ni, biz har doim ro'yxatda har bir tugma uchun bittadan ortiq yozuv mavjud emasligini bilib olamiz (agar yozuvlar tugasa va biz "kesmasa" ro'yxatdagi birinchi yozuv shunchaki yo'qoladi va hech narsa qo'shilmaydi. ), va ular har doim tartiblangan ilova kalitining o'sish tartibida (task_date, id).
3-qadam: yozuvlarni filtrlang va "kengaytiring"
Bizning rekursiv tanlovimizning ba'zi qatorlarida, ba'zi yozuvlar rv
takrorlanadi - avval biz "ro'yxatning 2-chi yozuvining chegarasini kesib o'tish" kabi narsalarni topamiz va keyin uni ro'yxatdagi 1-o'ringa almashtiramiz. Shunday qilib, birinchi hodisani filtrlash kerak.
Qo'rqinchli yakuniy so'rov
WITH RECURSIVE T AS (
-- #1 : Π·Π°Π½ΠΎΡΠΈΠΌ Π² ΡΠΏΠΈΡΠΎΠΊ "ΠΏΠ΅ΡΠ²ΡΠ΅" Π·Π°ΠΏΠΈΡΠΈ ΠΏΠΎ ΠΊΠ°ΠΆΠ΄ΠΎΠΌΡ ΠΈΠ· ΠΊΠ»ΡΡΠ΅ΠΉ Π½Π°Π±ΠΎΡΠ°
WITH wrap AS ( -- "ΠΌΠ°ΡΠ΅ΡΠΈΠ°Π»ΠΈΠ·ΡΠ΅ΠΌ" record'Ρ, ΡΡΠΎΠ±Ρ ΠΎΠ±ΡΠ°ΡΠ΅Π½ΠΈΠ΅ ΠΊ ΠΏΠΎΠ»ΡΠΌ Π½Π΅ Π²ΡΠ·ΡΠ²Π°Π»ΠΎ ΡΠΌΠ½ΠΎΠΆΠ΅Π½ΠΈΡ InitPlan/SubPlan
WITH T AS (
SELECT
(
SELECT
r
FROM
task r
WHERE
owner_id = unnest
ORDER BY
task_date, id
LIMIT 1
) r
FROM
unnest('{1,2,4,8,16,32,64,128,256,512}'::integer[])
)
SELECT
array_agg(r ORDER BY (r).task_date, (r).id) list -- ΡΠΎΡΡΠΈΡΡΠ΅ΠΌ ΡΠΏΠΈΡΠΎΠΊ Π² Π½ΡΠΆΠ½ΠΎΠΌ ΠΏΠΎΡΡΠ΄ΠΊΠ΅
FROM
T
)
SELECT
list
, list[1] rv
, FALSE not_cross
, 0 size
FROM
wrap
UNION ALL
-- #2 : Π²ΡΡΠΈΡΡΠ²Π°Π΅ΠΌ Π·Π°ΠΏΠΈΡΠΈ 1-Π³ΠΎ ΠΏΠΎ ΠΏΠΎΡΡΠ΄ΠΊΡ ΠΊΠ»ΡΡΠ°, ΠΏΠΎΠΊΠ° Π½Π΅ ΠΏΠ΅ΡΠ΅ΡΠ°Π³Π½Π΅ΠΌ ΡΠ΅ΡΠ΅Π· Π·Π°ΠΏΠΈΡΡ 2-Π³ΠΎ
SELECT
CASE
-- Π΅ΡΠ»ΠΈ Π½ΠΈΡΠ΅Π³ΠΎ Π½Π΅ Π½Π°ΠΉΠ΄Π΅Π½ΠΎ Π΄Π»Ρ ΠΊΠ»ΡΡΠ° 1-ΠΉ Π·Π°ΠΏΠΈΡΠΈ
WHEN X._r IS NOT DISTINCT FROM NULL THEN
T.list[2:] -- ΡΠ±ΠΈΡΠ°Π΅ΠΌ Π΅Π΅ ΠΈΠ· ΡΠΏΠΈΡΠΊΠ°
-- Π΅ΡΠ»ΠΈ ΠΌΡ ΠΠ ΠΏΠ΅ΡΠ΅ΡΠ΅ΠΊΠ»ΠΈ ΠΏΡΠΈΠΊΠ»Π°Π΄Π½ΠΎΠΉ ΠΊΠ»ΡΡ 2-ΠΉ Π·Π°ΠΏΠΈΡΠΈ
WHEN X.not_cross THEN
T.list -- ΠΏΡΠΎΡΡΠΎ ΠΏΡΠΎΡΡΠ³ΠΈΠ²Π°Π΅ΠΌ ΡΠΎΡ ΠΆΠ΅ ΡΠΏΠΈΡΠΎΠΊ Π±Π΅Π· ΠΌΠΎΠ΄ΠΈΡΠΈΠΊΠ°ΡΠΈΠΉ
-- Π΅ΡΠ»ΠΈ Π² ΡΠΏΠΈΡΠΊΠ΅ ΡΠΆΠ΅ Π½Π΅Ρ 2-ΠΉ Π·Π°ΠΏΠΈΡΠΈ
WHEN T.list[2] IS NULL THEN
-- ΠΏΡΠΎΡΡΠΎ Π²ΠΎΠ·Π²ΡΠ°ΡΠ°Π΅ΠΌ ΠΏΡΡΡΠΎΠΉ ΡΠΏΠΈΡΠΎΠΊ
'{}'
-- ΠΏΠ΅ΡΠ΅ΡΠΎΡΡΠΈΡΠΎΠ²ΡΠ²Π°Π΅ΠΌ ΡΠ»ΠΎΠ²Π°ΡΡ, ΡΠ±ΠΈΡΠ°Ρ 1-Ρ Π·Π°ΠΏΠΈΡΡ ΠΈ Π΄ΠΎΠ±Π°Π²Π»ΡΡ ΠΏΠΎΡΠ»Π΅Π΄Π½ΡΡ ΠΈΠ· Π½Π°ΠΉΠ΄Π΅Π½Π½ΡΡ
ELSE (
SELECT
coalesce(T.list[2] || array_agg(r ORDER BY (r).task_date, (r).id), '{}')
FROM
unnest(T.list[3:] || X._r) r
)
END
, X._r
, X.not_cross
, T.size + X.not_cross::integer
FROM
T
, LATERAL(
WITH wrap AS ( -- "ΠΌΠ°ΡΠ΅ΡΠΈΠ°Π»ΠΈΠ·ΡΠ΅ΠΌ" record
SELECT
CASE
-- Π΅ΡΠ»ΠΈ Π²ΡΠ΅-ΡΠ°ΠΊΠΈ "ΠΏΠ΅ΡΠ΅ΡΠ°Π³Π½ΡΠ»ΠΈ" ΡΠ΅ΡΠ΅Π· 2-Ρ Π·Π°ΠΏΠΈΡΡ
WHEN NOT T.not_cross
-- ΡΠΎ Π½ΡΠΆΠ½Π°Ρ Π·Π°ΠΏΠΈΡΡ - ΠΏΠ΅ΡΠ²Π°Ρ ΠΈΠ· ΡΠΏΠΏΠΈΡΠΊΠ°
THEN T.list[1]
ELSE ( -- Π΅ΡΠ»ΠΈ Π½Π΅ ΠΏΠ΅ΡΠ΅ΡΠ΅ΠΊΠ»ΠΈ, ΡΠΎ ΠΊΠ»ΡΡ ΠΎΡΡΠ°Π»ΡΡ ΠΊΠ°ΠΊ Π² ΠΏΡΠ΅Π΄ΡΠ΄ΡΡΠ΅ΠΉ Π·Π°ΠΏΠΈΡΠΈ - ΠΎΡΡΠ°Π»ΠΊΠΈΠ²Π°Π΅ΠΌΡΡ ΠΎΡ Π½Π΅Π΅
SELECT
_r
FROM
task _r
WHERE
owner_id = (rv).owner_id AND
(task_date, id) > ((rv).task_date, (rv).id)
ORDER BY
task_date, id
LIMIT 1
)
END _r
)
SELECT
_r
, CASE
-- Π΅ΡΠ»ΠΈ 2-ΠΉ Π·Π°ΠΏΠΈΡΠΈ ΡΠΆΠ΅ Π½Π΅Ρ Π² ΡΠΏΠΈΡΠΊΠ΅, Π½ΠΎ ΠΌΡ Ρ
ΠΎΡΡ ΡΡΠΎ-ΡΠΎ Π½Π°ΡΠ»ΠΈ
WHEN list[2] IS NULL AND _r IS DISTINCT FROM NULL THEN
TRUE
ELSE -- Π½ΠΈΡΠ΅Π³ΠΎ Π½Π΅ Π½Π°ΡΠ»ΠΈ ΠΈΠ»ΠΈ "ΠΏΠ΅ΡΠ΅ΡΠ°Π³Π½ΡΠ»ΠΈ"
coalesce(((_r).task_date, (_r).id) < ((list[2]).task_date, (list[2]).id), FALSE)
END not_cross
FROM
wrap
) X
WHERE
T.size < 20 AND -- ΠΎΠ³ΡΠ°Π½ΠΈΡΠΈΠ²Π°Π΅ΠΌ ΡΡΡ ΠΊΠΎΠ»ΠΈΡΠ΅ΡΡΠ²ΠΎ
T.list IS DISTINCT FROM '{}' -- ΠΈΠ»ΠΈ ΠΏΠΎΠΊΠ° ΡΠΏΠΈΡΠΎΠΊ Π½Π΅ ΠΊΠΎΠ½ΡΠΈΠ»ΡΡ
)
-- #3 : "ΡΠ°Π·Π²ΠΎΡΠ°ΡΠΈΠ²Π°Π΅ΠΌ" Π·Π°ΠΏΠΈΡΠΈ - ΠΏΠΎΡΡΠ΄ΠΎΠΊ Π³Π°ΡΠ°Π½ΡΠΈΡΠΎΠ²Π°Π½ ΠΏΠΎ ΠΏΠΎΡΡΡΠΎΠ΅Π½ΠΈΡ
SELECT
(rv).*
FROM
T
WHERE
not_cross; -- Π±Π΅ΡΠ΅ΠΌ ΡΠΎΠ»ΡΠΊΠΎ "Π½Π΅ΠΏΠ΅ΡΠ΅ΡΠ΅ΠΊΠ°ΡΡΠΈΠ΅" Π·Π°ΠΏΠΈΡΠΈ
Shunday qilib, biz bajarilish vaqtining 50% ββuchun ma'lumotlarni o'qishning 20% ni sotdi. Ya'ni, agar siz o'qish uzoq vaqt talab qilishi mumkinligiga ishonish uchun asoslaringiz bo'lsa (masalan, ma'lumotlar ko'pincha keshda bo'lmaydi va buning uchun siz diskka kirishingiz kerak), unda siz o'qishga kamroq bog'lanishingiz mumkin. .
Qanday bo'lmasin, ijro muddati "sodda" birinchi variantga qaraganda yaxshiroq bo'ldi. Ammo ushbu 3 variantdan qaysi birini ishlatish sizga bog'liq.
Manba: www.habr.com