PostgreSQL ์•ˆํ‹ฐํŒจํ„ด: CTE x CTE

์—…๋ฌด์ƒ ๊ฐœ๋ฐœ์ž๊ฐ€ ์š”์ฒญ์„ ์ž‘์„ฑํ•˜๊ณ  ''๋ผ๊ณ  ์ƒ๊ฐํ•˜๋Š” ์ƒํ™ฉ์„ ์ฒ˜๋ฆฌํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.๋ฒ ์ด์Šค๋Š” ์Šค๋งˆํŠธํ•ด์„œ ๋ชจ๋“  ๊ฒƒ์„ ์ž์ฒด์ ์œผ๋กœ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค!ยซ

์–ด๋–ค ๊ฒฝ์šฐ์—๋Š”(๋ถ€๋ถ„์ ์œผ๋กœ๋Š” ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ๊ธฐ๋Šฅ์— ๋Œ€ํ•œ ๋ฌด์ง€, ๋ถ€๋ถ„์ ์œผ๋กœ๋Š” ์กฐ๊ธฐ ์ตœ์ ํ™”๋กœ ์ธํ•ด) ์ด๋Ÿฌํ•œ ์ ‘๊ทผ ๋ฐฉ์‹์œผ๋กœ ์ธํ•ด "ํ”„๋ž‘์ผ„์Šˆํƒ€์ธ"์ด ๋‚˜ํƒ€๋‚ฉ๋‹ˆ๋‹ค.

๋จผ์ € ๊ทธ๋Ÿฌํ•œ ์š”์ฒญ์˜ ์˜ˆ๋ฅผ ๋“ค์–ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

-- ะดะปั ะบะฐะถะดะพะน ะบะปัŽั‡ะตะฒะพะน ะฟะฐั€ั‹ ะฝะฐั…ะพะดะธะผ ะฐััะพั†ะธะธั€ะพะฒะฐะฝะฝั‹ะต ะทะฝะฐั‡ะตะฝะธั ะฟะพะปะตะน
WITH RECURSIVE cte_bind AS (
  SELECT DISTINCT ON (key_a, key_b)
    key_a a
  , key_b b
  , fld1 bind_fld1
  , fld2 bind_fld2
  FROM
    tbl
)
-- ะฝะฐั…ะพะดะธะผ min/max ะทะฝะฐั‡ะตะฝะธะน ะดะปั ะบะฐะถะดะพะณะพ ะฟะตั€ะฒะพะณะพ ะบะปัŽั‡ะฐ
, cte_max AS (
  SELECT
    a
  , max(bind_fld1) bind_fld1
  , min(bind_fld2) bind_fld2
  FROM
    cte_bind
  GROUP BY
    a
)
-- ัะฒัะทั‹ะฒะฐะตะผ ะฟะพ ะฟะตั€ะฒะพะผัƒ ะบะปัŽั‡ัƒ ะบะปัŽั‡ะตะฒั‹ะต ะฟะฐั€ั‹ ะธ min/max-ะทะฝะฐั‡ะตะฝะธั
, cte_a_bind AS (
  SELECT
    cte_bind.a
  , cte_bind.b
  , cte_max.bind_fld1
  , cte_max.bind_fld2
  FROM
    cte_bind
  INNER JOIN
    cte_max
      ON cte_max.a = cte_bind.a
)
SELECT * FROM cte_a_bind;

์š”์ฒญ ํ’ˆ์งˆ์„ ์‹ค์งˆ์ ์œผ๋กœ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•ด ์ž„์˜์˜ ๋ฐ์ดํ„ฐ ์„ธํŠธ๋ฅผ ๋งŒ๋“ค์–ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

CREATE TABLE tbl AS
SELECT
  (random() * 1000)::integer key_a
, (random() * 1000)::integer key_b
, (random() * 10000)::integer fld1
, (random() * 10000)::integer fld2
FROM
  generate_series(1, 10000);
CREATE INDEX ON tbl(key_a, key_b);

๊ทธ๊ฒƒ์€ ๋ฐํ˜€์กŒ๋‹ค ๋ฐ์ดํ„ฐ๋ฅผ ์ฝ๋Š” ๋ฐ XNUMX/XNUMX๋„ ์ฑ„ ๊ฑธ๋ฆฌ์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค. ์ฟผ๋ฆฌ ์‹คํ–‰:

PostgreSQL ์•ˆํ‹ฐํŒจํ„ด: CTE x CTE[explain.tensor.ru ์ฐธ์กฐ]

ํ•œ์กฐ๊ฐ์”ฉ ๋ถ„ํ•ดํ•ด์„œ

์š”์ฒญ์„ ์ž์„ธํžˆ ์‚ดํŽด๋ณด๊ณ  ๋‹นํ™ฉํ•ด ๋ด…์‹œ๋‹ค.

  1. ์žฌ๊ท€์  CTE๊ฐ€ ์—†๋Š”๋ฐ ์™œ ์—ฌ๊ธฐ์— WITH RECURSIVE๊ฐ€ ์žˆ๋‚˜์š”?
  2. ์–ด์จŒ๋“  ์›๋ณธ ์ƒ˜ํ”Œ์— ์—ฐ๊ฒฐ๋˜์–ด ์žˆ๋Š” ๊ฒฝ์šฐ ์ตœ์†Œ/์ตœ๋Œ€ ๊ฐ’์„ ๋ณ„๋„์˜ CTE๋กœ ๊ทธ๋ฃนํ™”ํ•˜๋Š” ์ด์œ ๋Š” ๋ฌด์—‡์ž…๋‹ˆ๊นŒ?
    +25% ์‹œ๊ฐ„
  3. ์ด์ „ CTE๋ฅผ ๋ฐ˜๋ณตํ•˜๊ธฐ ์œ„ํ•ด ๋์— ๋ฌด์กฐ๊ฑด 'SELECT * FROM'์„ ์‚ฌ์šฉํ•˜๋Š” ์ด์œ ๋Š” ๋ฌด์—‡์ž…๋‹ˆ๊นŒ?
    +14% ์‹œ๊ฐ„

์ด ๊ฒฝ์šฐ ์—ฐ๊ฒฐ์— Nested Loop๊ฐ€ ์•„๋‹Œ Hash Join์ด ์„ ํƒ๋˜์—ˆ๋‹ค๋Š” ์ ์€ ๋งค์šฐ ์šด์ด ์ข‹์•˜์Šต๋‹ˆ๋‹ค. ์™œ๋ƒํ•˜๋ฉด CTE ์Šค์บ” ํŒจ์Šค ํ•˜๋‚˜๋งŒ์ด ์•„๋‹ˆ๋ผ 10K๋ฅผ ๋ฐ›์•˜๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค!

CTE ์Šค์บ”์— ๋Œ€ํ•ด ์กฐ๊ธˆ์—ฌ๊ธฐ์„œ ์šฐ๋ฆฌ๋Š” ๊ทธ๊ฒƒ์„ ๊ธฐ์–ตํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. CTE ์Šค์บ”์€ Seq ์Šค์บ”๊ณผ ์œ ์‚ฌํ•ฉ๋‹ˆ๋‹ค. - ์ฆ‰, ์ƒ‰์ธ์„ ์ƒ์„ฑํ•˜์ง€ ์•Š๊ณ  ์ „์ฒด ๊ฒ€์ƒ‰๋งŒ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค. 10K x 0.3ms = 3000ms cte_max์˜ ์‚ฌ์ดํด์šฉ ๋˜๋Š” 1K x 1.5ms = 1500ms cte_bind๋กœ ๋ฐ˜๋ณตํ•  ๋•Œ!
์‚ฌ์‹ค, ๊ทธ ๊ฒฐ๊ณผ ๋ฌด์—‡์„ ์–ป๊ณ  ์‹ถ์—ˆ๋‚˜์š”? ๋„ค, ๋ณดํ†ต "5์ธต" ์ฟผ๋ฆฌ๋ฅผ ๋ถ„์„ํ•˜๋Š” XNUMX๋ถ„ ์ •๋„์— ์ด๋Ÿฐ ์งˆ๋ฌธ์ด ๋‚˜์˜ต๋‹ˆ๋‹ค.

์šฐ๋ฆฌ๋Š” ๊ฐ๊ฐ์˜ ๊ณ ์œ ํ•œ ํ‚ค ์Œ์— ๋Œ€ํ•ด ์ถœ๋ ฅ์„ ์›ํ–ˆ์Šต๋‹ˆ๋‹ค. key_a๋ณ„ ๊ทธ๋ฃน์˜ ์ตœ์†Œ/์ตœ๋Œ€.
๊ทธ๋Ÿผ ์ด๊ฑธ๋กœ ์จ๋ณด์ž ์ฐฝ ๊ธฐ๋Šฅ:

SELECT DISTINCT ON(key_a, key_b)
	key_a a
,	key_b b
,	max(fld1) OVER(w) bind_fld1
,	min(fld2) OVER(w) bind_fld2
FROM
	tbl
WINDOW
	w AS (PARTITION BY key_a);

PostgreSQL ์•ˆํ‹ฐํŒจํ„ด: CTE x CTE
[explain.tensor.ru ์ฐธ์กฐ]

๋‘ ์˜ต์…˜ ๋ชจ๋‘์—์„œ ๋ฐ์ดํ„ฐ๋ฅผ ์ฝ๋Š” ๋ฐ ์•ฝ 4-5ms๊ฐ€ ๊ฑธ๋ฆฌ๋ฏ€๋กœ ๋ชจ๋“  ์‹œ๊ฐ„์ด ๋Š˜์–ด๋‚ฉ๋‹ˆ๋‹ค. -32 % - ์ด๊ฒƒ์€ ๊ฐ€์žฅ ์ˆœ์ˆ˜ํ•œ ํ˜•ํƒœ์ด๋‹ค. ๊ธฐ๋ณธ CPU์—์„œ ๋ถ€ํ•˜๊ฐ€ ์ œ๊ฑฐ๋จ, ๊ทธ๋Ÿฌํ•œ ์š”์ฒญ์ด ์ถฉ๋ถ„ํžˆ ์ž์ฃผ ์‹คํ–‰๋˜๋Š” ๊ฒฝ์šฐ.

์ผ๋ฐ˜์ ์œผ๋กœ ๋ฒ ์ด์Šค์— "๋‘ฅ๊ทผ ๊ฒƒ์€ ๋“ค๊ณ  ๋„ค๋ชจ๋Š” ๊ตด๋ฆฐ๋‹ค"๊ณ  ๊ฐ•์š”ํ•ด์„œ๋Š” ์•ˆ ๋ฉ๋‹ˆ๋‹ค.

์ถœ์ฒ˜ : habr.com

์ฝ”๋ฉ˜ํŠธ๋ฅผ ์ถ”๊ฐ€