Nggunakake kabeh fitur indeks ing PostgreSQL

Nggunakake kabeh fitur indeks ing PostgreSQL
Ing donya Postgres, indeks penting kanggo pandhu arah efisien saka panyimpenan database (disebut "tumpukan"). Postgres ora ndhukung clustering kanggo, lan arsitektur MVCC nyebabake sampeyan mungkasi munggah karo akeh versi saka tuple padha. Mulane, penting banget kanggo bisa nggawe lan njaga indeks sing efisien kanggo ndhukung aplikasi.

Ing ngisor iki sawetara tips kanggo ngoptimalake lan ningkatake panggunaan indeks.

Cathetan: pitakon sing ditampilake ing ngisor iki bisa digunakake kanthi ora diowahi database sampel pagila.

Nggunakake Indeks Panutup

Ayo goleki panjaluk kanggo ngekstrak alamat email kanggo pangguna sing ora aktif. Tabel customer ana kolom active, lan pitakon iku prasaja:

pagila=# EXPLAIN SELECT email FROM customer WHERE active=0;
                        QUERY PLAN
-----------------------------------------------------------
 Seq Scan on customer  (cost=0.00..16.49 rows=15 width=32)
   Filter: (active = 0)
(2 rows)

Pitakonan njaluk urutan scan tabel lengkap customer. Ayo nggawe indeks ing kolom active:

pagila=# CREATE INDEX idx_cust1 ON customer(active);
CREATE INDEX
pagila=# EXPLAIN SELECT email FROM customer WHERE active=0;
                                 QUERY PLAN
-----------------------------------------------------------------------------
 Index Scan using idx_cust1 on customer  (cost=0.28..12.29 rows=15 width=32)
   Index Cond: (active = 0)
(2 rows)

Iku mbantu, scan sakteruse dadi "index scan". Iki tegese Postgres bakal mindai indeks "idx_cust1", banjur terus nelusuri tumpukan tabel kanggo maca nilai kolom liyane (ing kasus iki, kolom email) sing dibutuhake pitakon.

Indeks panutup dikenalake ing PostgreSQL 11. Dheweke ngidini sampeyan nyakup siji utawa luwih kolom tambahan ing indeks kasebut - nilai kasebut disimpen ing toko data indeks.

Yen kita njupuk kauntungan saka fitur iki lan nambah nilai email ing indeks, banjur Postgres ora perlu kanggo nelusuri tumpukan meja kanggo nilai. email. Ayo ndeleng apa iki bakal bisa digunakake:

pagila=# CREATE INDEX idx_cust2 ON customer(active) INCLUDE (email);
CREATE INDEX
pagila=# EXPLAIN SELECT email FROM customer WHERE active=0;
                                    QUERY PLAN
----------------------------------------------------------------------------------
 Index Only Scan using idx_cust2 on customer  (cost=0.28..12.29 rows=15 width=32)
   Index Cond: (active = 0)
(2 rows)

Β«Index Only Scan' ngandhani yen pitakonan saiki mung mbutuhake indeks, sing mbantu supaya kabeh disk I / O kanggo maca tumpukan tabel.

Indeks panutup saiki mung kasedhiya kanggo B-wit. Nanging, ing kasus iki, gaweyan pangopènan bakal luwih dhuwur.

Nggunakake Indeks Parsial

Indeks indeks parsial mung subset saka baris ing tabel. Iki ngirit ukuran indeks lan nggawe scan luwih cepet.

Contone, kita pengin njaluk dhaptar alamat email pelanggan ing California. Panjaluk kasebut bakal kaya mangkene:

SELECT c.email FROM customer c
JOIN address a ON c.address_id = a.address_id
WHERE a.district = 'California';
which has a query plan that involves scanning both the tables that are joined:
pagila=# EXPLAIN SELECT c.email FROM customer c
pagila-# JOIN address a ON c.address_id = a.address_id
pagila-# WHERE a.district = 'California';
                              QUERY PLAN
----------------------------------------------------------------------
 Hash Join  (cost=15.65..32.22 rows=9 width=32)
   Hash Cond: (c.address_id = a.address_id)
   ->  Seq Scan on customer c  (cost=0.00..14.99 rows=599 width=34)
   ->  Hash  (cost=15.54..15.54 rows=9 width=4)
         ->  Seq Scan on address a  (cost=0.00..15.54 rows=9 width=4)
               Filter: (district = 'California'::text)
(6 rows)

Apa indeks biasa bakal menehi kita:

pagila=# CREATE INDEX idx_address1 ON address(district);
CREATE INDEX
pagila=# EXPLAIN SELECT c.email FROM customer c
pagila-# JOIN address a ON c.address_id = a.address_id
pagila-# WHERE a.district = 'California';
                                      QUERY PLAN
---------------------------------------------------------------------------------------
 Hash Join  (cost=12.98..29.55 rows=9 width=32)
   Hash Cond: (c.address_id = a.address_id)
   ->  Seq Scan on customer c  (cost=0.00..14.99 rows=599 width=34)
   ->  Hash  (cost=12.87..12.87 rows=9 width=4)
         ->  Bitmap Heap Scan on address a  (cost=4.34..12.87 rows=9 width=4)
               Recheck Cond: (district = 'California'::text)
               ->  Bitmap Index Scan on idx_address1  (cost=0.00..4.34 rows=9 width=0)
                     Index Cond: (district = 'California'::text)
(8 rows)

Scan address wis diganti dening scan indeks idx_address1banjur mindai tumpukan address.

Amarga iki minangka pitakon sing kerep lan kudu dioptimalake, kita bisa nggunakake indeks parsial, sing mung ngindeks baris kasebut kanthi alamat ing ngendi distrik kasebut. β€˜California’:

pagila=# CREATE INDEX idx_address2 ON address(address_id) WHERE district='California';
CREATE INDEX
pagila=# EXPLAIN SELECT c.email FROM customer c
pagila-# JOIN address a ON c.address_id = a.address_id
pagila-# WHERE a.district = 'California';
                                           QUERY PLAN
------------------------------------------------------------------------------------------------
 Hash Join  (cost=12.38..28.96 rows=9 width=32)
   Hash Cond: (c.address_id = a.address_id)
   ->  Seq Scan on customer c  (cost=0.00..14.99 rows=599 width=34)
   ->  Hash  (cost=12.27..12.27 rows=9 width=4)
         ->  Index Only Scan using idx_address2 on address a  (cost=0.14..12.27 rows=9 width=4)
(5 rows)

Saiki pitakon mung diwaca idx_address2 lan ora ndemek meja address.

Nggunakake Indeks Multi-Nilai

Sawetara kolom sing bakal diindeks bisa uga ora ngemot jinis data skalar. Tipe kolom kaya jsonb, arrays ΠΈ tsvector ngemot nilai gabungan utawa pirang-pirang. Yen sampeyan pengin ngindeks kolom kasebut, biasane sampeyan kudu nggoleki kabeh nilai individu ing kolom kasebut.

Coba goleki judhul kabeh film sing ngemot potongan saka njupuk sing ora kasil. Tabel film ana kolom teks diarani special_features. Yen film duwe "properti khusus" iki, kolom kasebut ngemot unsur minangka susunan teks Behind The Scenes. Kanggo nggoleki kabeh film kasebut, kita kudu milih kabeh baris kanthi "Behind The Scenes" nalika apa wae nilai array special_features:

SELECT title FROM film WHERE special_features @> '{"Behind The Scenes"}';

Operator nesting @> mriksa yen sisih tengen minangka subset saka sisih kiwa.

Panjaluk rencana:

pagila=# EXPLAIN SELECT title FROM film
pagila-# WHERE special_features @> '{"Behind The Scenes"}';
                           QUERY PLAN
-----------------------------------------------------------------
 Seq Scan on film  (cost=0.00..67.50 rows=5 width=15)
   Filter: (special_features @> '{"Behind The Scenes"}'::text[])
(2 rows)

Sing njaluk scan tumpukan lengkap kanthi biaya 67.

Ayo ndeleng manawa indeks B-wit biasa mbantu kita:

pagila=# CREATE INDEX idx_film1 ON film(special_features);
CREATE INDEX
pagila=# EXPLAIN SELECT title FROM film
pagila-# WHERE special_features @> '{"Behind The Scenes"}';
                           QUERY PLAN
-----------------------------------------------------------------
 Seq Scan on film  (cost=0.00..67.50 rows=5 width=15)
   Filter: (special_features @> '{"Behind The Scenes"}'::text[])
(2 rows)

Indeks kasebut ora dianggep. Indeks B-wit ora ngerti anane unsur individu ing nilai sing diindeks.

Kita butuh indeks GIN.

pagila=# CREATE INDEX idx_film2 ON film USING GIN(special_features);
CREATE INDEX
pagila=# EXPLAIN SELECT title FROM film
pagila-# WHERE special_features @> '{"Behind The Scenes"}';
                                QUERY PLAN
---------------------------------------------------------------------------
 Bitmap Heap Scan on film  (cost=8.04..23.58 rows=5 width=15)
   Recheck Cond: (special_features @> '{"Behind The Scenes"}'::text[])
   ->  Bitmap Index Scan on idx_film2  (cost=0.00..8.04 rows=5 width=0)
         Index Cond: (special_features @> '{"Behind The Scenes"}'::text[])
(4 rows)

Indeks GIN ndhukung pemetaan nilai tunggal marang nilai komposit sing diindeks, nyebabake biaya rencana pitakon luwih saka setengah.

Mbusak indeks duplikat

Indeks nglumpukake wektu, lan kadhangkala indeks anyar bisa ngemot definisi sing padha karo sing sadurunge. Sampeyan bisa nggunakake tampilan katalog kanggo entuk definisi indeks SQL sing bisa diwaca manungsa. pg_indexes. Sampeyan uga bisa kanthi gampang nemokake definisi sing padha:

 SELECT array_agg(indexname) AS indexes, replace(indexdef, indexname, '') AS defn
    FROM pg_indexes
GROUP BY defn
  HAVING count(*) > 1;
And here’s the result when run on the stock pagila database:
pagila=#   SELECT array_agg(indexname) AS indexes, replace(indexdef, indexname, '') AS defn
pagila-#     FROM pg_indexes
pagila-# GROUP BY defn
pagila-#   HAVING count(*) > 1;
                                indexes                                 |                                defn
------------------------------------------------------------------------+------------------------------------------------------------------
 {payment_p2017_01_customer_id_idx,idx_fk_payment_p2017_01_customer_id} | CREATE INDEX  ON public.payment_p2017_01 USING btree (customer_id
 {payment_p2017_02_customer_id_idx,idx_fk_payment_p2017_02_customer_id} | CREATE INDEX  ON public.payment_p2017_02 USING btree (customer_id
 {payment_p2017_03_customer_id_idx,idx_fk_payment_p2017_03_customer_id} | CREATE INDEX  ON public.payment_p2017_03 USING btree (customer_id
 {idx_fk_payment_p2017_04_customer_id,payment_p2017_04_customer_id_idx} | CREATE INDEX  ON public.payment_p2017_04 USING btree (customer_id
 {payment_p2017_05_customer_id_idx,idx_fk_payment_p2017_05_customer_id} | CREATE INDEX  ON public.payment_p2017_05 USING btree (customer_id
 {idx_fk_payment_p2017_06_customer_id,payment_p2017_06_customer_id_idx} | CREATE INDEX  ON public.payment_p2017_06 USING btree (customer_id
(6 rows)

Indeks Superset

Bisa kedadeyan yen sampeyan duwe akeh indeks, salah sijine ngindeks superset kolom sing ngindeks indeks liyane. Iki bisa uga ora dikarepake-superset bisa nyebabake pindai mung indeks, sing apik, nanging bisa uga mbutuhake papan sing akeh banget, utawa pitakon sing dimaksudake kanggo ngoptimalake superset ora digunakake maneh.

Yen sampeyan kudu ngotomatisasi definisi indeks kasebut, sampeyan bisa miwiti pg_index saka meja pg_catalog.

Indeks sing ora digunakake

Nalika aplikasi sing nggunakake database berkembang, uga pitakon sing digunakake. Indeks sing ditambahake sadurunge ora bisa digunakake maneh dening pitakon apa wae. Saben indeks dipindai, ditandhani dening manajer statistik, lan ing tampilan katalog sistem pg_stat_user_indexes sampeyan bisa ndeleng regane idx_scan, kang counter kumulatif. Nelusuri nilai iki sajrone sawetara wektu (sebutake sasi) bakal menehi ide sing apik babagan indeks sing ora digunakake lan bisa dibuwang.

Iki minangka pitakon kanggo entuk jumlah pindai saiki kabeh indeks ing skema β€˜public’:

SELECT relname, indexrelname, idx_scan
FROM   pg_catalog.pg_stat_user_indexes
WHERE  schemaname = 'public';
with output like this:
pagila=# SELECT relname, indexrelname, idx_scan
pagila-# FROM   pg_catalog.pg_stat_user_indexes
pagila-# WHERE  schemaname = 'public'
pagila-# LIMIT  10;
    relname    |    indexrelname    | idx_scan
---------------+--------------------+----------
 customer      | customer_pkey      |    32093
 actor         | actor_pkey         |     5462
 address       | address_pkey       |      660
 category      | category_pkey      |     1000
 city          | city_pkey          |      609
 country       | country_pkey       |      604
 film_actor    | film_actor_pkey    |        0
 film_category | film_category_pkey |        0
 film          | film_pkey          |    11043
 inventory     | inventory_pkey     |    16048
(10 rows)

Mbangun maneh indeks kanthi kunci sing luwih sithik

Indeks asring kudu dibangun maneh, contone nalika dadi kembung, lan mbangun maneh bisa nyepetake scan. Uga indeks bisa rusak. Ngganti paramèter indeks uga mbutuhake mbangun maneh.

Aktifake nggawe indeks paralel

Ing PostgreSQL 11, nggawe indeks B-Tree bebarengan. Kanggo nyepetake proses nggawe, sawetara buruh paralel bisa digunakake. Nanging, priksa manawa opsi konfigurasi iki disetel kanthi bener:

SET max_parallel_workers = 32;
SET max_parallel_maintenance_workers = 16;

Nilai standar cilik banget. Saenipun, nomer kasebut kudu nambah bebarengan karo jumlah inti prosesor. Waca liyane ing dokumentasi.

Nggawe indeks latar mburi

Sampeyan bisa nggawe indeks ing latar mburi nggunakake pilihan CONCURRENTLY printah CREATE INDEX:

pagila=# CREATE INDEX CONCURRENTLY idx_address1 ON address(district);
CREATE INDEX

Prosedur nggawe indeks iki beda karo sing biasa amarga ora mbutuhake kunci ing meja, lan mulane ora ngalangi operasi nulis. Ing sisih liya, butuh wektu luwih akeh lan nggunakake sumber daya liyane.

Postgres nyedhiyakake akeh keluwesan kanggo nggawe indeks lan cara kanggo ngatasi kasus khusus, uga cara kanggo ngatur database yen aplikasi sampeyan tuwuh kanthi cepet. Muga-muga tips iki bakal mbantu sampeyan njaluk pitakon kanthi cepet lan database siap kanggo skala.

Source: www.habr.com

Add a comment