Amfani da duk fasalulluka na fihirisa a cikin PostgreSQL

Amfani da duk fasalulluka na fihirisa a cikin PostgreSQL
A cikin duniyar Postgres, fihirisa suna da mahimmanci don ingantaccen kewayawa na ma'ajiyar bayanai (wanda ake kira "tuni"). Postgres baya goyan bayan tari don shi, kuma tsarin gine-ginen MVCC yana sa ku ƙare da yawancin nau'ikan tuple iri ɗaya. Sabili da haka, yana da matukar mahimmanci don samun damar ƙirƙira da kiyaye ingantattun fihirisa don tallafawa aikace-aikace.

Anan akwai wasu shawarwari don haɓakawa da haɓaka amfani da fihirisa.

Lura: tambayoyin da aka nuna a ƙasa suna aiki akan wanda ba a gyara ba pagila sample database.

Amfani da Rufe Fihirisa

Bari mu dubi buƙatar cire adiresoshin imel don masu amfani da ba su da aiki. Tebur customer akwai shafi active, kuma tambayar mai sauki ce:

pagila=# EXPLAIN SELECT email FROM customer WHERE active=0;
                        QUERY PLAN
-----------------------------------------------------------
 Seq Scan on customer  (cost=0.00..16.49 rows=15 width=32)
   Filter: (active = 0)
(2 rows)

Tambayar tana kiran cikakken jerin binciken tebur customer. Bari mu ƙirƙiri fihirisa akan ginshiƙi active:

pagila=# CREATE INDEX idx_cust1 ON customer(active);
CREATE INDEX
pagila=# EXPLAIN SELECT email FROM customer WHERE active=0;
                                 QUERY PLAN
-----------------------------------------------------------------------------
 Index Scan using idx_cust1 on customer  (cost=0.28..12.29 rows=15 width=32)
   Index Cond: (active = 0)
(2 rows)

Ya taimaka, binciken na gaba ya juya zuwa "index scan". Wannan yana nufin cewa Postgres zai duba ma'anar "idx_cust1", sannan ci gaba da bincika tulin tebur don karanta ƙimar sauran ginshiƙai (a cikin wannan yanayin, shafi email) abin da ake bukata.

An gabatar da fihirisar rufewa a cikin PostgreSQL 11. Suna ba ka damar haɗa ɗaya ko fiye da ƙarin ginshiƙai a cikin index kanta - ana adana ƙimar su a cikin ma'ajin bayanai.

Idan muka yi amfani da wannan fasalin kuma muka ƙara ƙimar imel a cikin fihirisar, to Postgres ba zai buƙaci bincika tarin tebur don ƙimar ba. email. Bari mu ga ko wannan zai yi aiki:

pagila=# CREATE INDEX idx_cust2 ON customer(active) INCLUDE (email);
CREATE INDEX
pagila=# EXPLAIN SELECT email FROM customer WHERE active=0;
                                    QUERY PLAN
----------------------------------------------------------------------------------
 Index Only Scan using idx_cust2 on customer  (cost=0.28..12.29 rows=15 width=32)
   Index Cond: (active = 0)
(2 rows)

«Index Only Scan' yana gaya mana cewa tambayar yanzu tana buƙatar maƙasudi kawai, wanda ke taimakawa guje wa duk I/O diski don karanta tulin tebur.

A halin yanzu ana samun maƙasudin rufewa don bishiyoyin B. Duk da haka, a wannan yanayin, ƙoƙarin kulawa zai kasance mafi girma.

Amfani da Partial Index

Fihirisar juzu'i tana baje koli kawai juzu'in layuka a cikin tebur. Wannan yana adana girman fihirisa kuma yana sa bincike cikin sauri.

Bari mu ce muna son samun jerin adiresoshin imel na abokan cinikinmu a California. Buqatar za ta kasance kamar haka:

SELECT c.email FROM customer c
JOIN address a ON c.address_id = a.address_id
WHERE a.district = 'California';
which has a query plan that involves scanning both the tables that are joined:
pagila=# EXPLAIN SELECT c.email FROM customer c
pagila-# JOIN address a ON c.address_id = a.address_id
pagila-# WHERE a.district = 'California';
                              QUERY PLAN
----------------------------------------------------------------------
 Hash Join  (cost=15.65..32.22 rows=9 width=32)
   Hash Cond: (c.address_id = a.address_id)
   ->  Seq Scan on customer c  (cost=0.00..14.99 rows=599 width=34)
   ->  Hash  (cost=15.54..15.54 rows=9 width=4)
         ->  Seq Scan on address a  (cost=0.00..15.54 rows=9 width=4)
               Filter: (district = 'California'::text)
(6 rows)

Waɗanne fihirisa na yau da kullun za su ba mu:

pagila=# CREATE INDEX idx_address1 ON address(district);
CREATE INDEX
pagila=# EXPLAIN SELECT c.email FROM customer c
pagila-# JOIN address a ON c.address_id = a.address_id
pagila-# WHERE a.district = 'California';
                                      QUERY PLAN
---------------------------------------------------------------------------------------
 Hash Join  (cost=12.98..29.55 rows=9 width=32)
   Hash Cond: (c.address_id = a.address_id)
   ->  Seq Scan on customer c  (cost=0.00..14.99 rows=599 width=34)
   ->  Hash  (cost=12.87..12.87 rows=9 width=4)
         ->  Bitmap Heap Scan on address a  (cost=4.34..12.87 rows=9 width=4)
               Recheck Cond: (district = 'California'::text)
               ->  Bitmap Index Scan on idx_address1  (cost=0.00..4.34 rows=9 width=0)
                     Index Cond: (district = 'California'::text)
(8 rows)

Duba address an maye gurbinsu da index scan idx_address1sannan ta leka tulin address.

Tun da yake wannan tambaya ce akai-akai kuma yana buƙatar ingantawa, za mu iya amfani da fihirisar juzu'i, wanda ke nuna waɗancan layuka ne kawai tare da adiresoshin da gundumar. ‘California’:

pagila=# CREATE INDEX idx_address2 ON address(address_id) WHERE district='California';
CREATE INDEX
pagila=# EXPLAIN SELECT c.email FROM customer c
pagila-# JOIN address a ON c.address_id = a.address_id
pagila-# WHERE a.district = 'California';
                                           QUERY PLAN
------------------------------------------------------------------------------------------------
 Hash Join  (cost=12.38..28.96 rows=9 width=32)
   Hash Cond: (c.address_id = a.address_id)
   ->  Seq Scan on customer c  (cost=0.00..14.99 rows=599 width=34)
   ->  Hash  (cost=12.27..12.27 rows=9 width=4)
         ->  Index Only Scan using idx_address2 on address a  (cost=0.14..12.27 rows=9 width=4)
(5 rows)

Yanzu tambayar tana karantawa kawai idx_address2 kuma baya taba teburin address.

Amfani da Multi-Value Index

Wasu ginshiƙan da za a yi maƙasudin ƙila ba su ƙunshi nau'in bayanai mai ma'ana ba. Nau'in ginshiƙi kamar jsonb, arrays и tsvector ya ƙunshi ƙima ko ƙima mai yawa. Idan kana buƙatar fiɗa irin waɗannan ginshiƙai, yawanci dole ne ka bincika duk ƙimar daidaitattun ginshiƙan.

Mu yi ƙoƙarin nemo sunayen duk fina-finan da ke ɗauke da yanke daga abubuwan da ba su yi nasara ba. Tebur film akwai ginshiƙin rubutu da ake kira special_features. Idan fim din yana da wannan "kayayyakin musamman", to, ginshiƙi ya ƙunshi kashi a matsayin tsararrun rubutu Behind The Scenes. Don nemo duk irin waɗannan fina-finai, muna buƙatar zaɓar duk layuka tare da "Bayan The Scenes" lokacin wani tsararru dabi'u special_features:

SELECT title FROM film WHERE special_features @> '{"Behind The Scenes"}';

Ma'aikacin gida @> yana bincika idan gefen dama yanki ne na gefen hagu.

Shirin nema:

pagila=# EXPLAIN SELECT title FROM film
pagila-# WHERE special_features @> '{"Behind The Scenes"}';
                           QUERY PLAN
-----------------------------------------------------------------
 Seq Scan on film  (cost=0.00..67.50 rows=5 width=15)
   Filter: (special_features @> '{"Behind The Scenes"}'::text[])
(2 rows)

Wanda ke buƙatar cikakken sikanin tsibiri tare da farashin 67.

Bari mu ga idan ma'aunin bishiyar B na yau da kullun yana taimaka mana:

pagila=# CREATE INDEX idx_film1 ON film(special_features);
CREATE INDEX
pagila=# EXPLAIN SELECT title FROM film
pagila-# WHERE special_features @> '{"Behind The Scenes"}';
                           QUERY PLAN
-----------------------------------------------------------------
 Seq Scan on film  (cost=0.00..67.50 rows=5 width=15)
   Filter: (special_features @> '{"Behind The Scenes"}'::text[])
(2 rows)

Ba a ma la'akari da fihirisar ba. Fihirisar itacen B ba ta san da wanzuwar abubuwa guda ɗaya a cikin ƙididdiga masu ƙima ba.

Muna buƙatar ma'aunin GIN.

pagila=# CREATE INDEX idx_film2 ON film USING GIN(special_features);
CREATE INDEX
pagila=# EXPLAIN SELECT title FROM film
pagila-# WHERE special_features @> '{"Behind The Scenes"}';
                                QUERY PLAN
---------------------------------------------------------------------------
 Bitmap Heap Scan on film  (cost=8.04..23.58 rows=5 width=15)
   Recheck Cond: (special_features @> '{"Behind The Scenes"}'::text[])
   ->  Bitmap Index Scan on idx_film2  (cost=0.00..8.04 rows=5 width=0)
         Index Cond: (special_features @> '{"Behind The Scenes"}'::text[])
(4 rows)

Fihirisar GIN tana goyan bayan yin taswirar ƙima guda ɗaya a kan ƙimar ƙima mai ƙima, yana haifar da farashin shirin tambaya wanda ya wuce rabi.

Cire kwafin fihirisa

Fihirisa suna taruwa cikin lokaci, kuma wani lokacin sabon fihirisar na iya ƙunsar ma'anar ɗaya da ɗaya daga cikin waɗanda suka gabata. Kuna iya amfani da ra'ayin kasidar don samun ma'anar ma'anar SQL-mai karantawa na ɗan adam. pg_indexes. Hakanan zaka iya samun ma'anoni iri ɗaya cikin sauƙi:

 SELECT array_agg(indexname) AS indexes, replace(indexdef, indexname, '') AS defn
    FROM pg_indexes
GROUP BY defn
  HAVING count(*) > 1;
And here’s the result when run on the stock pagila database:
pagila=#   SELECT array_agg(indexname) AS indexes, replace(indexdef, indexname, '') AS defn
pagila-#     FROM pg_indexes
pagila-# GROUP BY defn
pagila-#   HAVING count(*) > 1;
                                indexes                                 |                                defn
------------------------------------------------------------------------+------------------------------------------------------------------
 {payment_p2017_01_customer_id_idx,idx_fk_payment_p2017_01_customer_id} | CREATE INDEX  ON public.payment_p2017_01 USING btree (customer_id
 {payment_p2017_02_customer_id_idx,idx_fk_payment_p2017_02_customer_id} | CREATE INDEX  ON public.payment_p2017_02 USING btree (customer_id
 {payment_p2017_03_customer_id_idx,idx_fk_payment_p2017_03_customer_id} | CREATE INDEX  ON public.payment_p2017_03 USING btree (customer_id
 {idx_fk_payment_p2017_04_customer_id,payment_p2017_04_customer_id_idx} | CREATE INDEX  ON public.payment_p2017_04 USING btree (customer_id
 {payment_p2017_05_customer_id_idx,idx_fk_payment_p2017_05_customer_id} | CREATE INDEX  ON public.payment_p2017_05 USING btree (customer_id
 {idx_fk_payment_p2017_06_customer_id,payment_p2017_06_customer_id_idx} | CREATE INDEX  ON public.payment_p2017_06 USING btree (customer_id
(6 rows)

Fihirisar Superset

Yana iya faruwa cewa kun ƙare da firikwensin da yawa, ɗaya daga cikinsu yana nuna manyan ginshiƙai waɗanda ke nuna wasu fihirisa. Wannan na iya yiwuwa ko ba za a so ba — superset ɗin na iya haifar da sikanin fihirisa kawai, wanda yake da kyau, amma yana iya ɗaukar sarari da yawa, ko kuma ba a daina amfani da tambayar da aka yi niyya don ingantawa.

Idan kuna buƙatar sarrafa ma'anar irin waɗannan fihirisa, zaku iya farawa da pg_index daga tebur pg_catalog.

Fihirisar da ba a yi amfani da su ba

Kamar yadda aikace-aikacen da ke amfani da bayanan bayanai ke tasowa, haka kuma tambayoyin da suke amfani da su. Ba za a iya amfani da fihirisar da aka ƙara a baya ta kowace tambaya ba. Duk lokacin da aka duba fihirisa, mai sarrafa kididdiga ya yi masa alama, kuma a cikin duba kundin tsarin pg_stat_user_indexes kana iya ganin darajar idx_scan, wanda shine ma'auni na tarawa. Bibiyar wannan ƙimar na ɗan lokaci (a ce wata ɗaya) zai ba da kyakkyawan ra'ayi na waɗanne fihirisa ba a amfani da su kuma ana iya jefar da su.

Anan akwai tambaya don samun ƙidayar sikanin duk fihirisar da ke cikin tsarin ‘public’:

SELECT relname, indexrelname, idx_scan
FROM   pg_catalog.pg_stat_user_indexes
WHERE  schemaname = 'public';
with output like this:
pagila=# SELECT relname, indexrelname, idx_scan
pagila-# FROM   pg_catalog.pg_stat_user_indexes
pagila-# WHERE  schemaname = 'public'
pagila-# LIMIT  10;
    relname    |    indexrelname    | idx_scan
---------------+--------------------+----------
 customer      | customer_pkey      |    32093
 actor         | actor_pkey         |     5462
 address       | address_pkey       |      660
 category      | category_pkey      |     1000
 city          | city_pkey          |      609
 country       | country_pkey       |      604
 film_actor    | film_actor_pkey    |        0
 film_category | film_category_pkey |        0
 film          | film_pkey          |    11043
 inventory     | inventory_pkey     |    16048
(10 rows)

Sake gina fihirisa tare da ƴan makullai

Sau da yawa ana buƙatar sake gina fihirisa, misali lokacin da suka yi kumbura, kuma sake ginawa na iya hanzarta binciken. Hakanan fihirisa na iya lalacewa. Canza sigogin fihirisa na iya buƙatar sake gina shi.

Kunna ƙirar layi ɗaya

A cikin PostgreSQL 11, ƙirƙirar jigon B-Tree yana tare. Don hanzarta aikin ƙirƙira, ana iya amfani da ma'aikata iri ɗaya da yawa. Koyaya, tabbatar an saita waɗannan zaɓuɓɓukan daidaitawa daidai:

SET max_parallel_workers = 32;
SET max_parallel_maintenance_workers = 16;

Ma'auni na asali sun yi ƙanƙanta sosai. Da kyau, waɗannan lambobi yakamata su haɓaka tare da adadin abubuwan sarrafawa. Kara karantawa a ciki takardun.

Ƙirƙirar fihirisar bango

Kuna iya ƙirƙirar fihirisa a bango ta amfani da zaɓi CONCURRENTLY umarni CREATE INDEX:

pagila=# CREATE INDEX CONCURRENTLY idx_address1 ON address(district);
CREATE INDEX

Wannan tsarin ƙirƙirar ƙididdiga ya bambanta da wanda aka saba da shi saboda baya buƙatar kulle akan tebur, sabili da haka baya toshe ayyukan rubutu. A gefe guda, yana ɗaukar lokaci mai yawa kuma yana cinye ƙarin albarkatu.

Postgres yana ba da sassauci mai yawa don ƙirƙirar fihirisa da hanyoyin magance kowane yanayi na musamman, da kuma hanyoyin sarrafa bayanan bayanai idan aikace-aikacenku ya girma sosai. Muna fatan waɗannan shawarwari za su taimaka muku samun tambayoyinku cikin sauri da shirye-shiryen bayananku don daidaitawa.

source: www.habr.com

Add a comment