Pale ea patlisiso e le 'ngoe ea SQL

Selemong se fetileng ka Tshitwe ke ile ka fumana tlaleho e monate ya ditsitsiri ho tswa ho sehlopha sa tshehetso sa VWO. Nako ea ho kenya e 'ngoe ea litlaleho tsa analytics bakeng sa moreki e moholo oa khoebo e ne e bonahala e le thata. 'Me kaha sena ke sebaka sa ka sa boikarabelo, hang-hang ke ile ka tsepamisa maikutlo ho rarolla bothata.

prehistory

Ho hlakisa seo ke buang ka sona, ke tla o bolella hanyenyane ka VWO. Ena ke sethala seo ka sona u ka qalisang matšolo a fapaneng a reretsoeng liwebsaeteng tsa hau: etsa liteko tsa A/B, ho latella baeti le liphetoho, ho sekaseka fani ea thekiso, ho bonts'a limmapa tsa mocheso le ho bapala lirekoto tsa ketelo.

Empa ntho ea bohlokoahali ka sethala ke ho tlaleha. Mesebetsi eohle e ka holimo e hokahane. 'Me bakeng sa bareki ba mekhatlo, boitsebiso bo bongata bo ka be bo se na thuso ntle le sethala se matla se se hlahisang ka mokhoa oa analytics.

U sebelisa sethala, u ka etsa potso e sa reroang ho sete e kholo ea data. Mohlala o bonolo ke ona:

Hlahisa ho tobetsa tsohle leqepheng la "abc.com" HO TLOHA <date d1> HO FIHLELA <date d2> bakeng sa batho ba sebelisitseng Chrome KAPA (e Europe LE ba sebelisang iPhone)

Ela hloko basebelisi ba Boolean. Li fumaneha ho bareki ka har'a sebopeho sa lipotso ho etsa lipotso tse rarahaneng ho fumana lisampole.

Kopo e liehang

Moreki eo ho buuoang ka eena o ne a leka ho etsa ntho e 'ngoe e lokelang ho sebetsa kapele:

Bontša lirekoto tsohle tsa lenaneo bakeng sa basebelisi ba etileng leqephe lefe kapa lefe le nang le URL e nang le "/jobs"

Sebaka sena sa marang-rang se ne se e-na le sephethe-phethe 'me re ne re boloka li-URL tse ikhethang tse fetang milione molemong oa sona. Mme ba ne ba batla ho fumana template e bonolo ea URL e amanang le mofuta oa bona oa khoebo.

Patlisiso ea pele

Ha re shebeng se etsahalang polokelong ea litaba. Ka tlase ke potso ea mantlha ea SQL e liehang:

SELECT 
    count(*) 
FROM 
    acc_{account_id}.urls as recordings_urls, 
    acc_{account_id}.recording_data as recording_data, 
    acc_{account_id}.sessions as sessions 
WHERE 
    recording_data.usp_id = sessions.usp_id 
    AND sessions.referrer_id = recordings_urls.id 
    AND  (  urls &&  array(select id from acc_{account_id}.urls where url  ILIKE  '%enterprise_customer.com/jobs%')::text[]   ) 
    AND r_time > to_timestamp(1542585600) 
    AND r_time < to_timestamp(1545177599) 
    AND recording_data.duration >=5 
    AND recording_data.num_of_pages > 0 ;

'Me linako ke tsena:

Nako e reriloeng: 1.480 ms Nako ea ho phethahatsa: 1431924.650 ms

Potso e ile ea khasa mela e likete tse 150. Moralo oa lipotso o bonts'itse lintlha tse 'maloa tse khahlisang, empa ha ho na mathata a hlakileng.

Ha re ithute kopo ho ea pele. Joalokaha u ka bona, oa etsa joalo JOIN litafole tse tharo:

  1. mananeo: ho bonts'a tlhahisoleseling ea seshene: sebatli, moemeli oa mosebelisi, naha, joalo-joalo.
  2. recording_data: li-URL tse rekotiloeng, maqephe, nako ea maeto
  3. urls: Ho qoba ho kopitsa li-URL tse kholo haholo, re li boloka tafoleng e arohaneng.

Hape hlokomela hore litafole tsohle tsa rona li se li arotsoe ka account_id. Ka tsela ena, boemo boo akhaonto e le 'ngoe e kholo ka ho khetheha e bakang mathata ho ba bang e sa kenyelelitsoe.

Ho batla lintlha

Ha re shebisisa hantle, re bona hore ho na le phoso ka kopo e itseng. Ho bohlokoa ho shebisisa mola ona:

urls && array(
	select id from acc_{account_id}.urls 
	where url  ILIKE  '%enterprise_customer.com/jobs%'
)::text[]

Mohopolo oa pele e ne e le hore mohlomong hobane ILIKE ho li-URL tsena tsohle tse telele (re na le tse fetang limilione tse 1,4 ikhethang Li-URL tse bokeletsoeng ak'haonte ena) ts'ebetso e kanna ea senyeha.

Empa che, ha se taba eo!

SELECT id FROM urls WHERE url ILIKE '%enterprise_customer.com/jobs%';
  id
--------
 ...
(198661 rows)

Time: 5231.765 ms

Kopo ea ho batla template ka boeona e nka metsotsoana e 5 feela. Ho batla mohlala ho li-URL tse ikhethang tse milione ha se bothata.

Motho ea latelang ea belaelloang lethathamong ke tse 'maloa JOIN. Mohlomong tšebeliso ea bona e feteletseng e bakile ho fokotseha? Hangata JOINKe bakhethoa ba hlakileng ka ho fetesisa ba mathata a ts'ebetso, empa ke ne ke sa kholoe hore taba ea rona e ne e tloaelehile.

analytics_db=# SELECT
    count(*)
FROM
    acc_{account_id}.urls as recordings_urls,
    acc_{account_id}.recording_data_0 as recording_data,
    acc_{account_id}.sessions_0 as sessions
WHERE
    recording_data.usp_id = sessions.usp_id
    AND sessions.referrer_id = recordings_urls.id
    AND r_time > to_timestamp(1542585600)
    AND r_time < to_timestamp(1545177599)
    AND recording_data.duration >=5
    AND recording_data.num_of_pages > 0 ;
 count
-------
  8086
(1 row)

Time: 147.851 ms

'Me sena le sona e ne e se taba ea rona. JOIN's e ile ea bonahala e potlakile haholo.

Ho fokotsa selikalikoe sa babelaelloa

Ke ne ke itokiselitse ho qala ho fetola potso ho fihlela ntlafatso leha e le efe ea ts'ebetso. 'Na le sehlopha sa ka re thehile mehopolo e 2 ea mantlha:

  • Sebelisa EXISTS bakeng sa lipotso tsa URL: Re ne re batla ho hlahloba hape hore na ho bile le mathata ka subquery ea li-URL. Tsela e 'ngoe ea ho finyella sena ke ho sebelisa feela EXISTS. EXISTS ka ntlafatsa haholo ts'ebetso kaha e fela hang hang ha e fumana khoele e le 'ngoe feela e lumellanang le boemo.

SELECT
	count(*) 
FROM 
    acc_{account_id}.urls as recordings_urls,
    acc_{account_id}.recording_data as recording_data,
    acc_{account_id}.sessions as sessions
WHERE
    recording_data.usp_id = sessions.usp_id
    AND  (  1 = 1  )
    AND sessions.referrer_id = recordings_urls.id
    AND  (exists(select id from acc_{account_id}.urls where url  ILIKE '%enterprise_customer.com/jobs%'))
    AND r_time > to_timestamp(1547585600)
    AND r_time < to_timestamp(1549177599)
    AND recording_data.duration >=5
    AND recording_data.num_of_pages > 0 ;
 count
 32519
(1 row)
Time: 1636.637 ms

Ho joalo, ho joalo. Subquery ha e phuthetsoe EXISTS, e etsa hore tsohle li potlake haholo. Potso e latelang e utloahalang ke hore na ke hobane'ng ha kopo e nang le JOIN-ami le subquery ka boeona e potlakile ka bonngoe, empa e lieha haholo hammoho?

  • Ho fetisetsa subquery ho CTE : Haeba potso e potlakile ka bo eona, re ka bala sephetho sa kapele pele ebe re fana ka eona potsong ea mantlha.

WITH matching_urls AS (
    select id::text from acc_{account_id}.urls where url  ILIKE  '%enterprise_customer.com/jobs%'
)

SELECT 
    count(*) FROM acc_{account_id}.urls as recordings_urls, 
    acc_{account_id}.recording_data as recording_data, 
    acc_{account_id}.sessions as sessions,
    matching_urls
WHERE 
    recording_data.usp_id = sessions.usp_id 
    AND  (  1 = 1  )  
    AND sessions.referrer_id = recordings_urls.id
    AND (urls && array(SELECT id from matching_urls)::text[])
    AND r_time > to_timestamp(1542585600) 
    AND r_time < to_timestamp(1545107599)
    AND recording_data.duration >=5 
    AND recording_data.num_of_pages > 0;

Empa e ne e ntse e tsamaea butle haholo.

Ho fumana molato

Nakong ena eohle, ho ne ho benya ntho e le ’ngoe ka pel’a mahlo a ka, eo ke neng ke lula ke e qhelela ka thōko. Empa kaha ho ne ho se letho le leng le setseng, le ’na ke ile ka etsa qeto ea ho mo sheba. Ke bua ka && mosebeletsi. Sala hantle EXISTS tshebetso e ntlafalitsoeng feela && e ne e le eona feela ntho e tloaelehileng e setseng ho mefuta eohle ea potso e liehang.

Ho sheba litokomane, re bona seo && e sebelisitsoeng ha o hloka ho fumana likarolo tse tloaelehileng lipakeng tsa lihlopha tse peli.

Kopong ea mantlha ke ena:

AND  (  urls &&  array(select id from acc_{account_id}.urls where url  ILIKE  '%enterprise_customer.com/jobs%')::text[]   )

Ho bolelang hore re etsa patlisiso ea mohlala ho li-URL tsa rona, ebe re fumana mateano a litsela le li-URL tsohle tse nang le melaetsa e tloaelehileng. Sena sea ferekanya hobane "urls" mona ha e bolele tafole e nang le li-URL kaofela, empa e bua ka kholomo ea "urls" e tafoleng. recording_data.

Ka lipelaelo tse ntseng li hola mabapi le &&, Ke ile ka leka ho fumana netefatso bakeng sa bona morerong oa ho botsa o hlahisitsoeng EXPLAIN ANALYZE (Ke ne ke se ke ntse ke e-na le moralo o bolokiloeng, empa hangata ke phutholoha haholoanyane ho etsa liteko ho SQL ho feta ho leka ho utloisisa opacity ea bahlophisi ba lipotso).

Filter: ((urls && ($0)::text[]) AND (r_time > '2018-12-17 12:17:23+00'::timestamp with time zone) AND (r_time < '2018-12-18 23:59:59+00'::timestamp with time zone) AND (duration >= '5'::double precision) AND (num_of_pages > 0))
                           Rows Removed by Filter: 52710

Ho ne ho e-na le mela e mengata ea li-filters feela ho tloha &&. E leng se neng se bolela hore ts'ebetso ena e ne e sa theko e boima feela, empa hape e ne e etsoa ka makhetlo a 'maloa.

Ke ile ka leka sena ka ho arola boemo

SELECT 1
FROM 
    acc_{account_id}.urls as recordings_urls, 
    acc_{account_id}.recording_data_30 as recording_data_30, 
    acc_{account_id}.sessions_30 as sessions_30 
WHERE 
	urls &&  array(select id from acc_{account_id}.urls where url  ILIKE  '%enterprise_customer.com/jobs%')::text[]

Potso ena e ne e lieha. Hobane the JOIN-s li potlakile 'me li-subqueries li potlakile, ntho feela e neng e setse e ne e le && mosebeletsi.

Ena ke ts'ebetso ea bohlokoa feela. Kamehla re hloka ho batlisisa lethathamong lohle la li-URL ho batla paterone, 'me kamehla re hloka ho fumana mateano a litsela. Ha re khone ho batla ka lirekoto tsa URL ka kotloloho, hobane tsena ke li-ID tse buang ka tsona urls.

Tseleng ya tharollo

&& butle hobane lihlopha tse peli li kholo. Opereishene e tla ba kapele ha nka fetola urls mabapi le { "http://google.com/", "http://wingify.com/" }.

Ke ile ka qala ho batla mokhoa oa ho etsa seta mateano a litsela ho Postgres ntle le ho sebelisa &&, empa ntle le katleho e ngata.

Qetellong, re ile ra etsa qeto ea ho rarolla bothata feela re le bang: mpha tsohle urls mela eo URL e ts'oanang le paterone ea eona. Ntle le maemo a eketsehileng e tla ba - 

SELECT urls.url
FROM 
	acc_{account_id}.urls as urls,
	(SELECT unnest(recording_data.urls) AS id) AS unrolled_urls
WHERE
	urls.id = unrolled_urls.id AND
	urls.url  ILIKE  '%jobs%'

Sebakeng seo JOIN syntax Ke sa tsoa sebelisa subquery le ho holisa recording_data.urls array e le hore u ka sebelisa ka ho toba boemo ba ho WHERE.

Ntho ea bohlokoa ka ho fetisisa mona ke hore && e sebelisetsoang ho lekola hore na keno e fanoeng e na le URL e ts'oanang. Haeba u nyenyefatsa hanyane, u ka bona ts'ebetso ena e tsamaea ka har'a likarolo tsa sehlopha (kapa mela ea tafole) 'me e ema ha boemo (papali) bo fihletsoe. Ha e u hopotse letho? Ee, EXISTS.

Ho tloha ho ea pele recording_data.urls e ka boleloa ho tsoa kantle ho moelelo oa taba, ha sena se etsahala re ka khutlela ho motsoalle oa rona oa khale EXISTS ebe u thatela subquery ka eona.

Ha re kopanya tsohle, re fumana potso ea ho qetela e ntlafalitsoeng:

SELECT 
    count(*) 
FROM 
    acc_{account_id}.urls as recordings_urls, 
    acc_{account_id}.recording_data as recording_data, 
    acc_{account_id}.sessions as sessions 
WHERE 
    recording_data.usp_id = sessions.usp_id 
    AND  (  1 = 1  )  
    AND sessions.referrer_id = recordings_urls.id 
    AND r_time > to_timestamp(1542585600) 
    AND r_time < to_timestamp(1545177599) 
    AND recording_data.duration >=5 
    AND recording_data.num_of_pages > 0
    AND EXISTS(
        SELECT urls.url
        FROM 
            acc_{account_id}.urls as urls,
            (SELECT unnest(urls) AS rec_url_id FROM acc_{account_id}.recording_data) 
            AS unrolled_urls
        WHERE
            urls.id = unrolled_urls.rec_url_id AND
            urls.url  ILIKE  '%enterprise_customer.com/jobs%'
    );

Le nako ea ho qetela ea ho etella pele Time: 1898.717 ms Nako ea ho keteka?!?

Eseng kapele hakaalo! Pele u lokela ho hlahloba ho nepahala. Ke ne ke belaela haholo EXISTS optimization ha e ntse e fetola mohopolo hore o phethe pejana. Re hloka ho etsa bonnete ba hore ha re so kenye phoso e sa bonahaleng kopong.

Teko e bonolo e ne e le ho matha count(*) lipotsong tse liehang le tse potlakileng bakeng sa palo e kholo ea lisebelisoa tse fapaneng tsa data. Joale, bakeng sa karoloana e nyane ea data, ke netefalitse ka letsoho hore liphetho tsohle li nepahetse.

Liteko tsohle li fane ka liphello tse ntle kamehla. Re lokisitse tsohle!

Lithuto Tse Ithutiloeng

Ho na le lithuto tse ngata tse ka ithutoang paleng ena:

  1. Merero ea lipotso ha e bolele pale eohle, empa e ka fana ka lintlha
  2. Hase kamehla babelaelloa ba ka sehloohong e leng bona ba molato
  3. Lipotso tse liehang li ka aroloa ho arola litšitiso
  4. Ha se lintlafatso tsohle tse fokolang
  5. Sebelisa EXIST, moo ho khonehang, ho ka lebisa keketsehong e khōlō ea tlhahiso

fihlela qeto e

Re tlohile ho nako ea ho botsa ea ~ metsotso e 24 ho isa ho metsotsoana e 2 - keketseho e kholo ea ts'ebetso! Leha sengoloa sena se ile sa tsoa se le seholo, liteko tsohle tseo re li entseng li etsahetse ka letsatsi le le leng, mme ho hakanngoa hore li nkile lihora tse pakeng tsa 1,5 le 2 bakeng sa optimizations le liteko.

SQL ke puo e monate haeba u sa e tšabe, empa leka ho ithuta le ho e sebelisa. Ka ho utloisisa hantle hore na lipotso tsa SQL li etsoa joang, hore na database e hlahisa merero ea lipotso joang, hore na li-index li sebetsa joang, le boholo ba data eo u sebetsanang le eona, u ka atleha haholo ho ntlafatsa lipotso. Leha ho le joalo, ke habohlokoa ho tsoela pele ho leka mekhoa e fapaneng le ho senya bothata butle-butle, ho fumana litšitiso.

Karolo e ntle ka ho fetisisa mabapi le ho fumana liphetho tse kang tsena ke ntlafatso e hlokomelehang, e bonahalang ea lebelo - moo tlaleho eo pele e neng e ke ke ea beleha hona joale e jara hang hang.

Liteboho tse khethehileng ho balekane ba ka ka taelo ea Aditya MishraAditya Gauru и Varun Malhotra bakeng sa ho buisana ka maikutlo le Dinkar Pandir bakeng sa ho fumana phoso ea bohlokoa kopong ea rona ea ho qetela pele re qetella re e lumelisa!

Source: www.habr.com

Eketsa ka tlhaloso