Ibali lophando olunye lweSQL

KuDisemba ophelileyo ndifumene ingxelo ye-bug enomdla kwiqela lenkxaso ye-VWO. Ixesha lokulayisha enye yeengxelo zohlalutyo kumxhasi omkhulu wenkampani lalibonakala lingavumelekanga. Kwaye kuba le iyindawo yam yoxanduva, ngoko nangoko ndagxila ekusombululeni ingxaki.

ukubuzwa

Ukuyenza icace le nto ndithetha ngayo, ndiza kukuxelela kancinci ngeVWO. Eli liqonga onokuthi uqalise ngalo amaphulo ahlukeneyo ekujoliswe kuwo kwiiwebhusayithi zakho: ukuqhuba iimvavanyo ze-A / B, ukulandelela iindwendwe kunye nokuguqulwa, ukuhlalutya i-funnel yokuthengisa, ukubonisa iimephu zokushisa kunye nokudlala ukurekhoda kokutyelela.

Kodwa eyona nto ibalulekileyo ngeqonga kukunika ingxelo. Yonke le misebenzi ingentla ithungelana. Kwaye kubathengi beshishini, isixa esikhulu solwazi siya kuba yinto engenamsebenzi ngaphandle kweqonga elinamandla elibonisa kwifomu yohlalutyo.

Ukusebenzisa iqonga, unokwenza umbuzo ongaqhelekanga kwiseti enkulu yedatha. Nanku umzekelo olula:

Bonisa konke ukucofa kwiphepha "abc.com" UKUSUKA <umhla d1> UKUYA KU- <date d2> kubantu abasebenzisa i-Chrome OKANYE (ifumaneka eYurophu KWAYE basebenzise i-iPhone)

Nika ingqalelo kubasebenzisi beBoolean. Ziyafumaneka kubathengi kwi-interface yombuzo ukwenza imibuzo enzima ngokungenasizathu ukufumana iisampulu.

Isicelo esicothayo

Umxhasi ekuthethwa ngaye ebezama ukwenza into enokuthi isebenze ngokukhawuleza:

Bonisa zonke iirekhodi zeseshoni zabasebenzisi abandwendwele naliphi na iphepha eline URL equlethe "/imisebenzi"

Le ndawo yayinetoni yetrafikhi kwaye besigcina ngaphezulu kwesigidi ii-URL ezizodwa ngenxa yayo. Kwaye babefuna ukufumana itemplate elula ye-URL ehambelana nemodeli yabo yeshishini.

Uphando lokuqala

Makhe sijonge ukuba kuqhubeka ntoni kwi-database. Apha ngezantsi ngumbuzo wokuqala ocothayo weSQL:

SELECT 
    count(*) 
FROM 
    acc_{account_id}.urls as recordings_urls, 
    acc_{account_id}.recording_data as recording_data, 
    acc_{account_id}.sessions as sessions 
WHERE 
    recording_data.usp_id = sessions.usp_id 
    AND sessions.referrer_id = recordings_urls.id 
    AND  (  urls &&  array(select id from acc_{account_id}.urls where url  ILIKE  '%enterprise_customer.com/jobs%')::text[]   ) 
    AND r_time > to_timestamp(1542585600) 
    AND r_time < to_timestamp(1545177599) 
    AND recording_data.duration >=5 
    AND recording_data.num_of_pages > 0 ;

Kwaye nazi amaxesha:

Ixesha elicwangcisiweyo: 1.480 ms Ixesha lokwenziwa: 1431924.650 ms

Umbuzo ukhase imiqolo eyi-150 lamawaka. Umcwangcisi wemibuzo ubonise iinkcukacha ezimbalwa ezinomdla, kodwa akukho miqobo icacileyo.

Masifunde ngakumbi ngesicelo. Njengoko ubona, wenza njalo JOIN iitafile ezintathu:

  1. iiseshoni: ukubonisa ulwazi lweseshoni: isikhangeli, iarhente yomsebenzisi, ilizwe, njalo njalo.
  2. idata_yokurekhoda: ii-URL ezirekhodiweyo, amaphepha, ixesha lokutyelela
  3. urls: Ukunqanda ukuphinda-phinda ii-URL ezinkulu kakhulu, sizigcina kwitafile eyahlukileyo.

Kwakhona qaphela ukuba zonke iitafile zethu sele zahlulwe ngo account_id. Ngale ndlela, imeko apho iakhawunti enkulu ngokukodwa ibangela iingxaki kwabanye ayibandakanywanga.

Ukukhangela imikhondo

Xa siqwalasela ngenyameko, siyabona ukuba kukho into engalunganga ngesicelo esithile. Kufanelekile ukujonga ngakumbi lo mgca:

urls && array(
	select id from acc_{account_id}.urls 
	where url  ILIKE  '%enterprise_customer.com/jobs%'
)::text[]

Ingcinga yokuqala yaba kukuba mhlawumbi ngenxa ILIKE kuzo zonke ezi URLs zinde (singaphezu kwe-1,4 yezigidi ngokwahlukileyo Ii-URL eziqokelelweyo zale akhawunti) ukusebenza kungachaphazeleka.

Kodwa hayi, oku akunjalo!

SELECT id FROM urls WHERE url ILIKE '%enterprise_customer.com/jobs%';
  id
--------
 ...
(198661 rows)

Time: 5231.765 ms

Isicelo sokukhangela itemplate ngokwayo sithatha imizuzwana emi-5 kuphela. Ukukhangela ipateni kwisigidi see-URL ezizodwa ngokucacileyo akuyona ingxaki.

Umrhanelwa olandelayo kuluhlu luliqela JOIN. Mhlawumbi ukusetyenziswa kwabo ngokugqithiseleyo kubangele ukucotha? Ngesiqhelo JOIN's ngabona bagqatswa bacacileyo kwiingxaki zokusebenza, kodwa andizange ndikholelwe ukuba imeko yethu iqhelekile.

analytics_db=# SELECT
    count(*)
FROM
    acc_{account_id}.urls as recordings_urls,
    acc_{account_id}.recording_data_0 as recording_data,
    acc_{account_id}.sessions_0 as sessions
WHERE
    recording_data.usp_id = sessions.usp_id
    AND sessions.referrer_id = recordings_urls.id
    AND r_time > to_timestamp(1542585600)
    AND r_time < to_timestamp(1545177599)
    AND recording_data.duration >=5
    AND recording_data.num_of_pages > 0 ;
 count
-------
  8086
(1 row)

Time: 147.851 ms

Kwaye oku kwakungeyomeko yethu. JOINiye yakhawuleza kakhulu.

Ukunciphisa isangqa sabarhanelwa

Ndandikulungele ukuqalisa ukutshintsha umbuzo ukuze ndifezekise naluphi na uphuculo olunokwenzeka. Iqela lam kunye nam siphuhlise izimvo eziphambili ezi-2:

  • Sebenzisa i-EXISTS ye-URL yombuzo ongaphantsi: Besifuna ukujonga kwakhona ukuba bekukho naziphi na iingxaki nge-subquery yee-URLs. Enye indlela yokufezekisa oku kukusebenzisa ngokulula EXISTS. EXISTS kungaba iphucula kakhulu ukusebenza ekubeni iphela ngokukhawuleza nje ukuba ifumane umtya kuphela ohambelana nemeko.

SELECT
	count(*) 
FROM 
    acc_{account_id}.urls as recordings_urls,
    acc_{account_id}.recording_data as recording_data,
    acc_{account_id}.sessions as sessions
WHERE
    recording_data.usp_id = sessions.usp_id
    AND  (  1 = 1  )
    AND sessions.referrer_id = recordings_urls.id
    AND  (exists(select id from acc_{account_id}.urls where url  ILIKE '%enterprise_customer.com/jobs%'))
    AND r_time > to_timestamp(1547585600)
    AND r_time < to_timestamp(1549177599)
    AND recording_data.duration >=5
    AND recording_data.num_of_pages > 0 ;
 count
 32519
(1 row)
Time: 1636.637 ms

Ewe, ewe. I-subquery xa isongelwe EXISTS, yenza yonke into ikhawuleze kakhulu. Umbuzo olandelayo onengqiqo kukuba kutheni isicelo nge JOIN-ami kunye ne-subquery ngokwayo iyakhawuleza nganye, kodwa icotha kakhulu kunye?

  • Ukuhambisa i-subquery kwi-CTE : Ukuba umbuzo ukhawuleza ngokwawo, singabala isiphumo esikhawulezayo kuqala kwaye sinikezele kumbuzo ongundoqo.

WITH matching_urls AS (
    select id::text from acc_{account_id}.urls where url  ILIKE  '%enterprise_customer.com/jobs%'
)

SELECT 
    count(*) FROM acc_{account_id}.urls as recordings_urls, 
    acc_{account_id}.recording_data as recording_data, 
    acc_{account_id}.sessions as sessions,
    matching_urls
WHERE 
    recording_data.usp_id = sessions.usp_id 
    AND  (  1 = 1  )  
    AND sessions.referrer_id = recordings_urls.id
    AND (urls && array(SELECT id from matching_urls)::text[])
    AND r_time > to_timestamp(1542585600) 
    AND r_time < to_timestamp(1545107599)
    AND recording_data.duration >=5 
    AND recording_data.num_of_pages > 0;

Kodwa kwakusacotha kakhulu.

Ukufumana umoni

Ngalo lonke eli xesha, into encinci yayidanyaza phambi kwamehlo am, endandisoloko ndiyisusa ecaleni. Kodwa ekubeni kwakungekho nto yimbi, ndagqiba ekubeni nam ndimjonge. Ndithetha ngayo && umsebenzisi. Uxolo EXISTS ngcono nje ukusebenza && ibikuphela kwento eqhelekileyo kuzo zonke iinguqulelo zombuzo ocothayo.

Ejonge e uxwebhu, siyayibona loo nto && isetyenziswe xa ufuna ukufumana izinto eziqhelekileyo phakathi kweendlela ezimbini.

Kwisicelo sokuqala nantsi:

AND  (  urls &&  array(select id from acc_{account_id}.urls where url  ILIKE  '%enterprise_customer.com/jobs%')::text[]   )

Nto leyo ethetha ukuba senza uphendlo lwepateni kwii-URL zethu, emva koko sifumane ukudibana kunye nazo zonke ii-URL ezinezithuba eziqhelekileyo. Oku kuyabhidisa kancinane kuba "urls" apha ayibhekiseli kwitheyibhile equlathe zonke ii-URLs, kodwa kuluhlu lwe "urls" kwitafile. recording_data.

Ngokukhula kokukrokra malunga &&, Ndazama ukufumana isiqinisekiso kubo kwisicwangciso sombuzo owenziweyo EXPLAIN ANALYZE (Besele ndinesicwangciso esigciniweyo, kodwa ndihlala ndikhululekile ngakumbi xa ndizama kwi-SQL kunokuzama ukuqonda ukungafihli kwabacwangcisi bemibuzo).

Filter: ((urls && ($0)::text[]) AND (r_time > '2018-12-17 12:17:23+00'::timestamp with time zone) AND (r_time < '2018-12-18 23:59:59+00'::timestamp with time zone) AND (duration >= '5'::double precision) AND (num_of_pages > 0))
                           Rows Removed by Filter: 52710

Kwakukho imigca emininzi yezihluzi kuphela ukusuka &&. Oku kuthetha ukuba lo msebenzi wawungabizi nje kuphela, kodwa wenza izihlandlo ezininzi.

Ndivavanye oku ngokuyibekela bucala imeko

SELECT 1
FROM 
    acc_{account_id}.urls as recordings_urls, 
    acc_{account_id}.recording_data_30 as recording_data_30, 
    acc_{account_id}.sessions_30 as sessions_30 
WHERE 
	urls &&  array(select id from acc_{account_id}.urls where url  ILIKE  '%enterprise_customer.com/jobs%')::text[]

Lo mbuzo ubucotha. Ngenxa yokuba JOIN-s ziyakhawuleza kwaye ii-subqueries zikhawuleza, ekuphela kwento eseleyo && umqhubi.

Lo ngumsebenzi ongundoqo. Sisoloko sifuna ukukhangela yonke i-theyibhile esezantsi yee-URL ukukhangela ipateni, kwaye sihlala sifuna ukufumana iindlela zokuhlangana. Asikwazi ukukhangela ngeerekhodi ze-URL ngokuthe ngqo, kuba ezi zii-ID ezibhekiselele kuzo urls.

Endleleni eya kwisisombululo

&& ucotha kuba zombini iiseti zinkulu. Utyando luzakukhawuleza xa ndibuyisela urls phezu { "http://google.com/", "http://wingify.com/" }.

Ndaqala ukukhangela indlela yokwenza i-intersection e-Postgres ngaphandle kokusebenzisa &&, kodwa ngaphandle kwempumelelo ingako.

Ekugqibeleni, sagqiba ekubeni sisombulule ingxaki sisodwa: ndinike yonke into urls imigca apho i-URL ihambelana nepateni. Ngaphandle kwemiqathango eyongezelelweyo iya kuba - 

SELECT urls.url
FROM 
	acc_{account_id}.urls as urls,
	(SELECT unnest(recording_data.urls) AS id) AS unrolled_urls
WHERE
	urls.id = unrolled_urls.id AND
	urls.url  ILIKE  '%jobs%'

Endaweni yoko JOIN I-syntax ndisebenzise i-subquery kwaye ndandise recording_data.urls uluhlu ukuze ukwazi ukufaka imeko ngokuthe ngqo WHERE.

Eyona nto ibalulekileyo apha kukuba && isetyenziswa ukujonga ukuba ngaba ungeno olunikiweyo luqulathe i-URL ehambelanayo. Ukuba ukrwaqula kancinane, ungabona lo msebenzi uhamba ngapha kwezakhi zoluhlu (okanye kwimiqolo yetafile) kwaye ime xa imeko (umdlalo) ihlangatyeziwe. Ayikukhumbuzi nganto? Ewe, EXISTS.

Ukususela ngoko recording_data.urls inokubhekiselwa ngaphandle komxholo we-subquery, xa oku kusenzeka sinokubuyela kumhlobo wethu wakudala EXISTS kwaye usonge i-subquery ngayo.

Ukudibanisa yonke into, sifumana umbuzo wokugqibela olungiselelweyo:

SELECT 
    count(*) 
FROM 
    acc_{account_id}.urls as recordings_urls, 
    acc_{account_id}.recording_data as recording_data, 
    acc_{account_id}.sessions as sessions 
WHERE 
    recording_data.usp_id = sessions.usp_id 
    AND  (  1 = 1  )  
    AND sessions.referrer_id = recordings_urls.id 
    AND r_time > to_timestamp(1542585600) 
    AND r_time < to_timestamp(1545177599) 
    AND recording_data.duration >=5 
    AND recording_data.num_of_pages > 0
    AND EXISTS(
        SELECT urls.url
        FROM 
            acc_{account_id}.urls as urls,
            (SELECT unnest(urls) AS rec_url_id FROM acc_{account_id}.recording_data) 
            AS unrolled_urls
        WHERE
            urls.id = unrolled_urls.rec_url_id AND
            urls.url  ILIKE  '%enterprise_customer.com/jobs%'
    );

Kwaye ixesha lokugqibela lokuhamba Time: 1898.717 ms Ixesha lokubhiyozela?!?

Hayi ngokukhawuleza! Okokuqala kufuneka uhlolisise ukuchaneka. Ndandikrokra kakhulu EXISTS ulungiselelo njengoko itshintsha ingqiqo ukugqiba kwangaphambili. Kufuneka siqiniseke ukuba asifakanga mpazamo icacileyo kwisicelo.

Uvavanyo olulula yayikukubaleka count(*) kuyo yomibini imibuzo ecothayo nekhawulezayo kwinani elikhulu leseti zedatha ezahlukeneyo. Emva koko, kwiseti encinci yedatha, ndaqinisekisa ngesandla ukuba zonke iziphumo zichanekile.

Zonke iimvavanyo ziye zanika iziphumo eziqinisekisayo. Silungise yonke into!

Izifundo Ezifundiweyo

Zininzi izinto esinokuzifunda kweli bali:

  1. Izicwangciso zemibuzo azichazi lonke ibali, kodwa zinokunika imikhondo
  2. Abona barhanelwa abaphambili abasoloko bengabo nobangela bokwenene
  3. Imibuzo ecothayo inokwahlulwa ukubekelwa bucala imiqobo
  4. Ayilulo lonke ulungiselelo olucuthayo ngokwendalo
  5. Sebenzisa EXIST, apho kunokwenzeka, kunokukhokelela ekwandeni okumangalisayo kwimveliso

isiphelo

Sihambile kwixesha lombuzo we- ~ 24 imizuzu ukuya kwimizuzwana emi-2 - ukonyuka okubalulekileyo kokusebenza! Nangona eli nqaku laphuma likhulu, zonke iimvavanyo esizenzileyo zenzeke ngosuku olunye, kwaye kuqikelelwa ukuba bathatha phakathi kwe-1,5 kunye ne-2 iiyure zokuphucula kunye nokuvavanya.

I-SQL lulwimi oluhle kakhulu ukuba awuloyiki, kodwa zama ukufunda kwaye ulusebenzise. Ngokuqonda kakuhle indlela imibuzo yeSQL ephunyezwa ngayo, indlela isiseko sedatha esivelisa ngayo izicwangciso zemibuzo, indlela izalathi ezisebenza ngayo, kunye nobukhulu bedatha ojongene nayo, ungaphumelela kakhulu ekwandiseni imibuzo. Kubalulekile ngokulinganayo, nangona kunjalo, ukuqhubeka uzama iindlela ezahlukeneyo kunye nokucotha ingxaki, ukufumana imiqobo.

Elona candelo lilungileyo malunga nokufikelela kwiziphumo ezinje kukubonakala, ukuphuculwa kwesantya esibonakalayo - apho ingxelo ebikade ingade ilayishe ngoku ilayisha phantse ngoko nangoko.

Umbulelo okhethekileyo ku amaqabane am ngokomyalelo ka-Aditya MishraAditya Gauru ΠΈ Varun Malhotra ukudibanisa ingqondo kunye Dinkar Pandir ngokufumana impazamo ebalulekileyo kwisicelo sethu sokugqibela ngaphambi kokuba sithi sala kakuhle kuyo!

umthombo: www.habr.com

Yongeza izimvo