Zaj dab neeg ntawm ib qho kev tshawb nrhiav SQL

Lub Kaum Ob Hlis tas los kuv tau txais tsab ntawv ceeb toom kab mob txaus ntshai los ntawm pab pawg pab txhawb nqa VWO. Lub sijhawm thauj khoom rau ib qho ntawm cov ntaub ntawv txheeb xyuas rau cov neeg siv khoom loj zoo li txwv tsis pub. Thiab txij li qhov no yog kuv cheeb tsam ntawm lub luag haujlwm, kuv tau tsom mus rau kev daws qhov teeb meem tam sim ntawd.

prehistory

Kom paub meej tias kuv tab tom tham txog, Kuv yuav qhia koj me ntsis txog VWO. Qhov no yog lub platform uas koj tuaj yeem tsim ntau yam phiaj xwm phiaj xwm ntawm koj lub vev xaib: ua A / B thwmsim, taug qab cov neeg tuaj saib thiab hloov pauv, txheeb xyuas qhov muag funnel, tso saib cov duab tshav kub thiab ua si mus saib cov ntaub ntawv.

Tab sis qhov tseem ceeb tshaj plaws ntawm lub platform yog qhia. Tag nrho cov haujlwm saum toj no yog sib cuam tshuam. Thiab rau cov neeg siv khoom lag luam, cov ntaub ntawv loj loj yuav tsuas yog tsis muaj txiaj ntsig yam tsis muaj lub platform muaj zog uas nthuav tawm nws hauv daim ntawv tshuaj ntsuam.

Siv lub platform, koj tuaj yeem ua cov lus nug random ntawm cov ntaub ntawv loj. Nov yog ib qho piv txwv yooj yim:

Qhia tag nrho cov clicks ntawm nplooj ntawv "abc.com" NTAWM <date d1> TO <date d2> rau cov neeg siv Chrome LOSSIS (nyob hauv Europe THIAB siv iPhone)

Ua tib zoo saib xyuas cov neeg ua haujlwm Boolean. Lawv muaj rau cov neeg siv khoom hauv cov lus nug cuam tshuam los ua cov lus nug nyuaj arbitrarily kom tau txais cov qauv.

Kev thov qeeb

Tus neeg siv khoom hauv nqe lus nug tau sim ua qee yam uas yuav tsum ua haujlwm sai sai:

Qhia tag nrho cov ntaub ntawv teev tseg rau cov neeg siv uas tau mus xyuas ib nplooj ntawv nrog URL uas muaj "/ haujlwm"

Lub vev xaib no muaj ib tuj ntawm kev khiav tsheb thiab peb tau khaws ntau tshaj li ib lab qhov URLs tshwj xeeb rau nws. Thiab lawv xav nrhiav tus qauv URL yooj yim uas cuam tshuam nrog lawv cov qauv kev lag luam.

Kev tshawb nrhiav ua ntej

Cia peb saib seb dab tsi tshwm sim hauv lub database. Hauv qab no yog thawj qeeb SQL query:

SELECT 
    count(*) 
FROM 
    acc_{account_id}.urls as recordings_urls, 
    acc_{account_id}.recording_data as recording_data, 
    acc_{account_id}.sessions as sessions 
WHERE 
    recording_data.usp_id = sessions.usp_id 
    AND sessions.referrer_id = recordings_urls.id 
    AND  (  urls &&  array(select id from acc_{account_id}.urls where url  ILIKE  '%enterprise_customer.com/jobs%')::text[]   ) 
    AND r_time > to_timestamp(1542585600) 
    AND r_time < to_timestamp(1545177599) 
    AND recording_data.duration >=5 
    AND recording_data.num_of_pages > 0 ;

Thiab ntawm no yog lub sijhawm:

Lub sijhawm npaj: 1.480 ms Lub sijhawm ua tiav: 1431924.650 ms

Cov lus nug tau nkag mus rau 150 txhiab kab. Cov lus nug npaj tau nthuav tawm ob peb cov ntsiab lus nthuav dav, tab sis tsis muaj qhov tsis pom tseeb.

Cia peb kawm qhov kev thov ntxiv. Raws li koj tau pom, nws ua JOIN peb lub rooj:

  1. zaug: los tso saib cov ntaub ntawv sib tham: browser, tus neeg siv tus neeg sawv cev, lub teb chaws, thiab lwm yam.
  2. recording_data: kaw URLs, nplooj ntawv, sijhawm mus ntsib
  3. urls: Kom tsis txhob duplicating URLs loj heev, peb khaws cia rau hauv ib lub rooj sib cais.

Tsis tas li ntawd nco ntsoov tias tag nrho peb cov ntxhuav twb muab faib los ntawm account_id. Txoj kev no, qhov xwm txheej uas ib tus account loj tshwj xeeb ua rau muaj teeb meem rau lwm tus raug cais tawm.

Nrhiav cov lus qhia

Thaum kev soj ntsuam ze dua, peb pom tias qee yam tsis raug rau qhov kev thov tshwj xeeb. Nws tsim nyog ua tib zoo saib ntawm kab no:

urls && array(
	select id from acc_{account_id}.urls 
	where url  ILIKE  '%enterprise_customer.com/jobs%'
)::text[]

Thawj qhov kev xav yog tej zaum vim ILIKE ntawm tag nrho cov URLs ntev no (peb muaj ntau dua 1,4 lab txawv URLs sau rau tus account no) kev ua tau zoo yuav raug kev txom nyem.

Tab sis tsis yog, qhov ntawd tsis yog lub ntsiab lus!

SELECT id FROM urls WHERE url ILIKE '%enterprise_customer.com/jobs%';
  id
--------
 ...
(198661 rows)

Time: 5231.765 ms

Kev tshawb nrhiav template nws tus kheej siv sijhawm 5 vib nas this xwb. Kev tshawb nrhiav tus qauv hauv ib lab tus URLs tshwj xeeb tsis yog qhov teeb meem.

Tus neeg phem tom ntej ntawm daim ntawv teev npe yog ob peb JOIN. Tej zaum lawv txoj kev siv ntau dhau tau ua rau qeeb? Feem ntau JOIN's yog cov neeg sib tw pom tseeb tshaj plaws rau cov teeb meem kev ua tau zoo, tab sis kuv tsis ntseeg tias peb cov ntaub ntawv yog qhov raug.

analytics_db=# SELECT
    count(*)
FROM
    acc_{account_id}.urls as recordings_urls,
    acc_{account_id}.recording_data_0 as recording_data,
    acc_{account_id}.sessions_0 as sessions
WHERE
    recording_data.usp_id = sessions.usp_id
    AND sessions.referrer_id = recordings_urls.id
    AND r_time > to_timestamp(1542585600)
    AND r_time < to_timestamp(1545177599)
    AND recording_data.duration >=5
    AND recording_data.num_of_pages > 0 ;
 count
-------
  8086
(1 row)

Time: 147.851 ms

Thiab qhov no kuj tsis yog peb li. JOIN's tig tawm sai heev.

Narrowing cia lub voj voog ntawm cov neeg raug liam

Kuv tau npaj pib hloov cov lus nug kom ua tiav qhov kev txhim kho kev ua tau zoo. Kuv thiab kuv tau tsim 2 lub tswv yim tseem ceeb:

  • Siv EXISTS rau subquery URL: Peb xav rov kuaj dua yog tias muaj teeb meem nrog cov lus nug ntxiv rau URLs. Ib txoj hauv kev kom ua tiav qhov no yog siv yooj yim EXISTS. EXISTS tau zoo heev txhim kho kev ua tau zoo txij li thaum nws xaus tam sim ntawd sai li sai tau thaum nws pom cov hlua nkaus xwb uas phim tus mob.

SELECT
	count(*) 
FROM 
    acc_{account_id}.urls as recordings_urls,
    acc_{account_id}.recording_data as recording_data,
    acc_{account_id}.sessions as sessions
WHERE
    recording_data.usp_id = sessions.usp_id
    AND  (  1 = 1  )
    AND sessions.referrer_id = recordings_urls.id
    AND  (exists(select id from acc_{account_id}.urls where url  ILIKE '%enterprise_customer.com/jobs%'))
    AND r_time > to_timestamp(1547585600)
    AND r_time < to_timestamp(1549177599)
    AND recording_data.duration >=5
    AND recording_data.num_of_pages > 0 ;
 count
 32519
(1 row)
Time: 1636.637 ms

Zoo, yog. Subquery thaum qhwv hauv EXISTS, ua txhua yam ceev ceev. Cov lus nug tom ntej yog vim li cas qhov kev thov nrog JOIN-ami thiab subquery nws tus kheej yog ceev ceev ib tus zuj zus, tab sis puas qeeb ua ke?

  • Hloov cov lus nug mus rau CTE : Yog tias cov lus nug ceev ceev ntawm nws tus kheej, peb tuaj yeem xam cov txiaj ntsig ceev ua ntej thiab muab nws rau cov lus nug tseem ceeb

WITH matching_urls AS (
    select id::text from acc_{account_id}.urls where url  ILIKE  '%enterprise_customer.com/jobs%'
)

SELECT 
    count(*) FROM acc_{account_id}.urls as recordings_urls, 
    acc_{account_id}.recording_data as recording_data, 
    acc_{account_id}.sessions as sessions,
    matching_urls
WHERE 
    recording_data.usp_id = sessions.usp_id 
    AND  (  1 = 1  )  
    AND sessions.referrer_id = recordings_urls.id
    AND (urls && array(SELECT id from matching_urls)::text[])
    AND r_time > to_timestamp(1542585600) 
    AND r_time < to_timestamp(1545107599)
    AND recording_data.duration >=5 
    AND recording_data.num_of_pages > 0;

Tab sis nws tseem qeeb heev.

Nrhiav tus neeg ua txhaum

Txhua lub sijhawm no, ib qho me me flashed ua ntej kuv ob lub qhov muag, uas kuv niaj hnub txhuam ib sab. Tab sis vim tsis muaj dab tsi ntxiv lawm, kuv txiav txim siab saib nws thiab. Kuv hais txog && tus neeg ua haujlwm. Bye EXISTS tsuas yog txhim kho kev ua tau zoo && tsuas yog qhov tseem ceeb tshaj plaws nyob rau hauv tag nrho cov versions ntawm cov lus nug qeeb.

Saib ntawm cov ntaub ntawv, peb pom tias && siv thaum koj xav nrhiav cov ntsiab lus ntawm ob arrays.

Hauv thawj qhov kev thov no yog:

AND  (  urls &&  array(select id from acc_{account_id}.urls where url  ILIKE  '%enterprise_customer.com/jobs%')::text[]   )

Qhov ntawd txhais tau tias peb ua tus qauv tshawb nrhiav ntawm peb cov URLs, tom qab ntawd nrhiav kev sib tshuam nrog txhua qhov URLs nrog cov lus tshaj tawm. Qhov no yog qhov tsis meej pem vim tias "urls" ntawm no tsis yog hais txog lub rooj uas muaj tag nrho cov URLs, tab sis mus rau "urls" kem hauv lub rooj. recording_data.

Nrog kev xav tsis thoob txog &&, Kuv tau sim nrhiav kev pom zoo rau lawv hauv cov lus nug tsim tawm EXPLAIN ANALYZE (Kuv twb tau txais ib txoj kev npaj tseg, tab sis kuv feem ntau nyiam sim hauv SQL dua li sim nkag siab qhov opacity ntawm query planners).

Filter: ((urls && ($0)::text[]) AND (r_time > '2018-12-17 12:17:23+00'::timestamp with time zone) AND (r_time < '2018-12-18 23:59:59+00'::timestamp with time zone) AND (duration >= '5'::double precision) AND (num_of_pages > 0))
                           Rows Removed by Filter: 52710

Muaj ob peb kab ntawm cov ntxaij lim dej nkaus xwb los ntawm &&. Qhov no txhais tau tias qhov kev ua haujlwm no tsis yog kim xwb, tab sis kuj ua tau ntau zaus.

Kuv sim qhov no los ntawm kev cais tus mob

SELECT 1
FROM 
    acc_{account_id}.urls as recordings_urls, 
    acc_{account_id}.recording_data_30 as recording_data_30, 
    acc_{account_id}.sessions_30 as sessions_30 
WHERE 
	urls &&  array(select id from acc_{account_id}.urls where url  ILIKE  '%enterprise_customer.com/jobs%')::text[]

Cov lus nug no qeeb. Vim lub JOIN-s yog ceev thiab subqueries yog ceev, qhov tsuas yog qhov uas tshuav yog && tus neeg ua haujlwm.

Qhov no tsuas yog ib qho haujlwm tseem ceeb xwb. Peb ib txwm yuav tsum tshawb nrhiav tag nrho cov lus hauv qab ntawm URLs los tshawb nrhiav tus qauv, thiab peb ib txwm yuav tsum nrhiav kev sib tshuam. Peb tsis tuaj yeem tshawb nrhiav los ntawm URL cov ntaub ntawv ncaj qha, vim tias cov no tsuas yog IDs xa mus urls.

Ntawm txoj kev mus rau kev daws teeb meem

&& qeeb vim ob qhov teeb meem loj. Kev ua haujlwm yuav ceev heev yog tias kuv hloov urls rau { "http://google.com/", "http://wingify.com/" }.

Kuv pib nrhiav txoj hauv kev los teeb tsa kev sib tshuam hauv Postgres yam tsis siv &&, tab sis tsis muaj kev vam meej ntau.

Thaum kawg, peb txiav txim siab los daws qhov teeb meem hauv kev sib cais: muab txhua yam rau kuv urls cov kab uas URL phim tus qauv. Yog tsis muaj cov xwm txheej ntxiv nws yuav yog - 

SELECT urls.url
FROM 
	acc_{account_id}.urls as urls,
	(SELECT unnest(recording_data.urls) AS id) AS unrolled_urls
WHERE
	urls.id = unrolled_urls.id AND
	urls.url  ILIKE  '%jobs%'

Hloov chaw JOIN syntax kuv nyuam qhuav siv subquery thiab nthuav recording_data.urls array kom koj tuaj yeem siv ncaj qha rau cov xwm txheej hauv WHERE.

Qhov tseem ceeb tshaj plaws ntawm no yog qhov ntawd && siv los xyuas seb qhov kev nkag tau muaj qhov sib txuam URL. Yog tias koj squint me ntsis, koj tuaj yeem pom cov haujlwm no txav mus los ntawm cov ntsiab lus ntawm ib qho array (los yog kab ntawm lub rooj) thiab nres thaum muaj xwm txheej (muab sib tw). Tsis nco koj txog dab tsi? Yog lawm, EXISTS.

Txij li thaum recording_data.urls tuaj yeem hais los ntawm sab nraud ntawm cov ntsiab lus subquery, thaum qhov no tshwm sim peb tuaj yeem poob rov qab rau peb tus phooj ywg qub EXISTS thiab qhwv lub subquery nrog nws.

Muab txhua yam ua ke, peb tau txais cov lus nug zoo kawg nkaus:

SELECT 
    count(*) 
FROM 
    acc_{account_id}.urls as recordings_urls, 
    acc_{account_id}.recording_data as recording_data, 
    acc_{account_id}.sessions as sessions 
WHERE 
    recording_data.usp_id = sessions.usp_id 
    AND  (  1 = 1  )  
    AND sessions.referrer_id = recordings_urls.id 
    AND r_time > to_timestamp(1542585600) 
    AND r_time < to_timestamp(1545177599) 
    AND recording_data.duration >=5 
    AND recording_data.num_of_pages > 0
    AND EXISTS(
        SELECT urls.url
        FROM 
            acc_{account_id}.urls as urls,
            (SELECT unnest(urls) AS rec_url_id FROM acc_{account_id}.recording_data) 
            AS unrolled_urls
        WHERE
            urls.id = unrolled_urls.rec_url_id AND
            urls.url  ILIKE  '%enterprise_customer.com/jobs%'
    );

Thiab lub sijhawm ua ntej kawg Time: 1898.717 ms Lub sij hawm ua kev zoo siab?!?

Tsis ceev! Ua ntej koj yuav tsum xyuas qhov tseeb. Kuv twb tsis tshua muaj neeg suspicions txog EXISTS optimization raws li nws hloov lub logic kom tiav ua ntej. Peb yuav tsum nco ntsoov tias peb tsis tau ntxiv qhov yuam kev tsis meej rau qhov kev thov.

Ib qho kev sim yooj yim yog khiav count(*) ntawm ob qho tib si qeeb thiab ceev cov lus nug rau ntau cov ntaub ntawv sib txawv. Tom qab ntawd, rau ib qho me me ntawm cov ntaub ntawv, kuv manually txheeb xyuas tias tag nrho cov txiaj ntsig tau raug.

Txhua qhov kev ntsuam xyuas tau muab cov txiaj ntsig zoo tas li. Peb kho txhua yam!

Cov Lus Qhia Kawm

Muaj ntau ntau zaj lus qhia los ntawm zaj dab neeg no:

  1. Cov phiaj xwm nug tsis qhia tag nrho zaj dab neeg, tab sis lawv tuaj yeem muab cov lus qhia
  2. Cov neeg raug liam tseem ceeb tsis yog cov neeg ua txhaum tiag tiag
  3. Cov lus nug qeeb tuaj yeem tawg mus rau cais cov fwj
  4. Tsis yog txhua qhov kev ua kom zoo tshaj yog txo qis hauv qhov xwm txheej
  5. Siv EXIST, yog qhov ua tau, tuaj yeem ua rau muaj kev nce ntxiv hauv kev tsim khoom

xaus

Peb tau mus los ntawm lub sijhawm nug ntawm ~ 24 feeb mus rau 2 vib nas this - qhov ua tau zoo heev! Txawm hais tias tsab xov xwm no tawm los loj, tag nrho cov kev sim peb tau tshwm sim hauv ib hnub, thiab nws tau kwv yees tias lawv siv sijhawm li 1,5 mus rau 2 teev rau kev ua kom zoo thiab sim.

SQL yog ib hom lus zoo yog tias koj tsis ntshai nws, tab sis sim kawm thiab siv nws. Los ntawm kev nkag siab zoo txog yuav ua li cas SQL cov lus nug raug tua, yuav ua li cas cov ntaub ntawv tsim cov lus nug, yuav ua li cas indexes ua haujlwm, thiab tsuas yog qhov loj ntawm cov ntaub ntawv koj tab tom cuam tshuam nrog, koj tuaj yeem ua tau zoo heev ntawm optimizing queries. Nws yog ib qho tseem ceeb sib npaug, txawm li cas los xij, txuas ntxiv mus sim ntau txoj hauv kev thiab maj mam rhuav tshem qhov teeb meem, nrhiav cov fwj.

Qhov zoo tshaj plaws txog kev ua tiav cov txiaj ntsig zoo li no yog qhov pom tau, pom kev txhim kho nrawm - qhov twg daim ntawv tshaj tawm uas yav dhau los yuav tsis txawm thauj khoom tam sim no yuav luag tam sim ntawd.

Ua tsaug tshwj xeeb rau kuv cov phooj ywg ntawm Aditya Mishra hais kom uaAditya Gauru ΠΈ Varun Malhotra rau txoj kev xav thiab Dinkar Pandir: koj puas xav tau ntau tus thwjtim? txhawm rau nrhiav qhov yuam kev tseem ceeb hauv peb qhov kev thov zaum kawg ua ntej peb thaum kawg tau hais lus zoo rau nws!

Tau qhov twg los: www.hab.com

Ntxiv ib saib