Sheekada hal baaritaan SQL

Bishii Disembar ee la soo dhaafay waxaan ka helay warbixinta cayayaanka xiisaha leh ee kooxda taageerada VWO. Waqtiga rarista ee mid ka mid ah warbixinnada falanqaynta ee macmiilka weyn ee shirkaduhu waxay u muuqdeen mid mamnuuc ah. Oo maadaama ay tani tahay aagga mas'uuliyaddayda, waxaan isla markiiba diiradda saaray xallinta dhibaatada.

prehistory

Si aan u caddeeyo waxa aan ka hadlayo, waxaan wax yar kaaga sheegi doonaa VWO. Kani waa goob aad ku bilaabi karto ololeyaal kala duwan oo lagu beegsanayo mareegahaaga: Samee tijaabooyinka A/B, la soco dadka soo booqda iyo beddelka, falanqeyso marinka iibka, soo bandhig maab kulaylka iyo ciyaar duubista booqashada.

Laakiin waxa ugu muhiimsan ee ku saabsan madal waa warbixinta. Dhammaan hawlaha kor ku xusan waa kuwo isku xiran. Iyo macaamiisha shirkadaha, tiro badan oo macluumaad ah ayaa si fudud u noqon doona wax aan faa'iido lahayn la'aanteed madal awood leh oo u soo bandhigta qaabka falanqaynta.

Isticmaalka goobta, waxaad ku samayn kartaa su'aalo aan toos ahayn oo ku saabsan kaydka xogta weyn. Waa tan tusaale fudud:

Tus dhammaan gujisyada bogga "abc.com" Laga soo bilaabo <taariikhda d1> ilaa <taariikhda d2> ee dadka isticmaalay Chrome AMA (ku yaal Yurub oo isticmaalay iPhone)

U fiirso hawl wadeenada Boolean Waxay diyaar u yihiin macaamiisha ku jira interface-ka weydiinta si ay u sameeyaan weydiimo adag oo aan sabab lahayn si ay u helaan muunado.

Codsi tartiib ah

Macmiilku waxa uu isku dayayay in uu sameeyo wax si dareen leh ay tahay in uu si degdeg ah u shaqeeyo:

Tus dhammaan diiwaanada fadhiga ee isticmaalayaasha booqday bog kasta oo leh URL ka kooban "/shaqo"

Mareegtan waxa ay lahayd taraafiko tiro badan waxaana aanu u kaydinaynay in ka badan hal milyan oo URL-yo gaar ah. Oo waxay rabeen inay helaan qaab URL fudud oo cadaalad ah oo la xidhiidha qaabka ganacsigooda.

Baaritaan horudhac ah

Aynu eegno waxa ka socda kaydka xogta. Hoos waxaa ku yaal weydiinta asalka ah ee SQL oo gaabis ah:

SELECT 
    count(*) 
FROM 
    acc_{account_id}.urls as recordings_urls, 
    acc_{account_id}.recording_data as recording_data, 
    acc_{account_id}.sessions as sessions 
WHERE 
    recording_data.usp_id = sessions.usp_id 
    AND sessions.referrer_id = recordings_urls.id 
    AND  (  urls &&  array(select id from acc_{account_id}.urls where url  ILIKE  '%enterprise_customer.com/jobs%')::text[]   ) 
    AND r_time > to_timestamp(1542585600) 
    AND r_time < to_timestamp(1545177599) 
    AND recording_data.duration >=5 
    AND recording_data.num_of_pages > 0 ;

Oo waa kuwan waqtiyada:

Waqtiga la qorsheeyay: 1.480 ms Waqtiga fulinta: 1431924.650 ms

Weydiinta ayaa gurguurtay 150 kun oo saf. Qorsheeyaha weydiinta wuxuu muujiyay dhowr tafaasiil oo xiiso leh, laakiin ma jiraan wax caqabado ah oo muuqda.

Aan sii darso codsiga. Sida aad arki karto, wuu sameeyaa JOIN saddex miis:

  1. fadhiyada: si loo muujiyo macluumaadka fadhiga: browser, wakiilka isticmaalaha, dalka, iyo wixii la mid ah.
  2. duubista_xogtaURLs la duubay, bogag, muddada booqashooyinka
  3. urlsSi looga fogaado nuqul ka mid ah URL-yada aadka u weyn, waxaan ku kaydinnaa miis gaar ah.

Sidoo kale ogow in dhammaan miisaskayada ay hore u kala qaybiyeen account_id. Sidan, xaalad ay koontada weyni dhibaato u keento dadka kale waa laga saaray.

Raadinta tilmaamo

Marka si dhow loo eego, waxaan aragnaa in ay wax khaldan yihiin codsi gaar ah. Waxaa habboon in si qoto dheer loo eego xariiqan:

urls && array(
	select id from acc_{account_id}.urls 
	where url  ILIKE  '%enterprise_customer.com/jobs%'
)::text[]

Fikirka ugu horreeya wuxuu ahaa in laga yaabo sababtoo ah ILIKE Dhammaan URL-yadan dhaadheer (waxaan haynaa in ka badan 1,4 milyan gaar ah URL-yada loo soo ururiyay koontadan) wax qabadku wuu xumaan karaa

Laakiin maya, taasi maahan nuxurka!

SELECT id FROM urls WHERE url ILIKE '%enterprise_customer.com/jobs%';
  id
--------
 ...
(198661 rows)

Time: 5231.765 ms

Codsiga raadinta template laftiisa wuxuu qaadanayaa 5 ilbiriqsi oo kaliya. Raadinta qaabka hal milyan oo URL gaar ah maahan dhibaato.

Tuhmanaha soo socda ee liiska ku jira waa dhowr JOIN. Ma laga yaabaa in isticmaalkooda xad dhaafka ah uu sababay hoos u dhaca? Caadiyan JOIN'waa musharixiinta ugu cad cad ee mashaakilaadka waxqabadka, laakiin ma rumaysni in kiiskeena uu ahaa mid caadi ah.

analytics_db=# SELECT
    count(*)
FROM
    acc_{account_id}.urls as recordings_urls,
    acc_{account_id}.recording_data_0 as recording_data,
    acc_{account_id}.sessions_0 as sessions
WHERE
    recording_data.usp_id = sessions.usp_id
    AND sessions.referrer_id = recordings_urls.id
    AND r_time > to_timestamp(1542585600)
    AND r_time < to_timestamp(1545177599)
    AND recording_data.duration >=5
    AND recording_data.num_of_pages > 0 ;
 count
-------
  8086
(1 row)

Time: 147.851 ms

Tani sidoo kale ma ahayn kiiskeena. JOINWaxay noqotay mid aad u degdeg badan.

Hoos u dhigida goobada tuhmanayaasha

Waxaan diyaar u ahaa inaan bilaabo beddelka weydiinta si aan u gaaro horumar kasta oo suurtagal ah. Aniga iyo kooxdayda waxaanu samaynay 2 fikradood oo waaweyn:

  • U isticmaal EXISTS si aad u hesho URL-hoosaadka: Waxaan rabnay inaan mar kale hubinno haddii ay jiraan wax dhibaato ah oo ku saabsan mawduuca URL-yada. Hal dariiqo oo tan lagu gaaro waa in si fudud loo isticmaalo EXISTS. EXISTS awooddo waxay si weyn u wanaajisaa waxqabadka maadaama ay isla markiiba dhammaanayso isla marka ay hesho xadhigga kaliya ee ku habboon xaaladda.

SELECT
	count(*) 
FROM 
    acc_{account_id}.urls as recordings_urls,
    acc_{account_id}.recording_data as recording_data,
    acc_{account_id}.sessions as sessions
WHERE
    recording_data.usp_id = sessions.usp_id
    AND  (  1 = 1  )
    AND sessions.referrer_id = recordings_urls.id
    AND  (exists(select id from acc_{account_id}.urls where url  ILIKE '%enterprise_customer.com/jobs%'))
    AND r_time > to_timestamp(1547585600)
    AND r_time < to_timestamp(1549177599)
    AND recording_data.duration >=5
    AND recording_data.num_of_pages > 0 ;
 count
 32519
(1 row)
Time: 1636.637 ms

Hagaag, haa. Subquery marka lagu duuduubo EXISTS, wuxuu ka dhigayaa wax walba si aad u dhakhso badan. Su'aasha macquulka ah ee soo socota ayaa ah sababta codsiga la JOIN-ami iyo subquery laftiisa ayaa si gaar ah u degdega, laakiin aad bay u wada gaabiyaan?

  • U guurista subquery-ga CTE-da : Haddii su'aashu ay tahay mid keligiis ah, waxaan si fudud u xisaabin karnaa natiijada degdega ah marka hore ka dibna waxaan u gudbin karnaa su'aasha ugu weyn

WITH matching_urls AS (
    select id::text from acc_{account_id}.urls where url  ILIKE  '%enterprise_customer.com/jobs%'
)

SELECT 
    count(*) FROM acc_{account_id}.urls as recordings_urls, 
    acc_{account_id}.recording_data as recording_data, 
    acc_{account_id}.sessions as sessions,
    matching_urls
WHERE 
    recording_data.usp_id = sessions.usp_id 
    AND  (  1 = 1  )  
    AND sessions.referrer_id = recordings_urls.id
    AND (urls && array(SELECT id from matching_urls)::text[])
    AND r_time > to_timestamp(1542585600) 
    AND r_time < to_timestamp(1545107599)
    AND recording_data.duration >=5 
    AND recording_data.num_of_pages > 0;

Laakiin wali aad bay u gaabis ahayd.

In la helo dambiilaha

Wakhtigan oo dhan, hal shay ayaa indhahayga hortooda ka soo iftiimay, kaas oo aan si joogto ah dhinac uga jeexjeexay. Laakiin maadaama aysan jirin wax kale, waxaan go'aansaday inaan iyadana eego. Waxaan ka hadlayaa && hawlwadeen. Bye EXISTS wax qabad la hagaajiyay && ayaa ahayd qodobka kaliya ee soo haray ee guud ahaan dhammaan noocyada weydiinta qunyar socodka ah.

Isagoo eegaya dukumeenti, waan aragnaa taas && la isticmaalo marka aad u baahan tahay in la helo walxo caadi ah oo u dhexeeya laba qaybood.

Codsiga asalka ah waa kan:

AND  (  urls &&  array(select id from acc_{account_id}.urls where url  ILIKE  '%enterprise_customer.com/jobs%')::text[]   )

Taas oo macnaheedu yahay in aanu ku samayno qaab raadinta URL-yadayada, ka dibna hel isgoyska dhammaan URL-yada leh qoraallada caadiga ah. Tani waa waxoogaa jahawareer ah sababtoo ah "urls" halkan ma tixraacayso shaxda ka kooban URL-yada oo dhan, laakiin waxay u jeedaan "urls" tiirka miiska recording_data.

Iyadoo ay sii kordhayaan tuhunka la xiriira &&, Waxaan isku dayay inaan u helo xaqiijin iyaga qorshaha weydiinta ee la sameeyay EXPLAIN ANALYZE (Waxaan horey u lahaa qorshe la keydiyay, laakiin inta badan aad ayaan ugu qanacsanahay in aan tijaabiyo SQL intii aan isku dayi lahaa inaan fahmo mugdiga qorshayaasha weydiinta).

Filter: ((urls && ($0)::text[]) AND (r_time > '2018-12-17 12:17:23+00'::timestamp with time zone) AND (r_time < '2018-12-18 23:59:59+00'::timestamp with time zone) AND (duration >= '5'::double precision) AND (num_of_pages > 0))
                           Rows Removed by Filter: 52710

Waxaa jiray dhowr xariiq oo filtarrada oo keliya &&. Taas oo macnaheedu yahay in qaliinkani aanu ahayn mid qaali ah, laakiin sidoo kale la sameeyay dhowr jeer.

Waxaan tan tijaabiyey anigoo kakooban xaalada

SELECT 1
FROM 
    acc_{account_id}.urls as recordings_urls, 
    acc_{account_id}.recording_data_30 as recording_data_30, 
    acc_{account_id}.sessions_30 as sessions_30 
WHERE 
	urls &&  array(select id from acc_{account_id}.urls where url  ILIKE  '%enterprise_customer.com/jobs%')::text[]

Weydiintani waxay u socotay si tartiib ah. Sababtoo ah JOIN-s waa dheereya oo subqueries waa dhakhso, waxa kaliya ee hadhay ahaa && hawlwadeen.

Tani waa kaliya hawlgal muhiim ah. Waxaan had iyo jeer u baahanahay inaan raadino dhammaan miiska hoose ee URL-yada si aan u raadinno qaab, mar walbana waxaan u baahanahay inaan helno isgoysyo. Si toos ah uma baari karno diiwaannada URL-ka, sababtoo ah kuwani waa aqoonsiyo kaliya oo tixraacaya urls.

Waddada loo maro xalka

&& gaabis sababtoo ah labada qayboodba waa weyn yihiin. Qalliinku wuxuu noqon doonaa mid degdeg ah haddii aan beddelo urls on { "http://google.com/", "http://wingify.com/" }.

Waxaan bilaabay inaan raadiyo hab aan ku sameeyo isgoysyada Postgres anigoon isticmaalin &&, laakiin guulo badan ma helin.

Dhammaadkii, waxaan go'aansanay inaan dhibaatada kaliya ku xalinno go'doon: wax walba i sii urls khadadka kuwaas oo URL-ku uu ku habboon yahay qaabka. Haddii aysan jirin shuruudo dheeraad ah waxay noqon doontaa - 

SELECT urls.url
FROM 
	acc_{account_id}.urls as urls,
	(SELECT unnest(recording_data.urls) AS id) AS unrolled_urls
WHERE
	urls.id = unrolled_urls.id AND
	urls.url  ILIKE  '%jobs%'

Bedelkii JOIN syntax waxaan isticmaalay subquery oo balaariyay recording_data.urls array si aad si toos ah ugu dabaqdo shuruudda gudaha WHERE.

Waxa ugu muhiimsan halkan waa taas && loo isticmaalo in lagu hubiyo in galitaanka la bixiyay uu ku jiro URL u dhigma. Haddii aad wax yar dhuuxdo, waxa aad arkaysaa in qalliinkan uu dhex marayo curiyayaasha array (ama safafka miis) oo joogsada marka xaalad (isku mid ah) la buuxiyo. Miyaan waxba ku xasuusin? Haa, EXISTS.

Tan iyo markii recording_data.urls waxaa laga tixraaci karaa meel ka baxsan macnaha guud, marka tani dhacdo waxaan dib ugu dhici karnaa saaxiibkeen hore EXISTS oo ku duub subquery iyada.

Isku geynta wax walba, waxaan helnaa weydiinta ugu dambeysa ee la hagaajiyay:

SELECT 
    count(*) 
FROM 
    acc_{account_id}.urls as recordings_urls, 
    acc_{account_id}.recording_data as recording_data, 
    acc_{account_id}.sessions as sessions 
WHERE 
    recording_data.usp_id = sessions.usp_id 
    AND  (  1 = 1  )  
    AND sessions.referrer_id = recordings_urls.id 
    AND r_time > to_timestamp(1542585600) 
    AND r_time < to_timestamp(1545177599) 
    AND recording_data.duration >=5 
    AND recording_data.num_of_pages > 0
    AND EXISTS(
        SELECT urls.url
        FROM 
            acc_{account_id}.urls as urls,
            (SELECT unnest(urls) AS rec_url_id FROM acc_{account_id}.recording_data) 
            AS unrolled_urls
        WHERE
            urls.id = unrolled_urls.rec_url_id AND
            urls.url  ILIKE  '%enterprise_customer.com/jobs%'
    );

Iyo wakhtiga hogaanka u dambeeya Time: 1898.717 ms Goorma la dabbaaldegayaa?!?

Aad uma degdegsana! Marka hore waxaad u baahan tahay inaad hubiso saxnaanta. Aad ayaan uga shakiyay EXISTS hagaajinta sida ay u beddesho macquulka ah si ay u joojiyaan hore. Waxaan u baahannahay inaan hubinno inaanaan codsiga ku darin qalad aan caddayn.

Imtixaan fudud ayaa ahaa in la ordo count(*) labada su'aalood oo gaabis ah oo degdeg ah oo loogu talagalay tiro badan oo xog ah oo kala duwan. Kadib, qayb yar oo xogta ah, waxaan gacanta ku xaqiijiyay in dhammaan natiijooyinku sax ahaayeen.

Dhammaan imtixaanada waxay si joogto ah u bixiyeen natiijooyin togan. Wax walba waan hagaajinnay!

Duruusta La Bartay

Sheekadaan casharo badan ayaa laga baran karaa:

  1. Qorshayaasha weydiintu ma sheegaan sheekada oo dhan, laakiin waxay ku siin karaan tilmaamo
  2. Tuhmanayaasha ugu waaweyni had iyo jeer maaha dembiilayaasha dhabta ah
  3. Weydiimaha qunyar socodka ah waa la jebin karaa si loo go'doomiyo caqabadaha
  4. Ma aha dhammaan hagaajinta dabeecadda wax-yar
  5. Isticmaal EXIST, haddii ay suurtagal tahay, waxay u horseedi kartaa koror weyn oo wax soo saar ah

gunaanad

Waxaan ka soo baxnay wakhti su'aal ah ~24 daqiiqo ilaa 2 ilbiriqsi - wax qabad muuqda ayaa kordhay! In kasta oo maqaalkani uu si weyn u soo baxay, dhammaan tijaabooyinkii aan samaynay waxay dhaceen hal maalin, waxaana lagu qiyaasay inay qaateen inta u dhaxaysa 1,5 ilaa 2 saacadood hagaajinta iyo tijaabinta.

SQL waa luuqad cajiib ah haddii aadan ka baqin, laakiin isku day inaad barato oo aad isticmaasho. Markaad si fiican u fahamto sida su'aalaha SQL loo fuliyo, sida xog ururintu u soo saarto qorshooyinka weydiinta, sida tusmooyinku u shaqeeyaan, iyo si fudud cabbirka xogta aad la macaamilayso, waxaad aad ugu guulaysan kartaa hagaajinta su'aalaha. Si kastaba ha ahaatee, waa muhiim, si kastaba ha ahaatee, in la sii wado isku dayga habab kala duwan oo si tartiib ah loo jebiyo dhibaatada, helitaanka caqabadaha.

Qaybta ugu fiican ee ku saabsan gaadhista natiijooyinka kuwan oo kale ah waa la dareemi karo, horumar xawaaraha muuqda - halkaas oo warbixin markii hore aan xitaa soo qaadi karin hadda ku dhawaad ​​isla markiiba.

Mahad gaar ah ayaa leh asxaabteyda amarka Aditya MishraAditya Gauru ΠΈ Varun Malhotra maskax-dhisidda iyo Dinkar Pandir si aan u helno qalad muhiim ah codsigeena u dambeeya ka hor inta aanan ugu dambeyntii macsalaameyn!

Source: www.habr.com

Add a comment