Iyo nyaya yeimwe SQL yekuferefeta

Zvita apfuura ndakagamuchira inonakidza bug report kubva kuVWO rutsigiro timu. Nguva yekurodha yeimwe yeanalytics mishumo yemutengi mukuru wekambani yairatidzika kunge isingaite. Uye sezvo iyi iri nharaunda yangu yebasa, ndakabva ndatarisa pakugadzirisa dambudziko.

prehistory

Kuti zvijeke zvandiri kutaura nezvazvo, ini ndichakuudza zvishoma nezveVWO. Iyi ipuratifomu yaunogona kutanga nayo akasiyana akatariswa mishandirapamwe pawebhusaiti yako: ita zviedzo zveA/B, tarisa vashanyi uye shanduko, ongorora furani yekutengesa, ratidza mamepu ekupisa uye kutamba marekodhi ekushanya.

Asi chinhu chinonyanya kukosha pamusoro pepuratifomu ndechekushuma. Zvose zviri pamusoro apa zvakabatana. Uye kune vatengi vemakambani, huwandu hukuru hweruzivo hungangove husina basa pasina chikuva chine simba chinouratidza mune analytics fomu.

Uchishandisa chikuva, iwe unogona kuita zvisina tsarukano mubvunzo pane yakakura data set. Heino muenzaniso wakapfava:

Ratidza kudzvanya kwese papeji "abc.com" KUBVA <date d1> KUSVIKIRA <date d2> kune vanhu vaishandisa Chrome OR (iri kuEurope UYE vaishandisa iPhone)

Teerera kune vashandisi veBoolean. Iwo anowanikwa kune vatengi mune yemubvunzo interface kuti vabvunze zvisina tsarukano yakaoma kuti vawane masampuli.

Chikumbiro chinononoka

Mutengi ari mubvunzo aiedza kuita chimwe chinhu icho intuitively chinofanira kushanda nekukurumidza:

Ratidza ese marekodhi echikamu chevashandisi vakashanyira chero peji ine URL ine "/mabasa"

Saiti iyi yaive netraffic yakawanda uye isu taichengeta anopfuura miriyoni akasiyana ma URLs chete. Uye ivo vaida kutsvaga yakapusa URL template ine chekuita nebhizinesi ravo modhi.

Kuongorora kwekutanga

Ngatitarisei zviri kuitika mudhatabhesi. Pazasi pane yekutanga inononoka SQL mubvunzo:

SELECT 
    count(*) 
FROM 
    acc_{account_id}.urls as recordings_urls, 
    acc_{account_id}.recording_data as recording_data, 
    acc_{account_id}.sessions as sessions 
WHERE 
    recording_data.usp_id = sessions.usp_id 
    AND sessions.referrer_id = recordings_urls.id 
    AND  (  urls &&  array(select id from acc_{account_id}.urls where url  ILIKE  '%enterprise_customer.com/jobs%')::text[]   ) 
    AND r_time > to_timestamp(1542585600) 
    AND r_time < to_timestamp(1545177599) 
    AND recording_data.duration >=5 
    AND recording_data.num_of_pages > 0 ;

Uye pano ndiyo nguva:

Yakarongwa nguva: 1.480 ms Nguva yekuuraya: 1431924.650 ms

Mubvunzo wakakambaira zviuru zana nemakumi mashanu emitsara. Murongi wemubvunzo airatidza akati wandei anonakidza ruzivo, asi pasina mabhodhoro ari pachena.

Ngatidzidzei chikumbiro mberi. Sezvaunogona kuona, anodaro JOIN matafura matatu:

  1. muzvirongwa: kuratidza ruzivo rwesesheni: browser, mushandisi mumiriri, nyika, zvichingodaro.
  2. recording_data: ma URL akanyorwa, mapeji, nguva yekushanya
  3. urls: Kudzivirira kudzokorora maURL akakura zvakanyanya, tinoachengeta mune imwe tafura.

Ziva zvakare kuti matafura edu ese akatogovaniswa ne account_id. Nenzira iyi, mamiriro ezvinhu apo imwe yakakurisa account inokonzera matambudziko kune vamwe inobviswa.

Kutsvaga zviratidzo

Pakunyatsoongorora, tinoona kuti chimwe chinhu chakaipa nechikumbiro chakati. Zvakakodzera kunyatsotarisisa mutsetse uyu:

urls && array(
	select id from acc_{account_id}.urls 
	where url  ILIKE  '%enterprise_customer.com/jobs%'
)::text[]

Pfungwa yekutanga yaive yekuti pamwe nekuti ILIKE pane ese maURL marefu aya (tine anopfuura miriyoni 1,4 akasiyana MaURL akaunganidzwa eakaunti ino) kuita kunganetsa.

Asi kwete, iyo haisi iyo pfungwa!

SELECT id FROM urls WHERE url ILIKE '%enterprise_customer.com/jobs%';
  id
--------
 ...
(198661 rows)

Time: 5231.765 ms

Iyo template yekutsvaga chikumbiro pachayo inotora chete 5 masekonzi. Kutsvaga pateni mune miriyoni akasiyana maURL zviri pachena kuti harisi dambudziko.

Anotevera anofungirwa pane rondedzero akati wandei JOIN. Zvichida kushandiswa kwavo zvakanyanya kwakakonzera kuderera? Kazhinji JOIN's ndivo vanonyanya kukwikwidza kumatambudziko ekuita, asi ini handina kutenda kuti nyaya yedu yaive yakafanana.

analytics_db=# SELECT
    count(*)
FROM
    acc_{account_id}.urls as recordings_urls,
    acc_{account_id}.recording_data_0 as recording_data,
    acc_{account_id}.sessions_0 as sessions
WHERE
    recording_data.usp_id = sessions.usp_id
    AND sessions.referrer_id = recordings_urls.id
    AND r_time > to_timestamp(1542585600)
    AND r_time < to_timestamp(1545177599)
    AND recording_data.duration >=5
    AND recording_data.num_of_pages > 0 ;
 count
-------
  8086
(1 row)

Time: 147.851 ms

Uye iyi yakanga isiriwo nyaya yedu. JOIN's yakange ichimhanya chaizvo.

Kudzikisa denderedzwa revanofungira

Ndakanga ndakagadzirira kutanga kushandura mubvunzo kuti ndiwane chero zvingaita kuvandudzwa kwekuita. Chikwata changu neni takagadzira mazano makuru maviri:

  • Shandisa EXISTS kubvunza URL: Taida kutarisa zvakare kana paine matambudziko neiyo subquery yemaURL. Imwe nzira yekuita izvi ndeyekungoshandisa EXISTS. EXISTS may inonatsiridza zvakanyanya kuita sezvo inopera nekukurumidza kana ichinge yawana chete tambo inofanana nemamiriro acho.

SELECT
	count(*) 
FROM 
    acc_{account_id}.urls as recordings_urls,
    acc_{account_id}.recording_data as recording_data,
    acc_{account_id}.sessions as sessions
WHERE
    recording_data.usp_id = sessions.usp_id
    AND  (  1 = 1  )
    AND sessions.referrer_id = recordings_urls.id
    AND  (exists(select id from acc_{account_id}.urls where url  ILIKE '%enterprise_customer.com/jobs%'))
    AND r_time > to_timestamp(1547585600)
    AND r_time < to_timestamp(1549177599)
    AND recording_data.duration >=5
    AND recording_data.num_of_pages > 0 ;
 count
 32519
(1 row)
Time: 1636.637 ms

Zvakanaka, hongu. Subquery kana yakaputirwa mukati EXISTS, inoita kuti zvinhu zvese zvive nekukurumidza. Mubvunzo unotevera une musoro ndewekuti sei chikumbiro ne JOIN-ami uye iyo subquery pachayo inokurumidza mumwe nemumwe, asi inononoka pamwe chete?

  • Kufambisa iyo subquery kuCTE : Kana mubvunzo wacho uchikasira wega, tinogona kungoverenga mhedzisiro yekutanga tozopa kumubvunzo mukuru.

WITH matching_urls AS (
    select id::text from acc_{account_id}.urls where url  ILIKE  '%enterprise_customer.com/jobs%'
)

SELECT 
    count(*) FROM acc_{account_id}.urls as recordings_urls, 
    acc_{account_id}.recording_data as recording_data, 
    acc_{account_id}.sessions as sessions,
    matching_urls
WHERE 
    recording_data.usp_id = sessions.usp_id 
    AND  (  1 = 1  )  
    AND sessions.referrer_id = recordings_urls.id
    AND (urls && array(SELECT id from matching_urls)::text[])
    AND r_time > to_timestamp(1542585600) 
    AND r_time < to_timestamp(1545107599)
    AND recording_data.duration >=5 
    AND recording_data.num_of_pages > 0;

Asi zvakanga zvichiri zvishoma.

Kutsvaga ane mhosva

Nguva yese iyi, kanhu kadiki kaipenya pamberi pemaziso angu, chandaigara ndakachibvisa. Asi sezvo pakanga pasisina chimwe chinhu, ndakasarudza kumutarisawo. Ndiri kutaura nezvazvo && opareta. Chisarai EXISTS kungovandudza kuita && ndiyo chete yakasara yakajairika pane ese mavhezheni emubvunzo unononoka.

Kutarisa zvinyorwa, tinozviona && rinoshandiswa kana iwe uchida kutsvaga zvakajairika zvinhu pakati pezvirongwa zviviri.

Muchikumbiro chepakutanga ndeichi:

AND  (  urls &&  array(select id from acc_{account_id}.urls where url  ILIKE  '%enterprise_customer.com/jobs%')::text[]   )

Zvinoreva kuti tinoita tsvakiridzo yemapateni pamaURL edu, tobva tawana mharadzano ine maURL ese ane zvakajairika. Izvi zvinoti vhiringa nekuti "urls" pano hairevi tafura ine ma URL ese, asi kune "urls" column iri patafura. recording_data.

Nekuwedzera kufungirana maererano &&, ndakaedza kuwana simbiso kwavari muchirongwa chemubvunzo chakagadzirwa EXPLAIN ANALYZE (Ndanga ndatove nechirongwa chakachengetedzwa, asi ini ndinowanzo kusununguka kuyedza muSQL pane kuedza kunzwisisa kusajeka kwevarongi vemibvunzo).

Filter: ((urls && ($0)::text[]) AND (r_time > '2018-12-17 12:17:23+00'::timestamp with time zone) AND (r_time < '2018-12-18 23:59:59+00'::timestamp with time zone) AND (duration >= '5'::double precision) AND (num_of_pages > 0))
                           Rows Removed by Filter: 52710

Paive nemitsara yakawanda yemasefa chete kubva &&. Izvo zvaireva kuti kuvhiya uku kwaisangodhura chete, asi kwakaitwawo kakawanda.

Ndakaedza izvi nekuzvitsaura mamiriro acho

SELECT 1
FROM 
    acc_{account_id}.urls as recordings_urls, 
    acc_{account_id}.recording_data_30 as recording_data_30, 
    acc_{account_id}.sessions_30 as sessions_30 
WHERE 
	urls &&  array(select id from acc_{account_id}.urls where url  ILIKE  '%enterprise_customer.com/jobs%')::text[]

Mubvunzo uyu wakanonoka. Nokuti iyo JOIN-s inokurumidza uye subqueries inokurumidza, chinhu chega chakasara chaive && opareta.

Uku kungoshanda kwakakosha. Isu tinogara tichida kutsvaga tafura yese yepasi pema URL kuti titsvage patani, uye tinogara tichida kutsvaga mharadzano. Hatigone kutsvaga nemarekodhi eURL zvakananga, nekuti aya angori maID arikureva urls.

Panzira yekugadziriswa

&& inononoka nekuti ese ese akakura. Oparesheni yacho inokurumidza kana ndikatsiva urls pamusoro { "http://google.com/", "http://wingify.com/" }.

Ndakatanga kutsvaga nzira yekuita set intersection muPostgres ndisingashandise &&, asi pasina kubudirira kukuru.

Pakupedzisira, takasarudza kungogadzirisa dambudziko tiri toga: ndipe zvese urls mitsara iyo URL inofananidzira patani. Pasina mamwe mamiriro zvichave - 

SELECT urls.url
FROM 
	acc_{account_id}.urls as urls,
	(SELECT unnest(recording_data.urls) AS id) AS unrolled_urls
WHERE
	urls.id = unrolled_urls.id AND
	urls.url  ILIKE  '%jobs%'

Panzvimbo iyoyo JOIN syntax Ini ndichangoshandisa subquery uye yakawedzera recording_data.urls array kuitira kuti iwe ugone kushandisa zvakananga mamiriro mukati WHERE.

Chinonyanya kukosha apa ndechekuti && inoshandiswa kutarisa kuti chakapihwa chine URL inoenderana here. Kana iwe ukatsveta zvishoma, unoona oparesheni iyi ichifamba nepakati pezvinhu zvehurongwa (kana mitsara yetafura) uye inomira kana mamiriro (mechi) asangana. Hapana chinokuyeuchidza here? Ehe, EXISTS.

Kubva zvichienda mberi recording_data.urls inogona kutaurwa kubva kunze kweiyo subquery mamiriro, kana izvi zvikaitika tinogona kuwira kumashure kushamwari yedu yekare EXISTS uye putira iyo subquery nayo.

Kuisa zvese pamwechete, tinowana yekupedzisira yakagadziridzwa mubvunzo:

SELECT 
    count(*) 
FROM 
    acc_{account_id}.urls as recordings_urls, 
    acc_{account_id}.recording_data as recording_data, 
    acc_{account_id}.sessions as sessions 
WHERE 
    recording_data.usp_id = sessions.usp_id 
    AND  (  1 = 1  )  
    AND sessions.referrer_id = recordings_urls.id 
    AND r_time > to_timestamp(1542585600) 
    AND r_time < to_timestamp(1545177599) 
    AND recording_data.duration >=5 
    AND recording_data.num_of_pages > 0
    AND EXISTS(
        SELECT urls.url
        FROM 
            acc_{account_id}.urls as urls,
            (SELECT unnest(urls) AS rec_url_id FROM acc_{account_id}.recording_data) 
            AS unrolled_urls
        WHERE
            urls.id = unrolled_urls.rec_url_id AND
            urls.url  ILIKE  '%enterprise_customer.com/jobs%'
    );

Uye nguva yekupedzisira yekutungamira Time: 1898.717 ms Inguva yekupemberera?!?

Kwete nekukurumidza! Kutanga iwe unoda kutarisa kururama. Ndainyumwa zvakanyanya EXISTS optimization sezvainoshandura pfungwa yekumisa kare. Tinofanira kuva nechokwadi chekuti hatina kuwedzera kukanganisa kusiri pachena kuchikumbiro.

Muedzo wakapfava waiva wokumhanya count(*) pamibvunzo yese inononoka uye inokurumidza yenhamba huru yemaseti edata akasiyana. Zvadaro, kune diki diki re data, ini ndakasimbisa nemaoko kuti zvese zvabuda zvaive zvechokwadi.

Maedzo ese airamba achipa mhinduro dzakanaka. Takagadzirisa zvese!

Zvidzidzo Zvakadzidzwa

Pane zvakawanda zvidzidzo zvekudzidza kubva munyaya iyi:

  1. Zvirongwa zvemubvunzo hazvitauri nyaya yese, asi vanogona kupa zviratidzo
  2. Vafungidziri vakuru havasirivo vapari vechokwadi nguva dzose
  3. Mibvunzo inononoka inogona kudimburirwa pasi kuti iparadzanise mabhodhoro
  4. Haasi ese ma optimizations anoderedza hunhu
  5. Shandisa EXIST, pazvinobvira, zvinogona kutungamirira kukuwedzera kunoshamisa mukubudirira

mhedziso

Takabva panguva yekubvunza ye ~ maminitsi makumi maviri nemana kusvika kumasekonzi maviri - kuwedzera kwakakosha kwekuita! Kunyangwe chinyorwa ichi chakabuda chikuru, zviyedzo zvese zvatakaita zvakaitika muzuva rimwe, uye zvaifungidzirwa kuti zvakatora pakati pe24 nemaawa maviri ekugadzirisa uye kuyedzwa.

SQL mutauro unoshamisa kana usingautye, asi edza kudzidza nekuishandisa. Nekuva nekunzwisisa kwakanaka kwemaitirwo emibvunzo yeSQL, kuti dhatabhesi rinogadzira sei hurongwa hwemibvunzo, mashandiro anoita indexes, uye nehukuru hwe data rauri kubata naro, unogona kubudirira zvakanyanya pakugadzirisa mibvunzo. Izvo zvakakosha zvakaenzana, zvisinei, kuramba uchiedza nzira dzakasiyana uye zvishoma nezvishoma kuputsa dambudziko, kutsvaga mabhodhoro.

Chikamu chakanakisa pamusoro pekuwana mhedzisiro senge iyi ndiyo inooneka, inooneka kukurumidza kuvandudza - uko chirevo chaisatombo takura zvino chinotakura kanenge ipapo.

Kutenda kwakakosha kuna shamwari dzangu pakuraira kwaAditya MishraAditya Gauru ΠΈ Varun Malhotra zvekuongorora pfungwa uye Dinkar Pandir nekutsvaga kukanganisa kwakakosha muchikumbiro chedu chekupedzisira tisati tazoonekana nazvo!

Source: www.habr.com

Voeg