Awọn itan ti ọkan SQL iwadi

Oṣu Kejila to kọja Mo gba ijabọ kokoro ti o nifẹ lati ọdọ ẹgbẹ atilẹyin VWO. Akoko ikojọpọ fun ọkan ninu awọn ijabọ atupale fun alabara ile-iṣẹ nla kan dabi ẹni pe o jẹ idiwọ. Ati pe nitori eyi ni agbegbe ti ojuse mi, Mo dojukọ lẹsẹkẹsẹ lori ipinnu iṣoro naa.

prehistory

Lati jẹ ki o ye ohun ti Mo n sọrọ nipa, Emi yoo so fun o kekere kan nipa VWO. Eyi jẹ pẹpẹ pẹlu eyiti o le ṣe ifilọlẹ ọpọlọpọ awọn ipolongo ifọkansi lori awọn oju opo wẹẹbu rẹ: ṣe awọn adanwo A/B, tọpa awọn alejo ati awọn iyipada, ṣe itupalẹ awọn eefin tita, ṣafihan awọn maapu ooru ati mu awọn gbigbasilẹ ibẹwo ṣiṣẹ.

Ṣugbọn ohun pataki julọ nipa pẹpẹ jẹ ijabọ. Gbogbo awọn iṣẹ ti o wa loke wa ni asopọ. Ati fun awọn alabara ile-iṣẹ, iye nla ti alaye yoo jẹ asan lasan laisi pẹpẹ ti o lagbara ti o ṣafihan ni fọọmu itupalẹ.

Lilo Syeed, o le ṣe ibeere laileto lori ṣeto data nla kan. Eyi ni apẹẹrẹ ti o rọrun:

Ṣe afihan gbogbo awọn titẹ ni oju-iwe "abc.com" LATI <ọjọ d1> TO <ọjọ d2> fun awọn eniyan ti o lo Chrome OR (ti o wa ni Yuroopu ATI lo iPhone kan)

San ifojusi si awọn oniṣẹ Boolean. Wọn wa fun awọn alabara ni wiwo ibeere lati ṣe awọn ibeere idiju lainidii lati gba awọn ayẹwo.

Ibere ​​​​lọra

Onibara ti o wa ni ibeere n gbiyanju lati ṣe nkan ti o yẹ ki o ṣiṣẹ ni oye ni iyara:

Ṣe afihan gbogbo awọn igbasilẹ igba fun awọn olumulo ti o ṣabẹwo si oju-iwe eyikeyi pẹlu URL ti o ni "/ awọn iṣẹ" ninu

Aaye yii ni pupọ ti ijabọ ati pe a n tọju awọn URL alailẹgbẹ miliọnu kan fun rẹ nikan. Ati pe wọn fẹ lati wa awoṣe URL ti o rọrun ti o ni ibatan si awoṣe iṣowo wọn.

Iwadi alakoko

Jẹ ki a wo ohun ti n ṣẹlẹ ninu ibi ipamọ data. Ni isalẹ ni ibeere SQL atilẹba ti o lọra:

SELECT 
    count(*) 
FROM 
    acc_{account_id}.urls as recordings_urls, 
    acc_{account_id}.recording_data as recording_data, 
    acc_{account_id}.sessions as sessions 
WHERE 
    recording_data.usp_id = sessions.usp_id 
    AND sessions.referrer_id = recordings_urls.id 
    AND  (  urls &&  array(select id from acc_{account_id}.urls where url  ILIKE  '%enterprise_customer.com/jobs%')::text[]   ) 
    AND r_time > to_timestamp(1542585600) 
    AND r_time < to_timestamp(1545177599) 
    AND recording_data.duration >=5 
    AND recording_data.num_of_pages > 0 ;

Ati pe eyi ni awọn akoko:

Akoko ti a gbero: 1.480 ms Akoko ipaniyan: 1431924.650 ms

Ibeere naa ra 150 ẹgbẹrun awọn ori ila. Alakoso ibeere ṣe afihan tọkọtaya awọn alaye ti o nifẹ, ṣugbọn ko si awọn igo ti o han gbangba.

Jẹ ki a ṣe iwadi siwaju sii. Bi o ti le ri, o ṣe JOIN tabili mẹta:

  1. akoko: lati ṣafihan alaye igba: ẹrọ aṣawakiri, aṣoju olumulo, orilẹ-ede, ati bẹbẹ lọ.
  2. gbigbasilẹ_data: URL ti o gbasilẹ, awọn oju-iwe, iye akoko awọn abẹwo
  3. awọn url: Lati yago fun pidánpidán awọn URL ti o tobi pupọ, a tọju wọn sinu tabili lọtọ.

Tun ṣe akiyesi pe gbogbo awọn tabili wa ti pin tẹlẹ nipasẹ account_id. Nitorinaa, ipo kan nibiti akọọlẹ nla kan paapaa fa awọn iṣoro fun awọn miiran ni a yọkuro.

Nwa fun awọn amọran

Ni ayewo ti o sunmọ, a rii pe nkan kan jẹ aṣiṣe pẹlu ibeere kan pato. O tọ lati wo laini yii ni pẹkipẹki:

urls && array(
	select id from acc_{account_id}.urls 
	where url  ILIKE  '%enterprise_customer.com/jobs%'
)::text[]

Ni igba akọkọ ti ero ni wipe boya nitori ILIKE lori gbogbo awọn URL gigun wọnyi (a ni ju 1,4 milionu alailẹgbẹ Awọn URL ti a gba fun akọọlẹ yii) iṣẹ ṣiṣe le jiya.

Ṣugbọn rara, iyẹn kii ṣe aaye naa!

SELECT id FROM urls WHERE url ILIKE '%enterprise_customer.com/jobs%';
  id
--------
 ...
(198661 rows)

Time: 5231.765 ms

Ibeere wiwa awoṣe funrararẹ gba iṣẹju-aaya 5 nikan. Wiwa apẹrẹ kan ni awọn URL alailẹgbẹ miliọnu kan jẹ kedere kii ṣe iṣoro kan.

Nigbamii ti ifura lori awọn akojọ ni orisirisi awọn JOIN. Boya ilokulo wọn ti fa idinku? Nigbagbogbo JOIN's jẹ awọn oludije ti o han julọ fun awọn iṣoro iṣẹ, ṣugbọn Emi ko gbagbọ pe ọran wa jẹ aṣoju.

analytics_db=# SELECT
    count(*)
FROM
    acc_{account_id}.urls as recordings_urls,
    acc_{account_id}.recording_data_0 as recording_data,
    acc_{account_id}.sessions_0 as sessions
WHERE
    recording_data.usp_id = sessions.usp_id
    AND sessions.referrer_id = recordings_urls.id
    AND r_time > to_timestamp(1542585600)
    AND r_time < to_timestamp(1545177599)
    AND recording_data.duration >=5
    AND recording_data.num_of_pages > 0 ;
 count
-------
  8086
(1 row)

Time: 147.851 ms

Ati pe eyi tun kii ṣe ọran wa. JOIN's wa ni jade lati wa ni oyimbo sare.

Dinku Circle ti awọn ifura

Mo ti ṣetan lati bẹrẹ iyipada ibeere lati ṣaṣeyọri eyikeyi awọn ilọsiwaju iṣẹ ṣiṣe. Emi ati ẹgbẹ mi ni idagbasoke awọn imọran akọkọ meji:

  • Lo EXISTS fun URL abẹlẹ: A fẹ lati ṣayẹwo lẹẹkansi ti awọn iṣoro eyikeyi ba wa pẹlu abẹlẹ fun awọn URL. Ọna kan lati ṣaṣeyọri eyi ni lati lo nirọrun EXISTS. EXISTS le mu ilọsiwaju pọ si niwọn igba ti o pari lẹsẹkẹsẹ ni kete ti o rii okun kan ṣoṣo ti o baamu ipo naa.

SELECT
	count(*) 
FROM 
    acc_{account_id}.urls as recordings_urls,
    acc_{account_id}.recording_data as recording_data,
    acc_{account_id}.sessions as sessions
WHERE
    recording_data.usp_id = sessions.usp_id
    AND  (  1 = 1  )
    AND sessions.referrer_id = recordings_urls.id
    AND  (exists(select id from acc_{account_id}.urls where url  ILIKE '%enterprise_customer.com/jobs%'))
    AND r_time > to_timestamp(1547585600)
    AND r_time < to_timestamp(1549177599)
    AND recording_data.duration >=5
    AND recording_data.num_of_pages > 0 ;
 count
 32519
(1 row)
Time: 1636.637 ms

O dara, bẹẹni. Ibere ​​nigba ti a we sinu EXISTS, mu ki ohun gbogbo ni iyara pupọ. Nigbamii ti mogbonwa ibeere ni idi ti awọn ìbéèrè pẹlu JOIN-ami ati awọn subquery ara wa ni sare leyo, sugbon ni o wa lasan o lọra jọ?

  • Gbigbe subquery si CTE : Ti ibeere naa ba yara lori tirẹ, a le jiroro ni iṣiro abajade iyara ni akọkọ ati lẹhinna pese si ibeere akọkọ

WITH matching_urls AS (
    select id::text from acc_{account_id}.urls where url  ILIKE  '%enterprise_customer.com/jobs%'
)

SELECT 
    count(*) FROM acc_{account_id}.urls as recordings_urls, 
    acc_{account_id}.recording_data as recording_data, 
    acc_{account_id}.sessions as sessions,
    matching_urls
WHERE 
    recording_data.usp_id = sessions.usp_id 
    AND  (  1 = 1  )  
    AND sessions.referrer_id = recordings_urls.id
    AND (urls && array(SELECT id from matching_urls)::text[])
    AND r_time > to_timestamp(1542585600) 
    AND r_time < to_timestamp(1545107599)
    AND recording_data.duration >=5 
    AND recording_data.num_of_pages > 0;

Sugbon o tun jẹ o lọra pupọ.

Wiwa ẹlẹṣẹ

Ní gbogbo àkókò yìí, ohun kékeré kan tàn níwájú ojú mi, èyí tí mo máa ń fọ́ sí ẹ̀gbẹ́ rẹ̀. Ṣùgbọ́n níwọ̀n bí kò ti sí ohun mìíràn tí ó ṣẹ́ kù, mo pinnu láti wò ó pẹ̀lú. Mo n sọrọ nipa && onišẹ. Kabiyesi EXISTS o kan dara si išẹ && jẹ ifosiwewe wọpọ nikan ti o ku kọja gbogbo awọn ẹya ti ibeere ti o lọra.

Ti nwo iwe aṣẹ, a ri pe && lo nigbati o nilo lati wa awọn eroja ti o wọpọ laarin awọn ọna meji.

Ninu ibeere atilẹba eyi ni:

AND  (  urls &&  array(select id from acc_{account_id}.urls where url  ILIKE  '%enterprise_customer.com/jobs%')::text[]   )

Eyi ti o tumọ si pe a ṣe wiwa apẹrẹ lori Awọn URL wa, lẹhinna wa ikorita pẹlu gbogbo awọn URL pẹlu awọn ifiweranṣẹ ti o wọpọ. Eyi jẹ iruju diẹ nitori “urls” nibi ko tọka si tabili ti o ni gbogbo awọn URL ninu, ṣugbọn si iwe “urls” ninu tabili recording_data.

Pẹlu dagba awọn ifura nipa &&, Mo gbiyanju lati wa ifẹsẹmulẹ fun wọn ninu ero ibeere ti ipilẹṣẹ EXPLAIN ANALYZE (Mo ti ni eto ti o ti fipamọ tẹlẹ, ṣugbọn Mo ni itunu diẹ sii ni idanwo ni SQL ju igbiyanju lati ni oye opacity ti awọn oluṣeto ibeere).

Filter: ((urls && ($0)::text[]) AND (r_time > '2018-12-17 12:17:23+00'::timestamp with time zone) AND (r_time < '2018-12-18 23:59:59+00'::timestamp with time zone) AND (duration >= '5'::double precision) AND (num_of_pages > 0))
                           Rows Removed by Filter: 52710

Nibẹ wà orisirisi awọn ila ti Ajọ nikan lati &&. Eyi ti o tumọ si pe iṣiṣẹ yii kii ṣe gbowolori nikan, ṣugbọn tun ṣe ni igba pupọ.

Mo ṣe idanwo eyi nipa yiya sọtọ ipo naa

SELECT 1
FROM 
    acc_{account_id}.urls as recordings_urls, 
    acc_{account_id}.recording_data_30 as recording_data_30, 
    acc_{account_id}.sessions_30 as sessions_30 
WHERE 
	urls &&  array(select id from acc_{account_id}.urls where url  ILIKE  '%enterprise_customer.com/jobs%')::text[]

Ibeere yi lọra. Nitori awọn JOIN-s ni o wa sare ati subqueries ni sare, awọn nikan ni ohun ti o kù wà && onišẹ.

Eyi jẹ iṣẹ bọtini kan. Nigbagbogbo a nilo lati wa gbogbo tabili ti o wa labẹ awọn URL lati wa apẹrẹ kan, ati pe a nilo nigbagbogbo lati wa awọn ikorita. A ko le wa nipasẹ awọn igbasilẹ URL taara, nitori iwọnyi jẹ awọn ID ti o tọka si urls.

Lori ọna lati lọ si ojutu kan

&& o lọra nitori awọn mejeeji tosaaju ni o wa tobi. Awọn isẹ yoo jẹ jo awọn ọna ti o ba ti mo ti ropo urls on { "http://google.com/", "http://wingify.com/" }.

Mo bẹrẹ si wa ọna lati ṣe ikorita ni Postgres laisi lilo &&, ṣugbọn laisi aṣeyọri pupọ.

Ni ipari, a pinnu lati kan yanju iṣoro naa ni ipinya: fun mi ni ohun gbogbo urls awọn ila fun eyiti URL baamu apẹrẹ naa. Laisi awọn ipo afikun yoo jẹ - 

SELECT urls.url
FROM 
	acc_{account_id}.urls as urls,
	(SELECT unnest(recording_data.urls) AS id) AS unrolled_urls
WHERE
	urls.id = unrolled_urls.id AND
	urls.url  ILIKE  '%jobs%'

Dipo JOIN sintasi Mo ti o kan lo a subquery ati ki o gbooro sii recording_data.urls orun ki o le taara lo majemu ni WHERE.

Ohun pataki julọ nibi ni iyẹn && ti a lo lati ṣayẹwo boya titẹsi ti a fun ni URL ti o baamu. Ti o ba squint kekere kan, o le rii iṣiṣẹ yii n lọ nipasẹ awọn eroja ti orun (tabi awọn ori ila ti tabili) ati duro nigbati ipo kan (baramu) ba pade. Ko ṣe iranti rẹ ohunkohun? Bẹẹni, EXISTS.

Niwon lori recording_data.urls le ṣe itọkasi lati ita aaye ipilẹ-ọrọ, nigbati eyi ba ṣẹlẹ a le ṣubu pada lori ọrẹ wa atijọ EXISTS ati ki o fi ipari si awọn subquery pẹlu rẹ.

Ni fifi ohun gbogbo papọ, a gba ibeere iṣapeye ikẹhin:

SELECT 
    count(*) 
FROM 
    acc_{account_id}.urls as recordings_urls, 
    acc_{account_id}.recording_data as recording_data, 
    acc_{account_id}.sessions as sessions 
WHERE 
    recording_data.usp_id = sessions.usp_id 
    AND  (  1 = 1  )  
    AND sessions.referrer_id = recordings_urls.id 
    AND r_time > to_timestamp(1542585600) 
    AND r_time < to_timestamp(1545177599) 
    AND recording_data.duration >=5 
    AND recording_data.num_of_pages > 0
    AND EXISTS(
        SELECT urls.url
        FROM 
            acc_{account_id}.urls as urls,
            (SELECT unnest(urls) AS rec_url_id FROM acc_{account_id}.recording_data) 
            AS unrolled_urls
        WHERE
            urls.id = unrolled_urls.rec_url_id AND
            urls.url  ILIKE  '%enterprise_customer.com/jobs%'
    );

Ati akoko ipari ipari Time: 1898.717 ms Akoko lati ayeye?!?

Ko ki sare! Ni akọkọ o nilo lati ṣayẹwo deede. Mo ti wà lalailopinpin ifura nipa EXISTS iṣapeye bi o ṣe yipada ọgbọn lati pari ni iṣaaju. A nilo lati ni idaniloju pe a ko ṣafikun aṣiṣe ti kii ṣe kedere si ibeere naa.

A o rọrun igbeyewo je lati ṣiṣe count(*) lori mejeeji o lọra ati awọn ibeere iyara fun nọmba nla ti awọn eto data oriṣiriṣi. Lẹhinna, fun ipin kekere ti data, Mo jẹrisi pẹlu ọwọ pe gbogbo awọn abajade jẹ deede.

Gbogbo awọn idanwo fun awọn abajade rere nigbagbogbo. A ṣe atunṣe ohun gbogbo!

Awọn ẹkọ ti a Kọ

Awọn ẹkọ pupọ lo wa lati kọ lati inu itan yii:

  1. Awọn ero ibeere ko sọ gbogbo itan naa, ṣugbọn wọn le pese awọn amọran
  2. Awọn ifura akọkọ kii ṣe nigbagbogbo awọn ẹlẹṣẹ gidi
  3. Awọn ibeere ti o lọra le fọ lulẹ lati ya sọtọ awọn igo
  4. Kii ṣe gbogbo awọn iṣapeye jẹ idinku ninu iseda
  5. Lo EXIST, nibiti o ti ṣee ṣe, o le ja si awọn ilọsiwaju nla ni iṣelọpọ

ipari

A lọ lati akoko ibeere ti ~ iṣẹju 24 si awọn aaya 2 - ilosoke iṣẹ ṣiṣe pataki! Botilẹjẹpe nkan yii wa jade nla, gbogbo awọn idanwo ti a ṣe ṣẹlẹ ni ọjọ kan, ati pe a ṣe iṣiro pe wọn gba laarin awọn wakati 1,5 ati 2 fun awọn iṣapeye ati idanwo.

SQL jẹ ede iyanu ti o ko ba bẹru rẹ, ṣugbọn gbiyanju lati kọ ẹkọ ati lo. Nipa nini oye ti o dara ti bii awọn ibeere SQL ṣe n ṣiṣẹ, bawo ni data data ṣe n ṣe agbekalẹ awọn ero ibeere, bawo ni awọn atọka ṣe n ṣiṣẹ, ati ni irọrun iwọn data ti o n ṣe, o le ṣaṣeyọri pupọ ni mimuju awọn ibeere. O tun ṣe pataki, sibẹsibẹ, lati tẹsiwaju lati gbiyanju awọn ọna oriṣiriṣi ati laiyara fọ iṣoro naa, wiwa awọn igo.

Apakan ti o dara julọ nipa iyọrisi awọn abajade bii eyi ni akiyesi, ilọsiwaju iyara ti o han - nibiti ijabọ kan ti iṣaaju ko paapaa fifuye ni bayi o fẹrẹẹ lesekese.

Ọpẹ pataki si awọn ẹlẹgbẹ mi ni aṣẹ Aditya MishraAditya Gauru и Varun Malhotra fun brainstorming ati Dinkar Pandir fun wiwa aṣiṣe pataki ninu ibeere ikẹhin wa ṣaaju ki a to sọ o dabọ si rẹ nikẹhin!

orisun: www.habr.com

Fi ọrọìwòye kun