ʻO ka moʻolelo o kahi hoʻokolokolo SQL

I ka lā Dekemaba i hala, ua loaʻa iaʻu kahi hōʻike bug hoihoi mai ka hui kākoʻo VWO. ʻO ka manawa hoʻouka ʻana no kekahi o nā hōʻike hōʻike no kahi mea kūʻai aku ʻoihana nui he mea paʻa. A no ka mea ʻo kēia kaʻu wahi kuleana, ua nānā koke wau i ka hoʻoponopono ʻana i ka pilikia.

prehistory

I mea e maopopo ai kaʻu e kamaʻilio nei, e haʻi iki wau iā ʻoe e pili ana iā VWO. He kahua kēia e hiki ai iā ʻoe ke hoʻomaka i nā hoʻolaha like ʻole i manaʻo ʻia ma kāu mau pūnaewele: hana i nā hoʻokolohua A/B, nānā i nā malihini a me nā hoʻololi ʻana, ka nānā ʻana i ka funnel kūʻai, hōʻike i nā palapala wela a me ka pāʻani ʻana i nā leo kipa.

Akā ʻo ka mea nui loa e pili ana i ka paepae ka hōʻike. Hoʻopili ʻia nā hana a pau i luna. A no nā mea kūʻai aku ʻoihana, he mea ʻole ka nui o ka ʻike me ka ʻole o kahi kahua ikaika e hōʻike ana iā ia ma ke ʻano analytics.

Ke hoʻohana nei i ka paepae, hiki iā ʻoe ke hana i kahi nīnau maʻamau ma kahi hoʻonohonoho ʻikepili nui. Eia kekahi laʻana maʻalahi:

Hōʻike i nā kaomi a pau ma ka ʻaoʻao "abc.com" MAI <lā d1> a hiki i <lā d2> no ka poʻe i hoʻohana iā Chrome OR (aia ma ʻEulopa A hoʻohana i ka iPhone)

E nānā pono i nā mea hoʻohana Boolean. Loaʻa iā lākou i nā mea kūʻai aku ma ka ʻaoʻao hulina e hana i nā nīnau paʻakikī paʻakikī e kiʻi i nā laʻana.

Noi lohi

Ke hoʻāʻo nei ka mea kūʻai aku e hana i kahi mea e hana wikiwiki ai:

Hōʻike i nā moʻolelo kau a pau no nā mea hoʻohana i kipa i kekahi ʻaoʻao me kahi URL me "/jobs"

He ton o kaʻa kaʻa kēia pūnaewele a ke mālama nei mākou ma luna o hoʻokahi miliona mau URL kūʻokoʻa nona wale nō. A makemake lākou e ʻimi i kahi template URL maʻalahi e pili ana i kā lākou ʻoihana ʻoihana.

Hoʻomā ma mua

E nānā kākou i nā mea e hana nei ma ka waihona. Aia ma lalo ka nīnau SQL lohi mua:

SELECT 
    count(*) 
FROM 
    acc_{account_id}.urls as recordings_urls, 
    acc_{account_id}.recording_data as recording_data, 
    acc_{account_id}.sessions as sessions 
WHERE 
    recording_data.usp_id = sessions.usp_id 
    AND sessions.referrer_id = recordings_urls.id 
    AND  (  urls &&  array(select id from acc_{account_id}.urls where url  ILIKE  '%enterprise_customer.com/jobs%')::text[]   ) 
    AND r_time > to_timestamp(1542585600) 
    AND r_time < to_timestamp(1545177599) 
    AND recording_data.duration >=5 
    AND recording_data.num_of_pages > 0 ;

A eia nā manawa:

Ka manawa i hoʻolālā ʻia: 1.480 ms Ka manawa hoʻokō: 1431924.650 ms

Ua kolo ka nīnau i 150 tausani lālani. Ua hōʻike ka mea hoʻolālā nīnau i ʻelua mau kikoʻī hoihoi, akā ʻaʻohe bottlenecks maopopo.

E aʻo hou kāua i ke noi. E like me kāu e ʻike ai, hana ʻo ia JOIN ʻekolu papa:

  1. kau: e hōʻike i ka ʻike o ka hālāwai: polokalamu kele, mea hoʻohana, ʻāina, a pēlā aku.
  2. ʻikepili_hoʻopaʻa: nā URL i hoʻopaʻa ʻia, nā ʻaoʻao, ka lōʻihi o nā kipa
  3. nā pule: I mea e pale aku ai i ka kope kope ʻana i nā URL nui loa, mālama mākou iā lākou i loko o kahi papa ʻokoʻa.

E hoʻomaopopo hoʻi ua hoʻokaʻawale ʻia kā mākou papa ʻaina āpau e account_id. Ma kēia ala, hoʻokaʻawale ʻia kahi kūlana kahi e hoʻopilikia ai kekahi moʻokāki nui.

Ke ʻimi nei i nā hōʻailona

Ma ka nānā pono ʻana, ʻike mākou ua hewa kekahi mea i kekahi noi. Pono e nānā pono i kēia laina:

urls && array(
	select id from acc_{account_id}.urls 
	where url  ILIKE  '%enterprise_customer.com/jobs%'
)::text[]

ʻO ka manaʻo mua, no ka mea paha ILIKE ma kēia mau URL lōʻihi (ʻoi aku ma mua o 1,4 miliona ʻokoʻa Hiki ke pilikia nā URL i hōʻiliʻili ʻia no kēia moʻokāki).

Akā ʻaʻole, ʻaʻole ia ka manaʻo!

SELECT id FROM urls WHERE url ILIKE '%enterprise_customer.com/jobs%';
  id
--------
 ...
(198661 rows)

Time: 5231.765 ms

He 5 kekona wale no ke noi huli ana. ʻAʻole pilikia ka ʻimi ʻana i kahi ʻano ma kahi miliona URL kūʻokoʻa.

ʻO ka mea hoʻopiʻi hou ma ka papa inoa he nui JOIN. Ua hoʻolohi paha ko lākou hoʻohana nui ʻana? ʻO ka maʻamau JOIN'O nā moho maopopo loa no nā pilikia hana, akā, 'a'ole au i mana'o he ma'amau kā mākou hihia.

analytics_db=# SELECT
    count(*)
FROM
    acc_{account_id}.urls as recordings_urls,
    acc_{account_id}.recording_data_0 as recording_data,
    acc_{account_id}.sessions_0 as sessions
WHERE
    recording_data.usp_id = sessions.usp_id
    AND sessions.referrer_id = recordings_urls.id
    AND r_time > to_timestamp(1542585600)
    AND r_time < to_timestamp(1545177599)
    AND recording_data.duration >=5
    AND recording_data.num_of_pages > 0 ;
 count
-------
  8086
(1 row)

Time: 147.851 ms

A ʻaʻole kēia ʻo kā mākou hihia. JOIN'o ka wikiwiki loa.

Hoʻemi i ka pōʻai o nā mea kānalua

Ua mākaukau wau e hoʻomaka e hoʻololi i ka nīnau e hoʻokō i nā hoʻomaikaʻi hana. Ua hoʻomohala wau me kaʻu hui i 2 manaʻo nui:

  • E hoʻohana i ka EXISTS no ka huli ʻana i ka URL: Makemake mākou e nānā hou inā loaʻa kekahi pilikia me ka subquery no nā URL. ʻO kahi ala e hoʻokō ai i kēia ʻo ka hoʻohana wale ʻana EXISTS. EXISTS e hiki hoʻomaikaʻi nui i ka hana ʻoiai e hoʻopau koke ia i ka wā e loaʻa ai ke kaula hoʻokahi i kūpono i ke kūlana.

SELECT
	count(*) 
FROM 
    acc_{account_id}.urls as recordings_urls,
    acc_{account_id}.recording_data as recording_data,
    acc_{account_id}.sessions as sessions
WHERE
    recording_data.usp_id = sessions.usp_id
    AND  (  1 = 1  )
    AND sessions.referrer_id = recordings_urls.id
    AND  (exists(select id from acc_{account_id}.urls where url  ILIKE '%enterprise_customer.com/jobs%'))
    AND r_time > to_timestamp(1547585600)
    AND r_time < to_timestamp(1549177599)
    AND recording_data.duration >=5
    AND recording_data.num_of_pages > 0 ;
 count
 32519
(1 row)
Time: 1636.637 ms

ʻAe, ʻae. Subquery i ka wā i ʻōwili ʻia ai EXISTS, hana wikiwiki nā mea a pau. ʻO ka nīnau kūpono aʻe ke kumu o ka noi me JOIN-ami a me ka subquery pono'ī wikiwiki i kēlā me kēia, akā lohi loa?

  • Ke neʻe nei i ka subquery i ka CTE : Inā wikiwiki ka nīnau iā ia iho, hiki iā mākou ke helu mua i ka hopena wikiwiki a laila hāʻawi i ka nīnau nui.

WITH matching_urls AS (
    select id::text from acc_{account_id}.urls where url  ILIKE  '%enterprise_customer.com/jobs%'
)

SELECT 
    count(*) FROM acc_{account_id}.urls as recordings_urls, 
    acc_{account_id}.recording_data as recording_data, 
    acc_{account_id}.sessions as sessions,
    matching_urls
WHERE 
    recording_data.usp_id = sessions.usp_id 
    AND  (  1 = 1  )  
    AND sessions.referrer_id = recordings_urls.id
    AND (urls && array(SELECT id from matching_urls)::text[])
    AND r_time > to_timestamp(1542585600) 
    AND r_time < to_timestamp(1545107599)
    AND recording_data.duration >=5 
    AND recording_data.num_of_pages > 0;

Akā, ua lohi loa.

Ke ʻimi nei i ka mea hewa

I kēia mau manawa a pau, hoʻokahi mea liʻiliʻi i ʻānaʻi i mua o koʻu mau maka, aʻu i kāhili mau ai. Akā no ka mea ʻaʻohe mea ʻē aʻe i koe, ua hoʻoholo wau e nānā pū iā ia. Ke kamaʻilio nei au && mea hoʻohana. Aloha EXISTS hoʻomaikaʻi wale i ka hana && ʻo ia wale nō ke kumu maʻamau i koe ma nā ʻano āpau o ka nīnau lohi.

Ke nana nei palapala, ʻike mākou i kēlā && hoʻohana ʻia inā pono ʻoe e ʻimi i nā mea maʻamau ma waena o ʻelua arrays.

Ma ka noi kumu, penei:

AND  (  urls &&  array(select id from acc_{account_id}.urls where url  ILIKE  '%enterprise_customer.com/jobs%')::text[]   )

ʻO ia ka mea e hana mākou i kahi hulina kumu ma kā mākou mau URL, a laila e ʻimi i ke kikowaena me nā URL āpau me nā pou maʻamau. He mea huikau kēia no ka mea, ʻaʻole pili nā "urls" i ka papaʻaina i loaʻa nā URL āpau, akā i ke kolamu "urls" i ka papaʻaina. recording_data.

Me ka ulu ʻana o nā kānalua e pili ana &&, Ua ho'āʻo wau e ʻimi i ka hōʻoia no lākou ma ka hoʻolālā nīnau i hana ʻia EXPLAIN ANALYZE (Ua loaʻa iaʻu kahi hoʻolālā i mālama ʻia, akā ʻoi aku ka ʻoluʻolu o ka hoʻokolohua ma SQL ma mua o ka hoʻāʻo ʻana e hoʻomaopopo i ka opacity o nā mea hoʻolālā nīnau).

Filter: ((urls && ($0)::text[]) AND (r_time > '2018-12-17 12:17:23+00'::timestamp with time zone) AND (r_time < '2018-12-18 23:59:59+00'::timestamp with time zone) AND (duration >= '5'::double precision) AND (num_of_pages > 0))
                           Rows Removed by Filter: 52710

He mau laina kānana wale nō mai &&. ʻO ia hoʻi, ʻaʻole i uku wale kēia hana, akā ua hana ʻia i nā manawa he nui.

Ua hoʻāʻo au i kēia ma ka hoʻokaʻawale ʻana i ke kūlana

SELECT 1
FROM 
    acc_{account_id}.urls as recordings_urls, 
    acc_{account_id}.recording_data_30 as recording_data_30, 
    acc_{account_id}.sessions_30 as sessions_30 
WHERE 
	urls &&  array(select id from acc_{account_id}.urls where url  ILIKE  '%enterprise_customer.com/jobs%')::text[]

Ua lohi kēia nīnau. No ka mea JOIN-s wikiwiki a wikiwiki nā subqueries, ʻo ka mea wale nō i koe && mea hoʻohana.

He hana kī wale nō kēia. Pono mākou e ʻimi i ka papaʻaina o lalo o nā URL no ka ʻimi ʻana i kahi kumu, a pono mākou e ʻimi mau i nā kikowaena. ʻAʻole hiki iā mākou ke ʻimi pololei ma nā moʻolelo URL, no ka mea, he mau ID wale nō kēia urls.

Ma ke ala i kahi hoʻonā

&& lohi no ka mea nui nā pūʻulu ʻelua. E wikiwiki ana ka hana inā pani au urls maluna o { "http://google.com/", "http://wingify.com/" }.

Ua hoʻomaka wau e ʻimi i kahi ala e hana ai i ka intersection ma Postgres me ka hoʻohana ʻole &&, aka, me ka holomua ole.

I ka hopena, ua hoʻoholo mākou e hoʻoponopono i ka pilikia ma kahi kaʻawale: hāʻawi mai iaʻu i nā mea āpau urls nā laina e pili ana ka URL i ke kumu. Me ka ʻole o nā kūlana ʻē aʻe e - 

SELECT urls.url
FROM 
	acc_{account_id}.urls as urls,
	(SELECT unnest(recording_data.urls) AS id) AS unrolled_urls
WHERE
	urls.id = unrolled_urls.id AND
	urls.url  ILIKE  '%jobs%'

Ma kahi o JOIN syntax Ua hoʻohana wale au i kahi subquery a hoʻonui recording_data.urls array i hiki iā ʻoe ke hoʻopili pololei i ke kūlana ma WHERE.

ʻO ka mea nui ma ʻaneʻi ʻo ia && hoʻohana ʻia e nānā inā loaʻa i kahi helu i hāʻawi ʻia kahi URL pili. Inā ʻoe e wili iki, hiki iā ʻoe ke ʻike i ka neʻe ʻana o kēia hana ma nā mea o kahi ʻano (a i ʻole nā ​​lālani o ka papaʻaina) a kū i ka wā e hālāwai ai kahi kūlana (match). ʻAʻole hoʻomanaʻo iā ʻoe i kekahi mea? ʻAe, EXISTS.

Mai ka manawa mai recording_data.urls hiki ke kuhikuhi ʻia ma waho o ka pōʻaiapili subquery, ke hiki mai kēia hiki iā mākou ke hāʻule hou i kā mākou hoa kahiko EXISTS a hoʻopili i ka subquery me ia.

I ka hui pū ʻana i nā mea a pau, loaʻa iā mākou ka nīnau i hoʻopaʻa ʻia:

SELECT 
    count(*) 
FROM 
    acc_{account_id}.urls as recordings_urls, 
    acc_{account_id}.recording_data as recording_data, 
    acc_{account_id}.sessions as sessions 
WHERE 
    recording_data.usp_id = sessions.usp_id 
    AND  (  1 = 1  )  
    AND sessions.referrer_id = recordings_urls.id 
    AND r_time > to_timestamp(1542585600) 
    AND r_time < to_timestamp(1545177599) 
    AND recording_data.duration >=5 
    AND recording_data.num_of_pages > 0
    AND EXISTS(
        SELECT urls.url
        FROM 
            acc_{account_id}.urls as urls,
            (SELECT unnest(urls) AS rec_url_id FROM acc_{account_id}.recording_data) 
            AS unrolled_urls
        WHERE
            urls.id = unrolled_urls.rec_url_id AND
            urls.url  ILIKE  '%enterprise_customer.com/jobs%'
    );

A ʻo ka manawa alakaʻi hope loa Time: 1898.717 ms ʻO ka manawa e hoʻolauleʻa ai?!?

ʻAʻole wikiwiki loa! Pono mua ʻoe e nānā i ka pololei. Ua kānalua loa au EXISTS optimization e like me ka hoʻololi ʻana i ka loiloi e hoʻopau ma mua. Pono mākou e hōʻoia ʻaʻole mākou i hoʻohui i kahi hewa ʻike ʻole i ka noi.

He ho'āʻo maʻalahi ke holo count(*) ma nā nīnau lohi a wikiwiki no ka nui o nā pūʻulu ʻikepili like ʻole. A laila, no kahi ʻāpana liʻiliʻi o ka ʻikepili, ua hōʻoia lima wau ua pololei nā hopena āpau.

Hāʻawi nā hoʻokolohua a pau i nā hopena maikaʻi mau. Hoʻoponopono mākou i nā mea a pau!

Nā haʻawina i aʻo ʻia

Nui nā haʻawina e aʻo ʻia mai kēia moʻolelo.

  1. ʻAʻole hōʻike nā hoʻolālā nīnau i ka moʻolelo holoʻokoʻa, akā hiki iā lākou ke hāʻawi i nā hōʻailona
  2. ʻAʻole mau ka poʻe hoʻopiʻi nui i nā hewa maoli
  3. Hiki ke wāwahi ʻia nā nīnau lohi e hoʻokaʻawale i nā bottlenecks
  4. ʻAʻole hoʻohaʻahaʻa nā loiloi āpau
  5. E hoʻohana EXIST, inā hiki, hiki ke alakaʻi i ka piʻi nui o ka huahana

hopena

Hele mākou mai kahi manawa nīnau o ~24 mau minuke i 2 kekona - he piʻi nui ka hana! ʻOiai ua puka nui kēia ʻatikala, ua hana ʻia nā hoʻokolohua āpau a mākou i hana ai i ka lā hoʻokahi, a ua manaʻo ʻia ua lawe lākou ma waena o 1,5 a me 2 mau hola no ka loiloi a me ka hoʻāʻo.

He ʻōlelo kupanaha ʻo SQL inā ʻaʻole ʻoe makaʻu iā ia, akā e hoʻāʻo e aʻo a hoʻohana. Ma ka ʻike maikaʻi ʻana i ke ʻano o ka hoʻokō ʻana i nā nīnau SQL, pehea ka hana ʻana o ka waihona i nā hoʻolālā nīnau, pehea ka hana ʻana o nā kuhikuhi, a me ka nui o ka ʻikepili āu e kamaʻilio nei, hiki iā ʻoe ke kūleʻa i ka hoʻonui ʻana i nā nīnau. He mea nui nō naʻe, e hoʻomau i ka hoʻāʻo ʻana i nā ala like ʻole a wāwahi mālie i ka pilikia, e ʻimi ana i nā bottlenecks.

ʻO ka ʻāpana maikaʻi loa e pili ana i ka loaʻa ʻana o nā hopena e like me kēia, ʻo ia ka hoʻomaikaʻi ʻana i ka wikiwiki a ʻike ʻia - kahi i hoʻouka ʻia ai kahi hōʻike i hoʻouka ʻole ʻia ma mua i kēia manawa.

Mahalo nui iā ko'u mau hoa ma ke kauoha a Aditya MishraʻO Aditya Gauru и Varun Malhotra no ka noonoo ana a ʻO Dinkar Pandir no ka loaʻa ʻana o kahi hewa koʻikoʻi i kā mākou noi hope ma mua o ko mākou haʻi ʻana aku!

Source: www.habr.com

Pākuʻi i ka manaʻo hoʻopuka