Kev ua haujlwm analytics hauv microservice architecture: pab thiab sai Postgres FDW

Microservice architecture, zoo li txhua yam hauv ntiaj teb no, muaj nws qhov zoo thiab qhov tsis zoo. Qee cov txheej txheem ua tau yooj yim nrog nws, lwm qhov nyuaj dua. Thiab rau lub hom phiaj ntawm kev hloov pauv sai thiab kev ua kom zoo dua qub, koj yuav tsum tau ua kev txi. Ib tug ntawm lawv yog cov complexity ntawm analytics. Yog hais tias nyob rau hauv ib tug monolith tag nrho cov kev ua hauj lwm analytics yuav txo tau rau SQL queries mus rau ib tug analytical replica, ces nyob rau hauv ib tug multiservice architecture txhua qhov kev pab cuam muaj nws tus kheej database thiab nws zoo nkaus li tias ib qho lus nug tsis tuaj yeem ua tau (los yog tej zaum nws ua tau?). Rau cov neeg uas xav paub yuav ua li cas peb daws qhov teeb meem ntawm kev ua haujlwm analytics nyob rau hauv peb lub tuam txhab thiab yuav ua li cas peb kawm nyob rau hauv cov kev daws teeb meem no - txais tos.

Kev ua haujlwm analytics hauv microservice architecture: pab thiab sai Postgres FDW
Kuv lub npe yog Pavel Sivash, ntawm DomClick Kuv ua haujlwm hauv pab pawg uas yog lub luag haujlwm tswj xyuas cov ntaub ntawv tshuaj ntsuam xyuas. Raws li txoj cai, peb cov dej num tuaj yeem muab faib ua cov ntaub ntawv engineering, tab sis, qhov tseeb, cov haujlwm ntau dua. Muaj ETL / ELT tus qauv rau cov ntaub ntawv engineering, kev txhawb nqa thiab hloov kho cov cuab yeej rau kev txheeb xyuas cov ntaub ntawv thiab kev txhim kho koj tus kheej cov cuab yeej. Tshwj xeeb, rau kev tshaj tawm kev ua haujlwm, peb tau txiav txim siab "ua txuj" tias peb muaj ib qho monolith thiab muab cov kws tshuaj ntsuam ib qho ntaub ntawv uas yuav muaj tag nrho cov ntaub ntawv lawv xav tau.

Feem ntau, peb xav txog kev xaiv sib txawv. Nws yog ua tau los tsim kom tau ib tug tag nrho-fledged repository - peb txawm sim, tab sis, kom ncaj ncees, peb tsis muaj peev xwm los sib txuas ncaj qha kev hloov nyob rau hauv logic nrog lub es qeeb txheej txheem ntawm tsim ib tug repository thiab hloov mus rau nws (yog hais tias ib tug neeg ua tiav. , sau rau hauv cov lus qhia li cas). Nws muaj peev xwm hais qhia rau cov kws tshuaj ntsuam: "Cov txiv neej, kawm python thiab mus rau kev tshuaj ntsuam replicas," tab sis qhov no yog qhov yuav tsum tau ua ntxiv rau kev nrhiav neeg ua haujlwm, thiab zoo li qhov no yuav tsum zam yog tias ua tau. Peb txiav txim siab los sim siv FDW (Foreign Data Wrapper) thev naus laus zis: qhov tseem ceeb, qhov no yog tus qauv dblink, uas yog nyob rau hauv tus qauv SQL, tab sis nrog nws tus kheej ntau yooj yim interface. Raws li nws, peb tau ua ib qho kev daws teeb meem, uas thaum kawg ntes tau, thiab peb txiav txim siab rau nws. Nws cov ntsiab lus yog lub ntsiab lus ntawm ib tsab xov xwm cais, thiab tej zaum ntau tshaj ib qho, vim kuv xav tham txog ntau yam: los ntawm synchronizing database schemas kom nkag mus tswj thiab depersonalization ntawm tus kheej cov ntaub ntawv. Nws tseem yog ib qho tsim nyog yuav tau txais kev tshwj tseg tias qhov kev daws teeb meem no tsis yog hloov pauv rau cov ntaub ntawv txheeb xyuas tiag tiag thiab cov chaw khaws cia; nws tsuas yog daws qhov teeb meem tshwj xeeb xwb.

Nyob rau theem sab saum toj nws zoo li no:

Kev ua haujlwm analytics hauv microservice architecture: pab thiab sai Postgres FDW
Muaj PostgreSQL database qhov twg cov neeg siv tuaj yeem khaws lawv cov ntaub ntawv ua haujlwm, thiab qhov tseem ceeb tshaj plaws, kev tshuaj ntsuam xyuas ntawm txhua qhov kev pabcuam txuas nrog rau cov ntaub ntawv no ntawm FDW. Qhov no ua rau nws muaj peev xwm sau cov lus nug rau ntau lub databases, thiab nws tsis muaj teeb meem dab tsi nws yog: PostgreSQL, MySQL, MongoDB lossis lwm yam (cov ntaub ntawv, API, yog dheev tsis muaj wrapper tsim nyog, koj tuaj yeem sau koj tus kheej). Zoo, txhua yam zoo li zoo! Puas yog peb tawg?

Yog tias txhua yam xaus sai thiab yooj yim, ces, tej zaum, yuav tsis muaj ib tsab xov xwm.

Nws yog ib qho tseem ceeb kom paub meej txog yuav ua li cas Postgres cov txheej txheem thov rau cov chaw nyob deb. Qhov no zoo li muaj laj thawj, tab sis feem ntau tib neeg tsis xyuam xim rau nws: Postgres faib qhov kev thov mus rau hauv qhov chaw uas tau ua tiav ntawm nws tus kheej ntawm cov chaw taws teeb tswj, khaws cov ntaub ntawv no, thiab ua tiav qhov kev suav kawg ntawm nws tus kheej, yog li qhov ceev ntawm kev nug ua tiav yuav nyob ntawm seb. nws sau li cas. Nws tseem yuav tsum tau sau tseg: thaum cov ntaub ntawv tuaj txog ntawm lub chaw taws teeb chaw taws teeb, nws tsis muaj kev ntsuas ntxiv lawm, tsis muaj dab tsi uas yuav pab tau tus neeg teem sijhawm, yog li ntawd, tsuas yog peb tus kheej tuaj yeem pab thiab qhia nws. Thiab qhov no yog qhov kuv xav tham txog ntau yam ntxiv.

Ib qho lus nug yooj yim thiab npaj nrog nws

Txhawm rau qhia tias Postgres nug txog 6 lab kab lus ntawm lub chaw ua haujlwm nyob deb li cas, cia peb saib cov phiaj xwm yooj yim.

explain analyze verbose  
SELECT count(1)
FROM fdw_schema.table;

Aggregate  (cost=418383.23..418383.24 rows=1 width=8) (actual time=3857.198..3857.198 rows=1 loops=1)
  Output: count(1)
  ->  Foreign Scan on fdw_schema."table"  (cost=100.00..402376.14 rows=6402838 width=0) (actual time=4.874..3256.511 rows=6406868 loops=1)
        Output: "table".id, "table".is_active, "table".meta, "table".created_dt
        Remote SQL: SELECT NULL FROM fdw_schema.table
Planning time: 0.986 ms
Execution time: 3857.436 ms

Siv cov lus VERBOSE tso cai rau peb pom cov lus nug uas yuav raug xa mus rau cov chaw taws teeb chaw taws teeb thiab cov txiaj ntsig uas peb yuav tau txais rau kev ua haujlwm ntxiv (RemoteSQL kab).

Cia peb mus ntxiv me ntsis thiab ntxiv ob peb lub lim rau peb qhov kev thov: ib qho rau boolean teb, ib qho tshwm sim timestamp nyob rau hauv lub sij hawm thiab ib tug los ntawm jsonb.

explain analyze verbose
SELECT count(1)
FROM fdw_schema.table 
WHERE is_active is True
AND created_dt BETWEEN CURRENT_DATE - INTERVAL '7 month' 
AND CURRENT_DATE - INTERVAL '6 month'
AND meta->>'source' = 'test';

Aggregate  (cost=577487.69..577487.70 rows=1 width=8) (actual time=27473.818..25473.819 rows=1 loops=1)
  Output: count(1)
  ->  Foreign Scan on fdw_schema."table"  (cost=100.00..577469.21 rows=7390 width=0) (actual time=31.369..25372.466 rows=1360025 loops=1)
        Output: "table".id, "table".is_active, "table".meta, "table".created_dt
        Filter: (("table".is_active IS TRUE) AND (("table".meta ->> 'source'::text) = 'test'::text) AND ("table".created_dt >= (('now'::cstring)::date - '7 mons'::interval)) AND ("table".created_dt <= ((('now'::cstring)::date)::timestamp with time zone - '6 mons'::interval)))
        Rows Removed by Filter: 5046843
        Remote SQL: SELECT created_dt, is_active, meta FROM fdw_schema.table
Planning time: 0.665 ms
Execution time: 27474.118 ms

Qhov no yog qhov chaw uas koj yuav tsum tau them sai sai rau thaum sau cov lus nug dag. Cov lim dej tsis tau hloov mus rau cov chaw taws teeb tswj, uas txhais tau hais tias kom ua tiav nws, Postgres rub tawm tag nrho 6 lab kab nyob rau hauv thiaj li yuav lim hauv zos (Lim kab) thiab ua kev sib sau ua ke. Tus yuam sij rau kev vam meej yog sau cov lus nug kom cov ntxaij lim dej xa mus rau lub tshuab tej thaj chaw deb, thiab peb tau txais thiab sib sau ua ke tsuas yog cov kab tsim nyog.

Nov yog qee qhov booleanshit

Nrog boolean teb txhua yam yog yooj yim. Hauv thawj qhov kev thov, qhov teeb meem yog vim tus neeg teb xov tooj is. Yog koj hloov nws nrog =, tom qab ntawd peb tau txais cov txiaj ntsig hauv qab no:

explain analyze verbose
SELECT count(1)
FROM fdw_schema.table
WHERE is_active = True
AND created_dt BETWEEN CURRENT_DATE - INTERVAL '7 month' 
AND CURRENT_DATE - INTERVAL '6 month'
AND meta->>'source' = 'test';

Aggregate  (cost=508010.14..508010.15 rows=1 width=8) (actual time=19064.314..19064.314 rows=1 loops=1)
  Output: count(1)
  ->  Foreign Scan on fdw_schema."table"  (cost=100.00..507988.44 rows=8679 width=0) (actual time=33.035..18951.278 rows=1360025 loops=1)
        Output: "table".id, "table".is_active, "table".meta, "table".created_dt
        Filter: ((("table".meta ->> 'source'::text) = 'test'::text) AND ("table".created_dt >= (('now'::cstring)::date - '7 mons'::interval)) AND ("table".created_dt <= ((('now'::cstring)::date)::timestamp with time zone - '6 mons'::interval)))
        Rows Removed by Filter: 3567989
        Remote SQL: SELECT created_dt, meta FROM fdw_schema.table WHERE (is_active)
Planning time: 0.834 ms
Execution time: 19064.534 ms

Raws li koj tuaj yeem pom, lub lim tau ya mus rau cov chaw taws teeb chaw taws teeb, thiab lub sijhawm ua tiav raug txo los ntawm 27 mus rau 19 vib nas this.

Nws yog tsim nyog sau cia tias tus neeg teb xov tooj is txawv ntawm tus neeg teb xov tooj = vim nws tuaj yeem ua haujlwm nrog tus nqi Null. Nws txhais tau tias tsis muaj tseeb yuav tawm qhov tseem ceeb False thiab Null nyob rau hauv lub lim, whereas != Muaj tseeb yuav tso qhov tsis tseeb xwb. Yog li ntawd, thaum hloov tus neeg teb xov tooj yog tsis ob qho xwm txheej nrog OR tus neeg teb xov tooj yuav tsum raug xa mus rau lub lim, piv txwv li, THAUM (col != True) OR (col is null).

Peb tau daws nrog boolean, cia peb mus. Txog tam sim no, cia peb rov qab Boolean lim rau nws daim ntawv qub txhawm rau txhawm rau txiav txim siab txog cov txiaj ntsig ntawm lwm yam kev hloov pauv.

timestamptz? hz

Feem ntau, koj feem ntau yuav tsum sim ua kom raug sau ib daim ntawv thov uas cuam tshuam nrog cov chaw taws teeb tswj, thiab tsuas yog tom qab ntawd nrhiav kev piav qhia vim li cas qhov no tshwm sim. Cov ntaub ntawv me me txog qhov no tuaj yeem pom hauv Internet. Yog li, hauv kev sim peb pom tias lub lim tiam tas mus rau cov neeg rau zaub mov tej thaj chaw deb nrog lub suab nrov, tab sis thaum peb xav teem hnub dynamically, piv txwv li, tam sim no() lossis CURRENT_DATE, qhov no tsis tshwm sim. Hauv peb qhov piv txwv, peb ntxiv cov lim kom cov kab tsim_at muaj cov ntaub ntawv raws nraim 1 lub hlis dhau los (BETWEEN CURRENT_DATE - INTERVAL '7 hli' THIAB CURRENT_DATE - INTERVAL '6 hli'). Peb ua li cas rau qhov no?

explain analyze verbose
SELECT count(1)
FROM fdw_schema.table 
WHERE is_active is True
AND created_dt >= (SELECT CURRENT_DATE::timestamptz - INTERVAL '7 month') 
AND created_dt <(SELECT CURRENT_DATE::timestamptz - INTERVAL '6 month')
AND meta->>'source' = 'test';

Aggregate  (cost=306875.17..306875.18 rows=1 width=8) (actual time=4789.114..4789.115 rows=1 loops=1)
  Output: count(1)
  InitPlan 1 (returns $0)
    ->  Result  (cost=0.00..0.02 rows=1 width=8) (actual time=0.007..0.008 rows=1 loops=1)
          Output: ((('now'::cstring)::date)::timestamp with time zone - '7 mons'::interval)
  InitPlan 2 (returns $1)
    ->  Result  (cost=0.00..0.02 rows=1 width=8) (actual time=0.002..0.002 rows=1 loops=1)
          Output: ((('now'::cstring)::date)::timestamp with time zone - '6 mons'::interval)
  ->  Foreign Scan on fdw_schema."table"  (cost=100.02..306874.86 rows=105 width=0) (actual time=23.475..4681.419 rows=1360025 loops=1)
        Output: "table".id, "table".is_active, "table".meta, "table".created_dt
        Filter: (("table".is_active IS TRUE) AND (("table".meta ->> 'source'::text) = 'test'::text))
        Rows Removed by Filter: 76934
        Remote SQL: SELECT is_active, meta FROM fdw_schema.table WHERE ((created_dt >= $1::timestamp with time zone)) AND ((created_dt < $2::timestamp with time zone))
Planning time: 0.703 ms
Execution time: 4789.379 ms

Peb hais rau tus neeg npaj kom suav hnub nyob rau hauv cov lus nug ua ntej thiab dhau qhov kev npaj ua tiav rau lub lim. Thiab cov lus qhia no tau muab peb cov txiaj ntsig zoo, qhov kev thov tau dhau los yuav luag 6 zaug sai dua!

Ib zaug ntxiv, nws yog ib qho tseem ceeb uas yuav tsum tau ceev faj ntawm no: cov ntaub ntawv hom hauv cov lus nug yuav tsum yog tib yam li ntawm daim teb uas peb tau lim, txwv tsis pub tus neeg npaj yuav txiav txim siab tias txij li cov hom sib txawv, nws yog ib qho tsim nyog yuav tsum xub tau txais tag nrho. cov ntaub ntawv thiab lim nws hauv zos.

Cia peb rov qab lub hnub lim rau nws tus nqi qub.

Freddy vs. Jsonb

Feem ntau, Boolean teb thiab hnub tim twb tau ua kom peb cov lus nug kom txaus, tab sis muaj ib hom ntaub ntawv ntxiv ntxiv. Kev sib ntaus sib tua nrog kev lim dej los ntawm nws, kom ncaj ncees, tseem tsis dhau, txawm hais tias muaj kev vam meej ntawm no thiab. Yog li, qhov no yog li cas peb tswj kom dhau lub lim los ntawm jsonb teb rau cov chaw taws teeb server.

explain analyze verbose
SELECT count(1)
FROM fdw_schema.table 
WHERE is_active is True
AND created_dt BETWEEN CURRENT_DATE - INTERVAL '7 month' 
AND CURRENT_DATE - INTERVAL '6 month'
AND meta @> '{"source":"test"}'::jsonb;

Aggregate  (cost=245463.60..245463.61 rows=1 width=8) (actual time=6727.589..6727.590 rows=1 loops=1)
  Output: count(1)
  ->  Foreign Scan on fdw_schema."table"  (cost=1100.00..245459.90 rows=1478 width=0) (actual time=16.213..6634.794 rows=1360025 loops=1)
        Output: "table".id, "table".is_active, "table".meta, "table".created_dt
        Filter: (("table".is_active IS TRUE) AND ("table".created_dt >= (('now'::cstring)::date - '7 mons'::interval)) AND ("table".created_dt <= ((('now'::cstring)::date)::timestamp with time zone - '6 mons'::interval)))
        Rows Removed by Filter: 619961
        Remote SQL: SELECT created_dt, is_active FROM fdw_schema.table WHERE ((meta @> '{"source": "test"}'::jsonb))
Planning time: 0.747 ms
Execution time: 6727.815 ms

Hloov chaw lim dej, koj yuav tsum siv lub xub ntiag ntawm ib tus neeg teb xov tooj jsonb nyob rau hauv ib qho txawv. 7 vib nas this es tsis txhob ntawm tus thawj 29. Txog tam sim no qhov no tsuas yog txoj kev vam meej rau kev xa cov lim dej ntawm jsonb mus rau cov neeg rau zaub mov tej thaj chaw deb, tab sis ntawm no nws yog ib qho tseem ceeb uas yuav tsum tau coj mus rau hauv tus account ib qho kev txwv: peb tab tom siv version 9.6 ntawm cov ntaub ntawv, tab sis thaum kawg ntawm lub Plaub Hlis peb npaj ua kom tiav qhov kev xeem kawg thiab txav mus rau version 12. Thaum peb hloov kho, peb yuav sau txog qhov nws cuam tshuam li cas, vim tias muaj ntau qhov kev hloov pauv uas muaj ntau qhov kev cia siab: json_path, CTE tus cwj pwm tshiab, thawb (tsim txij li version 10). Kuv yeej xav sim sai sai no.

Ua kom tiav nws

Peb tau sim seb txhua qhov kev hloov pauv cuam tshuam li cas thov ceev ib tus zuj zus. Tam sim no cia saib yuav ua li cas tshwm sim thaum tag nrho peb cov lim tau sau raug.

explain analyze verbose
SELECT count(1)
FROM fdw_schema.table 
WHERE is_active = True
AND created_dt >= (SELECT CURRENT_DATE::timestamptz - INTERVAL '7 month') 
AND created_dt <(SELECT CURRENT_DATE::timestamptz - INTERVAL '6 month')
AND meta @> '{"source":"test"}'::jsonb;

Aggregate  (cost=322041.51..322041.52 rows=1 width=8) (actual time=2278.867..2278.867 rows=1 loops=1)
  Output: count(1)
  InitPlan 1 (returns $0)
    ->  Result  (cost=0.00..0.02 rows=1 width=8) (actual time=0.010..0.010 rows=1 loops=1)
          Output: ((('now'::cstring)::date)::timestamp with time zone - '7 mons'::interval)
  InitPlan 2 (returns $1)
    ->  Result  (cost=0.00..0.02 rows=1 width=8) (actual time=0.003..0.003 rows=1 loops=1)
          Output: ((('now'::cstring)::date)::timestamp with time zone - '6 mons'::interval)
  ->  Foreign Scan on fdw_schema."table"  (cost=100.02..322041.41 rows=25 width=0) (actual time=8.597..2153.809 rows=1360025 loops=1)
        Output: "table".id, "table".is_active, "table".meta, "table".created_dt
        Remote SQL: SELECT NULL FROM fdw_schema.table WHERE (is_active) AND ((created_dt >= $1::timestamp with time zone)) AND ((created_dt < $2::timestamp with time zone)) AND ((meta @> '{"source": "test"}'::jsonb))
Planning time: 0.820 ms
Execution time: 2279.087 ms

Yog lawm, qhov kev thov zoo li nyuaj dua, qhov no yog tus nqi yuam kev, tab sis qhov kev ua tiav nrawm yog 2 vib nas this, uas yog ntau dua 10 npaug sai dua! Thiab peb tab tom tham txog cov lus nug yooj yim tiv thaiv cov ntaub ntawv me me. Ntawm qhov kev thov tiag tiag, peb tau txais kev nce mus txog ntau pua zaus.

Los xaus: yog tias koj siv PostgreSQL nrog FDW, nco ntsoov xyuas tias tag nrho cov ntxaij lim dej raug xa mus rau cov chaw taws teeb tswj, thiab koj yuav zoo siab ... Tsawg kawg kom txog thaum koj tuaj yeem koom nrog cov rooj los ntawm cov servers sib txawv. Tab sis qhov ntawd yog ib zaj dab neeg rau lwm tsab xov xwm.

Ua tsaug rau koj mloog! Kuv xav hnov ​​​​cov lus nug, cov lus pom, thiab cov dab neeg txog koj cov kev paub hauv cov lus.

Tau qhov twg los: www.hab.com

Ntxiv ib saib