Kushanda analytics mune microservice architecture: rubatsiro uye kukurumidza Postgres FDW

Microservice architecture, sezvese zviri munyika ino, ine zvayakanakira nezvayakaipira. Mamwe maitiro anova nyore nawo, mamwe akaoma. Uye nekuda kwekumhanya kwekuchinja uye zvirinani scalability, unofanirwa kuzvipira. Chimwe chazvo ndechekuoma kwekuongorora. Kana mune monolith ese anoshanda analytics anogona kuderedzwa kusvika kuSQL queries kune analytical replica, zvino mune multiservice architecture sevhisi yega yega ine dhatabhesi rayo uye zvinoita sekunge mubvunzo mumwe hauna kukwana (kana pamwe uchadaro?). Kune avo vanofarira kuti takagadzirisa sei dambudziko rekushanda analytics mukambani yedu uye kuti takadzidza sei kurarama nemhinduro iyi - kugamuchirwa.

Kushanda analytics mune microservice architecture: rubatsiro uye kukurumidza Postgres FDW
Zita rangu ndiPavel Sivash, kuDomClick ndinoshanda muchikwata chine basa rekuchengetedza iyo analytical data warehouse. Sezvineiwo, zviitiko zvedu zvinogona kuverengerwa ku data engineering, asi, kutaura zvazviri, huwandu hwemabasa hwakakura zvakanyanya. Kune yakajairwa data engineering ETL / ELT, tsigiro uye kuchinjika kwedhata yekuongorora maturusi uye kuvandudza kwavo maturusi. Kunyanya, yekuzivisa kwekushanda, takasarudza "kunyepedzera" kuti isu tine monolith uye kupa vanoongorora imwe dhatabhesi iyo ichange iine data rese ravanoda.

Kazhinji, takafunga nzira dzakasiyana. Zvaigoneka kuvaka yakazara-yakazara repository - isu takatomboedza, asi, kutaura chokwadi, isu hatina kukwanisa kuita shamwari nekuchinja kwakaringana mune pfungwa neinononoka maitiro ekuvaka repository uye kuita shanduko pairi ( kana mumwe munhu akabudirira, nyora mumashoko kuti sei). Iwe unogona kuti kune vanoongorora: "Varume, dzidzai python uye endai kumitsara yekuongorora," asi ichi chimwe chinodiwa chekuwedzera, uye zvaiita sekuti izvi zvinofanirwa kudziviswa kana zvichibvira. Isu takasarudza kuyedza kushandisa iyo FDW (Foreign Data Wrapper) tekinoroji: kutaura zvazviri, iyi ndiyo yakajairwa dblink, iri muSQL standard, asi ine yakanyanya nyore interface. Pahwaro hwacho, takaita chisarudzo, icho chakazodzika midzi, takagadzika pamusoro pacho. Ruzivo rwayo ndiwo musoro wechinyorwa chakasiyana, uye pamwe chinopfuura chimwe, nekuti ini ndoda kutaura nezvakawanda: kubva kudhatabhesi schema kuwiriranisa kuwana kutonga uye depersonalization yedata rako. Izvo zvinofanirwa kucherechedzwa kuti mhinduro iyi haisi yekutsiva chaiyo analytical dhatabhesi uye repositori, inogadzirisa chete dambudziko chairo.

Pamusoro pepamusoro zvinoita seizvi:

Kushanda analytics mune microservice architecture: rubatsiro uye kukurumidza Postgres FDW
Kune dhatabhesi rePostgreSQL uko vashandisi vanogona kuchengeta data ravo rebasa, uye zvakanyanya kukosha, analytical replicas yemasevhisi ese akabatana kune iyi dhatabhesi kuburikidza neFDW. Izvi zvinoita kuti zvikwanise kunyora mubvunzo kune akati wandei dhatabhesi, uye hazvina basa kuti chii: PostgreSQL, MySQL, MongoDB kana chimwe chinhu (faira, API, kana kamwe kamwe pasina yakakodzera wrapper, unogona kunyora yako). Zvakanaka, zvese zvinoita kunge zvakanaka! Kurambana?

Kana zvinhu zvose zvakapera nokukurumidza uye zviri nyore, saka, zvichida, chinyorwa chingadai chisipo.

Izvo zvakakosha kuve pachena nezve mapostgres anobata zvikumbiro kumaseva ari kure. Izvi zvinoita sezvine musoro, asi kazhinji vanhu havazviteerere: postgres inokamura mubvunzo muzvikamu zvinoitwa zvakazvimiririra pamaseva ari kure, inounganidza iyi data, uye inoita maverengero ekupedzisira pachayo, saka kumhanyisa kwekuita kwemubvunzo kuchaenderana zvakanyanya nekuti sei. kwakanyorwa kuchinzi. Izvo zvinofanirwa kucherechedzwa: kana iyo data ichibva kune iri kure server, ivo havasisina indexes, hapana chinozobatsira mugadziri, saka, isu chete isu pachedu tinokwanisa kubatsira nekupa zano. Uye ndizvo zvandiri kuda kutaura nezvazvo zvakadzama.

Chikumbiro chakareruka uye chirongwa nacho

Kuratidza kuti Postgres inobvunza sei 6 miriyoni mutsara tafura pane iri kure server, ngatitarise chirongwa chakareruka.

explain analyze verbose  
SELECT count(1)
FROM fdw_schema.table;

Aggregate  (cost=418383.23..418383.24 rows=1 width=8) (actual time=3857.198..3857.198 rows=1 loops=1)
  Output: count(1)
  ->  Foreign Scan on fdw_schema."table"  (cost=100.00..402376.14 rows=6402838 width=0) (actual time=4.874..3256.511 rows=6406868 loops=1)
        Output: "table".id, "table".is_active, "table".meta, "table".created_dt
        Remote SQL: SELECT NULL FROM fdw_schema.table
Planning time: 0.986 ms
Execution time: 3857.436 ms

Kushandisa chirevo cheVERBOSE chinokubvumira kuti uone mubvunzo uchatumirwa kune sevha iri kure uye migumisiro yatichagamuchira kuti iwedzere kugadziriswa (RemoteSQL string).

Ngatiendei mberi zvishoma uye tiwedzere mafirita akati wandei kumubvunzo wedu: one by boolean munda, mumwe nekupinda timestamp panguva imwe neimwe uye imwe neimwe jsonb.

explain analyze verbose
SELECT count(1)
FROM fdw_schema.table 
WHERE is_active is True
AND created_dt BETWEEN CURRENT_DATE - INTERVAL '7 month' 
AND CURRENT_DATE - INTERVAL '6 month'
AND meta->>'source' = 'test';

Aggregate  (cost=577487.69..577487.70 rows=1 width=8) (actual time=27473.818..25473.819 rows=1 loops=1)
  Output: count(1)
  ->  Foreign Scan on fdw_schema."table"  (cost=100.00..577469.21 rows=7390 width=0) (actual time=31.369..25372.466 rows=1360025 loops=1)
        Output: "table".id, "table".is_active, "table".meta, "table".created_dt
        Filter: (("table".is_active IS TRUE) AND (("table".meta ->> 'source'::text) = 'test'::text) AND ("table".created_dt >= (('now'::cstring)::date - '7 mons'::interval)) AND ("table".created_dt <= ((('now'::cstring)::date)::timestamp with time zone - '6 mons'::interval)))
        Rows Removed by Filter: 5046843
        Remote SQL: SELECT created_dt, is_active, meta FROM fdw_schema.table
Planning time: 0.665 ms
Execution time: 27474.118 ms

Apa ndipo pane nguva, iyo yaunofanirwa kuterera kana uchinyora mibvunzo. Iwo mafirita haana kuendeswa kune iri kure server, zvinoreva kuti kuti iite, postgres inodhonza ese 6 miriyoni mitsara kuitira kusefa munharaunda (iyo Firita mutsetse) uye kuita aggregation gare gare. Kiyi yekubudirira ndeyekunyora mubvunzo kuitira kuti mafirita aendeswe kumuchina uri kure, uye isu tinogashira nekuunganidza chete mitsara inodiwa.

Ndiyo imwe booleanshit

Neminda ye boolean, zvese zviri nyore. Mumubvunzo wekutanga, dambudziko rakakonzerwa nemushandisi is. Kana tikaitsiva nayo =, tobva tawana mhedzisiro inotevera:

explain analyze verbose
SELECT count(1)
FROM fdw_schema.table
WHERE is_active = True
AND created_dt BETWEEN CURRENT_DATE - INTERVAL '7 month' 
AND CURRENT_DATE - INTERVAL '6 month'
AND meta->>'source' = 'test';

Aggregate  (cost=508010.14..508010.15 rows=1 width=8) (actual time=19064.314..19064.314 rows=1 loops=1)
  Output: count(1)
  ->  Foreign Scan on fdw_schema."table"  (cost=100.00..507988.44 rows=8679 width=0) (actual time=33.035..18951.278 rows=1360025 loops=1)
        Output: "table".id, "table".is_active, "table".meta, "table".created_dt
        Filter: ((("table".meta ->> 'source'::text) = 'test'::text) AND ("table".created_dt >= (('now'::cstring)::date - '7 mons'::interval)) AND ("table".created_dt <= ((('now'::cstring)::date)::timestamp with time zone - '6 mons'::interval)))
        Rows Removed by Filter: 3567989
        Remote SQL: SELECT created_dt, meta FROM fdw_schema.table WHERE (is_active)
Planning time: 0.834 ms
Execution time: 19064.534 ms

Sezvauri kuona, iyo sefa yakabhururuka kuenda kune iri kure server, uye nguva yekuuraya yakaderedzwa kubva pa27 kusvika 19 masekondi.

Zvinofanira kucherechedzwa kuti opareta is zvakasiyana nemushandisi = iyo inogona kushanda neiyo Null kukosha. Zvinoreva kuti hachisi chokwadi musefa ichasiya kukosha kweNhema uye Null, uku != Chokwadi zvichasiya tsika dzeNhema chete. Naizvozvo, pakutsiva opareta haisi iwe unofanirwa kupfuudza maviri mamiriro kune sefa ine OR opareta, semuenzaniso, KUPI (col != True) KANA (col is null).

Ne boolean akafunga, achienderera mberi. Zvichakadaro, ngatidzoserei sefa neboolean kukosha kune yayo yekutanga kuitira kuti tifunge takazvimirira mhedzisiro yedzimwe shanduko.

timestamptz? hz

Kazhinji, iwe unofanirwa kuyedza nemanyorero ekunyora mubvunzo unosanganisira maseva ari kure, uye wobva watsvaga tsananguro yekuti sei izvi zviri kuitika. Ruzivo rushoma nezveizvi runogona kuwanikwa paInternet. Saka, mukuyedza, takaona kuti firita yakatarwa inobhururuka ichienda kune iri kure server ine bang, asi kana isu tichida kuseta zuva zvine simba, semuenzaniso, ikozvino () kana CURRENT_DATE, izvi hazviitiki. Mumuenzaniso wedu, tawedzera sefa kuitira kuti the created_at column ive nedata remwedzi mumwe chete wapfuura (PAKATI CURRENT_DATE - INTERVAL 'mwedzi 1' UYE CURRENT_DATE - INTERVAL 'mwedzi mitanhatu'). Chii chatakaita munyaya iyi?

explain analyze verbose
SELECT count(1)
FROM fdw_schema.table 
WHERE is_active is True
AND created_dt >= (SELECT CURRENT_DATE::timestamptz - INTERVAL '7 month') 
AND created_dt <(SELECT CURRENT_DATE::timestamptz - INTERVAL '6 month')
AND meta->>'source' = 'test';

Aggregate  (cost=306875.17..306875.18 rows=1 width=8) (actual time=4789.114..4789.115 rows=1 loops=1)
  Output: count(1)
  InitPlan 1 (returns $0)
    ->  Result  (cost=0.00..0.02 rows=1 width=8) (actual time=0.007..0.008 rows=1 loops=1)
          Output: ((('now'::cstring)::date)::timestamp with time zone - '7 mons'::interval)
  InitPlan 2 (returns $1)
    ->  Result  (cost=0.00..0.02 rows=1 width=8) (actual time=0.002..0.002 rows=1 loops=1)
          Output: ((('now'::cstring)::date)::timestamp with time zone - '6 mons'::interval)
  ->  Foreign Scan on fdw_schema."table"  (cost=100.02..306874.86 rows=105 width=0) (actual time=23.475..4681.419 rows=1360025 loops=1)
        Output: "table".id, "table".is_active, "table".meta, "table".created_dt
        Filter: (("table".is_active IS TRUE) AND (("table".meta ->> 'source'::text) = 'test'::text))
        Rows Removed by Filter: 76934
        Remote SQL: SELECT is_active, meta FROM fdw_schema.table WHERE ((created_dt >= $1::timestamp with time zone)) AND ((created_dt < $2::timestamp with time zone))
Planning time: 0.703 ms
Execution time: 4789.379 ms

Isu takakurudzira murongi kuti averenge zuva risati rasvika mune subquery uye kupfuudza iyo yakatogadzirirwa yakagadziriswa kune sefa. Uye zano iri rakatipa mhedzisiro yakanaka, muvhunzo wakava kanenge ka6 nekukurumidza!

Zvakare, zvakakosha kungwarira pano: iyo data data mu subquery inofanirwa kunge yakafanana neyemunda watinosefa, zvikasadaro murongi achasarudza kuti sezvo mhando dzakasiyana uye zvakakosha kutanga wawana ese. data uye kusefa munzvimbo.

Ngatidzoserei sefa nemazuva kune kukosha kwayo kwepakutanga.

Freddy vs. jsonb

Kazhinji, boolean minda nemisi zvakatowedzera zvakakwana kumhanyisa mubvunzo wedu, asi pakanga paine imwe mhando yedata. Hondo yekusefa nayo, kutaura chokwadi, haisati yapera, kunyangwe paine budiriro pano zvakare. Saka, heano maitiro atakaita kupfuudza sefa jsonb munda kune server iri kure.

explain analyze verbose
SELECT count(1)
FROM fdw_schema.table 
WHERE is_active is True
AND created_dt BETWEEN CURRENT_DATE - INTERVAL '7 month' 
AND CURRENT_DATE - INTERVAL '6 month'
AND meta @> '{"source":"test"}'::jsonb;

Aggregate  (cost=245463.60..245463.61 rows=1 width=8) (actual time=6727.589..6727.590 rows=1 loops=1)
  Output: count(1)
  ->  Foreign Scan on fdw_schema."table"  (cost=1100.00..245459.90 rows=1478 width=0) (actual time=16.213..6634.794 rows=1360025 loops=1)
        Output: "table".id, "table".is_active, "table".meta, "table".created_dt
        Filter: (("table".is_active IS TRUE) AND ("table".created_dt >= (('now'::cstring)::date - '7 mons'::interval)) AND ("table".created_dt <= ((('now'::cstring)::date)::timestamp with time zone - '6 mons'::interval)))
        Rows Removed by Filter: 619961
        Remote SQL: SELECT created_dt, is_active FROM fdw_schema.table WHERE ((meta @> '{"source": "test"}'::jsonb))
Planning time: 0.747 ms
Execution time: 6727.815 ms

Panzvimbo yekusefa vashandisi, iwe unofanirwa kushandisa kuvepo kweumwe mushandisi. jsonb mune zvakasiyana. 7 masekonzi pachinzvimbo chepakutanga 29. Parizvino, iyi ndiyo yega sarudzo yakabudirira yekutamisa masefa pamusoro jsonb kune sevha iri kure, asi pano zvakakosha kufunga nezvekuganhurwa kumwe: tinoshandisa vhezheni 9.6 yedhatabhesi, asi pakupera kwaApril tinoronga kupedzisa bvunzo dzekupedzisira uye kutamira kune vhezheni 12. Patinongovandudza, tichanyora kuti zvakakanganisa sei, nokuti kune zvakawanda zvekuchinja kune tariro yakawanda: json_path, maitiro matsva eCTE, kusundira pasi (iripo kubva mushanduro 10). Ndinoda chaizvo kuzviedza nokukurumidza.

Mupedze

Isu takatarisa kuti shanduko yega yega inobata sei kumhanya kwemubvunzo mumwe nemumwe. Ngationei zvino zvinoitika kana ese matatu esefa akanyorwa nemazvo.

explain analyze verbose
SELECT count(1)
FROM fdw_schema.table 
WHERE is_active = True
AND created_dt >= (SELECT CURRENT_DATE::timestamptz - INTERVAL '7 month') 
AND created_dt <(SELECT CURRENT_DATE::timestamptz - INTERVAL '6 month')
AND meta @> '{"source":"test"}'::jsonb;

Aggregate  (cost=322041.51..322041.52 rows=1 width=8) (actual time=2278.867..2278.867 rows=1 loops=1)
  Output: count(1)
  InitPlan 1 (returns $0)
    ->  Result  (cost=0.00..0.02 rows=1 width=8) (actual time=0.010..0.010 rows=1 loops=1)
          Output: ((('now'::cstring)::date)::timestamp with time zone - '7 mons'::interval)
  InitPlan 2 (returns $1)
    ->  Result  (cost=0.00..0.02 rows=1 width=8) (actual time=0.003..0.003 rows=1 loops=1)
          Output: ((('now'::cstring)::date)::timestamp with time zone - '6 mons'::interval)
  ->  Foreign Scan on fdw_schema."table"  (cost=100.02..322041.41 rows=25 width=0) (actual time=8.597..2153.809 rows=1360025 loops=1)
        Output: "table".id, "table".is_active, "table".meta, "table".created_dt
        Remote SQL: SELECT NULL FROM fdw_schema.table WHERE (is_active) AND ((created_dt >= $1::timestamp with time zone)) AND ((created_dt < $2::timestamp with time zone)) AND ((meta @> '{"source": "test"}'::jsonb))
Planning time: 0.820 ms
Execution time: 2279.087 ms

Hongu, mubvunzo wacho unotaridzika wakaoma, mutengo unomanikidzirwa, asi kukurumidza kwekuuraya ndeye 2 seconds, iyo inopfuura ka10 nokukurumidza! Uye isu tiri kutaura nezvemubvunzo wakapfava pane idiki seti yedata. Pakukumbira chaiko, takawana wedzero dzinosvika mazana akati wandei.

Kuzvipfupikisa: kana uri kushandisa PostgreSQL neFDW, gara uchitarisa kana ese mafirita akatumirwa kune iri kure server uye iwe uchafara ... Zvishoma kusvikira wasvika pakujoinha pakati pematafura kubva kumaseva akasiyana. Asi iyo inyaya yechimwe chinyorwa.

Ndinokutendai nekuteerera kwenyu! Ndinoda kunzwa mibvunzo, makomendi, uye nyaya pamusoro pezviitiko zvako mumhinduro.

Source: www.habr.com

Voeg