Litlhahlobo tsa ts'ebetso ho meralo ea microservice: thuso le ho potlakisa Postgres FDW

Mehaho ea Microservice, joalo ka ntho e 'ngoe le e' ngoe lefatšeng lena, e na le melemo le boiketlo ba eona. Mekhoa e meng e ba bonolo ka eona, e meng e thata haholoanyane. 'Me molemong oa lebelo la phetoho le scalability e betere, o hloka ho itela. E 'ngoe ea tsona ke ho rarahana ho ntseng ho eketseha ha li-analytics. Haeba ka monolith li-analytics tsohle tsa ts'ebetso li ka fokotsoa ho lipotso tsa SQL ho ea ho replica ea analytical, joale mohahong oa meralo ea litšebeletso tse ngata tšebeletso e 'ngoe le e' ngoe e na le database ea eona 'me ho bonahala eka potso e le' ngoe e ke ke ea etsoa (kapa mohlomong e ka khona?). Bakeng sa ba thahasellang hore na re rarolotse joang bothata ba li-analytics tsa ts'ebetso k'hamphaning ea rona le kamoo re ithutileng ho phela ka tharollo ena - amohelehile.

Litlhahlobo tsa ts'ebetso ho meralo ea microservice: thuso le ho potlakisa Postgres FDW
Lebitso la ka ke Pavel Sivash, ho DomClick ke sebetsa sehlopheng se ikarabellang bakeng sa ho boloka polokelo ea data ea analytical. Ka tloaelo, mesebetsi ea rona e ka hlalosoa e le boenjiniere ba data, empa, ha e le hantle, mefuta e mengata ea mesebetsi e pharalletse haholo. Ho na le maemo a ETL/ELT a boenjineri ba data, tšehetso le ho ikamahanya le maemo a lisebelisoa bakeng sa tlhahlobo ea data le nts'etsopele ea lisebelisoa tsa hau. Haholo-holo, bakeng sa tlaleho ea ts'ebetso, re ile ra etsa qeto ea "ho iketsa eka" re na le monolith le ho fa bahlahlobisisi database e le 'ngoe e tla ba le lintlha tsohle tseo ba li hlokang.

Ka kakaretso, re ile ra nahana ka likhetho tse fapaneng. Ho ile ha khonahala ho aha polokelo e felletseng - re bile ra leka, empa, ho bua 'nete, ha rea ​​​​ka ra khona ho kopanya liphetoho khafetsa mohopolong le ts'ebetso e liehang ea ho aha polokelo le ho etsa liphetoho ho eona (haeba motho a atlehile. , ngola litlhalosong joang). Ho ne ho ka khoneha ho bolella bahlahlobisisi: "Bahlankana, ithuteng python 'me le ee ho li-analytical replicas," empa sena ke tlhokahalo e eketsehileng bakeng sa ho ngolisa,' me ho ne ho bonahala eka sena se lokela ho qojoa ha ho khoneha. Re nkile qeto ea ho leka ho sebelisa theknoloji ea FDW (Foreign Data Wrapper): ha e le hantle, ena ke dblink e tloaelehileng, e maemong a SQL, empa e na le sebopeho sa eona se bonolo haholoanyane. Ho itšetlehile ka eona, re ile ra etsa tharollo, eo qetellong e ileng ea tšoara, 'me ra lula ho eona. Lintlha tsa eona ke sehlooho sa sengoloa se arohaneng, mme mohlomong se fetang se le seng, kaha ke batla ho bua haholo: ho tloha ho hokahanya li-schemas tsa database ho fihlella taolo le depersonalization ea data ea hau. Hape hoa hlokahala ho etsa pehelo ea hore tharollo ena ha se phetisetso ea li-database tsa 'nete tsa tlhahlobo le polokelo; e rarolla bothata bo itseng feela.

Boemong bo phahameng bo shebahala tjena:

Litlhahlobo tsa ts'ebetso ho meralo ea microservice: thuso le ho potlakisa Postgres FDW
Ho na le database ea PostgreSQL moo basebelisi ba ka bolokang data ea bona ea mosebetsi, 'me habohlokoa ka ho fetisisa, replicas ea tlhahlobo ea litšebeletso tsohle e hokahane le database ena ka FDW. Sena se etsa hore ho khonehe ho ngola potso ho li-database tse 'maloa,' me ha ho tsotellehe hore na ke eng: PostgreSQL, MySQL, MongoDB kapa ntho e 'ngoe (faele, API, haeba ka tšohanyetso ho se na sekoahelo se loketseng, u ka ngola ea hau). Ho lokile, tsohle li bonahala li le ntle! Na rea ​​arohana?

Haeba ntho e 'ngoe le e' ngoe e felile ka potlako le ka mokhoa o bonolo, joale, mohlomong, ho ka be ho se na sehlooho.

Ho bohlokoa ho hlakisa hore na Postgres e sebetsa joang likopo ho li-server tse hole. Sena se bonahala se utloahala, empa hangata batho ha ba se ele hloko: Postgres e arola kopo ka likarolo tse etsoang ka boithaopo ho li-server tse hole, e bokella data ena, 'me e etsa lipalo tsa ho qetela ka boeona, kahoo lebelo la ts'ebetso ea lipotso le tla itšetleha haholo. kamoo e ngotsoeng kateng. Hape hoa lokela ho hlokomeloa: ha data e fihla ho tloha ho seva se hōle, ha e sa na li-index, ha ho letho le tla thusa mohlophisi, ka hona, ke rona feela re ka mo thusang le ho mo eletsa. 'Me sena ke sona seo ke batlang ho bua ka sona ka botlalo.

Potso e bonolo le moralo le eona

Ho bontša kamoo Postgres a botsang tafole ea mela e limilione tse 6 ho seva e hole, ha re shebeng leano le bonolo.

explain analyze verbose  
SELECT count(1)
FROM fdw_schema.table;

Aggregate  (cost=418383.23..418383.24 rows=1 width=8) (actual time=3857.198..3857.198 rows=1 loops=1)
  Output: count(1)
  ->  Foreign Scan on fdw_schema."table"  (cost=100.00..402376.14 rows=6402838 width=0) (actual time=4.874..3256.511 rows=6406868 loops=1)
        Output: "table".id, "table".is_active, "table".meta, "table".created_dt
        Remote SQL: SELECT NULL FROM fdw_schema.table
Planning time: 0.986 ms
Execution time: 3857.436 ms

Ho sebelisa polelo ea VERBOSE ho re lumella ho bona potso e tla romelloa ho seva se hole le liphetho tseo re tla li fumana bakeng sa ts'ebetso e tsoelang pele (RemoteSQL line).

Ha re ee pejana 'me re kenye li-filters tse' maloa ho kopo ea rona: e 'ngoe bakeng sa boolean tšimo, e 'ngoe ka ketsahalo timestamp ka nako le ka bonngoe jsonb.

explain analyze verbose
SELECT count(1)
FROM fdw_schema.table 
WHERE is_active is True
AND created_dt BETWEEN CURRENT_DATE - INTERVAL '7 month' 
AND CURRENT_DATE - INTERVAL '6 month'
AND meta->>'source' = 'test';

Aggregate  (cost=577487.69..577487.70 rows=1 width=8) (actual time=27473.818..25473.819 rows=1 loops=1)
  Output: count(1)
  ->  Foreign Scan on fdw_schema."table"  (cost=100.00..577469.21 rows=7390 width=0) (actual time=31.369..25372.466 rows=1360025 loops=1)
        Output: "table".id, "table".is_active, "table".meta, "table".created_dt
        Filter: (("table".is_active IS TRUE) AND (("table".meta ->> 'source'::text) = 'test'::text) AND ("table".created_dt >= (('now'::cstring)::date - '7 mons'::interval)) AND ("table".created_dt <= ((('now'::cstring)::date)::timestamp with time zone - '6 mons'::interval)))
        Rows Removed by Filter: 5046843
        Remote SQL: SELECT created_dt, is_active, meta FROM fdw_schema.table
Planning time: 0.665 ms
Execution time: 27474.118 ms

Mona ke moo ntlha eo u hlokang ho e ela hloko ha u ngola lipotso e le teng. Li-filters ha lia ka tsa fetisetsoa ho seva se hole, ho bolelang hore ho e phetha, Postgres e hula mela eohle ea limilione tse 6 e le hore e hloekise sebakeng sa heno (Filter row) le ho etsa aggregation. Senotlolo sa katleho ke ho ngola potso e le hore li-filters li fetisetsoe mochine o hōle, 'me re amohela le ho kopanya mela e hlokahalang feela.

Ke booleanshit

Ka masimo a boolean ntho e 'ngoe le e' ngoe e bonolo. Kopong ea mantlha, bothata bo ne bo bakoa ke opareitara is. Haeba u e nkela sebaka ka =, ebe re fumana sephetho se latelang:

explain analyze verbose
SELECT count(1)
FROM fdw_schema.table
WHERE is_active = True
AND created_dt BETWEEN CURRENT_DATE - INTERVAL '7 month' 
AND CURRENT_DATE - INTERVAL '6 month'
AND meta->>'source' = 'test';

Aggregate  (cost=508010.14..508010.15 rows=1 width=8) (actual time=19064.314..19064.314 rows=1 loops=1)
  Output: count(1)
  ->  Foreign Scan on fdw_schema."table"  (cost=100.00..507988.44 rows=8679 width=0) (actual time=33.035..18951.278 rows=1360025 loops=1)
        Output: "table".id, "table".is_active, "table".meta, "table".created_dt
        Filter: ((("table".meta ->> 'source'::text) = 'test'::text) AND ("table".created_dt >= (('now'::cstring)::date - '7 mons'::interval)) AND ("table".created_dt <= ((('now'::cstring)::date)::timestamp with time zone - '6 mons'::interval)))
        Rows Removed by Filter: 3567989
        Remote SQL: SELECT created_dt, meta FROM fdw_schema.table WHERE (is_active)
Planning time: 0.834 ms
Execution time: 19064.534 ms

Joalokaha u bona, filthara e ile ea fofela ho seva e hole, mme nako ea ts'ebetso e fokotsehile ho tloha ho 27 ho isa ho metsotsoana e 19.

Ke habohlokoa ho hlokomela hore opareitara is fapaneng le mosebeletsi = hobane e ka sebetsa ka boleng ba Null. Ho bolela seo ha se 'Nete e tla siea boleng ba Bohata le Null ka har'a sefa, athe != 'Nete e tla siea feela litekanyetso tsa Bohata. Ka hona, ha o nkela opareitara sebaka hase maemo a mabeli a nang le OR opareitara e lokela ho fetisetsoa ho sefa, mohlala, MOKAE (col != Nnete) KAPA (col is null).

Re sebetsana le boolean, ha re tsoeleng pele. Hajoale, ha re khutlisetse sefe sa Boolean ho sebopeho sa sona sa mantlha e le hore re ka nahana ka boikemelo ba liphetoho tse ling.

timestamptz? hz

Ka kakaretso, hangata o tlameha ho leka mokhoa oa ho ngola kopo e nepahetseng e kenyelletsang li-server tse hole, ebe o batla tlhaloso ea hore na ke hobane'ng ha sena se etsahala. Boitsebiso bo fokolang haholo mabapi le sena bo ka fumanoa Inthaneteng. Kahoo, litekong re fumane hore filthara e tsitsitseng ea letsatsi e fofela ho seva e hole ka bang, empa ha re batla ho beha letsatsi ka matla, mohlala, hona joale () kapa CURRENT_DATE, sena ha se etsahale. Mohlala oa rona, re kentse sesefa hore kholomo ea created_at e be le lintlha tsa khoeli e fetileng hantle (PAKENG TSA CURRENT_DATE - INTERVAL 'likhoeli tse 1' LE CURRENT_DATE - INTERVAL 'likhoeli tse 7'). Re ile ra etsa’ng tabeng ee?

explain analyze verbose
SELECT count(1)
FROM fdw_schema.table 
WHERE is_active is True
AND created_dt >= (SELECT CURRENT_DATE::timestamptz - INTERVAL '7 month') 
AND created_dt <(SELECT CURRENT_DATE::timestamptz - INTERVAL '6 month')
AND meta->>'source' = 'test';

Aggregate  (cost=306875.17..306875.18 rows=1 width=8) (actual time=4789.114..4789.115 rows=1 loops=1)
  Output: count(1)
  InitPlan 1 (returns $0)
    ->  Result  (cost=0.00..0.02 rows=1 width=8) (actual time=0.007..0.008 rows=1 loops=1)
          Output: ((('now'::cstring)::date)::timestamp with time zone - '7 mons'::interval)
  InitPlan 2 (returns $1)
    ->  Result  (cost=0.00..0.02 rows=1 width=8) (actual time=0.002..0.002 rows=1 loops=1)
          Output: ((('now'::cstring)::date)::timestamp with time zone - '6 mons'::interval)
  ->  Foreign Scan on fdw_schema."table"  (cost=100.02..306874.86 rows=105 width=0) (actual time=23.475..4681.419 rows=1360025 loops=1)
        Output: "table".id, "table".is_active, "table".meta, "table".created_dt
        Filter: (("table".is_active IS TRUE) AND (("table".meta ->> 'source'::text) = 'test'::text))
        Rows Removed by Filter: 76934
        Remote SQL: SELECT is_active, meta FROM fdw_schema.table WHERE ((created_dt >= $1::timestamp with time zone)) AND ((created_dt < $2::timestamp with time zone))
Planning time: 0.703 ms
Execution time: 4789.379 ms

Re ile ra bolella ralara hore a bale letsatsi ho subquery esale pele mme a fetise phetoho e seng e entsoe ho sefa. Mme tlhahiso ena e re file sephetho se setle haholo, kopo e ile ea e-ba makhetlo a ka bang 6 kapele!

Hape, ke habohlokoa ho ba hlokolosi mona: mofuta oa data ho subquery o tlameha ho tšoana le oa tšimo eo re e hloekisang, ho seng joalo moqapi o tla etsa qeto ea hore kaha mefuta e fapane, hoa hlokahala ho qala ho fumana tsohle. data le ho e sefa sebakeng sa heno.

Ha re khutlisetse sefe ea letsatsi boleng ba eona ba pele.

Freddy vs. Jsonb

Ka kakaretso, masimo le matsatsi a Boolean a se a potlakisitse potso ea rona ka ho lekana, empa ho ne ho setse mofuta o le mong hape oa data. Ntoa ea ho sefa ka eona, ho bua 'nete, ha e e-so fele, le hoja ho na le katleho le mona. Kahoo, ke kamoo re khonneng ho fetisa sefahla ka tsela ena jsonb lebaleng ho seva e hole.

explain analyze verbose
SELECT count(1)
FROM fdw_schema.table 
WHERE is_active is True
AND created_dt BETWEEN CURRENT_DATE - INTERVAL '7 month' 
AND CURRENT_DATE - INTERVAL '6 month'
AND meta @> '{"source":"test"}'::jsonb;

Aggregate  (cost=245463.60..245463.61 rows=1 width=8) (actual time=6727.589..6727.590 rows=1 loops=1)
  Output: count(1)
  ->  Foreign Scan on fdw_schema."table"  (cost=1100.00..245459.90 rows=1478 width=0) (actual time=16.213..6634.794 rows=1360025 loops=1)
        Output: "table".id, "table".is_active, "table".meta, "table".created_dt
        Filter: (("table".is_active IS TRUE) AND ("table".created_dt >= (('now'::cstring)::date - '7 mons'::interval)) AND ("table".created_dt <= ((('now'::cstring)::date)::timestamp with time zone - '6 mons'::interval)))
        Rows Removed by Filter: 619961
        Remote SQL: SELECT created_dt, is_active FROM fdw_schema.table WHERE ((meta @> '{"source": "test"}'::jsonb))
Planning time: 0.747 ms
Execution time: 6727.815 ms

Sebakeng sa ho sefa opareitara, o tlameha ho sebelisa boteng ba opareitara e le 'ngoe jsonb ka tsela e fapaneng. Metsotsoana e 7 ho fapana le ea pele ea 29. Ho fihlela joale ena ke eona feela khetho e atlehileng ea ho fetisa li-filters ka jsonb ho seva se hole, empa mona ho bohlokoa ho ela hloko moeli o le mong: re sebelisa mofuta oa 9.6 oa database, empa ho elella bofelong ba Mmesa re rera ho phethela liteko tsa ho qetela le ho fetela ho mofuta oa 12. Ha re se re ntlafalitse, re tla ngola ka hore na e amme joang, hobane ho na le liphetoho tse ngata tseo ho nang le tšepo e kholo bakeng sa tsona: json_path, boitšoaro bo bocha ba CTE, sutumelletsa fatše (e teng ho tloha phetolelong ea 10). Ke batla ho e leka haufinyane.

Mo qetelle

Re lekile hore na phetoho ka 'ngoe e ama lebelo la kopo joang ka bomong. Joale a re boneng hore na ho etsahala'ng ha li-filters tse tharo li ngotsoe ka nepo.

explain analyze verbose
SELECT count(1)
FROM fdw_schema.table 
WHERE is_active = True
AND created_dt >= (SELECT CURRENT_DATE::timestamptz - INTERVAL '7 month') 
AND created_dt <(SELECT CURRENT_DATE::timestamptz - INTERVAL '6 month')
AND meta @> '{"source":"test"}'::jsonb;

Aggregate  (cost=322041.51..322041.52 rows=1 width=8) (actual time=2278.867..2278.867 rows=1 loops=1)
  Output: count(1)
  InitPlan 1 (returns $0)
    ->  Result  (cost=0.00..0.02 rows=1 width=8) (actual time=0.010..0.010 rows=1 loops=1)
          Output: ((('now'::cstring)::date)::timestamp with time zone - '7 mons'::interval)
  InitPlan 2 (returns $1)
    ->  Result  (cost=0.00..0.02 rows=1 width=8) (actual time=0.003..0.003 rows=1 loops=1)
          Output: ((('now'::cstring)::date)::timestamp with time zone - '6 mons'::interval)
  ->  Foreign Scan on fdw_schema."table"  (cost=100.02..322041.41 rows=25 width=0) (actual time=8.597..2153.809 rows=1360025 loops=1)
        Output: "table".id, "table".is_active, "table".meta, "table".created_dt
        Remote SQL: SELECT NULL FROM fdw_schema.table WHERE (is_active) AND ((created_dt >= $1::timestamp with time zone)) AND ((created_dt < $2::timestamp with time zone)) AND ((meta @> '{"source": "test"}'::jsonb))
Planning time: 0.820 ms
Execution time: 2279.087 ms

E, kopo e shebahala e rarahane haholoanyane, ena ke tefiso e qobelloang, empa lebelo la ho bolaoa ke metsotsoana ea 2, e leng makhetlo a fetang 10 ka potlako! Mme re bua ka potso e bonolo khahlano le sete e nyane ea data. Likopong tsa 'nete, re ile ra eketsoa ka makhetlo a ka bang makholo a 'maloa.

Ho akaretsa: haeba u sebelisa PostgreSQL le FDW, kamehla hlahloba hore li-filters tsohle li rometsoe ho seva se hōle, 'me u tla thaba ... Bonyane ho fihlela u fihla ho kopanya pakeng tsa litafole tse tsoang ho li-server tse fapaneng. Empa ke pale ea sengoloa se seng.

Kea leboha ha u mametse! Ke kopa ho utloa lipotso, maikutlo le lipale ka liphihlelo tsa hau ho maikutlo.

Source: www.habr.com

Eketsa ka tlhaloso