Uhlalutyo olusebenzayo kwi-microservice architecture: indlela yokunceda kunye nokucebisa i-Postgres FDW

I-Microservice architecture, njengayo yonke into kweli hlabathi, ineenzuzo kunye neengxaki zayo. Ezinye iinkqubo ziba lula ngayo, ezinye zibe nzima ngakumbi. Kwaye ngenxa yesantya sotshintsho kunye nokulinganisa okungcono, kufuneka uncame. Enye yazo kukukhula kobunzima bohlalutyo. Ukuba kwi-monolith yonke i-analytics yokusebenza ingancitshiswa kwi-SQL imibuzo kwi-analytical replica, ngoko kwi-architecture ye-multiservice inkonzo nganye ine-database yayo kwaye kubonakala ngathi umbuzo omnye awunakwenziwa (okanye mhlawumbi?). Kwabo banomdla kwindlela esiyisombulule ngayo ingxaki yohlalutyo lokusebenza kwinkampani yethu kunye nendlela esifunde ngayo ukuhlala nesi sisombululo - wamkelekile.

Uhlalutyo olusebenzayo kwi-microservice architecture: indlela yokunceda kunye nokucebisa i-Postgres FDW
Igama lam nguPavel Sivash, kwiDomClick ndisebenza kwiqela elijongene nokugcina indawo yokugcina idatha yokuhlalutya. Ngokwesiqhelo, imisebenzi yethu inokuhlelwa njengobunjineli bedatha, kodwa, eneneni, uluhlu lwemisebenzi lubanzi kakhulu. Kukho umgangatho we-ETL/ELT wobunjineli bedatha, inkxaso kunye nokulungelelaniswa kwezixhobo zokuhlalutya idatha kunye nophuhliso lwezixhobo zakho. Ngokukodwa, kwingxelo yokusebenza, sagqiba ekubeni "sizenze" sine-monolith kwaye sinike abahlalutyi isiseko sedatha enye eya kuba nayo yonke idatha abayidingayo.

Ngokubanzi, siye saqwalasela iindlela ezahlukeneyo zokukhetha. Kwakunokwenzeka ukwakha indawo yokugcina indawo epheleleyo-sazama, kodwa, ukunyaniseka, asikwazanga ukudibanisa utshintsho oluqhelekileyo kwingqiqo kunye nenkqubo ecothayo yokwakha indawo yokugcina kunye nokwenza utshintsho kuyo (ukuba umntu uphumelele. , bhala kwizimvo njani). Kwakunokwenzeka ukuxelela abahlalutyi: "Madoda, fundani i-python kwaye niye kwii-replicas zohlalutyo," kodwa le mfuneko eyongezelelweyo yokugaya, kwaye kwakubonakala ngathi oku kufuneka kugwenywe ukuba kunokwenzeka. Sigqibe kwelokuba sizame ukusebenzisa itekhnoloji yeFDW (Foreign Data Wrapper): ngokusisiseko, le yidblink eqhelekileyo, ekumgangatho weSQL, kodwa inojongano lwayo olulula ngakumbi. Ngokusekelwe kuyo, senze isisombululo, esathi ekugqibeleni sabamba, kwaye sahlala kuso. Iinkcukacha zalo zingumxholo wenqaku elahlukileyo, kwaye mhlawumbi ngaphezu kwesinye, kuba ndifuna ukuthetha malunga nokuninzi: ukusuka kwi-synchronization schemas yedatha ukufikelela kulawulo kunye nokuchithwa kwedatha yomntu. Kukwayimfuneko ukwenza ugcino ukuba esi sisombululo asiyondawo yogcino-lwazi lokwenyani kunye neendawo zokugcina; sisombulula kuphela ingxaki ethile.

Kwinqanaba eliphezulu likhangeleka ngolu hlobo:

Uhlalutyo olusebenzayo kwi-microservice architecture: indlela yokunceda kunye nokucebisa i-Postgres FDW
Kukho i-database ye-PostgreSQL apho abasebenzisi banokugcina idatha yabo yomsebenzi, kwaye okona kubaluleke kakhulu, ii-replicas zokuhlalutya zonke iinkonzo zixhunyiwe kule datha nge-FDW. Oku kwenza ukuba kube lula ukubhala umbuzo kwiinkcukacha ezininzi, kwaye akunandaba nokuba yintoni na: I-PostgreSQL, i-MySQL, i-MongoDB okanye enye into (ifayile, i-API, ukuba ngokukhawuleza akukho ngqungquthela efanelekileyo, ungabhala eyakho). Ewe, yonke into ibonakala ilungile! Ngaba siyahlukana?

Ukuba yonke into iphelile ngokukhawuleza kwaye ngokulula, ngoko, mhlawumbi, bekungayi kubakho nqaku.

Kubalulekile ukucaca malunga nendlela iPostgres eqhuba ngayo izicelo kwiiseva ezikude. Oku kubonakala kunengqiqo, kodwa kaninzi abantu abayihoyi: I-Postgres yahlula isicelo sibe ngamacandelo aqhutywe ngokuzimeleyo kwiiseva ezikude, iqokelela le datha, kwaye yenza izibalo zokugqibela ngokwayo, ngoko ke isantya sokwenza umbuzo siyakuxhomekeka kakhulu kubhalwe njani. Kufuneka kwakhona kuqatshelwe: xa idatha ifika kwi-server ekude, ayisekho i-indexes, akukho nto iya kunceda umcwangcisi, ngoko ke, thina kuphela sinokumnceda kwaye simcebise. Kwaye yile nto kanye endifuna ukuthetha ngayo ngokubanzi.

Umbuzo olula kunye nesicwangciso kunye nawo

Ukubonisa indlela iPostgres ebuza ngayo itafile yezigidi ezi-6 kwiseva ekude, makhe sijonge isicwangciso esilula.

explain analyze verbose  
SELECT count(1)
FROM fdw_schema.table;

Aggregate  (cost=418383.23..418383.24 rows=1 width=8) (actual time=3857.198..3857.198 rows=1 loops=1)
  Output: count(1)
  ->  Foreign Scan on fdw_schema."table"  (cost=100.00..402376.14 rows=6402838 width=0) (actual time=4.874..3256.511 rows=6406868 loops=1)
        Output: "table".id, "table".is_active, "table".meta, "table".created_dt
        Remote SQL: SELECT NULL FROM fdw_schema.table
Planning time: 0.986 ms
Execution time: 3857.436 ms

Ukusebenzisa i-VERBOSE isitatimenti sivumela ukuba sibone umbuzo oza kuthunyelwa kwi-server ekude kunye neziphumo esiya kuzifumana ngokuqhubekayo (umgca we-RemoteSQL).

Masiqhubele phambili kancinci kwaye songeze izihluzi ezininzi kwisicelo sethu: enye ye yibhulu intsimi, enye ngesenzeko simiswe kwisithuba kwaye enye nge jsonb.

explain analyze verbose
SELECT count(1)
FROM fdw_schema.table 
WHERE is_active is True
AND created_dt BETWEEN CURRENT_DATE - INTERVAL '7 month' 
AND CURRENT_DATE - INTERVAL '6 month'
AND meta->>'source' = 'test';

Aggregate  (cost=577487.69..577487.70 rows=1 width=8) (actual time=27473.818..25473.819 rows=1 loops=1)
  Output: count(1)
  ->  Foreign Scan on fdw_schema."table"  (cost=100.00..577469.21 rows=7390 width=0) (actual time=31.369..25372.466 rows=1360025 loops=1)
        Output: "table".id, "table".is_active, "table".meta, "table".created_dt
        Filter: (("table".is_active IS TRUE) AND (("table".meta ->> 'source'::text) = 'test'::text) AND ("table".created_dt >= (('now'::cstring)::date - '7 mons'::interval)) AND ("table".created_dt <= ((('now'::cstring)::date)::timestamp with time zone - '6 mons'::interval)))
        Rows Removed by Filter: 5046843
        Remote SQL: SELECT created_dt, is_active, meta FROM fdw_schema.table
Planning time: 0.665 ms
Execution time: 27474.118 ms

Kulapho kulele khona ingongoma ekufuneka uyithathele ingqalelo xa ubhala imibuzo. Izihluzi azizange zidluliselwe kwiseva ekude, nto leyo ethetha ukuba ukuyiphumeza, i-Postgres ikhupha yonke imiqolo ezigidi ezi-6 ukwenzela ukuba ihluze ngokwalapha (umqolo wesihluzo) kwaye wenze ukudibanisa. Isitshixo sempumelelo kukubhala umbuzo ukwenzela ukuba izihluzi zidluliselwe kumatshini okude, kwaye sifumana kwaye sidibanise kuphela imiqolo eyimfuneko.

Yi booleanshit leyo

Ngamabala e-boolean yonke into ilula. Kwisicelo sokuqala, ingxaki yayingenxa yomsebenzisi is. Ukuba uyibuyisela endaweni yayo =, emva koko sifumana iziphumo ezilandelayo:

explain analyze verbose
SELECT count(1)
FROM fdw_schema.table
WHERE is_active = True
AND created_dt BETWEEN CURRENT_DATE - INTERVAL '7 month' 
AND CURRENT_DATE - INTERVAL '6 month'
AND meta->>'source' = 'test';

Aggregate  (cost=508010.14..508010.15 rows=1 width=8) (actual time=19064.314..19064.314 rows=1 loops=1)
  Output: count(1)
  ->  Foreign Scan on fdw_schema."table"  (cost=100.00..507988.44 rows=8679 width=0) (actual time=33.035..18951.278 rows=1360025 loops=1)
        Output: "table".id, "table".is_active, "table".meta, "table".created_dt
        Filter: ((("table".meta ->> 'source'::text) = 'test'::text) AND ("table".created_dt >= (('now'::cstring)::date - '7 mons'::interval)) AND ("table".created_dt <= ((('now'::cstring)::date)::timestamp with time zone - '6 mons'::interval)))
        Rows Removed by Filter: 3567989
        Remote SQL: SELECT created_dt, meta FROM fdw_schema.table WHERE (is_active)
Planning time: 0.834 ms
Execution time: 19064.534 ms

Njengoko ubona, isihluzi sabhabha saya kwiseva ekude, kwaye ixesha lokuphumeza lancitshiswa ukusuka kwi-27 ukuya kwi-19 imizuzwana.

Kubalulekile ukuba uqaphele ukuba umqhubi is eyahlukileyo kumqhubi = kuba inokusebenza ngexabiso le-Null. Kuthetha ukuba asiyoNyaniso izakushiya amaxabiso bubuxoki kunye ne-null kwisihluzo, kanti != Yinyaniso iza kushiya amaxabiso obuxoki kuphela. Ngoko ke, xa ubuyisela umqhubi ayiyo iimeko ezimbini ezinomsebenzi OKANYE kufuneka zigqithiselwe kwisihluzi, umzekelo, PHI (ikhol != Yinyani) OKANYE (ikholli ililize).

Sijongene ne-boolean, masiqhubele phambili. Okwangoku, masibuyisele isihluzo seBoolean kwimo yaso yoqobo ukuze sicinge ngokuzimeleyo isiphumo solunye utshintsho.

timestamptz? hz

Ngokubanzi, uhlala kufuneka uzame indlela yokubhala ngokuchanekileyo isicelo esibandakanya iiseva ezikude, kwaye emva koko ujonge inkcazo yokuba kutheni le nto isenzeka. Ulwazi oluncinane kakhulu malunga noku lunokufumaneka kwi-Intanethi. Ke, kwimifuniselo sifumanise ukuba isihluzi somhla omiselweyo sibhabhela kwiseva ekude nge-bang, kodwa xa sifuna ukuseta umhla ngamandla, umzekelo, ngoku () okanye CURRENT_DATE, oku akwenzeki. Kumzekelo wethu, songeze isihluzi ukwenzela ukuba ikholamu ye-created_at iqulathe idatha yenyanga eyi-1 ngqo kwixa elidlulileyo (PHAKATHI kwe-CURRENT_DATE - INTERVAL '7 inyanga' KUNYE CURRENT_DATE - INTERVAL '6 inyanga'). Senze ntoni kule meko?

explain analyze verbose
SELECT count(1)
FROM fdw_schema.table 
WHERE is_active is True
AND created_dt >= (SELECT CURRENT_DATE::timestamptz - INTERVAL '7 month') 
AND created_dt <(SELECT CURRENT_DATE::timestamptz - INTERVAL '6 month')
AND meta->>'source' = 'test';

Aggregate  (cost=306875.17..306875.18 rows=1 width=8) (actual time=4789.114..4789.115 rows=1 loops=1)
  Output: count(1)
  InitPlan 1 (returns $0)
    ->  Result  (cost=0.00..0.02 rows=1 width=8) (actual time=0.007..0.008 rows=1 loops=1)
          Output: ((('now'::cstring)::date)::timestamp with time zone - '7 mons'::interval)
  InitPlan 2 (returns $1)
    ->  Result  (cost=0.00..0.02 rows=1 width=8) (actual time=0.002..0.002 rows=1 loops=1)
          Output: ((('now'::cstring)::date)::timestamp with time zone - '6 mons'::interval)
  ->  Foreign Scan on fdw_schema."table"  (cost=100.02..306874.86 rows=105 width=0) (actual time=23.475..4681.419 rows=1360025 loops=1)
        Output: "table".id, "table".is_active, "table".meta, "table".created_dt
        Filter: (("table".is_active IS TRUE) AND (("table".meta ->> 'source'::text) = 'test'::text))
        Rows Removed by Filter: 76934
        Remote SQL: SELECT is_active, meta FROM fdw_schema.table WHERE ((created_dt >= $1::timestamp with time zone)) AND ((created_dt < $2::timestamp with time zone))
Planning time: 0.703 ms
Execution time: 4789.379 ms

Sixelele umcwangcisi ukubala umhla kwi-subquery kwangaphambili kwaye adlulise i-variable esele yenziwe kwisihluzi. Kwaye le ngcebiso isinike isiphumo esihle kakhulu, isicelo siye saphantse saphindaphindeka ka-6 ngokukhawuleza!

Kwakhona, kubalulekile ukuba uqaphele apha: uhlobo lwedatha kwi-subquery kufuneka ifane naleyo yentsimi esihluza kuyo, ngaphandle koko umcwangcisi uya kuthatha isigqibo sokuba ekubeni iindidi zahlukile, kuyimfuneko ukuba uqale ufumane zonke. idata kwaye ihluze kwindawo.

Masibuyisele isihluzo somhla kwixabiso laso loqobo.

UFreddy vs. Jsonb

Ngokubanzi, iindawo zeBoolean kunye nemihla sele ziwukhawulezisile umbuzo wethu ngokwaneleyo, kodwa bekukho olunye uhlobo lwedatha oluseleyo. Umlo wokuhluza ngawo, ukunyaniseka, awukapheli, nangona kukho impumelelo apha. Ke, yindlela esikwazile ngayo ukudlulisa isihluzo jsonb umhlaba kumncedisi okude.

explain analyze verbose
SELECT count(1)
FROM fdw_schema.table 
WHERE is_active is True
AND created_dt BETWEEN CURRENT_DATE - INTERVAL '7 month' 
AND CURRENT_DATE - INTERVAL '6 month'
AND meta @> '{"source":"test"}'::jsonb;

Aggregate  (cost=245463.60..245463.61 rows=1 width=8) (actual time=6727.589..6727.590 rows=1 loops=1)
  Output: count(1)
  ->  Foreign Scan on fdw_schema."table"  (cost=1100.00..245459.90 rows=1478 width=0) (actual time=16.213..6634.794 rows=1360025 loops=1)
        Output: "table".id, "table".is_active, "table".meta, "table".created_dt
        Filter: (("table".is_active IS TRUE) AND ("table".created_dt >= (('now'::cstring)::date - '7 mons'::interval)) AND ("table".created_dt <= ((('now'::cstring)::date)::timestamp with time zone - '6 mons'::interval)))
        Rows Removed by Filter: 619961
        Remote SQL: SELECT created_dt, is_active FROM fdw_schema.table WHERE ((meta @> '{"source": "test"}'::jsonb))
Planning time: 0.747 ms
Execution time: 6727.815 ms

Endaweni yokucoca abaqhubi, kufuneka usebenzise ubukho bomsebenzisi omnye jsonb kweyahlukileyo. Imizuzwana eyi-7 endaweni yeyoqobo engama-29. Ukuza kuthi ga ngoku olu lolona khetho luphumeleleyo lokudlulisela izihluzi nge jsonb kwi-server ekude, kodwa apha kubalulekile ukuqwalasela umda omnye: sisebenzisa inguqulo ye-9.6 yesiseko sedatha, kodwa ekupheleni kuka-Epreli siceba ukugqiba iimvavanyo zokugqibela kunye nokufudukela kwinguqulo ye-12. Nje ukuba sihlaziye, siza kubhala malunga nendlela echaphazeleke ngayo, kuba kukho utshintsho oluninzi apho kukho ithemba elininzi: json_path, indlela entsha yokuziphatha ye-CTE, tyhala ezantsi (ekhoyo ukusukela kwinguqulo ye-10). Ndifuna ngokwenene ukuyizama kungekudala.

Mgqibezele

Sivavanye indlela utshintsho ngalunye oluchaphazela ngayo isantya sesicelo somntu ngamnye. Ngoku masibone ukuba kwenzeka ntoni xa zontathu izihluzi zibhalwe ngokuchanekileyo.

explain analyze verbose
SELECT count(1)
FROM fdw_schema.table 
WHERE is_active = True
AND created_dt >= (SELECT CURRENT_DATE::timestamptz - INTERVAL '7 month') 
AND created_dt <(SELECT CURRENT_DATE::timestamptz - INTERVAL '6 month')
AND meta @> '{"source":"test"}'::jsonb;

Aggregate  (cost=322041.51..322041.52 rows=1 width=8) (actual time=2278.867..2278.867 rows=1 loops=1)
  Output: count(1)
  InitPlan 1 (returns $0)
    ->  Result  (cost=0.00..0.02 rows=1 width=8) (actual time=0.010..0.010 rows=1 loops=1)
          Output: ((('now'::cstring)::date)::timestamp with time zone - '7 mons'::interval)
  InitPlan 2 (returns $1)
    ->  Result  (cost=0.00..0.02 rows=1 width=8) (actual time=0.003..0.003 rows=1 loops=1)
          Output: ((('now'::cstring)::date)::timestamp with time zone - '6 mons'::interval)
  ->  Foreign Scan on fdw_schema."table"  (cost=100.02..322041.41 rows=25 width=0) (actual time=8.597..2153.809 rows=1360025 loops=1)
        Output: "table".id, "table".is_active, "table".meta, "table".created_dt
        Remote SQL: SELECT NULL FROM fdw_schema.table WHERE (is_active) AND ((created_dt >= $1::timestamp with time zone)) AND ((created_dt < $2::timestamp with time zone)) AND ((meta @> '{"source": "test"}'::jsonb))
Planning time: 0.820 ms
Execution time: 2279.087 ms

Ewe, isicelo sibonakala sinzima ngakumbi, le yintlawulo enyanzelekileyo, kodwa isantya sokwenza imizuzwana emi-2, engaphezulu kwamaxesha angama-10 ngokukhawuleza! Kwaye sithetha ngombuzo olula ngokuchasene neseti yedatha encinci. Kwizicelo zokwenyani, sifumene ukwanda ukuya kumakhulu aliqela amaxesha.

Ukushwankathela: ukuba usebenzisa i-PostgreSQL nge-FDW, hlala ujonga ukuba zonke izihluzi zithunyelwa kwiseva ekude, kwaye uya konwaba... Kodwa libali lelinye inqaku.

Enkosi ngosinaka kwakho! Ndingathanda ukuva imibuzo, izimvo, kunye namabali malunga namava akho kwizimvo.

umthombo: www.habr.com

Yongeza izimvo