Falanqaynta hawlgalka ee qaab-dhismeedka microservice: sida loo caawiyo oo loogu taliyo Postgres FDW

Naqshadeynta Microservice, sida wax kasta oo adduunkan ka jira, waxay leeyihiin faa'iidooyin iyo khasaare. Nidaamyada qaar ayaa ku fududaada, kuwa kalena aad ayey u adag yihiin. Iyo xawaaraha isbeddelka aawadood iyo scalability wanaagsan, waxaad u baahan tahay inaad allabaryo samayso. Mid ka mid ah waa kakanaanta sii kordheysa ee falanqaynta. Haddii monolith dhammaan falanqaynta hawlgalka lagu yarayn karo su'aalaha SQL si loo sameeyo nuqul falanqeyn ah, markaa qaab dhismeedka adeegyada badan ee adeeg kastaa wuxuu leeyahay xog ururin u gaar ah waxayna u muuqataa in hal su'aal aan la samayn karin (ama laga yaabo inay awooddo?). Kuwa xiiseynaya sida aan u xallinay dhibaatada falanqaynta hawlgalka ee shirkadeena iyo sida aan u baranay inaan ku noolaano xalkan - soo dhawoow.

Falanqaynta hawlgalka ee qaab-dhismeedka microservice: sida loo caawiyo oo loogu taliyo Postgres FDW
Magacaygu waa Pavel Sivash, oo jooga DomClick waxaan ka shaqeeyaa koox mas'uul ka ah ilaalinta bakhaarka xogta falanqaynta. Caadiyan, hawlaheenna waxaa loo kala saari karaa sida injineernimada xogta, laakiin, dhab ahaantii, baaxadda hawluhu aad ayay u ballaaran yihiin. Waxaa jira halbeeg ETL/ELT oo loogu talagalay injineerinka xogta, taageerada iyo la qabsiga agabka falanqaynta xogta iyo horumarinta qalabkaaga. Gaar ahaan, warbixinta hawlgalka, waxaan go'aansanay inaan "iska dhigno" inaan leenahay monolith oo aan siino falanqeeyayaasha hal xog oo ka kooban dhammaan xogta ay u baahan yihiin.

Guud ahaan, waxaanu tixgelinay doorashooyin kala duwan. Waxaa suurtagal ah in la dhiso kayd buuxa - xitaa waanu isku daynay, laakiin, si daacad ah, ma awoodin inaan isku darno isbeddellada soo noqnoqda ee macquulka ah iyo habka gaabiska ah ee dhismaha kaydinta iyo samaynta isbeddelka (haddii qof ku guuleysto , ku qor faallooyinka sida). Waa suurtogal in loo sheego falanqeeyayaasha: "Guys, baro Python oo tag nuqullada falanqaynta," laakiin tani waa shuruudo dheeraad ah oo shaqaaleysiinta, waxayna u muuqatay in tani ay tahay in laga fogaado haddii ay suurtagal tahay. Waxaan go'aansanay inaan isku dayno inaan isticmaalno FDW (Foreign Data Wrapper) tignoolajiyada: asal ahaan, kani waa dblink caadiga ah, kaas oo ku jira heerka SQL, laakiin leh interface u gaar ah oo ku habboon. Anagoo ka duulayna, ayaanu samaynay xal, kaas oo aakhirkii la qabsaday, oo aanu isla qaadanay. Faahfaahintiisu waa mawduuca maqaal gaar ah, iyo laga yaabee in ka badan hal, tan iyo markii aan rabo inaan wax badan ka hadlo: laga bilaabo isku-dubaridka schema database si ay u helaan xakamaynta iyo qarsoodiga xogta shakhsi ahaaneed. Waxa kale oo lagama maarmaan ah in la sameeyo boos celin ah in xalkani aanu ahayn beddelka kaydinta xogta falanqaynta dhabta ah iyo kaydinta; waxay xallisaa kaliya dhibaato gaar ah.

Heerka sare waxay u egtahay sidan:

Falanqaynta hawlgalka ee qaab-dhismeedka microservice: sida loo caawiyo oo loogu taliyo Postgres FDW
Waxa jira xog ururin PostgreSQL ah oo ay isticmaalayaashu ku kaydin karaan xogtooda shaqadooda, iyo tan ugu muhiimsan, nuqullada falanqaynta ee dhammaan adeegyada waxa lagu xidhay kaydkan iyada oo loo marayo FDW. Tani waxay suurtogal ka dhigaysaa in su'aal loo qoro dhowr kayd, oo dhib malahan waxa ay tahay: PostgreSQL, MySQL, MongoDB ama wax kale (faylka, API, haddii si lama filaan ah ma jiro duub ku habboon, waxaad qori kartaa adigu). Hagaag, wax walba waxay u muuqdaan kuwo weyn! Ma kala tagnaa?

Haddii wax waliba si dhakhso ah oo fudud u dhammaadaan, markaa, malaha, ma jiri doonto maqaal.

Waa muhiim in la caddeeyo sida Postgres u socodsiiyo codsiyada server-yada fog. Tani waxay u muuqataa mid macquul ah, laakiin inta badan dadku ma dhegaystaan: Postgres waxay u qaybisaa codsiga qaybo si madax-bannaan loogu fuliyo server-yada fog, waxay ururiyaan xogtan, waxayna sameeyaan xisaabinta kama dambaysta ah lafteeda, markaa xawaaraha fulinta su'aasha ayaa si weyn ugu xirnaan doona sida loo qoro. Waa in sidoo kale la ogaadaa: marka xogtu ka timaado server fog, mar dambe ma laha tusmooyinka, ma jiraan wax ka caawin doona jadwalka, sidaas darteed, anaga lafteena ayaa ku caawin kara oo la talin kara isaga. Tanina waa waxa aan rabo in aan si faahfaahsan uga hadlo.

Weydiimo fudud iyo qorshe leh

Si loo tuso sida Postgres u waydiiyo miis 6 milyan ah oo safaf ah oo saaran server fog, aynu eegno qorshe fudud.

explain analyze verbose  
SELECT count(1)
FROM fdw_schema.table;

Aggregate  (cost=418383.23..418383.24 rows=1 width=8) (actual time=3857.198..3857.198 rows=1 loops=1)
  Output: count(1)
  ->  Foreign Scan on fdw_schema."table"  (cost=100.00..402376.14 rows=6402838 width=0) (actual time=4.874..3256.511 rows=6406868 loops=1)
        Output: "table".id, "table".is_active, "table".meta, "table".created_dt
        Remote SQL: SELECT NULL FROM fdw_schema.table
Planning time: 0.986 ms
Execution time: 3857.436 ms

Isticmaalka bayaanka VERBOSE wuxuu noo ogolaanayaa inaan aragno weydiinta loo diri doono server-ka fog iyo natiijada aan ka heli doono habayn dheeri ah (khadka RemoteSQL).

Aan in yar sii socono oo aan ku darno dhowr filtar codsigeena: mid loogu talagalay boolean beerta, mid ka mid ah dhacdo timestamp inta u dhaxaysa iyo mid mid jsonb.

explain analyze verbose
SELECT count(1)
FROM fdw_schema.table 
WHERE is_active is True
AND created_dt BETWEEN CURRENT_DATE - INTERVAL '7 month' 
AND CURRENT_DATE - INTERVAL '6 month'
AND meta->>'source' = 'test';

Aggregate  (cost=577487.69..577487.70 rows=1 width=8) (actual time=27473.818..25473.819 rows=1 loops=1)
  Output: count(1)
  ->  Foreign Scan on fdw_schema."table"  (cost=100.00..577469.21 rows=7390 width=0) (actual time=31.369..25372.466 rows=1360025 loops=1)
        Output: "table".id, "table".is_active, "table".meta, "table".created_dt
        Filter: (("table".is_active IS TRUE) AND (("table".meta ->> 'source'::text) = 'test'::text) AND ("table".created_dt >= (('now'::cstring)::date - '7 mons'::interval)) AND ("table".created_dt <= ((('now'::cstring)::date)::timestamp with time zone - '6 mons'::interval)))
        Rows Removed by Filter: 5046843
        Remote SQL: SELECT created_dt, is_active, meta FROM fdw_schema.table
Planning time: 0.665 ms
Execution time: 27474.118 ms

Halkan waa meesha barta aad u baahan tahay inaad fiiro gaar ah u yeelato markaad qoreyso su'aalaha. Shaandhooyinka looma wareejin server-ka fog, taas oo macnaheedu yahay in la fuliyo, Postgres waxay soo saartaa dhammaan 6 milyan oo saf si ay markaas u shaandhayso gudaha ( safka shaandhaynta) oo ay sameyso isku-dar. Furaha guusha waa in la qoro weydiimo si filtarrada loogu wareejiyo mishiinka fog, waxaanan helnaa oo isku geynnaa kaliya safafka lagama maarmaanka ah.

Taasi waa xoogaa booleanshit ah

Beeraha boolean wax walba waa sahlan yihiin. Codsiga asalka ah, dhibaatadu waxaa sabab u ahaa hawlwadeenka is. Hadii aad ku badasho =, ka dib waxaan helnaa natiijada soo socota:

explain analyze verbose
SELECT count(1)
FROM fdw_schema.table
WHERE is_active = True
AND created_dt BETWEEN CURRENT_DATE - INTERVAL '7 month' 
AND CURRENT_DATE - INTERVAL '6 month'
AND meta->>'source' = 'test';

Aggregate  (cost=508010.14..508010.15 rows=1 width=8) (actual time=19064.314..19064.314 rows=1 loops=1)
  Output: count(1)
  ->  Foreign Scan on fdw_schema."table"  (cost=100.00..507988.44 rows=8679 width=0) (actual time=33.035..18951.278 rows=1360025 loops=1)
        Output: "table".id, "table".is_active, "table".meta, "table".created_dt
        Filter: ((("table".meta ->> 'source'::text) = 'test'::text) AND ("table".created_dt >= (('now'::cstring)::date - '7 mons'::interval)) AND ("table".created_dt <= ((('now'::cstring)::date)::timestamp with time zone - '6 mons'::interval)))
        Rows Removed by Filter: 3567989
        Remote SQL: SELECT created_dt, meta FROM fdw_schema.table WHERE (is_active)
Planning time: 0.834 ms
Execution time: 19064.534 ms

Sida aad arki karto, filtarku waxa uu u duulay server fog, wakhtiga fulintana waxa laga dhigay 27 ilaa 19 ilbidhiqsi.

Waxaa xusid mudan in hawlwadeenka is ka duwan hawlwadeenka = sababtoo ah waxay la shaqayn kartaa qiimaha Null. Taas macnaheedu waa Run maaha wuxuu kaga tagi doonaa qiyamka Been iyo Null shaandhada, halka != Run kaliya ka tagi doona qiyamka beenta ah. Sidaa darteed, marka la bedelayo hawlwadeenka ma ahan laba shuruudood oo leh OR hawlwadeenka waa in loo gudbiyaa shaandhada, tusaale ahaan, HALKEE (col != Run) AMA (col waa waxba).

Waxaan la macaamilnay boolean, aan sii wadno. Hadda, aan ku soo celinno shaandhada Boolean qaabkeedii asalka ahaa si aan si madax-banaan uga fikirno saamaynta isbeddelada kale.

timestamptz? hz

Guud ahaan, inta badan waa inaad tijaabisaa sida saxda ah ee loo qoro codsi ku lug leh server-yada fog, ka dibna kaliya raadi sharaxaad sababta tani u dhacdo. Macluumaad aad u yar oo arrintan ku saabsan ayaa laga heli karaa internetka. Marka, tijaabooyinka waxaan ku ogaanay in shaandheynta taariikhda go'an ay u duusho server-ka fog ee leh bang, laakiin markaan rabno inaan taariikhda u dhigno si firfircoon, tusaale ahaan, hadda() ama CURRENT_DATE, tani ma dhacayso. Tusaalahayaga, waxaanu ku darnay shaandheyn si tiirarka abuuray_at ay ugu jirto xogta 1 bilood ee la soo dhaafay (DHEXE CURRENT_DATE - INTERVAL '7 month' IYO CURRENT_DATE - INTERVAL '6 bilood'). Maxaan ka qabannay kiiskan?

explain analyze verbose
SELECT count(1)
FROM fdw_schema.table 
WHERE is_active is True
AND created_dt >= (SELECT CURRENT_DATE::timestamptz - INTERVAL '7 month') 
AND created_dt <(SELECT CURRENT_DATE::timestamptz - INTERVAL '6 month')
AND meta->>'source' = 'test';

Aggregate  (cost=306875.17..306875.18 rows=1 width=8) (actual time=4789.114..4789.115 rows=1 loops=1)
  Output: count(1)
  InitPlan 1 (returns $0)
    ->  Result  (cost=0.00..0.02 rows=1 width=8) (actual time=0.007..0.008 rows=1 loops=1)
          Output: ((('now'::cstring)::date)::timestamp with time zone - '7 mons'::interval)
  InitPlan 2 (returns $1)
    ->  Result  (cost=0.00..0.02 rows=1 width=8) (actual time=0.002..0.002 rows=1 loops=1)
          Output: ((('now'::cstring)::date)::timestamp with time zone - '6 mons'::interval)
  ->  Foreign Scan on fdw_schema."table"  (cost=100.02..306874.86 rows=105 width=0) (actual time=23.475..4681.419 rows=1360025 loops=1)
        Output: "table".id, "table".is_active, "table".meta, "table".created_dt
        Filter: (("table".is_active IS TRUE) AND (("table".meta ->> 'source'::text) = 'test'::text))
        Rows Removed by Filter: 76934
        Remote SQL: SELECT is_active, meta FROM fdw_schema.table WHERE ((created_dt >= $1::timestamp with time zone)) AND ((created_dt < $2::timestamp with time zone))
Planning time: 0.703 ms
Execution time: 4789.379 ms

Waxaanu u sheegnay qorshe-hayeha inuu hore u xisaabiyo taariikhda ku jirta subquery oo uu u gudbiyo doorsoomaha diyaarsan ee shaandhada. Oo tilmaantani waxay na siisay natiijo aad u fiican, codsigu wuxuu noqday ku dhawaad ​​6 jeer!

Mar labaad, waxaa muhiim ah in halkan laga taxaddaro: nooca xogta ee subquery waa inuu la mid yahay kan goobta aan ku shaandheyneyno, haddii kale qorsheeyaha ayaa go'aamin doona in maadaama noocyada kala duwan yihiin, waxaa lagama maarmaan ah in marka hore la helo dhammaan. xogta oo u shaandhee gudaha.

Aan ku soo celino shaandhada taariikhda qiimihiisii ​​asalka ahaa.

Freddy vs. Jsonb

Guud ahaan, beeraha Boolean iyo taariikhaha ayaa durba si ku filan u dedejiyay weydiintayada, laakiin waxaa haray nooc kale oo xog ah. Dagaalka lagula jiro shaandhaynta isaga, si daacad ah, weli ma dhammaan, in kasta oo ay jiraan guul halkan sidoo kale. Markaa, sidan ayaan ugu suurtagashay in aan ku dhaafno shaandhada jsonb garoonka fog ee server-ka.

explain analyze verbose
SELECT count(1)
FROM fdw_schema.table 
WHERE is_active is True
AND created_dt BETWEEN CURRENT_DATE - INTERVAL '7 month' 
AND CURRENT_DATE - INTERVAL '6 month'
AND meta @> '{"source":"test"}'::jsonb;

Aggregate  (cost=245463.60..245463.61 rows=1 width=8) (actual time=6727.589..6727.590 rows=1 loops=1)
  Output: count(1)
  ->  Foreign Scan on fdw_schema."table"  (cost=1100.00..245459.90 rows=1478 width=0) (actual time=16.213..6634.794 rows=1360025 loops=1)
        Output: "table".id, "table".is_active, "table".meta, "table".created_dt
        Filter: (("table".is_active IS TRUE) AND ("table".created_dt >= (('now'::cstring)::date - '7 mons'::interval)) AND ("table".created_dt <= ((('now'::cstring)::date)::timestamp with time zone - '6 mons'::interval)))
        Rows Removed by Filter: 619961
        Remote SQL: SELECT created_dt, is_active FROM fdw_schema.table WHERE ((meta @> '{"source": "test"}'::jsonb))
Planning time: 0.747 ms
Execution time: 6727.815 ms

Halkii aad ka shaandhayn lahayd hawlwadeenada, waa inaad isticmaashaa joogitaanka hal hawlwadeen jsonb si ka duwan. 7 ilbiriqsi halkii asalka ahaa 29. Ilaa hadda tani waa doorashada kaliya ee lagu guuleysto ee gudbinta filtarrada via jsonb server-ka fog, laakiin halkan waxaa muhiim ah in la tixgeliyo hal xaddidaad: waxaan isticmaaleynaa nooca 9.6 ee database-ka, laakiin dhammaadka Abriil waxaan qorsheyneynaa inaan dhamaystirno imtixaanadii ugu dambeeyay oo aan u gudubno nooca 12. Marka aan cusbooneysiinno, waxaan wax ka qori doonaa sida ay u saameysay, sababtoo ah waxaa jira isbedelo badan oo rajo badan leh: json_path, dabeecad cusub oo CTE ah, hoos u riix (jira tan iyo nooca 10). Runtii waxaan rabaa inaan dhawaan tijaabiyo

dhame isaga

Waxaan tijaabinay sida isbeddel kastaa u saameeyay xawaaraha codsiga shakhsi ahaan. Aynu hadda aragno waxa dhacaya marka dhammaan saddexda filtarrada si sax ah loo qoro.

explain analyze verbose
SELECT count(1)
FROM fdw_schema.table 
WHERE is_active = True
AND created_dt >= (SELECT CURRENT_DATE::timestamptz - INTERVAL '7 month') 
AND created_dt <(SELECT CURRENT_DATE::timestamptz - INTERVAL '6 month')
AND meta @> '{"source":"test"}'::jsonb;

Aggregate  (cost=322041.51..322041.52 rows=1 width=8) (actual time=2278.867..2278.867 rows=1 loops=1)
  Output: count(1)
  InitPlan 1 (returns $0)
    ->  Result  (cost=0.00..0.02 rows=1 width=8) (actual time=0.010..0.010 rows=1 loops=1)
          Output: ((('now'::cstring)::date)::timestamp with time zone - '7 mons'::interval)
  InitPlan 2 (returns $1)
    ->  Result  (cost=0.00..0.02 rows=1 width=8) (actual time=0.003..0.003 rows=1 loops=1)
          Output: ((('now'::cstring)::date)::timestamp with time zone - '6 mons'::interval)
  ->  Foreign Scan on fdw_schema."table"  (cost=100.02..322041.41 rows=25 width=0) (actual time=8.597..2153.809 rows=1360025 loops=1)
        Output: "table".id, "table".is_active, "table".meta, "table".created_dt
        Remote SQL: SELECT NULL FROM fdw_schema.table WHERE (is_active) AND ((created_dt >= $1::timestamp with time zone)) AND ((created_dt < $2::timestamp with time zone)) AND ((meta @> '{"source": "test"}'::jsonb))
Planning time: 0.820 ms
Execution time: 2279.087 ms

Haa, codsigu wuxuu u muuqdaa mid aad u adag, tani waa lacag khasab ah, laakiin xawaaraha fulinta waa 2 ilbiriqsi, taas oo ka badan 10 jeer! Oo waxaanu ka hadlaynaa su'aal fudud oo ka dhan ah xog yar oo kooban. Codsiyada dhabta ah, waxaan helnay koror ilaa dhowr boqol oo jeer ah.

Si loo soo koobo: haddii aad isticmaasho PostgreSQL oo leh FDW, had iyo jeer hubi in dhammaan filtarrada loo diro server-ka fog, waadna farxi doontaa ... Laakiin taasi waa sheeko maqaal kale.

Waad ku mahadsan tahay dareenkaaga! Waxaan jeclaan lahaa inaan maqlo su'aalaha, faallooyinka, iyo sheekooyinka ku saabsan khibradahaaga faallooyinka.

Source: www.habr.com

Add a comment