Nazarin aiki a cikin gine-ginen microservice: taimako da saurin Postgres FDW

Gine-gine na Microservice, kamar kowane abu na wannan duniyar, yana da fa'ida da rashin amfaninsa. Wasu matakai sun zama masu sauƙi tare da shi, wasu sun fi wuya. Kuma saboda saurin canji da mafi kyawun scalability, kuna buƙatar yin sadaukarwa. Ɗayan su shine rikitarwa na nazari. Idan a cikin monolith duk analytics na aiki za a iya rage su zuwa tambayoyin SQL zuwa kwafin nazari, to a cikin gine-ginen multiservice kowane sabis yana da nasa bayanan bayanai kuma da alama tambaya ɗaya bai isa ba (ko watakila zai yi?). Ga wadanda ke da sha'awar yadda muka magance matsalar nazarin aiki a cikin kamfaninmu da kuma yadda muka koyi rayuwa tare da wannan bayani - maraba.

Nazarin aiki a cikin gine-ginen microservice: taimako da saurin Postgres FDW
Sunana Pavel Sivash, a DomClick Ina aiki a cikin ƙungiyar da ke da alhakin kula da ajiyar bayanan nazari. A al'ada, ana iya danganta ayyukanmu ga injiniyan bayanai, amma, a zahiri, kewayon ayyuka sun fi fadi. Akwai daidaitattun injiniyan bayanai ETL / ELT, tallafi da daidaita kayan aikin nazarin bayanai da haɓaka kayan aikin nasu. Musamman, don bayar da rahoto na aiki, mun yanke shawarar “yi riya” cewa muna da monolith kuma mu ba manazarta bayanai guda ɗaya wanda zai ƙunshi duk bayanan da suke buƙata.

Gabaɗaya, mun yi la'akari da zaɓuɓɓuka daban-daban. Yana yiwuwa a gina cikakken ma'ajiya - mun ko da kokarin, amma, a gaskiya, ba mu iya yin abokai da fairly akai-akai canje-canje a cikin dabaru tare da wani wajen jinkirin aiwatar da gina ma'ajiyar da kuma yin canje-canje a gare shi ( idan wani ya yi nasara, rubuta a cikin sharhi yadda). Kuna iya ce wa manazarta: “Maza, ku koyi python kuma ku je ga layukan nazari,” amma wannan ƙarin buƙatun daukar ma’aikata ne, kuma da alama ya kamata a guji wannan idan zai yiwu. Mun yanke shawarar gwada amfani da fasahar FDW (Foreign Data Wrapper): a zahiri, wannan daidaitaccen dblink ne, wanda ke cikin ma'auni na SQL, amma tare da mafi sauƙin dubawa. A kan haka ne muka yanke shawara, wanda daga karshe ya samu gindin zama, muka daidaita a kai. Cikakkun bayanansa shine batun wani labarin dabam, kuma watakila fiye da ɗaya, saboda ina so in yi magana game da yawa: daga tsarin aiki tare da tsarin bayanai don samun damar sarrafawa da lalata bayanan sirri. Har ila yau, ya kamata a lura da cewa wannan bayani ba shine maye gurbin ainihin bayanan bayanan bincike da ma'ajin ba, yana magance wata matsala ta musamman.

A matakin farko yana kama da haka:

Nazarin aiki a cikin gine-ginen microservice: taimako da saurin Postgres FDW
Akwai bayanan PostgreSQL inda masu amfani za su iya adana bayanan aikin su, kuma mafi mahimmanci, ana haɗa kwafin duk ayyuka zuwa wannan bayanan ta hanyar FDW. Wannan yana ba da damar rubuta tambaya zuwa ɗakunan bayanai da yawa, kuma ba komai mene ne: PostgreSQL, MySQL, MongoDB ko wani abu dabam (fayil, API, idan ba zato ba tsammani babu abin da ya dace, zaku iya rubuta naku). To, duk abin da alama yana da kyau! Watsewa?

Idan komai ya ƙare da sauri da sauƙi, to, tabbas, labarin ba zai wanzu ba.

Yana da mahimmanci a bayyana game da yadda postgres ke sarrafa buƙatun zuwa sabar mai nisa. Wannan yana da ma'ana, amma sau da yawa mutane ba sa kula da shi: postgres yana raba tambayar zuwa sassa waɗanda aka aiwatar da kansu akan sabar masu nisa, tattara wannan bayanan, kuma suna yin lissafin ƙarshe da kanta, don haka saurin aiwatar da tambayar zai dogara sosai kan yadda ake aiwatar da shi. an rubuta. Har ila yau, ya kamata a lura: lokacin da bayanai suka fito daga uwar garken nesa, ba su da alamun ƙididdiga, babu wani abu da zai taimaka wa mai tsarawa, saboda haka, kawai mu kanmu za mu iya taimakawa kuma mu ba da shawara. Kuma abin da nake so in yi magana akai ke nan.

Buƙatun mai sauƙi da tsari tare da shi

Don nuna yadda Postgres ke tambayar tebirin jere miliyan 6 akan sabar mai nisa, bari mu kalli tsari mai sauƙi.

explain analyze verbose  
SELECT count(1)
FROM fdw_schema.table;

Aggregate  (cost=418383.23..418383.24 rows=1 width=8) (actual time=3857.198..3857.198 rows=1 loops=1)
  Output: count(1)
  ->  Foreign Scan on fdw_schema."table"  (cost=100.00..402376.14 rows=6402838 width=0) (actual time=4.874..3256.511 rows=6406868 loops=1)
        Output: "table".id, "table".is_active, "table".meta, "table".created_dt
        Remote SQL: SELECT NULL FROM fdw_schema.table
Planning time: 0.986 ms
Execution time: 3857.436 ms

Yin amfani da bayanin VERBOSE yana ba ku damar ganin tambayar da za a aika zuwa uwar garken nesa da sakamakon da za mu samu don ƙarin aiki (string RemoteSQL).

Bari mu ci gaba kadan kuma mu ƙara tacewa da yawa zuwa tambayar mu: ɗaya bayan bolan filin, daya ta hanyar shiga timestamp kowane tazara kuma daya bayan jsonb.

explain analyze verbose
SELECT count(1)
FROM fdw_schema.table 
WHERE is_active is True
AND created_dt BETWEEN CURRENT_DATE - INTERVAL '7 month' 
AND CURRENT_DATE - INTERVAL '6 month'
AND meta->>'source' = 'test';

Aggregate  (cost=577487.69..577487.70 rows=1 width=8) (actual time=27473.818..25473.819 rows=1 loops=1)
  Output: count(1)
  ->  Foreign Scan on fdw_schema."table"  (cost=100.00..577469.21 rows=7390 width=0) (actual time=31.369..25372.466 rows=1360025 loops=1)
        Output: "table".id, "table".is_active, "table".meta, "table".created_dt
        Filter: (("table".is_active IS TRUE) AND (("table".meta ->> 'source'::text) = 'test'::text) AND ("table".created_dt >= (('now'::cstring)::date - '7 mons'::interval)) AND ("table".created_dt <= ((('now'::cstring)::date)::timestamp with time zone - '6 mons'::interval)))
        Rows Removed by Filter: 5046843
        Remote SQL: SELECT created_dt, is_active, meta FROM fdw_schema.table
Planning time: 0.665 ms
Execution time: 27474.118 ms

Wannan shine inda lokacin ya ta'allaka, wanda kuke buƙatar kula da lokacin rubuta tambayoyin. Ba a tura masu tacewa zuwa uwar garken nesa ba, wanda ke nufin cewa don aiwatar da shi, postgres yana jan duk layuka miliyan 6 don tacewa a cikin gida (layin Filter) kuma a yi tari daga baya. Makullin nasara shine rubuta tambaya ta yadda za a watsa masu tacewa zuwa na'ura mai nisa, kuma muna karba kuma muna tara layukan da suka dace kawai.

Wannan shine wasu booleanshit

Tare da filayen boolean, komai yana da sauƙi. A cikin ainihin tambayar, matsalar ta samo asali ne daga mai aiki is. Idan muka maye gurbinsa da =, to muna samun sakamako mai zuwa:

explain analyze verbose
SELECT count(1)
FROM fdw_schema.table
WHERE is_active = True
AND created_dt BETWEEN CURRENT_DATE - INTERVAL '7 month' 
AND CURRENT_DATE - INTERVAL '6 month'
AND meta->>'source' = 'test';

Aggregate  (cost=508010.14..508010.15 rows=1 width=8) (actual time=19064.314..19064.314 rows=1 loops=1)
  Output: count(1)
  ->  Foreign Scan on fdw_schema."table"  (cost=100.00..507988.44 rows=8679 width=0) (actual time=33.035..18951.278 rows=1360025 loops=1)
        Output: "table".id, "table".is_active, "table".meta, "table".created_dt
        Filter: ((("table".meta ->> 'source'::text) = 'test'::text) AND ("table".created_dt >= (('now'::cstring)::date - '7 mons'::interval)) AND ("table".created_dt <= ((('now'::cstring)::date)::timestamp with time zone - '6 mons'::interval)))
        Rows Removed by Filter: 3567989
        Remote SQL: SELECT created_dt, meta FROM fdw_schema.table WHERE (is_active)
Planning time: 0.834 ms
Execution time: 19064.534 ms

Kamar yadda kake gani, tacewa ta tashi zuwa uwar garken nesa, kuma an rage lokacin aiwatarwa daga 27 zuwa 19 seconds.

Ya kamata a lura cewa mai aiki is daban da mai aiki = wanda zai iya aiki tare da ƙimar Null. Yana nufin haka ba Gaskiya ba ne a cikin tace zai bar dabi'u Ƙarya da Ƙarya, yayin da != Gaskiya zai bar kawai Karya dabi'u. Saboda haka, lokacin maye gurbin mai aiki ba ya kamata ku wuce sharuɗɗa biyu zuwa tacewa tare da OR, misali, INA (col! = Gaskiya) KO (col ba shi da amfani).

Tare da gano boolean, ci gaba. A halin yanzu, bari mu dawo da tacewa ta ƙimar boolean zuwa sigar ta ta asali domin mu yi la'akari da tasirin wasu canje-canje.

timestamptz? hz

Gabaɗaya, sau da yawa dole ne ku yi gwaji tare da yadda ake rubuta tambaya daidai da ta ƙunshi sabobin nesa, sannan kawai nemi bayanin dalilin da yasa hakan ke faruwa. Ana iya samun bayanai kaɗan akan wannan akan Intanet. Don haka, a cikin gwaje-gwajen, mun gano cewa ƙayyadaddun matatar kwanan wata yana tashi zuwa sabar mai nisa tare da ƙararrawa, amma lokacin da muke son saita kwanan wata a hankali, misali, yanzu() ko CURRENT_DATE, wannan baya faruwa. A cikin misalinmu, mun ƙara tacewa domin ginshiƙin Create_at ya ƙunshi bayanai na tsawon wata 1 daidai a baya (TSAKANIN CURRENT_DATE - INTERVAL '7 month' DA CURRENT_DATE - INTERVAL '6 month'). Me muka yi a wannan harka?

explain analyze verbose
SELECT count(1)
FROM fdw_schema.table 
WHERE is_active is True
AND created_dt >= (SELECT CURRENT_DATE::timestamptz - INTERVAL '7 month') 
AND created_dt <(SELECT CURRENT_DATE::timestamptz - INTERVAL '6 month')
AND meta->>'source' = 'test';

Aggregate  (cost=306875.17..306875.18 rows=1 width=8) (actual time=4789.114..4789.115 rows=1 loops=1)
  Output: count(1)
  InitPlan 1 (returns $0)
    ->  Result  (cost=0.00..0.02 rows=1 width=8) (actual time=0.007..0.008 rows=1 loops=1)
          Output: ((('now'::cstring)::date)::timestamp with time zone - '7 mons'::interval)
  InitPlan 2 (returns $1)
    ->  Result  (cost=0.00..0.02 rows=1 width=8) (actual time=0.002..0.002 rows=1 loops=1)
          Output: ((('now'::cstring)::date)::timestamp with time zone - '6 mons'::interval)
  ->  Foreign Scan on fdw_schema."table"  (cost=100.02..306874.86 rows=105 width=0) (actual time=23.475..4681.419 rows=1360025 loops=1)
        Output: "table".id, "table".is_active, "table".meta, "table".created_dt
        Filter: (("table".is_active IS TRUE) AND (("table".meta ->> 'source'::text) = 'test'::text))
        Rows Removed by Filter: 76934
        Remote SQL: SELECT is_active, meta FROM fdw_schema.table WHERE ((created_dt >= $1::timestamp with time zone)) AND ((created_dt < $2::timestamp with time zone))
Planning time: 0.703 ms
Execution time: 4789.379 ms

Mun sa mai tsarawa ya ƙididdige kwanan wata a gaba a cikin subquery kuma ya wuce canjin da aka riga aka shirya zuwa tacewa. Kuma wannan alamar ta ba mu babban sakamako, tambayar ta zama kusan sau 6 cikin sauri!

Har ila yau, yana da mahimmanci a yi hankali a nan: nau'in bayanan da ke cikin subquery dole ne ya kasance daidai da na filin da muke tacewa, in ba haka ba mai tsarawa zai yanke shawara cewa tun da nau'o'in sun bambanta kuma ya zama dole a fara samun duk abubuwan. data sannan tace a gida.

Bari mu mayar da tace ta kwanan wata zuwa ainihin ƙimarsa.

Freddy vs. jsonb

Gabaɗaya, filayen boolean da kwanan wata sun riga sun haɓaka tambayar mu, amma akwai ƙarin nau'in bayanai guda ɗaya. Yaki da tace dashi, gaskiya, har yanzu bai kare ba, duk da cewa a nan ma ana samun nasarori. To, ga yadda muka yi nasarar wuce tace jsonb filin zuwa uwar garken nesa.

explain analyze verbose
SELECT count(1)
FROM fdw_schema.table 
WHERE is_active is True
AND created_dt BETWEEN CURRENT_DATE - INTERVAL '7 month' 
AND CURRENT_DATE - INTERVAL '6 month'
AND meta @> '{"source":"test"}'::jsonb;

Aggregate  (cost=245463.60..245463.61 rows=1 width=8) (actual time=6727.589..6727.590 rows=1 loops=1)
  Output: count(1)
  ->  Foreign Scan on fdw_schema."table"  (cost=1100.00..245459.90 rows=1478 width=0) (actual time=16.213..6634.794 rows=1360025 loops=1)
        Output: "table".id, "table".is_active, "table".meta, "table".created_dt
        Filter: (("table".is_active IS TRUE) AND ("table".created_dt >= (('now'::cstring)::date - '7 mons'::interval)) AND ("table".created_dt <= ((('now'::cstring)::date)::timestamp with time zone - '6 mons'::interval)))
        Rows Removed by Filter: 619961
        Remote SQL: SELECT created_dt, is_active FROM fdw_schema.table WHERE ((meta @> '{"source": "test"}'::jsonb))
Planning time: 0.747 ms
Execution time: 6727.815 ms

Maimakon tace masu aiki, dole ne ka yi amfani da kasancewar mai aiki ɗaya. jsonb a cikin wani daban. 7 seconds maimakon ainihin 29. Ya zuwa yanzu, wannan shine kawai zaɓi mai nasara don canja wurin tacewa jsonb zuwa uwar garken nesa, amma a nan yana da mahimmanci a yi la'akari da iyakance ɗaya: muna amfani da sigar 9.6 na bayanan bayanai, amma a ƙarshen Afrilu muna shirin kammala gwaje-gwaje na ƙarshe kuma mu matsa zuwa sigar 12. Da zaran mun sabunta, za mu rubuta yadda abin ya shafa, saboda akwai canje-canje da yawa waɗanda ke da bege da yawa: json_path, sabon halayen CTE, tura ƙasa (wanda yake daga sigar 10). Ina so in gwada shi nan ba da jimawa ba.

Kammala shi

Mun duba yadda kowane canji ke shafar saurin tambaya daban-daban. Bari yanzu mu ga abin da zai faru lokacin da aka rubuta duk matattara guda uku daidai.

explain analyze verbose
SELECT count(1)
FROM fdw_schema.table 
WHERE is_active = True
AND created_dt >= (SELECT CURRENT_DATE::timestamptz - INTERVAL '7 month') 
AND created_dt <(SELECT CURRENT_DATE::timestamptz - INTERVAL '6 month')
AND meta @> '{"source":"test"}'::jsonb;

Aggregate  (cost=322041.51..322041.52 rows=1 width=8) (actual time=2278.867..2278.867 rows=1 loops=1)
  Output: count(1)
  InitPlan 1 (returns $0)
    ->  Result  (cost=0.00..0.02 rows=1 width=8) (actual time=0.010..0.010 rows=1 loops=1)
          Output: ((('now'::cstring)::date)::timestamp with time zone - '7 mons'::interval)
  InitPlan 2 (returns $1)
    ->  Result  (cost=0.00..0.02 rows=1 width=8) (actual time=0.003..0.003 rows=1 loops=1)
          Output: ((('now'::cstring)::date)::timestamp with time zone - '6 mons'::interval)
  ->  Foreign Scan on fdw_schema."table"  (cost=100.02..322041.41 rows=25 width=0) (actual time=8.597..2153.809 rows=1360025 loops=1)
        Output: "table".id, "table".is_active, "table".meta, "table".created_dt
        Remote SQL: SELECT NULL FROM fdw_schema.table WHERE (is_active) AND ((created_dt >= $1::timestamp with time zone)) AND ((created_dt < $2::timestamp with time zone)) AND ((meta @> '{"source": "test"}'::jsonb))
Planning time: 0.820 ms
Execution time: 2279.087 ms

Ee, tambayar ta fi rikitarwa, farashin tilas ne, amma saurin aiwatarwa shine 2 seconds, wanda ya fi sau 10 sauri! Kuma muna magana ne game da tambaya mai sauƙi akan ƙaramin saitin bayanai. A kan buƙatun gaske, mun sami karuwa har zuwa sau ɗari da yawa.

Don taƙaita shi: idan kuna amfani da PostgreSQL tare da FDW, koyaushe bincika idan an aika duk masu tacewa zuwa uwar garken nesa kuma za ku yi farin ciki ... Aƙalla har sai kun sami shiga tsakanin tebur daga sabar daban-daban. Amma wannan labari ne don wani labarin.

Na gode da kulawar ku! Ina so in ji tambayoyi, sharhi, da labarai game da abubuwan da kuka samu a cikin sharhin.

source: www.habr.com

Add a comment