Ma analytics ogwirira ntchito muzomangamanga za microservice: thandizani ndikulimbikitsa Postgres FDW

Zomangamanga za Microservice, monga chilichonse padziko lapansi, zili ndi zabwino ndi zoyipa zake. Njira zina zimakhala zosavuta nazo, zina zimakhala zovuta. Ndipo chifukwa cha liwiro la kusintha ndi scalability bwino, muyenera kudzimana. Chimodzi mwa izo ndizovuta za analytics. Ngati mu monolith ma analytics onse ogwirira ntchito amatha kuchepetsedwa kukhala mafunso a SQL kukhala ma analytical replica, ndiye muzomangamanga zambiri ntchito iliyonse ili ndi database yake ndipo zikuwoneka kuti funso limodzi silokwanira (kapena mwina litero?). Kwa iwo omwe ali ndi chidwi ndi momwe tathetsera vuto la analytics yogwira ntchito mu kampani yathu komanso momwe tinaphunzirira kukhala ndi yankho ili - kulandiridwa.

Ma analytics ogwirira ntchito muzomangamanga za microservice: thandizani ndikulimbikitsa Postgres FDW
Dzina langa ndi Pavel Sivash, ku DomClick ndimagwira ntchito mu gulu lomwe liri ndi udindo wosamalira malo osungiramo deta. MwachizoloΕ΅ezi, zochita zathu zimatha kukhala chifukwa cha uinjiniya wa data, koma, kwenikweni, kuchuluka kwa ntchito ndikokulirapo. Pali uinjiniya wa data ETL / ELT, kuthandizira ndikusintha zida zowunikira deta komanso kupanga zida zawo. Makamaka, popereka malipoti ogwirira ntchito, tinaganiza "kunyengezera" kuti tili ndi monolith ndikupatsa akatswiri ofufuza deta imodzi yomwe idzakhala ndi zonse zomwe akufunikira.

Mwambiri, tidakambirana njira zosiyanasiyana. Zinali zotheka kumanga malo osungiramo zinthu zonse - tinayesera, koma, kunena zoona, sitinathe kupanga mabwenzi ndi kusintha pafupipafupi pamalingaliro ndi njira yochepetsetsa yomanga malo ndikusintha ( ngati wina wapambana, lembani mu ndemanga momwe). Mutha kunena kwa akatswiri kuti: "Anyamata, phunzirani python ndikupita ku mizere yowunikira," koma ichi ndi chofunikira chowonjezera cholembera anthu, ndipo zikuwoneka kuti izi ziyenera kupewedwa ngati n'kotheka. Tidaganiza zoyesa kugwiritsa ntchito ukadaulo wa FDW (Foreign Data Wrapper): kwenikweni, iyi ndi dblink yokhazikika, yomwe ili mulingo wa SQL, koma mawonekedwe ake osavuta. Pamaziko ake, tinapanga chisankho, chomwe pamapeto pake chinazika mizu, tinakhazikika pa icho. Tsatanetsatane wake ndi mutu wankhani yosiyana, ndipo mwina yopitilira imodzi, chifukwa ndikufuna kuyankhula zambiri: kuchokera kumalumikizidwe a schema ya database kuti mupeze kuwongolera ndi kusokoneza deta yanu. Tiyeneranso kukumbukira kuti yankho ili silinalowe m'malo mwa nkhokwe zenizeni zowunikira ndi zosungira, zimangothetsa vuto linalake.

Pa mlingo wapamwamba zikuwoneka motere:

Ma analytics ogwirira ntchito muzomangamanga za microservice: thandizani ndikulimbikitsa Postgres FDW
Pali database ya PostgreSQL komwe ogwiritsa ntchito amatha kusunga deta yawo yantchito, ndipo koposa zonse, zofananira zowunikira za mautumiki onse zimalumikizidwa ndi database iyi kudzera pa FDW. Izi zimapangitsa kuti zikhale zotheka kulemba funso kuzinthu zingapo, ndipo ziribe kanthu kuti ndi chiyani: PostgreSQL, MySQL, MongoDB kapena china (fayilo, API, ngati mwadzidzidzi palibe wrapper yoyenera, mukhoza kulemba nokha). Chabwino, zonse zikuwoneka bwino! Kuthetsa chibwenzi?

Ngati chirichonse chinatha mofulumira komanso mophweka, ndiye, mwinamwake, nkhaniyo sikanakhalapo.

Ndikofunika kumveketsa bwino momwe postgres imagwirira ntchito zopempha kwa ma seva akutali. Izi zikuwoneka zomveka, koma nthawi zambiri anthu salabadira: ma postgres amagawa funsolo m'magawo omwe amachitidwa paokha pa ma seva akutali, amasonkhanitsa deta iyi, ndikuwerengera komaliza, kotero kuti liwiro la mafunso lidzadalira kwambiri kwalembedwa. Tiyeneranso kuzindikira: pamene deta imachokera ku seva yakutali, alibenso ma index, palibe chomwe chingathandize wokonza ndondomeko, choncho, ndife tokha titha kuthandizira ndikuzifotokozera. Ndipo ndi zomwe ndikufuna kunena mwatsatanetsatane.

Pempho losavuta ndi dongosolo nalo

Kuwonetsa momwe Postgres amafunsira tebulo la mizere 6 miliyoni pa seva yakutali, tiyeni tiwone dongosolo losavuta.

explain analyze verbose  
SELECT count(1)
FROM fdw_schema.table;

Aggregate  (cost=418383.23..418383.24 rows=1 width=8) (actual time=3857.198..3857.198 rows=1 loops=1)
  Output: count(1)
  ->  Foreign Scan on fdw_schema."table"  (cost=100.00..402376.14 rows=6402838 width=0) (actual time=4.874..3256.511 rows=6406868 loops=1)
        Output: "table".id, "table".is_active, "table".meta, "table".created_dt
        Remote SQL: SELECT NULL FROM fdw_schema.table
Planning time: 0.986 ms
Execution time: 3857.436 ms

Kugwiritsa ntchito mawu a VERBOSE kukulolani kuti muwone funso lomwe lidzatumizidwa ku seva yakutali ndi zotsatira zake zomwe tidzalandira kuti tipitirize kukonza (chingwe cha RemoteSQL).

Tiyeni tipite patsogolo pang'ono ndikuwonjezera zosefera zingapo pafunso lathu: imodzi ndi boolean munda, m'modzi polowera timestamp pa nthawi ndi imodzi jsonb.

explain analyze verbose
SELECT count(1)
FROM fdw_schema.table 
WHERE is_active is True
AND created_dt BETWEEN CURRENT_DATE - INTERVAL '7 month' 
AND CURRENT_DATE - INTERVAL '6 month'
AND meta->>'source' = 'test';

Aggregate  (cost=577487.69..577487.70 rows=1 width=8) (actual time=27473.818..25473.819 rows=1 loops=1)
  Output: count(1)
  ->  Foreign Scan on fdw_schema."table"  (cost=100.00..577469.21 rows=7390 width=0) (actual time=31.369..25372.466 rows=1360025 loops=1)
        Output: "table".id, "table".is_active, "table".meta, "table".created_dt
        Filter: (("table".is_active IS TRUE) AND (("table".meta ->> 'source'::text) = 'test'::text) AND ("table".created_dt >= (('now'::cstring)::date - '7 mons'::interval)) AND ("table".created_dt <= ((('now'::cstring)::date)::timestamp with time zone - '6 mons'::interval)))
        Rows Removed by Filter: 5046843
        Remote SQL: SELECT created_dt, is_active, meta FROM fdw_schema.table
Planning time: 0.665 ms
Execution time: 27474.118 ms

Apa ndi pamene nthawi yagona, yomwe muyenera kumvetsera polemba mafunso. Zosefera sizinasamutsidwe ku seva yakutali, zomwe zikutanthauza kuti kuti achite, postgres imakoka mizere yonse ya 6 miliyoni kuti isefa kwanuko (mzere wa Zosefera) ndikuphatikiza pambuyo pake. Chinsinsi cha kupambana ndikulemba funso kuti zosefera zitumizidwe kumakina akutali, ndipo timalandira ndikuphatikiza mizere yofunikira yokha.

Izi ndi zabodza

Ndi minda ya boolean, zonse ndi zophweka. Pafunso loyambirira, vuto linali chifukwa cha wogwiritsa ntchito is. Ngati ife m'malo ndi =, ndiye timapeza zotsatirazi:

explain analyze verbose
SELECT count(1)
FROM fdw_schema.table
WHERE is_active = True
AND created_dt BETWEEN CURRENT_DATE - INTERVAL '7 month' 
AND CURRENT_DATE - INTERVAL '6 month'
AND meta->>'source' = 'test';

Aggregate  (cost=508010.14..508010.15 rows=1 width=8) (actual time=19064.314..19064.314 rows=1 loops=1)
  Output: count(1)
  ->  Foreign Scan on fdw_schema."table"  (cost=100.00..507988.44 rows=8679 width=0) (actual time=33.035..18951.278 rows=1360025 loops=1)
        Output: "table".id, "table".is_active, "table".meta, "table".created_dt
        Filter: ((("table".meta ->> 'source'::text) = 'test'::text) AND ("table".created_dt >= (('now'::cstring)::date - '7 mons'::interval)) AND ("table".created_dt <= ((('now'::cstring)::date)::timestamp with time zone - '6 mons'::interval)))
        Rows Removed by Filter: 3567989
        Remote SQL: SELECT created_dt, meta FROM fdw_schema.table WHERE (is_active)
Planning time: 0.834 ms
Execution time: 19064.534 ms

Monga mukuwonera, fyulutayo idawulukira ku seva yakutali, ndipo nthawi yopha idachepetsedwa kuchokera ku 27 mpaka 19 masekondi.

Ndikoyenera kudziwa kuti woyendetsa is wosiyana ndi woyendetsa = yomwe ingagwire ntchito ndi mtengo wa Null. Izo zikutanthauza kuti sizowona mu fyuluta adzasiya makhalidwe False ndi Null, pamene != Zoona adzasiya makhalidwe Onama okha. Choncho, pamene m'malo opareta si muyenera kudutsa zinthu ziwiri zosefera ndi OR woyendetsa, mwachitsanzo, KUTI (col != True) KAPENA (col is null).

Ndi boolean anaganiza, kusuntha. Pakadali pano, tiyeni tibweze fyulutayo ndi mtengo wa boolean ku mawonekedwe ake oyambirira kuti tiganizire mozama zotsatira za zosintha zina.

timestamptz? hz

Nthawi zambiri, nthawi zambiri mumayenera kuyesa momwe mungalembe molondola funso lomwe limakhudza ma seva akutali, kenako ndikuyang'ana kufotokozera chifukwa chake izi zikuchitika. Zochepa kwambiri za izi zitha kupezeka pa intaneti. Chifukwa chake, pakuyesa, tapeza kuti fyuluta yokhazikika imawulukira ku seva yakutali ndi bang, koma tikafuna kukhazikitsa tsikulo mwamphamvu, mwachitsanzo, tsopano () kapena CURRENT_DATE, izi sizichitika. Muchitsanzo chathu, tawonjeza zosefera kuti mgawo wa created_at ukhale ndi data ya mwezi umodzi ndendende wapitawu (KATI PA CURRENT_DATE - INTERVAL '1 months' NDI CURRENT_DATE - INTERVAL '7 mwezi'). Tinatani pamenepa?

explain analyze verbose
SELECT count(1)
FROM fdw_schema.table 
WHERE is_active is True
AND created_dt >= (SELECT CURRENT_DATE::timestamptz - INTERVAL '7 month') 
AND created_dt <(SELECT CURRENT_DATE::timestamptz - INTERVAL '6 month')
AND meta->>'source' = 'test';

Aggregate  (cost=306875.17..306875.18 rows=1 width=8) (actual time=4789.114..4789.115 rows=1 loops=1)
  Output: count(1)
  InitPlan 1 (returns $0)
    ->  Result  (cost=0.00..0.02 rows=1 width=8) (actual time=0.007..0.008 rows=1 loops=1)
          Output: ((('now'::cstring)::date)::timestamp with time zone - '7 mons'::interval)
  InitPlan 2 (returns $1)
    ->  Result  (cost=0.00..0.02 rows=1 width=8) (actual time=0.002..0.002 rows=1 loops=1)
          Output: ((('now'::cstring)::date)::timestamp with time zone - '6 mons'::interval)
  ->  Foreign Scan on fdw_schema."table"  (cost=100.02..306874.86 rows=105 width=0) (actual time=23.475..4681.419 rows=1360025 loops=1)
        Output: "table".id, "table".is_active, "table".meta, "table".created_dt
        Filter: (("table".is_active IS TRUE) AND (("table".meta ->> 'source'::text) = 'test'::text))
        Rows Removed by Filter: 76934
        Remote SQL: SELECT is_active, meta FROM fdw_schema.table WHERE ((created_dt >= $1::timestamp with time zone)) AND ((created_dt < $2::timestamp with time zone))
Planning time: 0.703 ms
Execution time: 4789.379 ms

Tinalimbikitsa wokonza mapulani kuti awerengeretu tsikulo mu subquery ndikudutsa zosintha zomwe zakonzedwa kale ku fyuluta. Ndipo lingaliro ili lidatipatsa zotsatira zabwino, funsoli lidakhala mwachangu kuwirikiza ka 6!

Apanso, ndikofunikira kusamala apa: mtundu wa data mu subquery uyenera kukhala wofanana ndi wamunda womwe timasefera, apo ayi wokonzayo adzasankha kuti popeza mitunduyo ndi yosiyana ndipo ndikofunikira kuti tipeze zonse. deta ndikusefa kwanuko.

Tiyeni tibwezere zosefera ndi tsiku ku mtengo wake woyambirira.

Freddy vs. jsonb

Nthawi zambiri, magawo ndi masiku a boolean adafulumizitsa kale funso lathu, koma panali mtundu winanso wa data. Nkhondo yolimbana nayo, kunena zoona, sinathe, ngakhale pali zopambana panonso. Kotero, umu ndi momwe tinatha kudutsa fyulutayo jsonb kupita ku seva yakutali.

explain analyze verbose
SELECT count(1)
FROM fdw_schema.table 
WHERE is_active is True
AND created_dt BETWEEN CURRENT_DATE - INTERVAL '7 month' 
AND CURRENT_DATE - INTERVAL '6 month'
AND meta @> '{"source":"test"}'::jsonb;

Aggregate  (cost=245463.60..245463.61 rows=1 width=8) (actual time=6727.589..6727.590 rows=1 loops=1)
  Output: count(1)
  ->  Foreign Scan on fdw_schema."table"  (cost=1100.00..245459.90 rows=1478 width=0) (actual time=16.213..6634.794 rows=1360025 loops=1)
        Output: "table".id, "table".is_active, "table".meta, "table".created_dt
        Filter: (("table".is_active IS TRUE) AND ("table".created_dt >= (('now'::cstring)::date - '7 mons'::interval)) AND ("table".created_dt <= ((('now'::cstring)::date)::timestamp with time zone - '6 mons'::interval)))
        Rows Removed by Filter: 619961
        Remote SQL: SELECT created_dt, is_active FROM fdw_schema.table WHERE ((meta @> '{"source": "test"}'::jsonb))
Planning time: 0.747 ms
Execution time: 6727.815 ms

M'malo mosefa oyendetsa, muyenera kugwiritsa ntchito kukhalapo kwa woyendetsa m'modzi. jsonb mu zosiyana. 7 masekondi m'malo choyambirira 29. Pakali pano, iyi ndi njira yokhayo bwino posamutsa Zosefera pa jsonb ku seva yakutali, koma apa ndikofunikira kuganizira malire amodzi: timagwiritsa ntchito mtundu wa 9.6 wa database, koma kumapeto kwa Epulo tikukonzekera kumaliza mayeso omaliza ndikusunthira ku 12. Tikasintha, tidzalemba momwe zidakhudzira, chifukwa pali zosintha zambiri zomwe zili ndi ziyembekezo zambiri: json_path, khalidwe latsopano la CTE, kukankhira pansi (kuchokera ku version 10). Ndikufuna kuyesa posachedwa.

Malizitsani iye

Tidawona momwe kusintha kulikonse kumakhudzira liwiro lafunso payekhapayekha. Tiyeni tsopano tiwone zomwe zimachitika pamene zosefera zonse zitatu zalembedwa molondola.

explain analyze verbose
SELECT count(1)
FROM fdw_schema.table 
WHERE is_active = True
AND created_dt >= (SELECT CURRENT_DATE::timestamptz - INTERVAL '7 month') 
AND created_dt <(SELECT CURRENT_DATE::timestamptz - INTERVAL '6 month')
AND meta @> '{"source":"test"}'::jsonb;

Aggregate  (cost=322041.51..322041.52 rows=1 width=8) (actual time=2278.867..2278.867 rows=1 loops=1)
  Output: count(1)
  InitPlan 1 (returns $0)
    ->  Result  (cost=0.00..0.02 rows=1 width=8) (actual time=0.010..0.010 rows=1 loops=1)
          Output: ((('now'::cstring)::date)::timestamp with time zone - '7 mons'::interval)
  InitPlan 2 (returns $1)
    ->  Result  (cost=0.00..0.02 rows=1 width=8) (actual time=0.003..0.003 rows=1 loops=1)
          Output: ((('now'::cstring)::date)::timestamp with time zone - '6 mons'::interval)
  ->  Foreign Scan on fdw_schema."table"  (cost=100.02..322041.41 rows=25 width=0) (actual time=8.597..2153.809 rows=1360025 loops=1)
        Output: "table".id, "table".is_active, "table".meta, "table".created_dt
        Remote SQL: SELECT NULL FROM fdw_schema.table WHERE (is_active) AND ((created_dt >= $1::timestamp with time zone)) AND ((created_dt < $2::timestamp with time zone)) AND ((meta @> '{"source": "test"}'::jsonb))
Planning time: 0.820 ms
Execution time: 2279.087 ms

Inde, funsoli likuwoneka lovuta kwambiri, ndi mtengo wokakamizika, koma liwiro la kuphedwa ndi masekondi a 2, omwe ndi oposa 10 mofulumira! Ndipo tikukamba za funso losavuta pamagulu ang'onoang'ono a deta. Pa zopempha zenizeni, tinalandira chiwonjezeko mpaka mazana angapo.

Kuti mufotokoze mwachidule: ngati mukugwiritsa ntchito PostgreSQL ndi FDW, nthawi zonse fufuzani ngati zosefera zonse zimatumizidwa ku seva yakutali ndipo mudzakhala okondwa ... Osachepera mpaka mufike kujowina pakati pa matebulo ochokera ku seva zosiyanasiyana. Koma ndi nkhani ya nkhani ina.

Zikomo chifukwa chakumvetsera! Ndikufuna kumva mafunso, ndemanga, ndi nkhani za zomwe mwakumana nazo mu ndemanga.

Source: www.habr.com

Kuwonjezera ndemanga