Mukakhala ndi masikelo a Sber. Kugwiritsa ntchito Ab Initio ndi Hive ndi GreenPlum

Kale, tidakumana ndi funso losankha chida cha ETL chogwirira ntchito ndi Big Data. Yankho la Informatica BDM lomwe linagwiritsidwa ntchito kale silinagwirizane ndi ife chifukwa cha ntchito zochepa. Kugwiritsa ntchito kwake kwachepetsedwa kukhala chimango chokhazikitsa malamulo otumiza spark. Panalibe ma analogue ambiri pamsika omwe, kwenikweni, amatha kugwira ntchito ndi kuchuluka kwa deta yomwe timakumana nayo tsiku lililonse. Pamapeto pake tinasankha Ab Initio. Paziwonetsero zoyendetsa, mankhwalawa adawonetsa kuthamanga kwambiri kwa data. Palibe zambiri zokhudza Ab Initio mu Russian, kotero tinaganiza zolankhula za zomwe takumana nazo pa HabrΓ©.

Ab Initio ili ndi masinthidwe ambiri apamwamba komanso osazolowereka, ma code omwe amatha kukulitsidwa pogwiritsa ntchito chilankhulo chake cha PDL. Kwa bizinesi yaying'ono, chida champhamvu choterocho chikhoza kukhala chochulukirachulukira, ndipo zambiri zomwe zimatha kukhala zodula komanso zosagwiritsidwa ntchito. Koma ngati mulingo wanu uli pafupi ndi Sberov, ndiye kuti Ab Initio ikhoza kukhala yosangalatsa kwa inu.

Zimathandiza bizinesi kuti ipeze chidziwitso padziko lonse lapansi ndikupanga chilengedwe, komanso wopanga mapulogalamu kuti apititse patsogolo luso lake mu ETL, kupititsa patsogolo chidziwitso chake mu chipolopolo, amapereka mwayi wodziwa bwino chinenero cha PDL, amapereka chithunzi chojambula, komanso amathandizira chitukuko. chifukwa cha kuchuluka kwa zigawo zogwira ntchito.

Mu positi iyi ndilankhula za kuthekera kwa Ab Initio ndikupereka mawonekedwe ofananiza a ntchito yake ndi Hive ndi GreenPlum.

  • Kufotokozera za chimango cha MDW ndikugwira ntchito pakusintha kwake kwa GreenPlum
  • Kuyerekeza kwa machitidwe a Ab Initio pakati pa Hive ndi GreenPlum
  • Kugwira ntchito ya Ab Initio yokhala ndi GreenPlum mu Near Real Time mode


Magwiridwe a mankhwalawa ndi ochuluka kwambiri ndipo amafunika nthawi yochuluka kuti aphunzire. Komabe, ndi luso loyenera la ntchito ndi machitidwe oyenera, zotsatira za ndondomeko ya deta ndizodabwitsa kwambiri. Kugwiritsa ntchito Ab Initio kwa wopanga mapulogalamu kumatha kupereka chidwi. Uku ndikutenga kwatsopano pakukula kwa ETL, wosakanizidwa pakati pa malo owoneka bwino ndi chitukuko chotsitsa muchilankhulo chofanana ndi zolemba.

Mabizinesi akupanga chilengedwe chawo ndipo chida ichi chimakhala chothandiza kuposa kale. Ndi Ab Initio, mutha kudzikundikira chidziwitso cha bizinesi yanu yamakono ndikugwiritsa ntchito chidziwitsochi kukulitsa mabizinesi akale ndi kutsegula mabizinesi atsopano. Njira zina za Ab Initio zikuphatikiza malo otukuka a Informatica BDM komanso malo osawoneka bwino a Apache Spark.

Kufotokozera kwa Ab Initio

Ab Initio, monga zida zina za ETL, ndi mndandanda wazogulitsa.

Mukakhala ndi masikelo a Sber. Kugwiritsa ntchito Ab Initio ndi Hive ndi GreenPlum

Ab Initio GDE (Graphical Development Environment) ndi malo opangira mapulogalamu omwe amakonza kusintha kwa deta ndikugwirizanitsa ndi kayendedwe ka deta mu mawonekedwe a mivi. Pankhaniyi, kusintha kotereku kumatchedwa graph:

Mukakhala ndi masikelo a Sber. Kugwiritsa ntchito Ab Initio ndi Hive ndi GreenPlum

Malumikizidwe olowera ndi otulutsa a zigawo zogwira ntchito ndi madoko ndipo amakhala ndi magawo owerengedwa mkati mwa kusintha. Ma graph angapo olumikizidwa ndi kutuluka mu mawonekedwe a mivi mu dongosolo la kuphedwa kwawo amatchedwa dongosolo.

Pali mazana angapo zinchito zigawo zikuluzikulu, amene ndi zambiri. Ambiri a iwo ndi apadera kwambiri. Kuthekera kwakusintha kwakanthawi mu Ab Initio ndikokulirapo kuposa zida zina za ETL. Mwachitsanzo, Join ili ndi zotulutsa zingapo. Kuphatikiza pa zotsatira za kulumikiza ma dataset, mutha kupeza zolemba zotulutsa za dataset zomwe makiyi ake sanathe kulumikizidwa. Mutha kupezanso zokana, zolakwika ndi chipika chakusintha, zomwe zitha kuwerengedwa pamndandanda womwewo ngati fayilo yolemba ndikusinthidwa ndikusintha kwina:

Mukakhala ndi masikelo a Sber. Kugwiritsa ntchito Ab Initio ndi Hive ndi GreenPlum

Kapena, mwachitsanzo, mutha kuvala wolandila data ngati tebulo ndikuwerenga zomwe zili mugawo lomwelo.

Pali kusintha koyambirira. Mwachitsanzo, kusintha kwa Scan kuli ndi magwiridwe antchito ofanana ndi ntchito zowunikira. Pali masinthidwe okhala ndi mayina odzifotokozera okha: Pangani Deta, Werengani Excel, Normalize, Sanjani mkati mwa Magulu, Thamangani Pulogalamu, Thamangani SQL, Lowani ndi DB, ndi zina zotero. Ma grafu angagwiritse ntchito magawo othamanga, kuphatikizapo kuthekera kodutsa magawo kuchokera kapena kupita makina opangira . Mafayilo okhala ndi magawo opangidwa okonzeka operekedwa ku graph amatchedwa ma parameter sets (psets).

Monga zikuyembekezeredwa, Ab Initio GDE ili ndi malo ake omwe amatchedwa EME (Enterprise Meta Environment). Madivelopa ali ndi mwayi wogwira ntchito ndi mitundu yam'deralo ya ma code ndikuyang'ana zomwe akupita kumalo osungirako.

Ndizotheka, pakuphedwa kapena mutatha kupanga graph, dinani pamayendedwe aliwonse olumikiza kusinthika ndikuyang'ana zomwe zidadutsa pakati pa masinthidwe awa:

Mukakhala ndi masikelo a Sber. Kugwiritsa ntchito Ab Initio ndi Hive ndi GreenPlum

Ndizothekanso kudina pamtsinje uliwonse ndikuwona zambiri zotsatirira - kuchuluka kwa kufanana komwe kusinthaku kudachitika, mizere ingati ndi ma byte adakwezedwa kuti ndi ziti zofananira:

Mukakhala ndi masikelo a Sber. Kugwiritsa ntchito Ab Initio ndi Hive ndi GreenPlum

N'zotheka kugawanitsa kuchitidwa kwa graph mu magawo ndikuwonetsa kuti kusintha kwina kuyenera kuchitidwa poyamba (mu gawo la zero), otsatirawa mu gawo loyamba, ena mu gawo lachiwiri, ndi zina zotero.

Pakusintha kulikonse, mutha kusankha chomwe chimatchedwa masanjidwe (kumene chidzachitikire): popanda kufanana kapena ulusi wofananira, kuchuluka kwake komwe kungatchulidwe. Panthawi imodzimodziyo, mafayilo osakhalitsa omwe Ab Initio amapanga pamene kusintha kukuyenda akhoza kuikidwa mu seva yamafayilo ndi HDFS.

Pakusintha kulikonse, kutengera template yokhazikika, mutha kupanga zolemba zanu mu PDL, zomwe zimakhala ngati chipolopolo.

Ndi PDL, mutha kukulitsa magwiridwe antchito akusintha ndipo, makamaka, mutha dynamically (panthawi yothamanga) kupanga zidutswa zamakhodi mosagwirizana kutengera magawo a nthawi yothamanga.

Ab Initio ilinso ndi kuphatikiza kopangidwa bwino ndi OS kudzera mu chipolopolo. Makamaka, Sberbank amagwiritsa ntchito linux ksh. Mutha kusinthanitsa zosinthika ndi chipolopolo ndikuzigwiritsa ntchito ngati magawo a graph. Mutha kuyitanitsa kuphedwa kwa ma graph a Ab Initio kuchokera pachipolopolo ndikuwongolera Ab Initio.

Kuphatikiza pa Ab Initio GDE, zinthu zina zambiri zimaphatikizidwa pakubweretsa. Pali Co>Operation System yake yomwe imatchedwa kuti opaleshoni. Pali Control> Center komwe mungakonzekere ndikuwunika kutsitsa. Pali zinthu zopangira chitukuko pamlingo wakale kwambiri kuposa momwe Ab Initio GDE imalola.

Kufotokozera za chimango cha MDW ndikugwira ntchito pakusintha kwake kwa GreenPlum

Pamodzi ndi zogulitsa zake, wogulitsa amapereka chinthu cha MDW (Metadata Driven Warehouse), chomwe ndi chosinthira ma graph chomwe chimapangidwa kuti chithandizire ndi ntchito zanthawi zonse zodzaza malo osungiramo data kapena malo osungira ma data.

Lili ndi zofotokozera za metadata (zodziwika ndi projekiti) ndi makina opanga ma code okonzeka kunja kwa bokosilo.

Mukakhala ndi masikelo a Sber. Kugwiritsa ntchito Ab Initio ndi Hive ndi GreenPlum
Monga cholowetsa, MDW imalandira chitsanzo cha data, fayilo yokonzekera kukhazikitsa kulumikiza ku database (Oracle, Teradata kapena Hive) ndi zina. Gawo la polojekitiyi, mwachitsanzo, limatumiza chitsanzo ku database. Gawo la kunja kwa bokosi la mankhwala limapanga ma graph ndi mafayilo osinthika kwa iwo pokweza deta mu matebulo achitsanzo. Pankhaniyi, ma graph (ndi ma psets) amapangidwira njira zingapo zoyambira ndikuwonjezera ntchito pakukonzanso mabungwe.

Pankhani ya Hive ndi RDBMS, ma graph osiyanasiyana amapangidwa kuti ayambitse ndikuwonjezera zosintha za data.

Pankhani ya Hive, zomwe zikubwera za delta zimalumikizidwa kudzera pa Ab Initio Join ndi zomwe zinali patebulo zisanachitike. Zosungira deta mu MDW (zonse mu Hive ndi RDBMS) sizimangoyika zatsopano kuchokera ku delta, komanso kutseka nthawi yokhudzana ndi deta yomwe makiyi ake oyambirira adalandira delta. Kuphatikiza apo, muyenera kulembanso gawo losasinthika la data. Koma izi ziyenera kuchitika chifukwa Hive ilibe zochotsa kapena zosintha.

Mukakhala ndi masikelo a Sber. Kugwiritsa ntchito Ab Initio ndi Hive ndi GreenPlum

Pankhani ya RDBMS, ma grafu osinthira deta yowonjezereka amawoneka bwino kwambiri, chifukwa RDBMS ili ndi luso losintha.

Mukakhala ndi masikelo a Sber. Kugwiritsa ntchito Ab Initio ndi Hive ndi GreenPlum

Delta yolandilidwa imakwezedwa patebulo lapakati mu database. Pambuyo pa izi, delta imalumikizidwa ndi deta yomwe inali patebulo musanasinthe. Ndipo izi zimachitika pogwiritsa ntchito SQL pogwiritsa ntchito funso la SQL lopangidwa. Chotsatira, pogwiritsa ntchito malamulo a SQL chotsani + lowetsani, deta yatsopano kuchokera ku delta imayikidwa mu tebulo lazomwe mukufuna ndipo nthawi yokhudzana ndi deta yomwe makiyi ake oyambirira adalandira delta amatsekedwa.
Palibe chifukwa cholemberanso deta yosasinthika.

Chifukwa chake tidafika pomaliza kuti pankhani ya Hive, MDW iyenera kupita kukalembanso tebulo lonse chifukwa Hive ilibe ntchito yosinthira. Ndipo palibe chabwino kuposa kulembanso deta pamene kusinthidwa kwapangidwa. Pankhani ya RDBMS, m'malo mwake, omwe amapanga zinthuzo adapeza kuti ndikofunikira kuyika kulumikizana ndikusintha matebulo kuti agwiritse ntchito SQL.

Kwa projekiti ku Sberbank, tapanga kukhazikitsa kwatsopano, kogwiritsiridwa ntchitonso kwa chojambulira database cha GreenPlum. Izi zidachitika potengera mtundu womwe MDW imapangira Teradata. Inali Teradata, osati Oracle, yomwe idabwera pafupi kwambiri ndi izi, chifukwa ... ndi dongosolo la MPP. Njira zogwirira ntchito, komanso ma syntax, a Teradata ndi GreenPlum adakhala ofanana.

Zitsanzo za kusiyana kwakukulu kwa MDW pakati pa ma RDBMS osiyanasiyana ndi motere. Ku GreenPlum, mosiyana ndi Teradata, popanga matebulo muyenera kulemba ndime

distributed by

Teradata analemba kuti:

delete <table> all

, ndipo mu GreenPlum amalemba

delete from <table>

Ku Oracle, amalemba kuti akwaniritse zolinga zake

delete from t where rowid in (<соСдинСниС t с Π΄Π΅Π»ΡŒΡ‚ΠΎΠΉ>)

, ndipo Teradata ndi GreenPlum amalemba

delete from t where exists (select * from delta where delta.pk=t.pk)

Tikuwonanso kuti kuti Ab Initio agwire ntchito ndi GreenPlum, kunali koyenera kukhazikitsa kasitomala wa GreenPlum pamanode onse a cluster ya Ab Initio. Izi ndichifukwa choti tidalumikizana ndi GreenPlum nthawi imodzi kuchokera kumagulu onse amgulu lathu. Ndipo kuti kuwerenga kuchokera ku GreenPlum kukhale kofanana ndi ulusi uliwonse wofanana wa Ab Initio kuti uwerenge gawo lake la data kuchokera ku GreenPlum, tidayenera kuyika zomanga zomwe Ab Initio amamvetsetsa mu gawo la "kumene" la mafunso a SQL.

where ABLOCAL()

ndikuwona kufunika kwa zomangamanga izi pofotokoza zowerengera za parameter kuchokera ku database yosinthika

ablocal_expr=Β«string_concat("mod(t.", string_filter_out("{$TABLE_KEY}","{}"), ",", (decimal(3))(number_of_partitions()),")=", (decimal(3))(this_partition()))Β»

, zomwe zimaphatikizana ndi zina

mod(sk,10)=3

,ndi. muyenera kuyambitsa GreenPlum ndi fyuluta yowonekera pagawo lililonse. Kwa nkhokwe zina (Teradata, Oracle), Ab Initio imatha kufananiza izi zokha.

Kuyerekeza kwa machitidwe a Ab Initio pakati pa Hive ndi GreenPlum

Sberbank idachita kuyesa kuyerekeza magwiridwe antchito a ma graph opangidwa ndi MDW molingana ndi Hive komanso mogwirizana ndi GreenPlum. Monga gawo la kuyesa, pankhani ya Hive panali ma node 5 pagulu lomwelo la Ab Initio, ndipo pankhani ya GreenPlum panali ma node 4 pagulu lapadera. Iwo. Hive anali ndi mwayi wa Hardware kuposa GreenPlum.

Tidakambirana ma graph awiri omwe amagwira ntchito yofanana yosinthira deta mu Hive ndi GreenPlum. Nthawi yomweyo, ma graph opangidwa ndi MDW configurator adayambitsidwa:

  • katundu woyamba + kuchuluka kwa data yopangidwa mwachisawawa mu tebulo la Hive
  • katundu woyamba + wochulukira wa data yopangidwa mwachisawawa mu tebulo lomwelo la GreenPlum

M'magawo onse awiriwa (Hive ndi GreenPlum) adatsitsa zotsitsa ku ulusi wofanana 10 pagulu lomwelo la Ab Initio. Ab Initio adasunga data yapakatikati yowerengera mu HDFS (malinga ndi Ab Initio, mawonekedwe a MFS pogwiritsa ntchito HDFS adagwiritsidwa ntchito). Mzere umodzi wa deta yopangidwa mwachisawawa unatenga ma byte 200 m'zochitika zonsezi.

Zotsatira zake zinali motere:

Mng'oma:

Kutsitsa koyamba mu Hive

Mizere yoyikidwa
6 000 000
60 000 000
600 000 000

Nthawi yoyambira
kutsitsa mumasekondi
41
203
1 601

Kuchulukitsa kowonjezera mu Hive

Chiwerengero cha mizere yomwe ilipo
tebulo lolunjika kumayambiriro kwa kuyesa
6 000 000
60 000 000
600 000 000

Chiwerengero cha mizere ya delta yomwe imagwiritsidwa ntchito
chandamale tebulo pa kuyesera
6 000 000
6 000 000
6 000 000

Nthawi yowonjezereka
kutsitsa mumasekondi
88
299
2 541

GreenPlum:

Kutsitsa koyamba ku GreenPlum

Mizere yoyikidwa
6 000 000
60 000 000
600 000 000

Nthawi yoyambira
kutsitsa mumasekondi
72
360
3 631

Kukweza kowonjezera mu GreenPlum

Chiwerengero cha mizere yomwe ilipo
tebulo lolunjika kumayambiriro kwa kuyesa
6 000 000
60 000 000
600 000 000

Chiwerengero cha mizere ya delta yomwe imagwiritsidwa ntchito
chandamale tebulo pa kuyesera
6 000 000
6 000 000
6 000 000

Nthawi yowonjezereka
kutsitsa mumasekondi
159
199
321

Tikuwona kuti liwiro lotsitsa koyamba mu Hive ndi GreenPlum limadalira kuchuluka kwa deta ndipo, pazifukwa za hardware yabwinoko, imathamanga pang'ono kwa Hive kuposa GreenPlum.

Kuchulukitsidwa kochulukira mu Hive kumatengeranso kuchuluka kwa zomwe zidalowetsedwa kale zomwe zikupezeka patebulo lomwe mukufuna ndipo kumayenda pang'onopang'ono pomwe voliyumu ikukula. Izi zimayamba chifukwa chofuna kulembanso tebulo lazomwe mukufuna. Izi zikutanthauza kuti kugwiritsa ntchito zosintha zazing'ono pamagome akulu sikwabwino kwa Hive.

Kukweza kochulukira mu GreenPlum mofooka kumadalira kuchuluka kwa zomwe zidatsitsidwa kale zomwe zikupezeka patebulo lomwe mukufuna ndipo zimayenda mwachangu. Izi zidachitika chifukwa cha SQL Joins ndi mapangidwe a GreenPlum, omwe amalola kufufuta.

Chifukwa chake, GreenPlum imawonjezera delta pogwiritsa ntchito njira yochotsa +, koma Hive ilibe ntchito zochotsa kapena zosintha, chifukwa chake gulu lonse la data lidakakamizidwa kulembedwanso kwathunthu pakuwonjezera kowonjezera. Kuyerekeza kwa ma cell omwe akuwunikiridwa molimba mtima kumawonekera kwambiri, chifukwa kumagwirizana ndi njira yodziwika bwino yotsitsa pogwiritsa ntchito zida. Tikuwona kuti GreenPlum idamenya Hive pamayeso awa nthawi 8.

Kugwira ntchito ya Ab Initio yokhala ndi GreenPlum mu Near Real Time mode

Pakuyesaku, tiyesa kuthekera kwa Ab Initio kusinthira tebulo la GreenPlum ndi magawo opangidwa mwachisawawa pafupifupi nthawi yeniyeni. Tiyeni tilingalire tebulo la GreenPlum dev42_1_db_usl.TESTING_SUBJ_org_finval, lomwe tigwiritse ntchito.

Tidzagwiritsa ntchito ma graph atatu a Ab Initio kuti tigwire nawo ntchito:

1) Graph Create_test_data.mp - imapanga mafayilo a data mu HDFS ndi mizere 10 mu ulusi 6 wofanana. Deta ndi yachisawawa, kapangidwe kake kakonzedwa kuti kalowe mu tebulo lathu

Mukakhala ndi masikelo a Sber. Kugwiritsa ntchito Ab Initio ndi Hive ndi GreenPlum

Mukakhala ndi masikelo a Sber. Kugwiritsa ntchito Ab Initio ndi Hive ndi GreenPlum

2) Graph mdw_load.day_one.current.dev42_1_db_usl_testing_subj_org_finval.pset - MDW yopangidwa ndi graph poyambitsa kuyika kwa deta mu tebulo lathu mu ulusi wofanana 10 (deta yoyesera yopangidwa ndi graph (1) imagwiritsidwa ntchito)

Mukakhala ndi masikelo a Sber. Kugwiritsa ntchito Ab Initio ndi Hive ndi GreenPlum

3) Graph mdw_load.regular.current.dev42_1_db_usl_testing_subj_org_finval.pset - chithunzi chopangidwa ndi MDW kuti chiwonjezeke patebulo lathu mu ulusi wofanana wa 10 pogwiritsa ntchito gawo la deta (delta) yopangidwa ndi graph (1)

Mukakhala ndi masikelo a Sber. Kugwiritsa ntchito Ab Initio ndi Hive ndi GreenPlum

Tiyeni tiyendetse zolemba pansipa munjira ya NRT:

  • kupanga mizere yoyesa 6
  • pangani katundu woyamba ikani mizere yoyesa 6 patebulo lopanda kanthu
  • bwerezani kutsitsa kowonjezera nthawi 5
    • kupanga mizere yoyesa 6
    • lowetsani mizere yoyeserera 6 patebulo (panthawiyi, nthawi yomaliza yovomerezeka_to_ts imayikidwa ku data yakale ndipo data yaposachedwa kwambiri yokhala ndi kiyi yoyambirira yomweyi imayikidwa)

Izi zimatengera momwe amagwirira ntchito zenizeni zamabizinesi ena - gawo lalikulu lazatsopano limawonekera munthawi yeniyeni ndipo limatsanulidwa ku GreenPlum.

Tsopano tiyeni tiwone zolemba za script:

Yambani Create_test_data.input.pset pa 2020-06-04 11:49:11
Malizitsani Create_test_data.input.pset pa 2020-06-04 11:49:37
Kuyambira mdw_load.day_one.current.dev42_1_db_usl_testing_subj_org_finval.pset pa 2020-06-04 11:49:37
Malizitsani mdw_load.day_one.current.dev42_1_db_usl_testing_subj_org_finval.pset pa 2020-06-04 11:50:42
Yambani Create_test_data.input.pset pa 2020-06-04 11:50:42
Malizitsani Create_test_data.input.pset pa 2020-06-04 11:51:06
Kuyambira mdw_load.regular.current.dev42_1_db_usl_testing_subj_org_finval.pset pa 2020-06-04 11:51:06
Malizitsani mdw_load.regular.current.dev42_1_db_usl_testing_subj_org_finval.pset pa 2020-06-04 11:53:41
Yambani Create_test_data.input.pset pa 2020-06-04 11:53:41
Malizitsani Create_test_data.input.pset pa 2020-06-04 11:54:04
Kuyambira mdw_load.regular.current.dev42_1_db_usl_testing_subj_org_finval.pset pa 2020-06-04 11:54:04
Malizitsani mdw_load.regular.current.dev42_1_db_usl_testing_subj_org_finval.pset pa 2020-06-04 11:56:51
Yambani Create_test_data.input.pset pa 2020-06-04 11:56:51
Malizitsani Create_test_data.input.pset pa 2020-06-04 11:57:14
Kuyambira mdw_load.regular.current.dev42_1_db_usl_testing_subj_org_finval.pset pa 2020-06-04 11:57:14
Malizitsani mdw_load.regular.current.dev42_1_db_usl_testing_subj_org_finval.pset pa 2020-06-04 11:59:55
Yambani Create_test_data.input.pset pa 2020-06-04 11:59:55
Malizitsani Create_test_data.input.pset pa 2020-06-04 12:00:23
Kuyambira mdw_load.regular.current.dev42_1_db_usl_testing_subj_org_finval.pset pa 2020-06-04 12:00:23
Malizitsani mdw_load.regular.current.dev42_1_db_usl_testing_subj_org_finval.pset pa 2020-06-04 12:03:23
Yambani Create_test_data.input.pset pa 2020-06-04 12:03:23
Malizitsani Create_test_data.input.pset pa 2020-06-04 12:03:49
Kuyambira mdw_load.regular.current.dev42_1_db_usl_testing_subj_org_finval.pset pa 2020-06-04 12:03:49
Malizitsani mdw_load.regular.current.dev42_1_db_usl_testing_subj_org_finval.pset pa 2020-06-04 12:06:46

Zimakhala chithunzi ichi:

Zithunzi
Yambani nthawi
Kutsiriza nthawi
utali

Create_test_data.input.pset
04.06.2020: 11: 49: 11
04.06.2020: 11: 49: 37
00:00:26

mdw_load.day_one.current.
dev42_1_db_usl_testing_subj_org_finval.pset
04.06.2020: 11: 49: 37
04.06.2020: 11: 50: 42
00:01:05

Create_test_data.input.pset
04.06.2020: 11: 50: 42
04.06.2020: 11: 51: 06
00:00:24

mdw_load.regular.current.
dev42_1_db_usl_testing_subj_org_finval.pset
04.06.2020: 11: 51: 06
04.06.2020: 11: 53: 41
00:02:35

Create_test_data.input.pset
04.06.2020: 11: 53: 41
04.06.2020: 11: 54: 04
00:00:23

mdw_load.regular.current.
dev42_1_db_usl_testing_subj_org_finval.pset
04.06.2020: 11: 54: 04
04.06.2020: 11: 56: 51
00:02:47

Create_test_data.input.pset
04.06.2020: 11: 56: 51
04.06.2020: 11: 57: 14
00:00:23

mdw_load.regular.current.
dev42_1_db_usl_testing_subj_org_finval.pset
04.06.2020: 11: 57: 14
04.06.2020: 11: 59: 55
00:02:41

Create_test_data.input.pset
04.06.2020: 11: 59: 55
04.06.2020: 12: 00: 23
00:00:28

mdw_load.regular.current.
dev42_1_db_usl_testing_subj_org_finval.pset
04.06.2020: 12: 00: 23
04.06.2020: 12: 03: 23
00:03:00

Create_test_data.input.pset
04.06.2020: 12: 03: 23
04.06.2020: 12: 03: 49
00:00:26

mdw_load.regular.current.
dev42_1_db_usl_testing_subj_org_finval.pset
04.06.2020: 12: 03: 49
04.06.2020: 12: 06: 46
00:02:57

Tikuwona kuti mizere yowonjezereka ya 6 imakonzedwa mumphindi zitatu, zomwe zimathamanga kwambiri.
Deta yomwe ili mu tebulo lofuna kutsata idagawidwa motere:

select valid_from_ts, valid_to_ts, count(1), min(sk), max(sk) from dev42_1_db_usl.TESTING_SUBJ_org_finval group by valid_from_ts, valid_to_ts order by 1,2;

Mukakhala ndi masikelo a Sber. Kugwiritsa ntchito Ab Initio ndi Hive ndi GreenPlum
Mutha kuwona kulemberana kwa data yomwe idayikidwako nthawi yomwe ma graph adakhazikitsidwa.
Izi zikutanthauza kuti mutha kuyika zambiri mu GreenPlum mu Ab Initio ndi ma frequency apamwamba kwambiri ndikuwona kuthamanga kwambiri pakuyika deta mu GreenPlum. Zoonadi, sikungatheke kuyambitsa kamodzi pa sekondi, popeza Ab Initio, monga chida chilichonse cha ETL, chimafuna nthawi "yoyamba" ikakhazikitsidwa.

Pomaliza

Ab Initio pakali pano ikugwiritsidwa ntchito ku Sberbank kupanga Unified Semantic Data Layer (ESS). Ntchitoyi ikuphatikiza kupanga mtundu umodzi wamakampani osiyanasiyana amabanki. Zambiri zimachokera kuzinthu zosiyanasiyana, zofananira zomwe zimakonzedwa pa Hadoop. Malingana ndi zosowa za bizinesi, chitsanzo cha deta chimakonzedwa ndipo kusintha kwa deta kumafotokozedwa. Ab Initio imanyamula zambiri mu ESN ndipo zomwe zidatsitsidwa sizongosangalatsa bizinesi yokha, komanso imagwira ntchito ngati gwero lopangira ma data. Nthawi yomweyo, magwiridwe antchito amakulolani kugwiritsa ntchito machitidwe osiyanasiyana monga wolandila (Mng'oma, Greenplum, Teradata, Oracle), zomwe zimapangitsa kuti zitheke kukonzekera deta yabizinesi mumitundu yosiyanasiyana yomwe imafunikira.

Kuthekera kwa Ab Initio ndikwambiri; mwachitsanzo, chimango chophatikizidwa cha MDW chimatheketsa kupanga mbiri yaukadaulo ndi bizinesi kuchokera m'bokosi. Kwa omanga, Ab Initio imapangitsa kuti asayambitsenso gudumu, koma kuti agwiritse ntchito zida zambiri zomwe zilipo, zomwe ndizofunikira malaibulale pogwira ntchito ndi deta.

Wolembayo ndi katswiri pagulu la akatswiri a Sberbank SberProfi DWH/BigData. Gulu la akatswiri a SberProfi DWH/BigData ali ndi udindo wopanga maluso m'malo monga Hadoop ecosystem, Teradata, Oracle DB, GreenPlum, komanso zida za BI Qlik, SAP BO, Tableau, ndi zina zambiri.

Source: www.habr.com

Kuwonjezera ndemanga