Mgbe ị nwere Sber akpịrịkpa. Iji Ab Initio nwere Hive na GreenPlum

Oge ụfọdụ gara aga, anyị chere ajụjụ nke ịhọrọ ngwa ETL maka ịrụ ọrụ na Big Data. Ihe ngwọta Informatica BDM ejiri na mbụ adabaghị anyị n'ihi oke ọrụ. E wedarala ojiji ya ka ọ bụrụ usoro mmalite maka iwepụta iwu-ebunye ọkụ. Enweghị ọtụtụ analogues n'ahịa ndị, n'ụkpụrụ, nwere ike ịrụ ọrụ na oke data anyị na-eme kwa ụbọchị. N'ikpeazụ anyị họọrọ Ab Initio. N'oge ngosi pilot, ngwaahịa ahụ gosipụtara ọsọ nhazi data dị oke elu. Ọ fọrọ nke nta ka ọ bụrụ na ọ nweghị ozi gbasara Ab Initio na Russian, yabụ anyị kpebiri ikwu banyere ahụmịhe anyị na Habré.

Ab Initio nwere ọtụtụ mgbanwe kpochapụwo na nke pụrụ iche, koodu nke enwere ike ịgbatị ya site na iji asụsụ PDL nke ya. Maka obere azụmahịa, ngwá ọrụ dị ike dị otú ahụ nwere ike ịka njọ, na ọtụtụ n'ime ike ya nwere ike ịdị ọnụ ma ghara iji ya. Ma ọ bụrụ na ọnụ ọgụgụ gị dị nso na Sberov, mgbe ahụ Ab Initio nwere ike ịmasị gị.

Ọ na-enyere azụmahịa aka ịkpakọba ihe ọmụma zuru ụwa ọnụ na ịzụlite gburugburu ebe obibi, na onye mmepụta imeziwanye nkà ya na ETL, meziwanye ihe ọmụma ya na shei, na-enye ohere ịmụta asụsụ PDL, na-enye ihe ngosi anya nke usoro ntinye, ma mee ka mmepe dị mfe. n'ihi ụbara ihe ndị na-arụ ọrụ.

Na nke a post m ga-ekwu banyere ike nke Ab Initio na-enye atụnyere àgwà nke ọrụ ya na Hive na GreenPlum.

  • Nkọwa nke usoro MDW ma rụọ ọrụ na nhazi ya maka GreenPlum
  • Ntụle arụmọrụ Ab Initio n'etiti Hive na GreenPlum
  • Na-arụ ọrụ Ab Initio na GreenPlum na nso ezigbo oge mode


Ọrụ nke ngwaahịa a dị oke oke ma na-achọ oge dị ukwuu iji mụọ. Otú ọ dị, site na nkà ọrụ kwesịrị ekwesị na ntọala arụmọrụ ziri ezi, nsonaazụ nke nhazi data dị oke egwu. Iji Ab Initio maka onye nrụpụta nwere ike inye ahụmịhe na-atọ ụtọ. Nke a bụ ihe ọhụrụ na mmepe ETL, ngwakọ dị n'etiti gburugburu a na-ahụ anya na nbudata nbudata n'asụsụ dị ka edemede.

Azụmahịa na-emepe emepe gburugburu ebe obibi ha na ngwá ọrụ a na-aba uru karịa mgbe ọ bụla. Site na Ab Initio, ị nwere ike ịnakọta ihe ọmụma gbasara azụmahịa gị ugbu a ma jiri ihe ọmụma a gbasaa azụmahịa ochie na imepe ọhụrụ. Nhọrọ ndị ọzọ na Ab Initio gụnyere gburugburu mmepe anya Informatica BDM na gburugburu mmepe anaghị ahụ anya Apache Spark.

Nkọwa nke Ab Initio

Ab Initio, dị ka ngwaọrụ ETL ndị ọzọ, bụ nchịkọta ngwaahịa.

Mgbe ị nwere Sber akpịrịkpa. Iji Ab Initio nwere Hive na GreenPlum

Ab Initio GDE (Graphical Development Environment) bụ gburugburu ebe obibi maka onye nrụpụta nke ọ na-ahazi mgbanwe data ma jikọta ha na ntinye data n'ụdị akụ. N'okwu a, a na-akpọ ụdị mgbanwe dị otú ahụ eserese:

Mgbe ị nwere Sber akpịrịkpa. Iji Ab Initio nwere Hive na GreenPlum

Njikọ ntinye na mmepụta nke akụrụngwa na-arụ ọrụ bụ ọdụ ụgbọ mmiri ma nwee mpaghara gbakọrọ n'ime mgbanwe. A na-akpọ ọtụtụ eserese ndị ejikọrọ site na iyi n'ụdị akụ n'usoro e ji gbuo ha atụmatụ.

E nwere ọtụtụ narị components arụ ọrụ, nke bụ ọtụtụ. Ọtụtụ n'ime ha bụ ndị ọkachamara nke ukwuu. Ikike nke mgbanwe kpochapụwo na Ab Initio sara mbara karịa na ngwaọrụ ETL ndị ọzọ. Dịka ọmụmaatụ, Jikọọ nwere ọtụtụ mpụta. Na mgbakwunye na nsonaazụ njikọ datasets, ị nwere ike nweta ndekọ mmepụta nke ntinye dataset nke igodo ya enweghị ike ijikọ. Ị nwekwara ike nweta ajụ, njehie na ndekọ nke ọrụ mgbanwe, nke enwere ike ịgụ n'otu kọlụm dị ka faịlụ ederede ma jiri mgbanwe ndị ọzọ hazie ya:

Mgbe ị nwere Sber akpịrịkpa. Iji Ab Initio nwere Hive na GreenPlum

Ma ọ bụ, dịka ọmụmaatụ, ị nwere ike ịmepụta data nnata n'ụdị tebụl wee gụọ data sitere na ya n'otu kọlụm ahụ.

Enwere mgbanwe mbụ. Dịka ọmụmaatụ, mgbanwe nyocha ahụ nwere ọrụ yiri ọrụ nyocha. Enwere mgbanwe na aha nkọwa nke onwe: Mepụta data, gụọ Excel, Normalize, hazie n'ime otu, Gbaa Mmemme, Gbaa SQL, Jikọọ na DB, wdg. Ihe eserese nwere ike iji paramita oge na-agba ọsọ, gụnyere ohere nke ịgafe paramita site na ma ọ bụ gaa sistemụ arụmọrụ . A na-akpọ faịlụ ndị nwere paramita akwadoro emebere na eserese a na-akpọ parameter sets (psets).

Dị ka a tụrụ anya, Ab Initio GDE nwere ebe nchekwa nke ya akpọrọ EME (Enterprise Meta Environment). Ndị mmepe nwere ohere iji ụdị koodu mpaghara rụọ ọrụ wee lelee mmepe ha n'ime ebe nchekwa etiti.

Ọ ga-ekwe omume, n'oge ogbugbu ma ọ bụ mgbe emechara eserese ahụ, pịa ọ bụla iyi na-ejikọ mgbanwe wee lelee data gafere n'etiti mgbanwe ndị a:

Mgbe ị nwere Sber akpịrịkpa. Iji Ab Initio nwere Hive na GreenPlum

Ọ ga-ekwe omume ịpị na iyi ọ bụla wee hụ nkọwa nsochi - ole ngbanwe nke mgbanwe ahụ na-arụ ọrụ, ahịrị ole na bytes ka etinyere n'ime nke yiri ya:

Mgbe ị nwere Sber akpịrịkpa. Iji Ab Initio nwere Hive na GreenPlum

Ọ ga-ekwe omume kewaa mmebe nke eserese ahụ n'ime akụkụ ma gosi na e kwesịrị ime mgbanwe ụfọdụ na mbụ (na nkeji efu), ndị na-esote na nke mbụ, ndị na-esote na nke abụọ, wdg.

Maka mgbanwe ọ bụla, ị nwere ike ịhọrọ ihe a na-akpọ nhazi (ebe a ga-eme ya): na-enweghị ihe ọ bụla ma ọ bụ na eri, ọnụ ọgụgụ nke nwere ike ịkọwa. N'otu oge ahụ, faịlụ nwa oge nke Ab Initio na-emepụta mgbe mgbanwe na-agba ọsọ nwere ike itinye ma na faịlụ faịlụ na HDFS.

Na mgbanwe ọ bụla, dabere na ndebiri ndabara, ị nwere ike ịmepụta edemede nke gị na PDL, nke dị ka shei.

Site na PDL ị nwere ike ịgbatị ọrụ mgbanwe yana, ọkachasị, ị nwere ike ịmegharị (na oge ojiri gaa) mepụta iberibe koodu aka ike dabere na nkeji oge.

Ab Initio nwekwara mmekorita nke ọma na OS site na shei. Kpọmkwem, Sberbank na-eji Linux ksh. Ị nwere ike iji shea gbanwere mgbanwe ma jiri ha dị ka ihe eserese. Ị nwere ike ịkpọ ogbugbu nke eserese Ab Initio site na shei wee nye Ab Initio.

Na mgbakwunye na Ab Initio GDE, ọtụtụ ngwaahịa ndị ọzọ gụnyere na nnyefe. Enwere Co>Operation System nke ya na-ekwu na a na-akpọ ya sistemụ arụmọrụ. Enwere njikwa> Center ebe ị nwere ike ịhazi ma nyochaa usoro nbudata. Enwere ngwaahịa maka ime mmepe na ọkwa ochie karịa Ab Initio GDE na-enye ohere.

Nkọwa nke usoro MDW ma rụọ ọrụ na nhazi ya maka GreenPlum

Tinyere ngwaahịa ya, onye na-ere ahịa na-enye ngwaahịa MDW (Metadata Driven Warehouse), nke bụ ihe nhazi eserese emebere iji nyere aka n'ọrụ a na-ahụkarị nke ibipụta ụlọ nkwakọba ihe data ma ọ bụ ebe nchekwa data.

Ọ nwere ihe nyocha metadata nke omenala (ọrụ-kpọmkwem) yana ndị na-emepụta koodu emebere na igbe.

Mgbe ị nwere Sber akpịrịkpa. Iji Ab Initio nwere Hive na GreenPlum
Dị ka ntinye, MDW na-enweta ụdị data, faịlụ nhazi maka ịmepụta njikọ na nchekwa data (Oracle, Teradata ma ọ bụ Hive) na ụfọdụ ntọala ndị ọzọ. Akụkụ ahụ akọwapụtara nke ọma, dịka ọmụmaatụ, na-ebuga ihe nlereanya na nchekwa data. Akụkụ na-apụ apụ nke ngwaahịa ahụ na-ewepụta eserese na faịlụ nhazi maka ha site na itinye data na tebụl nlereanya. N'okwu a, a na-emepụta eserese (na psets) maka ọtụtụ ụdị mmalite na ọrụ agbakwunyere na imelite ụlọ ọrụ.

N'ihe gbasara Hive na RDBMS, a na-emepụta eserese dị iche iche maka mmalite na mmelite data agbakwunyere.

N'ihe banyere Hive, a na-ejikọta data delta na-abata site na Ab Initio Jikọọ na data dị na tebụl tupu mmelite ahụ. Ndị na-ebu data na MDW (ma na Hive na RDBMS) ọ bụghị nanị na itinye data ọhụrụ site na delta, kamakwa na-emechi oge dị mkpa nke data nke isi igodo natara delta. Na mgbakwunye, ị ga-edegharị akụkụ na-agbanweghị agbanwe nke data. Mana nke a ga-eme n'ihi na Hive enweghị ihichapụ ma ọ bụ melite arụmọrụ.

Mgbe ị nwere Sber akpịrịkpa. Iji Ab Initio nwere Hive na GreenPlum

N'ihe banyere RDBMS, eserese maka mmelite data agbakwunyere na-adị mma karịa, n'ihi na RDBMS nwere ikike imelite n'ezie.

Mgbe ị nwere Sber akpịrịkpa. Iji Ab Initio nwere Hive na GreenPlum

A na-ebunye delta enwetara n'ime tebụl etiti na nchekwa data. Mgbe nke a gasịrị, a na-ejikọta delta na data nke dị na tebụl tupu mmelite ahụ. A na-eme nke a site na iji SQL site na iji ajụjụ SQL emepụtara. Na-esote, na iji iwu SQL ihichapụ+fanye, a na-etinye data ọhụrụ sitere na delta n'ime tebụl ebumnuche yana oge dị mkpa nke data nke igodo isi natara delta mechiri.
Ọ dịghị mkpa idegharị data na-agbanweghị agbanwe.

Ya mere, anyị bịarutere na nkwubi okwu na n'ihe gbasara Hive, MDW ga-aga idegharị tebụl dum n'ihi na Hive enweghị ọrụ mmelite. Ọ dịghịkwa ihe dị mma karịa idegharị data ahụ kpamkpam mgbe emelitere emepụtara. N'ihe banyere RDBMS, n'ụzọ megidere nke ahụ, ndị na-emepụta ngwaahịa ahụ hụrụ na ọ dị mkpa inyefe njikọ na imelite tebụl na iji SQL.

Maka oru ngo na Sberbank, anyị mepụtara mmejuputa iwu ọhụrụ, reusable nke nchekwa data maka GreenPlum. Emere nke a dabere na ụdị nke MDW na-ewepụta maka Teradata. Ọ bụ Teradata, ọ bụghị Oracle, bịara nso na kacha mma maka nke a, n'ihi na ... bụkwa usoro MPP. Ụzọ ọrụ, yana syntax, nke Teradata na GreenPlum tụgharịrị yiri ya.

Ọmụmaatụ nke MDW-iche dị mkpa n'etiti RDBMS dị iche iche bụ ndị a. Na GreenPlum, n'adịghị ka Teradata, mgbe ị na-emepụta tebụl ị ga-ede nkebiokwu

distributed by

Teradata dere:

delete <table> all

, na GreenPlum ha na-ede

delete from <table>

Na Oracle, maka ebumnuche nkwalite ha na-ede

delete from t where rowid in (<соединение t с дельтой>)

, na Teradata na GreenPlum dee

delete from t where exists (select * from delta where delta.pk=t.pk)

Anyị na-achọpụtakwa na maka Ab Initio ka ya na GreenPlum rụọ ọrụ, ọ dị mkpa ịwụnye onye ahịa GreenPlum n'ọnụ ọnụ niile nke ụyọkọ Ab Initio. Nke a bụ n'ihi na anyị jikọọ na GreenPlum n'otu oge site na ọnụ ọnụ niile dị na ụyọkọ anyị. Na iji na-agụ si GreenPlum na-yiri na onye ọ bụla yiri Ab Initio eri na-agụ nke ya òkè nke data si GreenPlum, anyị ga-etinye ihe owuwu nke Ab Initio ghọtara na ngalaba "ebe" nke ajụjụ SQL.

where ABLOCAL()

ma chọpụta uru nke ihe owuwu a site n'ịkọpụta ihe ọgụgụ parameter si na nchekwa data mgbanwe

ablocal_expr=«string_concat("mod(t.", string_filter_out("{$TABLE_KEY}","{}"), ",", (decimal(3))(number_of_partitions()),")=", (decimal(3))(this_partition()))»

, nke na-achịkọta ihe dị ka

mod(sk,10)=3

, i.e. ị ga-akpali GreenPlum na ihe nzacha doro anya maka nkebi ọ bụla. Maka ọdụ data ndị ọzọ (Teradata, Oracle), Ab Initio nwere ike ime nke a na-akpaghị aka.

Ntụle arụmọrụ Ab Initio n'etiti Hive na GreenPlum

Sberbank mere nnwale iji tụnyere arụmọrụ nke eserese ndị MDW mepụtara n'ihe metụtara Hive na n'ihe metụtara GreenPlum. Dịka akụkụ nke nnwale ahụ, n'ihe gbasara Hive, enwere oghere 5 n'otu ụyọkọ Ab Initio, na n'ihe banyere GreenPlum enwere oghere 4 na ụyọkọ dị iche. Ndị ahụ. Hive nwere uru ngwaike karịa GreenPlum.

Anyị tụlere eserese abụọ na-arụ otu ọrụ nke imelite data na Hive na GreenPlum. N'otu oge ahụ, ewepụtara eserese nke onye nhazi MDW mepụtara:

  • ibu mbụ + agbakwunyere data ewepụtara na-enweghị usoro n'ime tebụl hive
  • ibu mbụ + agbakwunyere data ewepụtara na-enweghị usoro n'ime otu tebụl GreenPlum

N'okwu abụọ ahụ (Hive na GreenPlum) ha na-agba ọsọ na-ebugo na eri 10 yiri n'otu ụyọkọ Ab Initio. Ab Initio echekwara data etiti maka mgbako na HDFS (n'usoro nke Ab Initio, nhazi MFS site na iji HDFS ejiri ya). Otu ahịrị data ewepụtara na-enweghị usoro nwere 200 bytes n'okwu abụọ a.

Nsonaazụ dị ka nke a:

Kpoo:

Ntinye mbu na Hive

Etinyere ahịrị
6 000 000
60 000 000
600 000 000

Ogologo oge mmalite
nbudata na sekọnd
41
203
1 601

Nbudata na-abawanye na Hive

Ọnụọgụ ahịrị dị na
lekwasịrị anya table na mmalite nke nnwale
6 000 000
60 000 000
600 000 000

Ọnụọgụ ahịrị delta etinyere na
lekwasịrị anya table n'oge nnwale
6 000 000
6 000 000
6 000 000

Ogologo oge agbakwunyere
nbudata na sekọnd
88
299
2 541

GreenPlum:

Ntinye mbu na GreenPlum

Etinyere ahịrị
6 000 000
60 000 000
600 000 000

Ogologo oge mmalite
nbudata na sekọnd
72
360
3 631

Nbudata na-abawanye na GreenPlum

Ọnụọgụ ahịrị dị na
lekwasịrị anya table na mmalite nke nnwale
6 000 000
60 000 000
600 000 000

Ọnụọgụ ahịrị delta etinyere na
lekwasịrị anya table n'oge nnwale
6 000 000
6 000 000
6 000 000

Ogologo oge agbakwunyere
nbudata na sekọnd
159
199
321

Anyị na-ahụ na ọsọ nke mbụ loading na ma Hive na GreenPlum linearly dabere na ego nke data na, n'ihi na ihe ndị mma ngwaike, ọ bụ ubé ngwa ngwa n'ihi na ekwo Ekwo karịa GreenPlum.

Nbudata ịrị elu na Hive na-adaberekwa na oke data ebugoro na mbụ dị na tebụl ebumnuche wee na-aga nke ọma nwayọ ka olu na-eto. Ihe kpatara nke a bụ mkpa iji degharịa tebụl ebumnuche kpamkpam. Nke a pụtara na itinye obere mgbanwe na tebụl buru ibu abụghị ezigbo ojiji maka Hive.

Mmụba ibu na GreenPlum adịghị ike na-adabere na oke data ebugogoro na mbụ dị na tebụl ebumnuche wee na-aga ngwa ngwa. Nke a mere ekele SQL Joins na GreenPlum architecture, nke na-enye ohere ọrụ ihichapụ.

Yabụ, GreenPlum na-agbakwunye delta site na iji usoro nchapụta + ntinye, mana Hive enweghị ihichapụ ma ọ bụ melite arụmọrụ, yabụ amanyere ka edegharịa usoro data niile kpamkpam n'oge mmelite na-abawanye. Ntụnyere sel ndị ahụ agbapụtara n'atụghị egwu na-egosipụta nke ọma, ebe ọ dabara na nhọrọ a na-ahụkarị maka iji nbudata ngwa ngwa. Anyị na-ahụ na GreenPlum meriri Hive na ule a ugboro asatọ.

Na-arụ ọrụ Ab Initio na GreenPlum na nso ezigbo oge mode

Na nnwale a, anyị ga-anwale ike Ab Initio imelite tebụl GreenPlum site na iji chunk data ewepụtara na-enweghị usoro n'oge dị nso. Ka anyị tụlee tebụl GreenPlum dev42_1_db_usl.TESTING_SUBJ_org_finval, nke anyị ga-eji rụọ ọrụ.

Anyị ga-eji eserese Ab Initio atọ iji rụọ ọrụ na ya:

1) Eserese Create_test_data.mp - na-emepụta faịlụ data na HDFS na ahịrị 10 na eri 6 yiri ya. Ihe omuma a na-adighi nma, a haziri nhazi ya maka ntinye n'ime okpokoro anyi

Mgbe ị nwere Sber akpịrịkpa. Iji Ab Initio nwere Hive na GreenPlum

Mgbe ị nwere Sber akpịrịkpa. Iji Ab Initio nwere Hive na GreenPlum

2) Eserese mdw_load.day_one.current.dev42_1_db_usl_testing_subj_org_finval.pset – MDW emepụtara eserese site na ibido ntinye data n'ime tebụl anyị n'ime eriri iri 10 (a na-eji data nnwale emepụtara site na eserese (1))

Mgbe ị nwere Sber akpịrịkpa. Iji Ab Initio nwere Hive na GreenPlum

3) Eserese mdw_load.regular.current.dev42_1_db_usl_testing_subj_org_finval.pset – eserese nke MDW mepụtara maka imelite tebụl anyị n'ime eriri iri 10 yiri ya site na iji akụkụ nke data enwetara ọhụrụ (delta) sitere na eserese (1).

Mgbe ị nwere Sber akpịrịkpa. Iji Ab Initio nwere Hive na GreenPlum

Ka anyị mee script dị n'okpuru na ọnọdụ NRT:

  • mepụta ahịrị ule 6
  • mee ntinye ibu nke mbụ 6 ahịrị ule n'ime tebụl tọgbọrọ chakoo
  • megharịa nbudata nbudata ugboro 5
    • mepụta ahịrị ule 6
    • mee ntinye ntinye nke ahịrị ule 6 n'ime tebụl (n'ọnọdụ a, a na-edozi oge ngafe_to_ts na data ochie ma tinye data na nso nso a nwere otu igodo isi)

Nke a dịruru ná njọ na-eṅomi mode nke ezigbo ọrụ nke a ụfọdụ azụmahịa usoro - a pụtara nnukwu akụkụ nke ọhụrụ data na-egosi ozugbo na ozugbo wụsara GreenPlum.

Ugbu a, ka anyị leba anya na ndekọ ederede:

Bido Create_test_data.input.pset na 2020-06-04 11:49:11
Mechaa Create_test_data.input.pset na 2020-06-04 11:49:37
Bido mdw_load.day_one.current.dev42_1_db_usl_testing_subj_org_finval.pset na 2020-06-04 11:49:37
Mechaa mdw_load.day_one.current.dev42_1_db_usl_testing_subj_org_finval.pset na 2020-06-04 11:50:42
Bido Create_test_data.input.pset na 2020-06-04 11:50:42
Mechaa Create_test_data.input.pset na 2020-06-04 11:51:06
Bido mdw_load.regular.current.dev42_1_db_usl_testing_subj_org_finval.pset na 2020-06-04 11:51:06
Mechaa mdw_load.regular.current.dev42_1_db_usl_testing_subj_org_finval.pset na 2020-06-04 11:53:41
Bido Create_test_data.input.pset na 2020-06-04 11:53:41
Mechaa Create_test_data.input.pset na 2020-06-04 11:54:04
Bido mdw_load.regular.current.dev42_1_db_usl_testing_subj_org_finval.pset na 2020-06-04 11:54:04
Mechaa mdw_load.regular.current.dev42_1_db_usl_testing_subj_org_finval.pset na 2020-06-04 11:56:51
Bido Create_test_data.input.pset na 2020-06-04 11:56:51
Mechaa Create_test_data.input.pset na 2020-06-04 11:57:14
Bido mdw_load.regular.current.dev42_1_db_usl_testing_subj_org_finval.pset na 2020-06-04 11:57:14
Mechaa mdw_load.regular.current.dev42_1_db_usl_testing_subj_org_finval.pset na 2020-06-04 11:59:55
Bido Create_test_data.input.pset na 2020-06-04 11:59:55
Mechaa Create_test_data.input.pset na 2020-06-04 12:00:23
Bido mdw_load.regular.current.dev42_1_db_usl_testing_subj_org_finval.pset na 2020-06-04 12:00:23
Mechaa mdw_load.regular.current.dev42_1_db_usl_testing_subj_org_finval.pset na 2020-06-04 12:03:23
Bido Create_test_data.input.pset na 2020-06-04 12:03:23
Mechaa Create_test_data.input.pset na 2020-06-04 12:03:49
Bido mdw_load.regular.current.dev42_1_db_usl_testing_subj_org_finval.pset na 2020-06-04 12:03:49
Mechaa mdw_load.regular.current.dev42_1_db_usl_testing_subj_org_finval.pset na 2020-06-04 12:06:46

Ọ tụgharịrị na foto a:

Ihe ngosi
Oge mmalite
Mechaa oge
ogologo

Mepụta_test_data.input.pset
04.06.2020: 11: 49: 11
04.06.2020: 11: 49: 37
00:00:26

mdw_load.day_one.ugbu a.
dev42_1_db_usl_testing_subj_org_finval.pset
04.06.2020: 11: 49: 37
04.06.2020: 11: 50: 42
00:01:05

Mepụta_test_data.input.pset
04.06.2020: 11: 50: 42
04.06.2020: 11: 51: 06
00:00:24

mdw_load.regular.current.
dev42_1_db_usl_testing_subj_org_finval.pset
04.06.2020: 11: 51: 06
04.06.2020: 11: 53: 41
00:02:35

Mepụta_test_data.input.pset
04.06.2020: 11: 53: 41
04.06.2020: 11: 54: 04
00:00:23

mdw_load.regular.current.
dev42_1_db_usl_testing_subj_org_finval.pset
04.06.2020: 11: 54: 04
04.06.2020: 11: 56: 51
00:02:47

Mepụta_test_data.input.pset
04.06.2020: 11: 56: 51
04.06.2020: 11: 57: 14
00:00:23

mdw_load.regular.current.
dev42_1_db_usl_testing_subj_org_finval.pset
04.06.2020: 11: 57: 14
04.06.2020: 11: 59: 55
00:02:41

Mepụta_test_data.input.pset
04.06.2020: 11: 59: 55
04.06.2020: 12: 00: 23
00:00:28

mdw_load.regular.current.
dev42_1_db_usl_testing_subj_org_finval.pset
04.06.2020: 12: 00: 23
04.06.2020: 12: 03: 23
00:03:00

Mepụta_test_data.input.pset
04.06.2020: 12: 03: 23
04.06.2020: 12: 03: 49
00:00:26

mdw_load.regular.current.
dev42_1_db_usl_testing_subj_org_finval.pset
04.06.2020: 12: 03: 49
04.06.2020: 12: 06: 46
00:02:57

Anyị na-ahụ na a na-ahazi ahịrị 6 na nkeji 000, nke na-adị ngwa ngwa.
A tụgharịrị kesaa data dị na tebụl ebumnuche dịka ndị a:

select valid_from_ts, valid_to_ts, count(1), min(sk), max(sk) from dev42_1_db_usl.TESTING_SUBJ_org_finval group by valid_from_ts, valid_to_ts order by 1,2;

Mgbe ị nwere Sber akpịrịkpa. Iji Ab Initio nwere Hive na GreenPlum
Ị nwere ike ịhụ nzikọrịta data etinyere na oge ewepụtara eserese ndị ahụ.
Nke a pụtara na ị nwere ike ịgbanye ntinye data n'ime GreenPlum na Ab Initio n'ọtụtụ dị elu wee hụ nnukwu ọsọ nke itinye data a na GreenPlum. N'ezie, ọ gaghị ekwe omume ịmalite otu ugboro na nke abụọ, ebe ọ bụ na Ab Initio, dị ka ngwá ọrụ ETL ọ bụla, chọrọ oge iji "bido" mgbe emebere ya.

nkwubi

A na-eji Ab Initio ugbu a na Sberbank iji wuo otu Layer Semantic Data Layer (ESS). Ihe oru ngo a na-agụnye iwulite ụdị n'otu nke steeti ụlọ ọrụ azụmahịa dị iche iche nke ụlọ akụ. Ozi sitere na isi mmalite dị iche iche, a na-akwado ụdị ya na Hadoop. Dabere na mkpa azụmahịa, a na-akwado ụdị data ma kọwaa mgbanwe data. Ab Initio na-ebunye ozi n'ime ESN na data ebudatara abụghị naanị mmasị maka azụmahịa n'onwe ya, kamakwa ọ na-ejekwa ozi dị ka isi iyi maka iwulite marts data. N'otu oge ahụ, ọrụ nke ngwaahịa ahụ na-enye gị ohere iji usoro dị iche iche dị ka onye nata (Hive, Greenplum, Teradata, Oracle), nke na-eme ka ọ dị mfe ịkwadebe data maka azụmahịa n'ụdị dị iche iche ọ chọrọ.

Ike Ab Initio dị obosara dịka ọmụmaatụ, usoro MDW gụnyere na-eme ka o kwe omume ịmepụta data akụkọ ihe mere eme nke teknụzụ na nke azụmahịa. Maka ndị mmepe, Ab Initio na-eme ka o kwe omume ịghara ịmalitegharị wiil ahụ, kama iji ọtụtụ ihe ndị na-arụ ọrụ dị adị, nke bụ n'ezie ụlọ akwụkwọ dị mkpa mgbe ị na-arụ ọrụ na data.

Onye edemede bụ ọkachamara na obodo ọkachamara nke Sberbank SberProfi DWH/BigData. Ndị ọkachamara SberProfi DWH/BigData na-ahụ maka ịzụlite ikike na mpaghara ndị dị ka Hadoop ecosystem, Teradata, Oracle DB, GreenPlum, yana BI ngwaọrụ Qlik, SAP BO, Tableau, wdg.

isi: www.habr.com

Tinye a comment