Thaum koj muaj Sber nplai. Siv Ab Initio nrog Hive thiab GreenPlum

Qee lub sij hawm dhau los, peb tau ntsib cov lus nug ntawm kev xaiv ETL cov cuab yeej ua haujlwm nrog Cov Ntaub Ntawv Loj. Cov tshuaj Informatica BDM yav dhau los siv tsis haum rau peb vim muaj kev ua haujlwm tsawg. Nws siv tau raug txo mus rau lub moj khaum rau launching spark-submit commands. Tsis muaj ntau cov analogues ntawm kev ua lag luam uas yog, hauv paus ntsiab lus, muaj peev xwm ua haujlwm nrog cov ntaub ntawv ntim uas peb cuam tshuam nrog txhua hnub. Thaum kawg peb xaiv Ab Initio. Thaum lub sij hawm sim ua qauv qhia, cov khoom tau pom cov ntaub ntawv ceev heev. Yuav luag tsis muaj ntaub ntawv hais txog Ab Initio hauv Lavxias, yog li peb txiav txim siab los tham txog peb qhov kev paub dhau los ntawm HabrΓ©.

Ab Initio muaj ntau yam kev hloov pauv qub thiab txawv txawv, cov cai uas tuaj yeem txuas ntxiv siv nws tus kheej cov lus PDL. Rau kev lag luam me, cov cuab yeej muaj zog zoo li no yuav dhau mus, thiab feem ntau ntawm nws lub peev xwm yuav kim thiab tsis siv. Tab sis yog tias koj qhov ntsuas nyob ze rau Sberov's, ces Ab Initio yuav nthuav rau koj.

Nws pab kev lag luam kom tau txais kev paub thoob ntiaj teb thiab txhim kho ecosystem, thiab tus tsim tawm los txhim kho nws cov kev txawj hauv ETL, txhim kho nws txoj kev paub hauv lub plhaub, muab lub sijhawm los paub cov lus PDL, muab cov duab pom ntawm cov txheej txheem thauj khoom, thiab txhim kho yooj yim. vim muaj ntau ntawm functional Cheebtsam.

Hauv tsab ntawv no kuv yuav tham txog lub peev xwm ntawm Ab Initio thiab muab cov yam ntxwv sib piv ntawm nws txoj haujlwm nrog Hive thiab GreenPlum.

  • Kev piav qhia ntawm MDW lub moj khaum thiab ua haujlwm ntawm nws qhov kev hloov kho rau GreenPlum
  • Ab Initio kev sib piv ntawm Hive thiab GreenPlum
  • Ua haujlwm Ab Initio nrog GreenPlum nyob ze hom sijhawm tiag tiag


Lub functionality ntawm cov khoom no yog dav heev thiab yuav tsum tau siv sij hawm ntau los kawm. Txawm li cas los xij, nrog cov kev txawj ua haujlwm kom raug thiab kev teeb tsa kev ua tau zoo, cov txiaj ntsig ntawm kev ua cov ntaub ntawv tau zoo heev. Siv Ab Initio rau tus tsim tawm tuaj yeem muab qhov kev paub zoo. Qhov no yog qhov kev coj ua tshiab ntawm ETL kev txhim kho, kev sib txuas ntawm ib puag ncig kev pom thiab rub tawm kev txhim kho hauv cov lus zoo li tsab ntawv.

Cov lag luam tab tom txhim kho lawv cov ecosystems thiab cov cuab yeej no los ua ke ntau dua li puas tau. Nrog Ab Initio, koj tuaj yeem sau cov kev paub txog koj txoj kev lag luam tam sim no thiab siv qhov kev paub no los nthuav cov qub thiab qhib kev lag luam tshiab. Lwm txoj hauv kev rau Ab Initio suav nrog qhov pom kev loj hlob ib puag ncig Informatica BDM thiab tsis pom kev loj hlob ib puag ncig Apache Spark.

Description of Ab Initio

Ab Initio, zoo li lwm yam ETL cov cuab yeej, yog cov khoom sib sau ua ke.

Thaum koj muaj Sber nplai. Siv Ab Initio nrog Hive thiab GreenPlum

Ab Initio GDE (Graphical Development Ib puag ncig) yog ib puag ncig rau tus tsim tawm uas nws teeb tsa cov ntaub ntawv hloov pauv thiab txuas nrog cov ntaub ntawv ntws hauv daim ntawv xub. Nyob rau hauv cov ntaub ntawv no, xws li ib tug txheej ntawm transformations yog hu ua graph:

Thaum koj muaj Sber nplai. Siv Ab Initio nrog Hive thiab GreenPlum

Cov tswv yim thiab cov khoom sib txuas ntawm cov khoom siv ua haujlwm yog cov chaw nres nkoj thiab muaj cov teb suav nrog hauv kev hloov pauv. Ob peb daim duab txuas nrog ntws hauv daim ntawv ntawm cov xub hauv qhov kev txiav txim ntawm lawv qhov kev ua tiav yog hu ua ib txoj kev npaj.

Muaj ntau pua lub luag haujlwm, uas yog ntau heev. Ntau ntawm lawv yog cov tshwj xeeb heev. Lub peev xwm ntawm kev hloov pauv classic hauv Ab Initio yog dav dua li lwm cov cuab yeej ETL. Piv txwv li, Join muaj ntau qhov kev tso tawm. Ntxiv nrog rau qhov tshwm sim ntawm kev sib txuas cov ntaub ntawv, koj tuaj yeem tau txais cov ntaub ntawv tawm ntawm cov ntaub ntawv tawm tswv yim uas nws cov yuam sij tsis tuaj yeem txuas nrog. Koj tseem tuaj yeem tau txais kev tsis lees paub, qhov tsis raug thiab lub cav ntawm kev ua haujlwm hloov pauv, uas tuaj yeem nyeem hauv tib kem raws li cov ntawv nyeem thiab ua tiav nrog lwm cov kev hloov pauv:

Thaum koj muaj Sber nplai. Siv Ab Initio nrog Hive thiab GreenPlum

Los yog, piv txwv li, koj tuaj yeem tsim cov ntaub ntawv txais hauv daim ntawv ntawm lub rooj thiab nyeem cov ntaub ntawv los ntawm nws hauv tib kem.

Muaj kev hloov pauv qub. Piv txwv li, Scan transformation muaj kev ua haujlwm zoo ib yam li kev ua haujlwm analytical. Muaj cov kev hloov pauv nrog cov npe piav qhia tus kheej: Tsim cov ntaub ntawv, Nyeem Excel, Normalize, Sort hauv pab pawg, Khiav Program, Khiav SQL, Koom nrog DB, thiab lwm yam. Cov duab tuaj yeem siv cov sijhawm ua haujlwm, suav nrog qhov muaj peev xwm hla dhau los ntawm lossis mus rau lub operating system. Cov ntaub ntawv nrog cov txheej txheem npaj tau dhau mus rau daim duab hu ua parameter sets (psets).

Raws li kev cia siab, Ab Initio GDE muaj nws lub chaw cia khoom hu ua EME (Enterprise Meta Ib puag ncig). Cov neeg tsim khoom muaj lub sijhawm los ua haujlwm nrog cov qauv hauv zos ntawm cov lej thiab tshawb xyuas lawv cov kev txhim kho rau hauv lub hauv paus repository.

Nws yog qhov ua tau, thaum ua tiav lossis tom qab ua tiav daim duab, nyem rau ntawm txhua qhov ntws txuas nrog kev hloov pauv thiab saib cov ntaub ntawv uas dhau los ntawm cov kev hloov pauv no:

Thaum koj muaj Sber nplai. Siv Ab Initio nrog Hive thiab GreenPlum

Nws tseem tuaj yeem nyem rau ntawm ib qho kwj thiab saib cov ntsiab lus taug qab - pes tsawg qhov sib npaug ntawm qhov kev hloov pauv tau ua haujlwm, pes tsawg kab thiab bytes tau thauj mus rau qhov twg ntawm qhov sib piv:

Thaum koj muaj Sber nplai. Siv Ab Initio nrog Hive thiab GreenPlum

Nws muaj peev xwm faib qhov kev ua tiav ntawm daim duab mus rau theem thiab cim tias qee qhov kev hloov pauv yuav tsum tau ua ua ntej (hauv theem xoom), cov tom ntej hauv thawj theem, cov tom ntej hauv theem thib ob, thiab lwm yam.

Rau txhua qhov kev hloov pauv, koj tuaj yeem xaiv qhov hu ua layout (qhov twg nws yuav raug tua): yam tsis muaj qhov sib npaug lossis hauv cov xov sib txuas, tus lej uas tuaj yeem teev tau. Nyob rau tib lub sijhawm, cov ntaub ntawv ib ntus uas Ab Initio tsim thaum hloov pauv tau ua haujlwm tuaj yeem muab tso rau hauv cov ntaub ntawv server thiab hauv HDFS.

Hauv txhua qhov kev hloov pauv, raws li tus qauv qub, koj tuaj yeem tsim koj tus kheej tsab ntawv hauv PDL, uas yog me ntsis zoo li lub plhaub.

Nrog PDL, koj tuaj yeem txuas ntxiv kev ua haujlwm ntawm kev hloov pauv thiab, tshwj xeeb tshaj yog, koj tuaj yeem dynamically (thaum runtime) tsim cov lej tsis txaus ntseeg nyob ntawm cov sijhawm ua haujlwm.

Ab Initio kuj tau tsim kev koom ua ke nrog OS ntawm lub plhaub. Tshwj xeeb, Sberbank siv linux ksh. Koj tuaj yeem hloov pauv cov hloov pauv nrog lub plhaub thiab siv lawv ua cov duab ntsuas. Koj tuaj yeem hu rau kev ua tiav ntawm Ab Initio graphs los ntawm lub plhaub thiab tswj hwm Ab Initio.

Ntxiv rau Ab Initio GDE, ntau lwm yam khoom raug suav nrog hauv kev xa khoom. Nws muaj nws tus kheej Co>Operation System nrog rau kev thov hu ua kev ua haujlwm. Muaj ib qho Control>Center uas koj tuaj yeem teem sijhawm thiab saib xyuas kev rub tawm. Muaj cov khoom lag luam rau kev txhim kho ntawm qib qub tshaj li Ab Initio GDE tso cai.

Kev piav qhia ntawm MDW lub moj khaum thiab ua haujlwm ntawm nws qhov kev hloov kho rau GreenPlum

Nrog rau nws cov khoom lag luam, tus neeg muag khoom muab MDW (Metadata Driven Warehouse) cov khoom lag luam, uas yog lub teeb pom kev zoo tsim los pab nrog cov haujlwm tseem ceeb ntawm cov ntaub ntawv khaws cia lossis cov ntaub ntawv vaults.

Nws muaj cov kev cai (qhov project-specific) metadata parsers thiab npaj-ua code generators tawm ntawm lub thawv.

Thaum koj muaj Sber nplai. Siv Ab Initio nrog Hive thiab GreenPlum
Raws li cov tswv yim, MDW tau txais cov qauv ntaub ntawv, cov ntaub ntawv teeb tsa rau kev teeb tsa kev sib txuas rau cov ntaub ntawv (Oracle, Teradata lossis Hive) thiab qee qhov chaw. Piv txwv li, qhov project-specific part, deploys the model to a database. Qhov chaw tawm ntawm lub thawv ntawm cov khoom tsim cov duab kos thiab teeb tsa cov ntaub ntawv rau lawv los ntawm kev thauj cov ntaub ntawv mus rau hauv cov duab qauv. Hauv qhov no, graphs (thiab psets) yog tsim rau ntau hom kev pib thiab kev ua haujlwm ntxiv ntawm kev hloov kho cov chaw.

Hauv cov xwm txheej ntawm Hive thiab RDBMS, cov duab sib txawv tau tsim los rau kev pib thiab nce cov ntaub ntawv hloov tshiab.

Nyob rau hauv cov ntaub ntawv ntawm Hive, cov ntaub ntawv delta tuaj txuas nrog ntawm Ab Initio Koom nrog cov ntaub ntawv uas nyob hauv lub rooj ua ntej hloov tshiab. Cov ntaub ntawv thauj khoom hauv MDW (ob qho tib si hauv Hive thiab RDBMS) tsis tsuas yog ntxig cov ntaub ntawv tshiab los ntawm delta, tab sis kuj kaw lub sijhawm ntawm qhov cuam tshuam ntawm cov ntaub ntawv uas nws cov yuam sij tseem ceeb tau txais delta. Tsis tas li ntawd, koj yuav tsum rov sau dua qhov tsis hloov pauv ntawm cov ntaub ntawv. Tab sis qhov no yuav tsum tau ua vim Hive tsis muaj kev tshem tawm lossis hloov kho cov haujlwm.

Thaum koj muaj Sber nplai. Siv Ab Initio nrog Hive thiab GreenPlum

Nyob rau hauv rooj plaub ntawm RDBMS, graphs rau incremental cov ntaub ntawv hloov tshiab saib zoo dua, vim hais tias RDBMS muaj peev xwm hloov kho tiag tiag.

Thaum koj muaj Sber nplai. Siv Ab Initio nrog Hive thiab GreenPlum

Qhov tau txais delta yog loaded rau hauv ib lub rooj nruab nrab hauv lub database. Tom qab no, lub delta txuas nrog cov ntaub ntawv uas nyob hauv lub rooj ua ntej hloov tshiab. Thiab qhov no yog ua tiav siv SQL siv cov lus nug SQL generated. Tom ntej no, siv cov lus txib SQL tshem tawm + ntxig, cov ntaub ntawv tshiab los ntawm delta tau muab tso rau hauv lub hom phiaj lub rooj thiab lub sijhawm ntawm qhov cuam tshuam ntawm cov ntaub ntawv uas nws cov yuam sij tseem ceeb tau txais cov delta raug kaw.
Tsis tas yuav rov sau cov ntaub ntawv tsis hloov pauv.

Yog li peb tuaj yeem txiav txim siab tias nyob rau hauv rooj plaub ntawm Hive, MDW yuav tsum tau mus sau tag nrho cov lus vim Hive tsis muaj qhov hloov tshiab. Thiab tsis muaj dab tsi zoo dua li rov sau cov ntaub ntawv tag nrho thaum hloov kho tau tsim. Nyob rau hauv rooj plaub ntawm RDBMS, ntawm qhov tsis sib xws, cov tsim ntawm cov khoom pom tias nws tsim nyog los tso siab rau kev sib txuas thiab hloov kho cov ntxhuav rau kev siv SQL.

Rau ib qhov project ntawm Sberbank, peb tsim ib qho tshiab, reusable siv lub database loader rau GreenPlum. Qhov no tau ua tiav raws li cov qauv uas MDW tsim rau Teradata. Nws yog Teradata, thiab tsis yog Oracle, uas tuaj ze thiab zoo tshaj plaws rau qhov no, vim tias ... tseem yog MPP system. Cov txheej txheem ua haujlwm, nrog rau cov syntax, ntawm Teradata thiab GreenPlum tau zoo sib xws.

Piv txwv ntawm MDW-qhov tseem ceeb ntawm qhov sib txawv ntawm RDBMSs sib txawv yog raws li hauv qab no. Hauv GreenPlum, tsis zoo li Teradata, thaum tsim cov ntxhuav koj yuav tsum tau sau ib nqe lus

distributed by

Teradata sau:

delete <table> all

, thiab hauv GreenPlum lawv sau

delete from <table>

Nyob rau hauv Oracle, rau optimization lub hom phiaj lawv sau

delete from t where rowid in (<соСдинСниС t с Π΄Π΅Π»ΡŒΡ‚ΠΎΠΉ>)

, thiab Teradata thiab GreenPlum sau

delete from t where exists (select * from delta where delta.pk=t.pk)

Peb kuj tseem nco ntsoov tias rau Ab Initio ua haujlwm nrog GreenPlum, nws yog qhov tsim nyog rau nruab GreenPlum tus neeg siv khoom ntawm txhua qhov ntawm Ab Initio pawg. Qhov no yog vim peb txuas nrog GreenPlum ib txhij los ntawm txhua qhov ntawm peb pawg. Thiab txhawm rau kev nyeem ntawv los ntawm GreenPlum mus rau qhov sib npaug thiab txhua qhov sib npaug Ab Initio xov kom nyeem nws tus kheej ib feem ntawm cov ntaub ntawv los ntawm GreenPlum, peb yuav tsum tso kev tsim kho nkag siab los ntawm Ab Initio hauv "qhov twg" ntu ntawm SQL queries

where ABLOCAL()

thiab txiav txim siab tus nqi ntawm qhov kev tsim kho no los ntawm kev qhia qhov ntsuas kev nyeem ntawv los ntawm cov ntaub ntawv hloov pauv

ablocal_expr=Β«string_concat("mod(t.", string_filter_out("{$TABLE_KEY}","{}"), ",", (decimal(3))(number_of_partitions()),")=", (decimal(3))(this_partition()))Β»

, uas compiles rau tej yam zoo li

mod(sk,10)=3

, i.e. koj yuav tsum hais kom GreenPlum nrog cov lim qhia meej rau txhua qhov kev faib tawm. Rau lwm cov databases (Teradata, Oracle), Ab Initio tuaj yeem ua qhov kev sib piv no tau.

Ab Initio kev sib piv ntawm Hive thiab GreenPlum

Sberbank tau ua ib qho kev sim los sib piv qhov kev ua tau zoo ntawm MDW-tsim graphs hauv kev cuam tshuam nrog Hive thiab cuam tshuam nrog GreenPlum. Raws li ib feem ntawm qhov kev sim, nyob rau hauv cov ntaub ntawv ntawm Hive muaj 5 nodes ntawm tib pawg li Ab Initio, thiab nyob rau hauv cov ntaub ntawv ntawm GreenPlum muaj 4 nodes ntawm ib tug cais pawg. Cov. Hive muaj qee qhov kho vajtse kom zoo dua GreenPlum.

Peb tau txiav txim siab ob khub ntawm daim duab ua haujlwm tib yam ntawm kev hloov kho cov ntaub ntawv hauv Hive thiab GreenPlum. Nyob rau tib lub sijhawm, cov duab tsim los ntawm MDW configurator tau pib:

  • pib load + incremental load ntawm randomly generated cov ntaub ntawv rau hauv lub Hive rooj
  • pib load + incremental load ntawm randomly generated cov ntaub ntawv rau hauv tib lub GreenPlum rooj

Nyob rau hauv ob qho tib si (Hive thiab GreenPlum) lawv tau khiav uploads rau 10 cov lus sib luag ntawm tib lub Ab Initio pawg. Ab Initio khaws cia cov ntaub ntawv nruab nrab rau kev suav hauv HDFS (raws li Ab Initio, MFS layout siv HDFS tau siv). Ib kab ntawm randomly generated cov ntaub ntawv nyob 200 bytes nyob rau hauv ob qho tib si.

Cov txiaj ntsig tau zoo li no:

Huv:

Thawj loading hauv Hive

Kab nkag
6 000 000
60 000 000
600 000 000

Lub sijhawm pib
downloads hauv vib nas this
41
203
1 601

Incremental loading hauv Hive

Tus naj npawb ntawm kab muaj nyob hauv
phiaj lub rooj thaum pib ntawm qhov kev sim
6 000 000
60 000 000
600 000 000

Tus naj npawb ntawm delta kab siv rau
phiaj lub rooj thaum lub sij hawm sim
6 000 000
6 000 000
6 000 000

Duration ntawm incremental
downloads hauv vib nas this
88
299
2 541

GreenPlum:

Thawj qhov kev thauj khoom hauv GreenPlum

Kab nkag
6 000 000
60 000 000
600 000 000

Lub sijhawm pib
downloads hauv vib nas this
72
360
3 631

Incremental loading hauv GreenPlum

Tus naj npawb ntawm kab muaj nyob hauv
phiaj lub rooj thaum pib ntawm qhov kev sim
6 000 000
60 000 000
600 000 000

Tus naj npawb ntawm delta kab siv rau
phiaj lub rooj thaum lub sij hawm sim
6 000 000
6 000 000
6 000 000

Duration ntawm incremental
downloads hauv vib nas this
159
199
321

Peb pom tias qhov ceev ntawm qhov pib thauj khoom hauv ob qho tib si Hive thiab GreenPlum linearly nyob ntawm tus nqi ntawm cov ntaub ntawv thiab, vim li cas cov khoom siv zoo dua, nws yog me ntsis nrawm dua rau GreenPlum.

Incremental loading nyob rau hauv Hive kuj linearly nyob ntawm qhov ntim ntawm yav tas los loaded cov ntaub ntawv muaj nyob rau hauv lub hom phiaj lub rooj thiab pib heev maj mam raws li lub ntim loj hlob. Qhov no yog tshwm sim los ntawm qhov yuav tsum tau rov sau lub hom phiaj tag nrho. Qhov no txhais tau hais tias siv cov kev hloov me me rau cov rooj loj loj tsis yog qhov siv tau zoo rau Hive.

Kev thauj khoom nce ntxiv hauv GreenPlum tsis muaj zog yog nyob ntawm qhov ntim ntawm cov ntaub ntawv thauj khoom yav dhau los muaj nyob rau hauv lub hom phiaj lub rooj thiab ua tiav sai heev. Qhov no tshwm sim ua tsaug rau SQL Joins thiab GreenPlum architecture, uas tso cai rau kev tshem tawm haujlwm.

Yog li, GreenPlum ntxiv cov delta siv cov txheej txheem rho tawm + ntxig, tab sis Hive tsis muaj kev tshem tawm lossis hloov kho cov haujlwm, yog li tag nrho cov ntaub ntawv array raug yuam kom rov sau tag nrho thaum lub sijhawm hloov tshiab ntxiv. Kev sib piv ntawm cov hlwb tseem ceeb hauv kev ua siab tawv yog qhov nthuav dav tshaj plaws, vim nws sib haum rau qhov kev xaiv ntau tshaj plaws rau kev siv cov peev txheej-ntaus downloads. Peb pom tias GreenPlum yeej Hive hauv qhov kev sim no los ntawm 8 zaug.

Ua haujlwm Ab Initio nrog GreenPlum nyob ze hom sijhawm tiag tiag

Hauv qhov kev sim no, peb yuav sim Ab Initio lub peev xwm los hloov kho GreenPlum lub rooj nrog cov ntaub ntawv tsim tawm nyob ze ze ntawm lub sijhawm. Cia peb xav txog GreenPlum lub rooj dev42_1_db_usl.TESTING_SUBJ_org_finval, uas peb yuav ua haujlwm.

Peb yuav siv peb daim duab Ab Initio ua haujlwm nrog nws:

1) Daim duab Create_test_data.mp - tsim cov ntaub ntawv hauv HDFS nrog 10 kab hauv 6 kab sib txuas. Cov ntaub ntawv yog random, nws cov qauv yog npaj rau kev ntxig rau hauv peb lub rooj

Thaum koj muaj Sber nplai. Siv Ab Initio nrog Hive thiab GreenPlum

Thaum koj muaj Sber nplai. Siv Ab Initio nrog Hive thiab GreenPlum

2) Graph mdw_load.day_one.current.dev42_1_db_usl_testing_subj_org_finval.pset – MDW generated graph by initializing data insertion into our table in 10 parallel threads (test data generated by graph (1) yog siv)

Thaum koj muaj Sber nplai. Siv Ab Initio nrog Hive thiab GreenPlum

3) Graph mdw_load.regular.current.dev42_1_db_usl_testing_subj_org_finval.pset – ib daim duab tsim los ntawm MDW rau incremental hloov tshiab ntawm peb lub rooj nyob rau hauv 10 parallel threads siv ib feem ntawm freshly txais cov ntaub ntawv (delta) generated los ntawm graph (1)

Thaum koj muaj Sber nplai. Siv Ab Initio nrog Hive thiab GreenPlum

Cia peb khiav cov ntawv hauv qab no hauv NRT hom:

  • tsim 6 xeem kab
  • ua qhov pib load ntxig 6 xeem kab rau hauv lub rooj khoob
  • rov incremental download 5 zaug
    • tsim 6 xeem kab
    • ua ib qho incremental insert ntawm 6 xeem kab rau hauv lub rooj (qhov no, valid_to_ts lub sij hawm tas sij hawm yog teem rau cov ntaub ntawv qub thiab cov ntaub ntawv tsis ntev los no nrog tib lub ntsiab tseem ceeb yog muab tso rau)

Qhov xwm txheej no ua raws li hom kev ua haujlwm tiag tiag ntawm qee qhov kev lag luam - ib feem loj ntawm cov ntaub ntawv tshiab tshwm nyob rau lub sijhawm thiab tam sim ntawd nchuav rau hauv GreenPlum.

Tam sim no cia saib daim ntawv sau npe:

Pib Create_test_data.input.pset ntawm 2020-06-04 11:49:11
Finish Create_test_data.input.pset ntawm 2020-06-04 11:49:37
Pib mdw_load.day_one.current.dev42_1_db_usl_testing_subj_org_finval.pset at 2020-06-04 11:49:37
Ua tiav mdw_load.day_one.current.dev42_1_db_usl_testing_subj_org_finval.pset at 2020-06-04 11:50:42
Pib Create_test_data.input.pset ntawm 2020-06-04 11:50:42
Finish Create_test_data.input.pset ntawm 2020-06-04 11:51:06
Pib mdw_load.regular.current.dev42_1_db_usl_testing_subj_org_finval.pset at 2020-06-04 11:51:06
Ua tiav mdw_load.regular.current.dev42_1_db_usl_testing_subj_org_finval.pset at 2020-06-04 11:53:41
Pib Create_test_data.input.pset ntawm 2020-06-04 11:53:41
Finish Create_test_data.input.pset ntawm 2020-06-04 11:54:04
Pib mdw_load.regular.current.dev42_1_db_usl_testing_subj_org_finval.pset at 2020-06-04 11:54:04
Ua tiav mdw_load.regular.current.dev42_1_db_usl_testing_subj_org_finval.pset at 2020-06-04 11:56:51
Pib Create_test_data.input.pset ntawm 2020-06-04 11:56:51
Finish Create_test_data.input.pset ntawm 2020-06-04 11:57:14
Pib mdw_load.regular.current.dev42_1_db_usl_testing_subj_org_finval.pset at 2020-06-04 11:57:14
Ua tiav mdw_load.regular.current.dev42_1_db_usl_testing_subj_org_finval.pset at 2020-06-04 11:59:55
Pib Create_test_data.input.pset ntawm 2020-06-04 11:59:55
Finish Create_test_data.input.pset ntawm 2020-06-04 12:00:23
Pib mdw_load.regular.current.dev42_1_db_usl_testing_subj_org_finval.pset at 2020-06-04 12:00:23
Ua tiav mdw_load.regular.current.dev42_1_db_usl_testing_subj_org_finval.pset at 2020-06-04 12:03:23
Pib Create_test_data.input.pset ntawm 2020-06-04 12:03:23
Finish Create_test_data.input.pset ntawm 2020-06-04 12:03:49
Pib mdw_load.regular.current.dev42_1_db_usl_testing_subj_org_finval.pset at 2020-06-04 12:03:49
Ua tiav mdw_load.regular.current.dev42_1_db_usl_testing_subj_org_finval.pset at 2020-06-04 12:06:46

Nws hloov tawm daim duab no:

Teeb
Pib lub sijhawm
Sijhawm tiav
Length

Tsim_test_data.input.pset
04.06.2020: 11: 49: 11
04.06.2020: 11: 49: 37
00:00:26

mdw_load.day_one.current.
dev42_1_db_usl_testing_subj_org_finval.pset
04.06.2020: 11: 49: 37
04.06.2020: 11: 50: 42
00:01:05

Tsim_test_data.input.pset
04.06.2020: 11: 50: 42
04.06.2020: 11: 51: 06
00:00:24

mdw_load.regular.current.
dev42_1_db_usl_testing_subj_org_finval.pset
04.06.2020: 11: 51: 06
04.06.2020: 11: 53: 41
00:02:35

Tsim_test_data.input.pset
04.06.2020: 11: 53: 41
04.06.2020: 11: 54: 04
00:00:23

mdw_load.regular.current.
dev42_1_db_usl_testing_subj_org_finval.pset
04.06.2020: 11: 54: 04
04.06.2020: 11: 56: 51
00:02:47

Tsim_test_data.input.pset
04.06.2020: 11: 56: 51
04.06.2020: 11: 57: 14
00:00:23

mdw_load.regular.current.
dev42_1_db_usl_testing_subj_org_finval.pset
04.06.2020: 11: 57: 14
04.06.2020: 11: 59: 55
00:02:41

Tsim_test_data.input.pset
04.06.2020: 11: 59: 55
04.06.2020: 12: 00: 23
00:00:28

mdw_load.regular.current.
dev42_1_db_usl_testing_subj_org_finval.pset
04.06.2020: 12: 00: 23
04.06.2020: 12: 03: 23
00:03:00

Tsim_test_data.input.pset
04.06.2020: 12: 03: 23
04.06.2020: 12: 03: 49
00:00:26

mdw_load.regular.current.
dev42_1_db_usl_testing_subj_org_finval.pset
04.06.2020: 12: 03: 49
04.06.2020: 12: 06: 46
00:02:57

Peb pom tias 6 increment kab ua tiav nyob rau hauv 000 feeb, uas yog ceev heev.
Cov ntaub ntawv nyob rau hauv lub hom phiaj lub rooj tau muab faib raws li nram no:

select valid_from_ts, valid_to_ts, count(1), min(sk), max(sk) from dev42_1_db_usl.TESTING_SUBJ_org_finval group by valid_from_ts, valid_to_ts order by 1,2;

Thaum koj muaj Sber nplai. Siv Ab Initio nrog Hive thiab GreenPlum
Koj tuaj yeem pom cov ntawv xov xwm ntawm cov ntaub ntawv tso rau lub sijhawm cov duab tau nthuav tawm.
Qhov no txhais tau tias koj tuaj yeem khiav nce ntxiv ntawm cov ntaub ntawv mus rau hauv GreenPlum hauv Ab Initio nrog lub zaus siab heev thiab saib xyuas qhov kev kub ceev ntawm kev ntxig cov ntaub ntawv no rau hauv GreenPlum. Tau kawg, nws yuav tsis tuaj yeem tso tawm ib zaug ib zaug, txij li Ab Initio, zoo li txhua yam cuab yeej ETL, xav tau sijhawm "pib" thaum pib.

xaus

Ab Initio yog tam sim no siv ntawm Sberbank los tsim kom muaj Unified Semantic Data Layer (ESS). Qhov project no suav nrog kev tsim kom muaj kev sib koom ua ke ntawm lub xeev ntawm ntau lub tuam txhab lag luam lag luam. Cov ntaub ntawv los ntawm ntau qhov chaw, cov ntawv luam tawm uas tau npaj rau ntawm Hadoop. Raws li kev xav tau ntawm kev lag luam, cov qauv ntaub ntawv tau npaj thiab cov ntaub ntawv hloov pauv tau piav qhia. Ab Initio thauj cov ntaub ntawv mus rau hauv ESN thiab cov ntaub ntawv rub tawm tsis yog tsuas yog txaus siab rau kev lag luam hauv nws tus kheej xwb, tab sis kuj tseem yog lub hauv paus rau kev tsim cov ntaub ntawv marts. Nyob rau tib lub sijhawm, kev ua haujlwm ntawm cov khoom tso cai rau koj siv ntau lub tshuab ua tus txais (Hive, Greenplum, Teradata, Oracle), uas ua rau nws yooj yim los npaj cov ntaub ntawv rau kev lag luam hauv ntau hom ntawv nws xav tau.

Ab Initio lub peev xwm yog qhov dav; piv txwv li, suav nrog MDW lub luag haujlwm ua rau nws muaj peev xwm tsim cov ntaub ntawv keeb kwm kev lag luam thiab kev lag luam tawm ntawm lub thawv. Rau cov neeg tsim khoom, Ab Initio ua rau nws tsis tuaj yeem rov tsim lub log, tab sis siv ntau yam khoom siv uas twb muaj lawm, uas yog cov tsev qiv ntawv tseem ceeb uas xav tau thaum ua haujlwm nrog cov ntaub ntawv.

Tus sau yog tus kws tshaj lij hauv zej zog kev tshaj lij ntawm Sberbank SberProfi DWH / BigData. SberProfi DWH/BigData cov kws tshaj lij hauv zej zog yog lub luag haujlwm rau kev tsim cov peev txheej hauv thaj chaw xws li Hadoop ecosystem, Teradata, Oracle DB, GreenPlum, nrog rau BI cov cuab yeej Qlik, SAP BO, Tableau, thiab lwm yam.

Tau qhov twg los: www.hab.com

Ntxiv ib saib