Lokacin da kake da ma'aunin Sber. Amfani da Ab Initio tare da Hive da GreenPlum

Wani lokaci da suka gabata, mun fuskanci tambayar zabar kayan aikin ETL don aiki tare da Big Data. Maganin Informatica BDM da aka yi amfani da shi a baya bai dace da mu ba saboda iyakantaccen aiki. An rage amfani da shi zuwa tsarin ƙaddamar da umarnin ƙaddamar da walƙiya. Babu analogues da yawa a kasuwa waɗanda suke, bisa manufa, masu iya aiki tare da ƙarar bayanan da muke hulɗa da su kowace rana. A ƙarshe mun zaɓi Ab Initio. Yayin zanga-zangar matukin jirgi, samfurin ya nuna saurin sarrafa bayanai sosai. Kusan babu bayani game da Ab Initio a cikin Rashanci, don haka mun yanke shawarar yin magana game da kwarewarmu akan Habré.

Ab Initio yana da sauye-sauye na al'ada da yawa da ba a saba ba, wanda za'a iya tsawaita lambar ta ta amfani da yaren PDL nata. Ga ƙaramin kasuwanci, irin wannan kayan aiki mai ƙarfi zai yi yuwuwa ya wuce kima, kuma yawancin ƙarfinsa na iya zama tsada da rashin amfani. Amma idan ma'aunin ku yana kusa da Sberov, to Ab Initio na iya zama mai ban sha'awa a gare ku.

Yana taimaka wa kasuwanci don tara ilimi a duniya da haɓaka yanayin halittu, kuma mai haɓakawa don haɓaka ƙwarewarsa a cikin ETL, haɓaka iliminsa a cikin harsashi, ba da damar sanin yaren PDL, yana ba da hoto na gani na ayyukan lodawa, da sauƙaƙe haɓakawa. saboda yawan abubuwan aiki.

A cikin wannan sakon zan yi magana game da iyawar Ab Initio kuma in ba da halayen kwatanta aikinta tare da Hive da GreenPlum.

  • Bayanin tsarin MDW da aiki akan keɓance shi don GreenPlum
  • Kwatanta aikin Ab Initio tsakanin Hive da GreenPlum
  • Yin aiki Ab Initio tare da GreenPlum a cikin Yanayin Kusa da Lokaci


Ayyukan wannan samfurin yana da faɗi sosai kuma yana buƙatar lokaci mai yawa don yin nazari. Duk da haka, tare da ƙwarewar aikin da ya dace da saitunan aiki daidai, sakamakon sarrafa bayanai yana da ban sha'awa sosai. Yin amfani da Ab Initio don mai haɓakawa na iya ba da ƙwarewa mai ban sha'awa. Wannan sabon abu ne game da ci gaban ETL, haɗaɗɗiya tsakanin yanayin gani da haɓaka haɓakawa a cikin harshe mai kama da rubutun.

Kasuwanci suna haɓaka yanayin yanayin su kuma wannan kayan aiki yana zuwa da amfani fiye da kowane lokaci. Tare da Ab Initio, zaku iya tara ilimi game da kasuwancin ku na yanzu kuma kuyi amfani da wannan ilimin don faɗaɗa tsofaffi da buɗe sabbin kasuwancin. Madadin Ab Initio sun haɗa da mahallin ci gaban gani Informatica BDM da wuraren ci gaban da ba na gani ba Apache Spark.

Bayanin Ab Initio

Ab Initio, kamar sauran kayan aikin ETL, tarin samfuran ne.

Lokacin da kake da ma'aunin Sber. Amfani da Ab Initio tare da Hive da GreenPlum

Ab Initio GDE (Graphical Development Environment) wani yanayi ne na mai haɓakawa wanda ya tsara canje-canjen bayanai kuma ya haɗa su da bayanan da ke gudana a cikin nau'i na kibiyoyi. A wannan yanayin, ana kiran irin wannan saitin sauye-sauyen jadawali:

Lokacin da kake da ma'aunin Sber. Amfani da Ab Initio tare da Hive da GreenPlum

Hanyoyin shigarwa da fitarwa na kayan aikin aiki tashoshin jiragen ruwa ne kuma sun ƙunshi filayen da aka ƙididdige su a cikin canje-canje. jadawalai da yawa da aka haɗa ta hanyar kwarara cikin nau'in kibiyoyi a tsarin aiwatar da su ana kiransu shiri.

Akwai abubuwa da yawa masu aiki ɗari, wanda yake da yawa. Yawancinsu sun kware sosai. Ƙarfin sauye-sauye na yau da kullun a cikin Ab Initio sun fi na sauran kayan aikin ETL. Misali, Join yana da abubuwa da yawa. Baya ga sakamakon haɗa saitin bayanai, zaku iya samun bayanan fitarwa na bayanan shigar da maɓallan waɗanda ba a iya haɗa su ba. Hakanan zaka iya samun ƙin yarda, kurakurai da log na aikin canji, wanda za'a iya karantawa a cikin ginshiƙi ɗaya azaman fayil ɗin rubutu kuma ana sarrafa shi tare da wasu canje-canje:

Lokacin da kake da ma'aunin Sber. Amfani da Ab Initio tare da Hive da GreenPlum

Ko, alal misali, zaku iya ƙirƙirar mai karɓar bayanai ta hanyar tebur kuma ku karanta bayanai daga gare ta a cikin ginshiƙi ɗaya.

Akwai sauye-sauye na asali. Misali, canjin Scan yana da ayyuka kama da ayyukan nazari. Akwai sauye-sauye tare da sunaye masu bayyana kansu: Ƙirƙiri Data, Karanta Excel, Daidaita, Tsara tsakanin Ƙungiyoyi, Shirye-shiryen Run, Run SQL, Haɗa tare da DB, da dai sauransu. Zane-zane na iya amfani da sigogi na lokaci-lokaci, ciki har da yiwuwar wucewa sigogi daga ko zuwa. tsarin aiki . Fayiloli tare da shirye-shiryen saitin sigogi da aka wuce zuwa jadawali ana kiran su saitin sigogi (psets).

Kamar yadda aka zata, Ab Initio GDE yana da nasa ma'ajiyar da ake kira EME (Enterprise Meta Environment). Masu haɓakawa suna da damar yin aiki tare da nau'ikan lambobi na gida kuma bincika ci gaban su cikin ma'ajiya ta tsakiya.

Yana yiwuwa, a lokacin aiwatarwa ko bayan aiwatar da jadawali, don danna kowane kwararar da ke haɗa canjin kuma duba bayanan da suka wuce tsakanin waɗannan canje-canje:

Lokacin da kake da ma'aunin Sber. Amfani da Ab Initio tare da Hive da GreenPlum

Hakanan yana yiwuwa a danna kowane rafi don ganin cikakkun bayanan bin diddigin - nawa daidaitattun canjin da aka yi aiki a ciki, layin nawa da bytes nawa aka ɗora a cikin wanne daidaitattun:

Lokacin da kake da ma'aunin Sber. Amfani da Ab Initio tare da Hive da GreenPlum

Yana yiwuwa a raba aiwatar da jadawali zuwa matakai da alama cewa wasu canje-canje suna buƙatar yin farko (a cikin sifili), na gaba a cikin kashi na farko, na gaba a cikin kashi na biyu, da sauransu.

Ga kowane canji, zaku iya zaɓar abin da ake kira layout (inda za a aiwatar da shi): ba tare da layi ɗaya ba ko a cikin layi ɗaya, adadin wanda za'a iya ƙayyade. A lokaci guda, fayilolin wucin gadi waɗanda Ab Initio ke ƙirƙira lokacin da sauye-sauye ke gudana ana iya sanya su duka a cikin tsarin fayil ɗin uwar garke kuma a cikin HDFS.

A cikin kowane canji, dangane da tsoho samfuri, zaku iya ƙirƙirar rubutun ku a cikin PDL, wanda yayi kama da harsashi.

Tare da PDL zaku iya tsawaita ayyukan sauye-sauye kuma, musamman, zaku iya haɓakawa (a lokacin aiki) ƙirƙirar gutsuttssun lambobin sabani dangane da sigogin lokacin aiki.

Ab Initio kuma yana da ingantaccen haɓakawa tare da OS ta harsashi. Musamman, Sberbank yana amfani da Linux ksh. Kuna iya musanya masu canji tare da harsashi kuma amfani da su azaman sigogin jadawali. Kuna iya kiran aiwatar da jadawalin Ab Initio daga harsashi kuma ku gudanar da Ab Initio.

Baya ga Ab Initio GDE, wasu samfuran da yawa an haɗa su a cikin bayarwa. Akwai nata Co>Operation System tare da da'awar ana kiranta da tsarin aiki. Akwai Sarrafa>Cibiyar da za ku iya tsarawa da lura da yadda ake zazzagewa. Akwai samfuran don yin haɓakawa a matakin farko fiye da Ab Initio GDE ya ba da izini.

Bayanin tsarin MDW da aiki akan keɓance shi don GreenPlum

Tare da samfuransa, mai siyar yana samar da samfurin MDW (Metadata Driven Warehouse), wanda shine na'urar daidaitawa da aka tsara don taimakawa tare da ayyuka na yau da kullun na yawan wuraren ajiyar bayanai ko rumbun adana bayanai.

Ya ƙunshi al'ada (takamaiman aikin) na'urorin tantance bayanan metadata da shirye-shiryen code janareta daga cikin akwatin.

Lokacin da kake da ma'aunin Sber. Amfani da Ab Initio tare da Hive da GreenPlum
A matsayin shigarwa, MDW yana karɓar samfurin bayanai, fayil ɗin daidaitawa don saita haɗi zuwa bayanan bayanai (Oracle, Teradata ko Hive) da wasu saitunan. Ƙayyadaddun ɓangaren aikin, alal misali, yana tura samfurin zuwa bayanan bayanai. Bangaren waje na samfurin yana haifar da zane-zane da fayilolin daidaitawa don su ta hanyar loda bayanai cikin tebur na ƙira. A wannan yanayin, an ƙirƙiri jadawalai (da psets) don nau'ikan farawa da ƙarin aiki akan sabunta abubuwan.

A cikin al'amuran Hive da RDBMS, ana ƙirƙira zane-zane daban-daban don farawa da ƙarin sabunta bayanai.

A cikin yanayin Hive, ana haɗa bayanan delta mai shigowa ta hanyar Ab Initio Haɗa tare da bayanan da ke cikin tebur kafin sabuntawa. Masu loda bayanai a cikin MDW (dukansu a cikin Hive da RDBMS) ba kawai saka sabbin bayanai daga delta ba, har ma suna rufe lokutan dacewar bayanan waɗanda maɓallan farko suka karɓi delta. Bugu da kari, dole ne ka sake rubuta sashin bayanan da bai canza ba. Amma wannan dole ne a yi saboda Hive ba shi da gogewa ko sabunta ayyukan.

Lokacin da kake da ma'aunin Sber. Amfani da Ab Initio tare da Hive da GreenPlum

A cikin yanayin RDBMS, jadawalai don haɓaka bayanai na haɓaka sun fi dacewa, saboda RDBMS suna da ƙarfin haɓakawa na gaske.

Lokacin da kake da ma'aunin Sber. Amfani da Ab Initio tare da Hive da GreenPlum

Ana ɗora nauyin delta da aka karɓa a cikin tebur na matsakaici a cikin bayanan. Bayan wannan, an haɗa delta zuwa bayanan da ke cikin tebur kafin sabuntawa. Kuma ana yin wannan ta amfani da SQL ta amfani da tambayar SQL da aka samar. Bayan haka, ta amfani da umarnin SQL share + saka, ana shigar da sabbin bayanai daga delta a cikin tebur ɗin da aka yi niyya kuma ana rufe lokutan dacewa da bayanan waɗanda maɓallan farko suka karɓi delta.
Babu buƙatar sake rubuta bayanan da basu canza ba.

Don haka mun kai ga ƙarshe cewa game da Hive, MDW dole ne ya je ya sake rubuta dukkan tebur ɗin saboda Hive ba ta da aikin sabuntawa. Kuma babu wani abu mafi kyau fiye da sake rubuta bayanan gaba ɗaya lokacin da aka ƙirƙira sabuntawa. A cikin yanayin RDBMS, akasin haka, waɗanda suka ƙirƙira samfurin sun ga ya zama dole su ba da haɗin kai da sabunta tebur don amfani da SQL.

Don wani aiki a Sberbank, mun ƙirƙiri sabon, sake amfani da aiwatar da mai ɗaukar bayanai na GreenPlum. Anyi wannan ne bisa sigar da MDW ke samarwa don Teradata. Teradata ne, ba Oracle ba, wanda ya zo kusa kuma mafi kyau ga wannan, saboda ... Hakanan tsarin MPP ne. Hanyoyin aiki, da kuma haɗin kai, na Teradata da GreenPlum sun zama kama.

Misalan bambance-bambance masu mahimmanci na MDW tsakanin RDBMS daban-daban sune kamar haka. A cikin GreenPlum, ba kamar Teradata ba, lokacin ƙirƙirar tebur kuna buƙatar rubuta jumla

distributed by

Teradata ya rubuta:

delete <table> all

, kuma a cikin GreenPlum sun rubuta

delete from <table>

A cikin Oracle, don inganta dalilai suna rubutawa

delete from t where rowid in (<соединение t с дельтой>)

, da Teradata da GreenPlum sun rubuta

delete from t where exists (select * from delta where delta.pk=t.pk)

Mun kuma lura cewa don Ab Initio yayi aiki tare da GreenPlum, ya zama dole a shigar da abokin ciniki na GreenPlum akan duk nodes na gungu Ab Initio. Wannan saboda mun haɗa zuwa GreenPlum lokaci guda daga duk nodes a cikin tarin mu. Kuma don karantawa daga GreenPlum ya zama daidai da kowane zaren Ab Initio don karanta nasa ɓangaren bayanai daga GreenPlum, dole ne mu sanya ginin da Ab Initio ya fahimta a cikin sashin "inda" na tambayoyin SQL.

where ABLOCAL()

da kuma ƙayyade ƙimar wannan ginin ta hanyar ƙididdige ma'auni na karatun daga bayanan canji

ablocal_expr=«string_concat("mod(t.", string_filter_out("{$TABLE_KEY}","{}"), ",", (decimal(3))(number_of_partitions()),")=", (decimal(3))(this_partition()))»

, wanda ya tattara zuwa wani abu kamar

mod(sk,10)=3

, i.e. dole ne ka faɗakar da GreenPlum tare da bayyananniyar tacewa ga kowane bangare. Don sauran bayanan bayanai (Teradata, Oracle), Ab Initio na iya yin wannan daidaitawa ta atomatik.

Kwatanta aikin Ab Initio tsakanin Hive da GreenPlum

Sberbank ya gudanar da gwaji don kwatanta aikin jadawali da aka samar da MDW dangane da Hive da kuma dangane da GreenPlum. A matsayin wani ɓangare na gwajin, a cikin yanayin Hive akwai nodes 5 akan gungu iri ɗaya da Ab Initio, kuma a cikin yanayin GreenPlum akwai nodes 4 akan wani gungu daban. Wadancan. Hive yana da fa'idar kayan masarufi akan GreenPlum.

Mun yi la'akari da nau'i-nau'i biyu na jadawali suna yin aiki iri ɗaya na sabunta bayanai a cikin Hive da GreenPlum. A lokaci guda, an ƙaddamar da jadawalin da mai daidaitawa na MDW:

  • kaya na farko + ƙarin nauyin bayanai da aka ƙirƙira ba da gangan ba a cikin teburin Hive
  • kaya na farko + ƙarin nauyin bayanai da aka ƙirƙira ba da gangan ba cikin teburin GreenPlum iri ɗaya

A cikin duka biyun (Hive da GreenPlum) sun gudanar da lodawa zuwa zaren layi ɗaya guda 10 akan gungu Ab Initio iri ɗaya. Ab Initio ya adana bayanan matsakaici don ƙididdigewa a cikin HDFS (dangane da Ab Initio, shimfidar MFS ta amfani da HDFS an yi amfani da shi). Ɗayan layi na bayanan da aka ƙirƙira ba da gangan ba ya mamaye bytes 200 a cikin biyun.

Sakamakon ya kasance kamar haka:

Ƙugiya:

Farkon lodi a cikin Hive

Saka layuka
6 000 000
60 000 000
600 000 000

Tsawon farawa
zazzagewa a cikin daƙiƙa
41
203
1 601

Ƙarar lodi a cikin Hive

Adadin layuka da ake samu a ciki
tebur manufa a farkon gwaji
6 000 000
60 000 000
600 000 000

Yawan layin delta da aka yi amfani da su
tebur manufa yayin gwaji
6 000 000
6 000 000
6 000 000

Tsawon kari
zazzagewa a cikin daƙiƙa
88
299
2 541

GreenPlum:

Load na farko a cikin GreenPlum

Saka layuka
6 000 000
60 000 000
600 000 000

Tsawon farawa
zazzagewa a cikin daƙiƙa
72
360
3 631

Ƙarar lodi a cikin GreenPlum

Adadin layuka da ake samu a ciki
tebur manufa a farkon gwaji
6 000 000
60 000 000
600 000 000

Yawan layin delta da aka yi amfani da su
tebur manufa yayin gwaji
6 000 000
6 000 000
6 000 000

Tsawon kari
zazzagewa a cikin daƙiƙa
159
199
321

Mun ga cewa gudun farko loading a duka Hive da GreenPlum linearly ya dogara da adadin bayanai da, saboda dalilai mafi kyau hardware, shi ne dan kadan sauri ga Hive fiye da na GreenPlum.

Ƙara yawan lodi a cikin Hive shima yana dogara akan girman bayanan da aka ɗora a baya a cikin tebur ɗin da aka yi niyya kuma yana ci gaba a hankali yayin da ƙarar ke girma. Wannan yana faruwa ne saboda buƙatar sake rubuta teburin da aka yi niyya gaba ɗaya. Wannan yana nufin cewa yin amfani da ƙananan canje-canje zuwa manyan teburi ba kyakkyawan yanayin amfani bane ga Hive.

Ƙarar lodi a cikin GreenPlum a raunane ya dogara da ƙarar bayanan da aka ɗora a baya da ake samu a cikin teburin da aka yi niyya kuma yana ci gaba da sauri. Wannan ya faru godiya ga SQL Joins da GreenPlum gine, wanda ke ba da damar aikin sharewa.

Don haka, GreenPlum yana ƙara delta ta amfani da hanyar share+saka, amma Hive ba ta da gogewa ko sabunta ayyukan, don haka an tilasta wa dukkan tsararrun bayanan sake rubutawa gaba ɗaya yayin sabuntawar haɓakawa. Kwatankwacin sel da aka yi alama da ƙarfi shine mafi bayyanawa, tunda ya dace da zaɓi na gama gari don amfani da abubuwan zazzagewa masu ƙarfi. Mun ga cewa GreenPlum ta doke Hive a cikin wannan gwajin sau 8.

Yin aiki Ab Initio tare da GreenPlum a cikin Yanayin Kusa da Lokaci

A cikin wannan gwaji, za mu gwada ikon Ab Initio don sabunta tebur na GreenPlum tare da ɓangarorin bayanai da aka ƙirƙira a kusa da ainihin lokaci. Mu yi la'akari da teburin GreenPlum dev42_1_db_usl.TESTING_SUBJ_org_finval, wanda za mu yi aiki da shi.

Za mu yi amfani da jadawali Ab Initio guda uku don yin aiki tare da shi:

1) Graph Create_test_data.mp - yana ƙirƙirar fayilolin bayanai a cikin HDFS tare da layuka 10 a cikin zaren layi ɗaya guda 6. Bayanan ba zato ba tsammani, an tsara tsarinsa don sakawa cikin teburin mu

Lokacin da kake da ma'aunin Sber. Amfani da Ab Initio tare da Hive da GreenPlum

Lokacin da kake da ma'aunin Sber. Amfani da Ab Initio tare da Hive da GreenPlum

2) Graph mdw_load.day_one.current.dev42_1_db_usl_testing_subj_org_finval.pset - MDW ya samar da jadawali ta hanyar fara shigar da bayanai a cikin teburin mu a cikin zaren layi guda 10 (ana amfani da bayanan gwaji da aka samar ta jadawali (1))

Lokacin da kake da ma'aunin Sber. Amfani da Ab Initio tare da Hive da GreenPlum

3) Graph mdw_load.regular.current.dev42_1_db_usl_testing_subj_org_finval.pset - wani jadawali da MDW ya samar don ƙarin sabuntawa na teburin mu a cikin zaren layi guda 10 ta hanyar amfani da wani yanki na sabbin bayanan da aka karɓa (1)

Lokacin da kake da ma'aunin Sber. Amfani da Ab Initio tare da Hive da GreenPlum

Bari mu gudanar da rubutun da ke ƙasa a yanayin NRT:

  • samar da layukan gwaji 6
  • yi gwajin farko na saka layuka 6 a cikin tebur mara komai
  • maimaita ƙarar saukewa sau 5
    • samar da layukan gwaji 6
    • aiwatar da ƙarar layukan gwaji 6 a cikin tebur (a wannan yanayin, an saita lokacin ƙarewar_to_ts zuwa tsohuwar bayanai kuma an saka ƙarin bayanan kwanan nan tare da maɓallin farko iri ɗaya)

Wannan yanayin yana kwaikwayon yanayin ainihin aiki na wani tsarin kasuwanci - babban ɓangaren sabbin bayanai yana bayyana a ainihin lokacin kuma an zuba shi nan da nan a cikin GreenPlum.

Yanzu bari mu kalli gunkin rubutun:

Fara Create_test_data.input.pset a 2020-06-04 11:49:11
Gama Create_test_data.input.pset a 2020-06-04 11:49:37
Fara mdw_load.day_one.current.dev42_1_db_usl_testing_subj_org_finval.pset a 2020-06-04 11:49:37
Gama mdw_load.day_one.current.dev42_1_db_usl_testing_subj_org_finval.pset a 2020-06-04 11:50:42
Fara Create_test_data.input.pset a 2020-06-04 11:50:42
Gama Create_test_data.input.pset a 2020-06-04 11:51:06
Fara mdw_load.regular.current.dev42_1_db_usl_testing_subj_org_finval.pset a 2020-06-04 11:51:06
Gama mdw_load.regular.current.dev42_1_db_usl_testing_subj_org_finval.pset a 2020-06-04 11:53:41
Fara Create_test_data.input.pset a 2020-06-04 11:53:41
Gama Create_test_data.input.pset a 2020-06-04 11:54:04
Fara mdw_load.regular.current.dev42_1_db_usl_testing_subj_org_finval.pset a 2020-06-04 11:54:04
Gama mdw_load.regular.current.dev42_1_db_usl_testing_subj_org_finval.pset a 2020-06-04 11:56:51
Fara Create_test_data.input.pset a 2020-06-04 11:56:51
Gama Create_test_data.input.pset a 2020-06-04 11:57:14
Fara mdw_load.regular.current.dev42_1_db_usl_testing_subj_org_finval.pset a 2020-06-04 11:57:14
Gama mdw_load.regular.current.dev42_1_db_usl_testing_subj_org_finval.pset a 2020-06-04 11:59:55
Fara Create_test_data.input.pset a 2020-06-04 11:59:55
Gama Create_test_data.input.pset a 2020-06-04 12:00:23
Fara mdw_load.regular.current.dev42_1_db_usl_testing_subj_org_finval.pset a 2020-06-04 12:00:23
Gama mdw_load.regular.current.dev42_1_db_usl_testing_subj_org_finval.pset a 2020-06-04 12:03:23
Fara Create_test_data.input.pset a 2020-06-04 12:03:23
Gama Create_test_data.input.pset a 2020-06-04 12:03:49
Fara mdw_load.regular.current.dev42_1_db_usl_testing_subj_org_finval.pset a 2020-06-04 12:03:49
Gama mdw_load.regular.current.dev42_1_db_usl_testing_subj_org_finval.pset a 2020-06-04 12:06:46

Ya fito wannan hoton:

Shafi
Fara lokaci
Kammala lokaci
Length

Ƙirƙiri_test_data.input.pset
04.06.2020 11: 49: 11
04.06.2020 11: 49: 37
00:00:26

mdw_load.day_one.yanzu.
dev42_1_db_usl_testing_subj_org_finval.pset
04.06.2020 11: 49: 37
04.06.2020 11: 50: 42
00:01:05

Ƙirƙiri_test_data.input.pset
04.06.2020 11: 50: 42
04.06.2020 11: 51: 06
00:00:24

mdw_load.na yau da kullun.
dev42_1_db_usl_testing_subj_org_finval.pset
04.06.2020 11: 51: 06
04.06.2020 11: 53: 41
00:02:35

Ƙirƙiri_test_data.input.pset
04.06.2020 11: 53: 41
04.06.2020 11: 54: 04
00:00:23

mdw_load.na yau da kullun.
dev42_1_db_usl_testing_subj_org_finval.pset
04.06.2020 11: 54: 04
04.06.2020 11: 56: 51
00:02:47

Ƙirƙiri_test_data.input.pset
04.06.2020 11: 56: 51
04.06.2020 11: 57: 14
00:00:23

mdw_load.na yau da kullun.
dev42_1_db_usl_testing_subj_org_finval.pset
04.06.2020 11: 57: 14
04.06.2020 11: 59: 55
00:02:41

Ƙirƙiri_test_data.input.pset
04.06.2020 11: 59: 55
04.06.2020 12: 00: 23
00:00:28

mdw_load.na yau da kullun.
dev42_1_db_usl_testing_subj_org_finval.pset
04.06.2020 12: 00: 23
04.06.2020 12: 03: 23
00:03:00

Ƙirƙiri_test_data.input.pset
04.06.2020 12: 03: 23
04.06.2020 12: 03: 49
00:00:26

mdw_load.na yau da kullun.
dev42_1_db_usl_testing_subj_org_finval.pset
04.06.2020 12: 03: 49
04.06.2020 12: 06: 46
00:02:57

Mun ga cewa ana sarrafa layukan haɓaka 6 a cikin mintuna 000, wanda ke da sauri sosai.
Bayanan da ke cikin teburin da aka yi niyya sun juya zuwa rarraba kamar haka:

select valid_from_ts, valid_to_ts, count(1), min(sk), max(sk) from dev42_1_db_usl.TESTING_SUBJ_org_finval group by valid_from_ts, valid_to_ts order by 1,2;

Lokacin da kake da ma'aunin Sber. Amfani da Ab Initio tare da Hive da GreenPlum
Kuna iya ganin saƙonnin bayanan da aka saka zuwa lokutan da aka ƙaddamar da jadawali.
Wannan yana nufin za ku iya tafiyar da ƙara yawan bayanai zuwa cikin GreenPlum a Ab Initio tare da mitar mai yawa kuma ku lura da babban saurin shigar da wannan bayanan cikin GreenPlum. Tabbas, ba zai yiwu a ƙaddamar da sau ɗaya a sakan ɗaya ba, tunda Ab Initio, kamar kowane kayan aikin ETL, yana buƙatar lokaci don "farawa" lokacin ƙaddamarwa.

ƙarshe

A halin yanzu ana amfani da Ab Initio a Sberbank don gina Haɗin Ƙirar Bayanan Halitta (ESS). Wannan aikin ya ƙunshi gina haɗin kai na yanayin ƙungiyoyin kasuwanci na banki daban-daban. Bayanai sun zo daga tushe daban-daban, wanda aka shirya kwafin su akan Hadoop. Dangane da buƙatun kasuwanci, an shirya samfurin bayanai kuma an bayyana canjin bayanai. Ab Initio yana loda bayanai a cikin ESN kuma bayanan da aka zazzage ba wai kawai sha'awar kasuwancin bane a cikin kanta, amma kuma yana aiki azaman tushen ginin bayanan marts. A lokaci guda, aikin samfurin yana ba ku damar amfani da tsarin daban-daban azaman mai karɓa (Hive, Greenplum, Teradata, Oracle), wanda ke ba ku damar shirya bayanai cikin sauƙi don kasuwanci a cikin nau'ikan nau'ikan da ake buƙata.

Ƙarfin Ab Initio yana da faɗi; misali, tsarin MDW da aka haɗa yana ba da damar gina bayanan fasaha da na kasuwanci daga cikin akwatin. Ga masu haɓakawa, Ab Initio yana ba da damar kada a sake ƙirƙira dabaran, amma don amfani da yawancin kayan aikin da ake da su, waɗanda ainihin ɗakunan karatu ne da ake buƙata yayin aiki tare da bayanai.

Marubucin gwani ne a cikin ƙwararrun al'umma na Sberbank SberProfi DWH/BigData. SberProfi DWH/BigData ƙwararrun al'umma suna da alhakin haɓaka ƙwarewa a irin waɗannan yankuna kamar Hadoop ecosystem, Teradata, Oracle DB, GreenPlum, da BI kayan aikin Qlik, SAP BO, Tableau, da sauransu.

source: www.habr.com

Add a comment