å°ãåãç§ãã¡ã¯ããã° ããŒã¿ãæäœããããã® ETL ããŒã«ãéžæãããšããåé¡ã«çŽé¢ããŸããã 以åã«äœ¿çšããŠãã Informatica BDM ãœãªã¥ãŒã·ã§ã³ã¯ãæ©èœãéãããŠãããããç§ãã¡ã«ã¯åããŸããã§ããã ãã®äœ¿çšã¯ãspark-submit ã³ãã³ããèµ·åããããã®ãã¬ãŒã ã¯ãŒã¯ã«çž®å°ãããŸããã ååãšããŠãç§ãã¡ãæ¯æ¥æ±ã倧éã®ããŒã¿ãåŠçã§ããé¡äŒŒè£œåã¯åžå Žã«ã¯ããŸããããŸããã§ããã çµå±ãAb InitioãéžæããŸããã ãã€ããã ãã¢ã³ã¹ãã¬ãŒã·ã§ã³äžããã®è£œåã¯éåžžã«é«ãããŒã¿åŠçé床ã瀺ããŸããã ãã·ã¢èªã§ã¯ Ab Initio ã«é¢ããæ å ±ãã»ãšãã©ãªããããHabré ã§ã®çµéšã«ã€ããŠè©±ãããšã«ããŸããã
Ab Initio ã«ã¯å€ãã®å€å žçãªå€æãšçããå€æãããããã®ã³ãŒãã¯ç¬èªã® PDL èšèªã䜿çšããŠæ¡åŒµã§ããŸãã äžå°äŒæ¥ã«ãšã£ãŠããã®ãããªåŒ·åãªããŒã«ã¯éå°ã§ããå¯èœæ§ãé«ãããã®æ©èœã®ã»ãšãã©ã¯é«äŸ¡ã§æªäœ¿çšã§ããå¯èœæ§ããããŸãã ããããããªãã®ã¹ã±ãŒã«ãã¹ãããã«è¿ãå Žåã¯ãAb Initio ã«èå³ããããããããŸããã
ããã¯ãäŒæ¥ãäžççã«ç¥èãèç©ããŠãšã³ã·ã¹ãã ãéçºããã®ã«åœ¹ç«ã¡ãéçºè 㯠ETL ã®ã¹ãã«ãåäžãããã·ã§ã«ã®ç¥èãåäžãããPDL èšèªãç¿åŸããæ©äŒãæäŸããèªã¿èŸŒã¿ããã»ã¹ãèŠèŠçã«ææ¡ããéçºãç°¡çŽ åããã®ã«åœ¹ç«ã¡ãŸããæ©èœéšåãè±å¯ã«å«ãŸããŠããããã§ãã
ãã®æçš¿ã§ã¯ãAb Initio ã®æ©èœã«ã€ããŠèª¬æããHive ããã³ GreenPlum ãšã®åäœã®æ¯èŒç¹æ§ã瀺ããŸãã
- MDW ãã¬ãŒã ã¯ãŒã¯ã®èª¬æãšãGreenPlum åãã®ã«ã¹ã¿ãã€ãºã«é¢ããäœæ¥
- Hive ãš GreenPlum ã® Ab Initio ããã©ãŒãã³ã¹ã®æ¯èŒ
- GreenPlum ãæºãªã¢ã«ã¿ã€ã ã¢ãŒãã§äœ¿çšãã Ab Initio ã®äœæ¥
ãã®è£œåã®æ©èœã¯éåžžã«å¹
åºããããåŠç¿ã«ã¯å€ãã®æéãããããŸãã ãã ããé©åãªäœæ¥ã¹ãã«ãšé©åãªããã©ãŒãã³ã¹èšå®ãããã°ãããŒã¿åŠçã®çµæã¯éåžžã«åªããŠããŸãã éçºè
ã Ab Initio ã䜿çšãããšãèå³æ·±ãäœéšãåŸãããŸãã ãã㯠ETL éçºã®æ°ããèãæ¹ã§ãããããžã¥ã¢ã«ç°å¢ãšã¹ã¯ãªããã®ãããªèšèªã«ããããŠã³ããŒãéçºã®ãã€ããªããã§ãã
äŒæ¥ã¯ãšã³ã·ã¹ãã ãéçºããŠããããã®ããŒã«ã¯ãããŸã§ä»¥äžã«äŸ¿å©ã«ãªã£ãŠããŸãã Ab Initio ã䜿çšãããšãçŸåšã®ããžãã¹ã«é¢ããç¥èãèç©ãããã®ç¥èã䜿çšããŠå€ãããžãã¹ãæ°ããããžãã¹ãæ¡å€§ããããšãã§ããŸãã Ab Initio ã®ä»£æ¿ã«ã¯ãããžã¥ã¢ã«éçºç°å¢ Informatica BDM ãéããžã¥ã¢ã«éçºç°å¢ Apache Spark ããããŸãã
Ab Initioã®èª¬æ
Ab Initio ã¯ãä»ã® ETL ããŒã«ãšåæ§ã補åã®ã³ã¬ã¯ã·ã§ã³ã§ãã
Ab Initio GDE (ã°ã©ãã£ã«ã«éçºç°å¢) ã¯ãéçºè
ãããŒã¿å€æãæ§æããç¢å°ã®åœ¢ã§ããŒã¿ ãããŒã«æ¥ç¶ããããã®ç°å¢ã§ãã ãã®å Žåããã®ãããªäžé£ã®å€æã¯ã°ã©ããšåŒã°ããŸãã
æ©èœã³ã³ããŒãã³ãã®å
¥åããã³åºåæ¥ç¶ã¯ããŒãã§ãããå€æå
ã§èšç®ããããã£ãŒã«ããå«ãŸããŸãã å®è¡é ã«ç¢å°ã®åœ¢ã§ãããŒã§çµã°ããè€æ°ã®ã°ã©ãããã©ã³ãšåŒã³ãŸãã
æ©èœã³ã³ããŒãã³ãã¯æ°çŸåãšèšå€§ã§ãã ãã®å€ãã¯é«åºŠã«å°éåãããŠããŸãã Ab Initio ã®åŸæ¥ã®å€ææ©èœã¯ãä»ã® ETL ããŒã«ãããå¹ åºãæ©èœãåããŠããŸãã ããšãã°ãJoin ã«ã¯è€æ°ã®åºåããããŸãã ããŒã¿ã»ããã®æ¥ç¶çµæã«å ããŠãããŒãæ¥ç¶ã§ããªãã£ãå ¥åããŒã¿ã»ããã®åºåã¬ã³ãŒããååŸã§ããŸãã ãŸããå€ææäœã®æåŠããšã©ãŒããã°ãååŸããããšãã§ããŸãããããã¯ããã¹ã ãã¡ã€ã«ãšåãåã§èªã¿åã£ãŠãä»ã®å€æã§åŠçã§ããŸãã
ãããã¯ãããšãã°ãããŒã¿ ã¬ã·ãŒããŒãããŒãã«ã®åœ¢åŒã§å®äœåããåãåã§ããããããŒã¿ãèªã¿åãããšãã§ããŸãã
ãªãªãžãã«ã®å€åœ¢ããããŸãã ããšãã°ãã¹ãã£ã³å€æã«ã¯åæé¢æ°ãšåæ§ã®æ©èœããããŸãã ããŒã¿ã®äœæãExcel ã®èªã¿åããæ£èŠåãã°ã«ãŒãå ã§ã®äžŠã¹æ¿ããããã°ã©ã ã®å®è¡ãSQL ã®å®è¡ãDB ãšã®çµåãªã©ãããããããååã®å€æããããŸããã°ã©ãã§ã¯ããã©ã¡ãŒã¿ã®åãæž¡ããã°ã©ããžã®ãã©ã¡ãŒã¿ã®åãæž¡ããªã©ãå®è¡æãã©ã¡ãŒã¿ã䜿çšã§ããŸãããªãã¬ãŒãã£ã³ã° ã·ã¹ãã ã ã°ã©ãã«æž¡ãããæ¢è£œã®ãã©ã¡ãŒã¿ãŒã®ã»ãããå«ããã¡ã€ã«ã¯ããã©ã¡ãŒã¿ãŒ ã»ãã (pset) ãšåŒã°ããŸãã
äºæ³éããAb Initio GDE ã«ã¯ EME (Enterprise Meta Environment) ãšåŒã°ããç¬èªã®ãªããžããªããããŸãã éçºè ã¯ããŒã«ã« ããŒãžã§ã³ã®ã³ãŒããæäœããéçºçµæãäžå€®ãªããžããªã«ãã§ãã¯ã€ã³ããæ©äŒããããŸãã
ã°ã©ãã®å®è¡äžãŸãã¯å®è¡åŸã«ãå€æãæ¥ç¶ãããããŒãã¯ãªãã¯ããŠããããã®å€æéã§æž¡ãããããŒã¿ã確èªããããšãã§ããŸãã
ä»»æã®ã¹ããªãŒã ãã¯ãªãã¯ããŠãå€æãæ©èœãã䞊åæ°ãã©ã®äžŠåã«ããŒããããè¡æ°ãšãã€ãæ°ãªã©ã远跡ã®è©³çŽ°ã確èªããããšãã§ããŸãã
ã°ã©ãã®å®è¡ããã§ãŒãºã«åå²ããäžéšã®å€æãæåã« (ãŒããã§ãŒãºã§) å®è¡ããå¿
èŠãããã次ã®å€æã第 XNUMX ãã§ãŒãºã§å®è¡ãã次ã®å€æã第 XNUMX ãã§ãŒãºã§å®è¡ããå¿
èŠãããããšãããŒã¯ããããšãã§ããŸãã
å€æããšã«ãããããã¬ã€ã¢ãŠã (å€æãå®è¡ãããå Žæ) ãéžæã§ããŸãã䞊åãªãããŸãã¯äžŠåã¹ã¬ããã§ããã®æ°ãæå®ã§ããŸãã åæã«ãå€æã®å®è¡äžã« Ab Initio ãäœæããäžæãã¡ã€ã«ããµãŒã㌠ãã¡ã€ã« ã·ã¹ãã ãš HDFS ã®äž¡æ¹ã«é 眮ã§ããŸãã
åå€æã§ã¯ãããã©ã«ãã®ãã³ãã¬ãŒãã«åºã¥ããŠãã·ã§ã«ã«äŒŒãç¬èªã®ã¹ã¯ãªããã PDL ã§äœæã§ããŸãã
PDL ã䜿çšãããšãå€æã®æ©èœãæ¡åŒµã§ããç¹ã«ãå®è¡æãã©ã¡ãŒã¿ãŒã«å¿ããŠä»»æã®ã³ãŒã ãã©ã°ã¡ã³ãã (å®è¡æã«) åçã«çæã§ããŸãã
Ab Initio ã¯ãã·ã§ã«ãä»ãã OS ãšã®çµ±åãããéçºãããŠããŸãã å ·äœçã«ã¯ãSberbank 㯠Linux ksh ã䜿çšããŸãã ã·ã§ã«ãšå€æ°ã亀æããã°ã©ãã®ãã©ã¡ãŒã¿ãŒãšããŠäœ¿çšã§ããŸãã ã·ã§ã«ãã Ab Initio ã°ã©ãã®å®è¡ãåŒã³åºããAb Initio ã管çã§ããŸãã
Ab Initio GDE ã«å ããŠãä»ã®å€ãã®è£œåãé ä¿¡ãããŸãã ãªãã¬ãŒãã£ã³ã° ã·ã¹ãã ãšåŒã°ããç¬èªã® Co>Operation System ããããŸãã ããŠã³ããŒã ãããŒãã¹ã±ãžã¥ãŒã«ããã³ç£èŠã§ãã [ã³ã³ãããŒã«] > [ã»ã³ã¿ãŒ] ããããŸãã Ab Initio GDE ãããããã«åå§çãªã¬ãã«ã§éçºãè¡ãããã®è£œåããããŸãã
MDW ãã¬ãŒã ã¯ãŒã¯ã®èª¬æãšãGreenPlum åãã®ã«ã¹ã¿ãã€ãºã«é¢ããäœæ¥
ãã³ããŒã¯ãèªç€Ÿã®è£œåãšãšãã«ãMDW (Metadata Driven Warehouse) 補åãæäŸããŠããŸããããã¯ãããŒã¿ ãŠã§ã¢ããŠã¹ãŸãã¯ããŒã¿ ãã«ãã«ããŒã¿ãè¿œå ããäžè¬çãªã¿ã¹ã¯ãæ¯æŽããããã«èšèšãããã°ã©ã ã³ã³ãã£ã®ã¥ã¬ãŒã¿ãŒã§ãã
ããã«ã¯ãã«ã¹ã¿ã (ãããžã§ã¯ãåºæ) ã¡ã¿ããŒã¿ ããŒãµãŒãšãããã«äœ¿çšã§ããæ¢è£œã®ã³ãŒã ãžã§ãã¬ãŒã¿ãŒãå«ãŸããŠããŸãã
MDW ã¯å
¥åãšããŠãããŒã¿ ã¢ãã«ãããŒã¿ããŒã¹ (OracleãTeradataããŸã㯠Hive) ãžã®æ¥ç¶ãã»ããã¢ããããããã®æ§æãã¡ã€ã«ãããã³ãã®ä»ã®èšå®ãåãåããŸãã ããšãã°ããããžã§ã¯ãåºæã®éšåã§ã¯ãã¢ãã«ãããŒã¿ããŒã¹ã«ãããã€ããŸãã 補åã®ããã«äœ¿çšã§ããéšåã§ã¯ãããŒã¿ãã¢ãã« ããŒãã«ã«ããŒãããããšã«ãã£ãŠãã°ã©ããšãã®æ§æãã¡ã€ã«ãçæãããŸãã ãã®å Žåãã°ã©ã (ããã³ pset) ã¯ããšã³ãã£ãã£ã®æŽæ°ã«é¢ããåæåããã³å¢åäœæ¥ã®ããã€ãã®ã¢ãŒãã«å¯ŸããŠäœæãããŸãã
Hive ãš RDBMS ã®å Žåãåæåãšå¢åããŒã¿æŽæ°çšã«ç°ãªãã°ã©ããçæãããŸãã
Hive ã®å Žåãåä¿¡ãã«ã¿ ããŒã¿ã¯ãæŽæ°åã«ããŒãã«ã«ãã£ãããŒã¿ãš Ab Initio Join ãä»ããŠæ¥ç¶ãããŸãã MDW ã®ããŒã¿ ããŒã㌠(Hive ãš RDBMS ã®äž¡æ¹) ã¯ããã«ã¿ããæ°ããããŒã¿ãæ¿å ¥ããã ãã§ãªããäž»ããŒããã«ã¿ãåãåã£ãããŒã¿ã®é¢é£æéãéããŸãã ããã«ãããŒã¿ã®å€æŽãããŠããªãéšåãæžãæããå¿ èŠããããŸãã ãã ããHive ã«ã¯åé€ãŸãã¯æŽæ°æäœããªãããããããè¡ãå¿ èŠããããŸãã
RDBMS ã®å ŽåãRDBMS ã«ã¯å®éã®æŽæ°æ©èœããããããå¢åããŒã¿æŽæ°ã®ã°ã©ããããæé©ã«èŠããŸãã
åä¿¡ãããã«ã¿ã¯ããŒã¿ããŒã¹å
ã®äžéããŒãã«ã«ããŒããããŸãã ãã®åŸããã«ã¿ã¯æŽæ°åã«ããŒãã«ã«ãã£ãããŒã¿ã«æ¥ç¶ãããŸãã ããã¯ãçæããã SQL ã¯ãšãªã䜿çšã㊠SQL ã䜿çšããŠå®è¡ãããŸãã 次ã«ãSQL ã³ãã³ã delete+insert ã䜿çšããŠããã«ã¿ããã®æ°ããããŒã¿ãã¿ãŒã²ãã ããŒãã«ã«æ¿å
¥ãããäž»ããŒããã«ã¿ãåãåã£ãããŒã¿ã®é¢é£æéãéããããŸãã
å€æŽãããŠããªãããŒã¿ãæžãçŽãå¿
èŠã¯ãããŸããã
ããã§ãHive ã®å ŽåãHive ã«ã¯æŽæ°æ©èœããªããããMDW ãããŒãã«å šäœãæžãæããå¿ èŠããããšããçµè«ã«éããŸããã ãããŠãæŽæ°æã«ããŒã¿ãå®å šã«æžãæãã以äžã«åªãããã®ã¯ãããŸããã éã« RDBMS ã®å Žåã補åã®äœæè ã¯ããŒãã«ã®æ¥ç¶ãšæŽæ°ã SQL ã®äœ¿çšã«å§ããå¿ èŠããããšèããŸããã
Sberbank ã®ãããžã§ã¯ãã®ããã«ãGreenPlum çšã®ããŒã¿ããŒã¹ ããŒããŒã®æ°ããåå©çšå¯èœãªå®è£ ãäœæããŸããã ããã¯ãMDW ã Teradata çšã«çæããããŒãžã§ã³ã«åºã¥ããŠè¡ãããŸããã ããã«æãè¿ããŠæé©ã ã£ãã®ã¯ãOracle ã§ã¯ãªã Teradata ã§ããããªããªã... ã MPP ã·ã¹ãã ã§ãã Teradata ãš GreenPlum ã®äœæ¥æ¹æ³ãšæ§æã¯é¡äŒŒããŠããããšãå€æããŸããã
ç°ãªã RDBMS éã® MDW ã«ãšã£ãŠéèŠãªéãã®äŸã¯æ¬¡ã®ãšããã§ãã GreenPlum ã§ã¯ãTeradata ãšã¯ç°ãªããããŒãã«ãäœæãããšãã«å¥ãèšè¿°ããå¿ èŠããããŸãã
distributed by
Teradata ã¯æ¬¡ã®ããã«æžããŠããŸãã
delete <table> all
ãGreenPlumã§ã¯æ¬¡ã®ããã«æžããŸãã
delete from <table>
Oracle ã§ã¯ãæé©åã®ç®çã§æ¬¡ã®ããã«èšè¿°ããŸãã
delete from t where rowid in (<ÑПеЎОМеМОе t Ñ ÐŽÐµÐ»ÑÑПй>)
ãTeradata ãš GreenPlum ã¯æ¬¡ã®ããã«æžããŸãã
delete from t where exists (select * from delta where delta.pk=t.pk)
ãŸããAb Initio ã GreenPlum ãšé£æºããã«ã¯ãAb Initio ã¯ã©ã¹ã¿ãŒã®ãã¹ãŠã®ããŒãã« GreenPlum ã¯ã©ã€ã¢ã³ããã€ã³ã¹ããŒã«ããå¿ èŠãããããšã«ã泚æããŠãã ããã ããã¯ãã¯ã©ã¹ã¿ãŒå ã®ãã¹ãŠã®ããŒãããåæã« GreenPlum ã«æ¥ç¶ããããã§ãã ãããŠãGreenPlum ããã®èªã¿åãã䞊åã«ããŠãå䞊å Ab Initio ã¹ã¬ããã GreenPlum ããããŒã¿ã®ç¬èªã®éšåãèªã¿åãããã«ã¯ãSQL ã¯ãšãªã®ãwhereãã»ã¯ã·ã§ã³ã« Ab Initio ã«ãã£ãŠç解ãããæ§é ãé 眮ããå¿ èŠããããŸããã
where ABLOCAL()
å€æããŒã¿ããŒã¹ããèªã¿åã£ããã©ã¡ãŒã¿ãæå®ããŠããã®æ§é ã®å€ã決å®ããŸãã
ablocal_expr=«string_concat("mod(t.", string_filter_out("{$TABLE_KEY}","{}"), ",", (decimal(3))(number_of_partitions()),")=", (decimal(3))(this_partition()))»
ãã³ã³ãã€ã«ãããšæ¬¡ã®ããã«ãªããŸã
mod(sk,10)=3
ãã€ãŸãGreenPlum ã«ããŒãã£ã·ã§ã³ããšã«æ瀺çãªãã£ã«ã¿ãŒãèŠæ±ããå¿ èŠããããŸãã ä»ã®ããŒã¿ããŒã¹ (TeradataãOracle) ã®å ŽåãAb Initio ã¯ãã®äžŠååãèªåçã«å®è¡ã§ããŸãã
Hive ãš GreenPlum ã® Ab Initio ããã©ãŒãã³ã¹ã®æ¯èŒ
Sberbank ã¯ãMDW ã§çæãããã°ã©ãã®ããã©ãŒãã³ã¹ã Hive ãšã®é¢ä¿ããã³ GreenPlum ãšã®é¢ä¿ã§æ¯èŒããå®éšãå®æœããŸããã å®éšã®äžç°ãšããŠãHive ã®å Žå㯠Ab Initio ãšåãã¯ã©ã¹ã¿ãŒäžã« 5 ã€ã®ããŒãããããGreenPlum ã®å Žåã¯å¥ã®ã¯ã©ã¹ã¿ãŒäžã« 4 ã€ã®ããŒãããããŸããã ãããã®ã Hive ã«ã¯ãGreenPlum ãããããŒããŠã§ã¢äžã®å©ç¹ããããŸããã
Hive ãš GreenPlum ã§ããŒã¿ãæŽæ°ãããšããåãã¿ã¹ã¯ãå®è¡ãã XNUMX ã€ã®ã°ã©ãã®ãã¢ãæ€èšããŸããã åæã«ãMDW ã³ã³ãã£ã®ã¥ã¬ãŒã¿ãŒã«ãã£ãŠçæãããã°ã©ããèµ·åãããŸããã
- ã©ã³ãã ã«çæãããããŒã¿ã® Hive ããŒãã«ãžã®åæããŒã + å¢åããŒã
- åã GreenPlum ããŒãã«ãžã®ã©ã³ãã ã«çæãããããŒã¿ã®åæããŒã + å¢åããŒã
ã©ã¡ãã®å Žå (Hive ãš GreenPlum) ããåã Ab Initio ã¯ã©ã¹ã¿ãŒäžã® 10 åã®äžŠåã¹ã¬ãããžã®ã¢ããããŒããå®è¡ããŸããã Ab Initio ã§ã¯ãèšç®çšã®äžéããŒã¿ã HDFS ã«ä¿åããŸãã (Ab Initio ã§ã¯ãHDFS ã䜿çšãã MFS ã¬ã€ã¢ãŠãã䜿çšãããŸãã)ã ã©ã³ãã ã«çæãããããŒã¿ã® 200 è¡ã¯ãã©ã¡ãã®å Žåã XNUMX ãã€ããå ããŠããŸããã
çµæã¯æ¬¡ã®ããã«ãªããŸããã
ãã€ãïŒ
Hive ã§ã®åæèªã¿èŸŒã¿
æ¿å
¥ãããè¡
6 000 000
60 000 000
600 000 000
åæåæé
æ°ç§ã§ããŠã³ããŒã
41
203
1 601
Hive ã§ã®å¢åèªã¿èŸŒã¿
䜿çšå¯èœãªè¡æ°
å®éšéå§æã®ã¿ãŒã²ããããŒãã«
6 000 000
60 000 000
600 000 000
é©çšããããã«ã¿ ã©ã€ã³ã®æ°
å®éšäžã®ã¿ãŒã²ããããŒãã«
6 000 000
6 000 000
6 000 000
ã€ã³ã¯ãªã¡ã³ã¿ã«ã®æé
æ°ç§ã§ããŠã³ããŒã
88
299
2 541
ã°ãªãŒã³ãã©ã :
GreenPlum ã§ã®åæããŒã
æ¿å
¥ãããè¡
6 000 000
60 000 000
600 000 000
åæåæé
æ°ç§ã§ããŠã³ããŒã
72
360
3 631
GreenPlum ã§ã®å¢åèªã¿èŸŒã¿
䜿çšå¯èœãªè¡æ°
å®éšéå§æã®ã¿ãŒã²ããããŒãã«
6 000 000
60 000 000
600 000 000
é©çšããããã«ã¿ ã©ã€ã³ã®æ°
å®éšäžã®ã¿ãŒã²ããããŒãã«
6 000 000
6 000 000
6 000 000
ã€ã³ã¯ãªã¡ã³ã¿ã«ã®æé
æ°ç§ã§ããŠã³ããŒã
159
199
321
Hive ãš GreenPlum ã®äž¡æ¹ã®åæèªã¿èŸŒã¿é床ã¯ããŒã¿éã«çŽç·çã«äŸåããŠãããããŒããŠã§ã¢ãåªããŠãããšããçç±ãããHive ã®æ¹ã GreenPlum ããããããã«éãããšãããããŸãã
Hive ã®å¢åèªã¿èŸŒã¿ããã¿ãŒã²ãã ããŒãã«ã§å©çšå¯èœãªä»¥åã«èªã¿èŸŒãŸããããŒã¿ã®éã«ç·åœ¢çã«äŸåããããªã¥ãŒã ãå¢å ããã«ã€ããŠéåžžã«ãã£ãããšé²è¡ããŸãã ããã¯ãã¿ãŒã²ããããŒãã«ãå®å šã«æžãçŽãå¿ èŠãããããã«çºçããŸãã ããã¯ãå°ããªå€æŽã巚倧ãªããŒãã«ã«é©çšããããšã¯ãHive ã«ãšã£ãŠé©åãªãŠãŒã¹ã±ãŒã¹ã§ã¯ãªãããšãæå³ããŸãã
GreenPlum ã®å¢åããŒãã¯ãã¿ãŒã²ãã ããŒãã«ã§äœ¿çšå¯èœãªä»¥åã«ããŒããããããŒã¿ã®éã«ã»ãšãã©äŸåãããéåžžã«è¿ éã«åŠçãããŸãã ããã¯ãSQL çµåãšåé€æäœãå¯èœã«ãã GreenPlum ã¢ãŒããã¯ãã£ã®ãããã§èµ·ãããŸããã
ãã®ãããGreenPlum ã¯åé€ + æ¿å ¥ã¡ãœããã䜿çšããŠãã«ã¿ãè¿œå ããŸãããHive ã«ã¯åé€ãŸãã¯æŽæ°æäœããªããããå¢åæŽæ°äžã«ããŒã¿é åå šäœãå®å šã«æžãçŽãå¿ èŠããããŸããã 倪åã§åŒ·èª¿è¡šç€ºãããã»ã«ã®æ¯èŒã¯ããªãœãŒã¹ã倧éã«æ¶è²»ããããŠã³ããŒãã䜿çšããããã®æãäžè¬çãªãªãã·ã§ã³ã«å¯Ÿå¿ããŠãããããæãæããã§ãã ãã®ãã¹ãã§ã¯ãGreenPlum ã Hive ã 8 åäžåã£ãŠããããšãããããŸãã
GreenPlum ãæºãªã¢ã«ã¿ã€ã ã¢ãŒãã§äœ¿çšãã Ab Initio ã®äœæ¥
ãã®å®éšã§ã¯ãã©ã³ãã ã«çæãããããŒã¿ã®ãã£ã³ã¯ã䜿çšã㊠GreenPlum ããŒãã«ãã»ãŒãªã¢ã«ã¿ã€ã ã§æŽæ°ãã Ab Initio ã®æ©èœããã¹ãããŸãã ããããäœæ¥ãã GreenPlum ããŒãã« dev42_1_db_usl.TESTING_SUBJ_org_finval ã«ã€ããŠèããŠã¿ãŸãããã
XNUMX ã€ã® Ab Initio ã°ã©ãã䜿çšããŠäœæ¥ããŸãã
1) Graph Create_test_data.mp â 10 åã®äžŠåã¹ã¬ãã㧠6 è¡ã®ããŒã¿ ãã¡ã€ã«ã HDFS ã«äœæããŸãã ããŒã¿ã¯ã©ã³ãã ã§ããããã®æ§é ã¯ããŒãã«ã«æ¿å ¥ã§ããããã«ç·šæãããŠããŸãã
2) ã°ã©ã mdw_load.day_one.current.dev42_1_db_usl_testing_subj_org_finval.pset â 10 åã®äžŠåã¹ã¬ããã§ããŒãã«ãžã®ããŒã¿æ¿å
¥ãåæåããããšã«ãããMDW ã§çæãããã°ã©ã (ã°ã©ã (1) ã«ãã£ãŠçæããããã¹ã ããŒã¿ã䜿çšãããŸã)
3) ã°ã©ã mdw_load.regulator.current.dev42_1_db_usl_testing_subj_org_finval.pset â ã°ã©ã (10) ã«ãã£ãŠçæãããæ°ããåä¿¡ããããŒã¿ (ãã«ã¿) ã®äžéšã䜿çšããŠã1 åã®äžŠåã¹ã¬ããã§ããŒãã«ãå¢åæŽæ°ããããã« MDW ã«ãã£ãŠçæãããã°ã©ã
以äžã®ã¹ã¯ãªããã NRT ã¢ãŒãã§å®è¡ããŠã¿ãŸãããã
- 6 ã®ãã¹ãè¡ãçæ
- åæããŒããå®è¡ãã空ã®ããŒãã«ã« 6 ã®ãã¹ãè¡ãæ¿å ¥ããŸãã
- å¢åããŠã³ããŒãã 5 åç¹°ãè¿ã
- 6 ã®ãã¹ãè¡ãçæ
- ããŒãã«ãžã® 6 åã®ãã¹ãè¡ã®å¢åæ¿å ¥ãå®è¡ããŸã (ãã®å Žåãvalid_to_ts æå¹æéã¯å€ãããŒã¿ã«èšå®ãããåãäž»ããŒãæã€ããæ°ããããŒã¿ãæ¿å ¥ãããŸã)
ãã®ã·ããªãªã¯ãç¹å®ã®ããžãã¹ ã·ã¹ãã ã®å®éã®éçšã¢ãŒãããšãã¥ã¬ãŒãããŸããæ°ããããŒã¿ã®ããªãã®éšåããªã¢ã«ã¿ã€ã ã§è¡šç€ºãããããã« GreenPlum ã«æ³šãããŸãã
次ã«ãã¹ã¯ãªããã®ãã°ãèŠãŠã¿ãŸãããã
2020-06-04 11:49:11 ã« Create_test_data.input.pset ãéå§ããŸãã
2020-06-04 11:49:37 ã« Create_test_data.input.pset ãå®äºããŸã
42-1-2020 06:04:11 ã« mdw_load.day_one.current.dev49_37_db_usl_testing_subj_org_finval.pset ãéå§ããŸãã
42-1-2020 06:04:11 ã« mdw_load.day_one.current.dev50_42_db_usl_testing_subj_org_finval.pset ãçµäºããŸã
2020-06-04 11:50:42 ã« Create_test_data.input.pset ãéå§ããŸãã
2020-06-04 11:51:06 ã« Create_test_data.input.pset ãå®äºããŸã
42-1-2020 06:04:11 ã« mdw_load.normal.current.dev51_06_db_usl_testing_subj_org_finval.pset ãéå§ããŸãã
42-1-2020 06:04:11 ã« mdw_load.normal.current.dev53_41_db_usl_testing_subj_org_finval.pset ãçµäºããŸã
2020-06-04 11:53:41 ã« Create_test_data.input.pset ãéå§ããŸãã
2020-06-04 11:54:04 ã« Create_test_data.input.pset ãå®äºããŸã
42-1-2020 06:04:11 ã« mdw_load.normal.current.dev54_04_db_usl_testing_subj_org_finval.pset ãéå§ããŸãã
42-1-2020 06:04:11 ã« mdw_load.normal.current.dev56_51_db_usl_testing_subj_org_finval.pset ãçµäºããŸã
2020-06-04 11:56:51 ã« Create_test_data.input.pset ãéå§ããŸãã
2020-06-04 11:57:14 ã« Create_test_data.input.pset ãå®äºããŸã
42-1-2020 06:04:11 ã« mdw_load.normal.current.dev57_14_db_usl_testing_subj_org_finval.pset ãéå§ããŸãã
42-1-2020 06:04:11 ã« mdw_load.normal.current.dev59_55_db_usl_testing_subj_org_finval.pset ãçµäºããŸã
2020-06-04 11:59:55 ã« Create_test_data.input.pset ãéå§ããŸãã
2020-06-04 12:00:23 ã« Create_test_data.input.pset ãå®äºããŸã
42-1-2020 06:04:12 ã« mdw_load.normal.current.dev00_23_db_usl_testing_subj_org_finval.pset ãéå§ããŸãã
42-1-2020 06:04:12 ã« mdw_load.normal.current.dev03_23_db_usl_testing_subj_org_finval.pset ãçµäºããŸã
2020-06-04 12:03:23 ã« Create_test_data.input.pset ãéå§ããŸãã
2020-06-04 12:03:49 ã« Create_test_data.input.pset ãå®äºããŸã
42-1-2020 06:04:12 ã« mdw_load.normal.current.dev03_49_db_usl_testing_subj_org_finval.pset ãéå§ããŸãã
42-1-2020 06:04:12 ã« mdw_load.normal.current.dev06_46_db_usl_testing_subj_org_finval.pset ãçµäºããŸã
ãã®åçã¯æ¬¡ã®ããã«ãªããŸãã
ã°ã©ã
éå§æå»
çµäºæé
é·ã
Create_test_data.input.pset
04.06.2020 11ïŒ49ïŒ11
04.06.2020 11ïŒ49ïŒ37
00:00:26
mdw_load.day_one.currentã
dev42_1_db_usl_testing_subj_org_finval.pset
04.06.2020 11ïŒ49ïŒ37
04.06.2020 11ïŒ50ïŒ42
00:01:05
Create_test_data.input.pset
04.06.2020 11ïŒ50ïŒ42
04.06.2020 11ïŒ51ïŒ06
00:00:24
mdw_load.normal.currentã
dev42_1_db_usl_testing_subj_org_finval.pset
04.06.2020 11ïŒ51ïŒ06
04.06.2020 11ïŒ53ïŒ41
00:02:35
Create_test_data.input.pset
04.06.2020 11ïŒ53ïŒ41
04.06.2020 11ïŒ54ïŒ04
00:00:23
mdw_load.normal.currentã
dev42_1_db_usl_testing_subj_org_finval.pset
04.06.2020 11ïŒ54ïŒ04
04.06.2020 11ïŒ56ïŒ51
00:02:47
Create_test_data.input.pset
04.06.2020 11ïŒ56ïŒ51
04.06.2020 11ïŒ57ïŒ14
00:00:23
mdw_load.normal.currentã
dev42_1_db_usl_testing_subj_org_finval.pset
04.06.2020 11ïŒ57ïŒ14
04.06.2020 11ïŒ59ïŒ55
00:02:41
Create_test_data.input.pset
04.06.2020 11ïŒ59ïŒ55
04.06.2020 12ïŒ00ïŒ23
00:00:28
mdw_load.normal.currentã
dev42_1_db_usl_testing_subj_org_finval.pset
04.06.2020 12ïŒ00ïŒ23
04.06.2020 12ïŒ03ïŒ23
00:03:00
Create_test_data.input.pset
04.06.2020 12ïŒ03ïŒ23
04.06.2020 12ïŒ03ïŒ49
00:00:26
mdw_load.normal.currentã
dev42_1_db_usl_testing_subj_org_finval.pset
04.06.2020 12ïŒ03ïŒ49
04.06.2020 12ïŒ06ïŒ46
00:02:57
6 å¢åè¡ã 000 åã§åŠçãããããšãããããŸããããã¯éåžžã«é«éã§ãã
ã¿ãŒã²ããããŒãã«ã®ããŒã¿ã¯æ¬¡ã®ããã«åæ£ãããŠããããšãããããŸããã
select valid_from_ts, valid_to_ts, count(1), min(sk), max(sk) from dev42_1_db_usl.TESTING_SUBJ_org_finval group by valid_from_ts, valid_to_ts order by 1,2;
æ¿å
¥ãããããŒã¿ãšã°ã©ãã®éå§æå»ãšã®å¯Ÿå¿ã確èªã§ããŸãã
ããã¯ãAb Initio 㧠GreenPlum ãžã®ããŒã¿ã®å¢åèªã¿èŸŒã¿ãéåžžã«é«ãé »åºŠã§å®è¡ã§ãããã®ããŒã¿ã GreenPlum ã«é«éã§æ¿å
¥ãããããšã芳å¯ã§ããããšãæå³ããŸãã ãã¡ãããAb Initio ã¯ä»ã® ETL ããŒã«ãšåæ§ãèµ·åæã«ãèµ·åãããã®ã«æéãããããããXNUMX ç§ã« XNUMX åèµ·åããããšã¯ã§ããŸããã
ãŸãšã
Ab Initio ã¯çŸåšãSberbank ã§çµ±åã»ãã³ãã£ã㯠ããŒã¿ ã¬ã€ã€ãŒ (ESS) ãæ§ç¯ããããã«äœ¿çšãããŠããŸãã ãã®ãããžã§ã¯ãã«ã¯ãããŸããŸãªéè¡äºæ¥äœã®ç¶æ ã®çµ±äžããŒãžã§ã³ã®æ§ç¯ãå«ãŸããŸãã æ å ±ã¯ããŸããŸãªãœãŒã¹ããååŸããããã®ã¬ããªã«ã Hadoop äžã«äœæãããŸãã ããžãã¹ ããŒãºã«åºã¥ããŠããŒã¿ ã¢ãã«ãæºåãããããŒã¿å€æãèšè¿°ãããŸãã Ab Initio ã¯æ å ±ã ESN ã«ããŒãããŸããããŠã³ããŒããããããŒã¿ã¯ãããžãã¹èªäœã«ãšã£ãŠèå³æ·±ãã ãã§ãªããããŒã¿ ããŒããæ§ç¯ããããã®ãœãŒã¹ãšããŠãæ©èœããŸãã åæã«ããã®è£œåã®æ©èœã«ãããããŸããŸãªã·ã¹ãã (HiveãGreenplumãTeradataãOracle) ãåä¿¡åŽãšããŠäœ¿çšã§ãããããããžãã¹ã«å¿ èŠãªããŸããŸãªåœ¢åŒã§ããŒã¿ãç°¡åã«æºåããããšãã§ããŸãã
Ab Initio ã®æ©èœã¯å¹ åºããããšãã°ãä»å±ã® MDW ãã¬ãŒã ã¯ãŒã¯ã䜿çšãããšãããã«æè¡ããŒã¿ãšããžãã¹å±¥æŽããŒã¿ãæ§ç¯ã§ããŸãã éçºè ã«ãšã£ãŠãAb Initio ã䜿çšãããšãè»èŒªã®åçºæã§ã¯ãªããå€ãã®æ¢åã®æ©èœã³ã³ããŒãã³ã (åºæ¬çã«ããŒã¿ãæäœãããšãã«å¿ èŠãªã©ã€ãã©ãª) ã䜿çšã§ããããã«ãªããŸãã
èè ã¯ãSberbank SberProfi DWH/BigData ã®ãããã§ãã·ã§ãã« ã³ãã¥ããã£ã®å°é家ã§ãã SberProfi DWH/BigData ãããã§ãã·ã§ãã« ã³ãã¥ããã£ã¯ãHadoop ãšã³ã·ã¹ãã ãTeradataãOracle DBãGreenPlumãBI ããŒã« QlikãSAP BOãTableau ãªã©ã®åéã§ã®ã³ã³ããã³ã·ãŒã®éçºãæ åœããŠããŸãã
åºæïŒ habr.com