ã«ãããšãåæ£ã³ã³ãã¥ãŒãã£ã³ã°ãšããã°ããŒã¿ã®åžå Ž
éåžžã®ããžãã¹ã«ãããŠãªãåæ£ã³ã³ãã¥ãŒãã£ã³ã°ãå¿
èŠãªã®ã§ãããã? ãã¹ãŠã¯åçŽã§ãããšåæã«è€éã§ããããŸãã ã·ã³ãã« - ã»ãšãã©ã®å Žåãæ
å ±åäœããšã«æ¯èŒçåçŽãªèšç®ãå®è¡ããããã§ãã é£ããã®ã¯ããã®ãããªæ
å ±ãããããããããã§ãã ãšãŠããããããããŸãã çµæãšããŠãããã¯ã
æè¿ã®äŸ: ãããã¶
å¥ã®äŸïŒ
ããŒã«ã®éžæ
ãã®çš®ã®ã³ã³ãã¥ãŒãã£ã³ã°ã®æ¥çæšæºã¯ Hadoop ã§ãã ãªãïŒ ãªããªããHadoop ã¯åªããååã«ææžåããããã¬ãŒã ã¯ãŒã¯ã§ãã (åã Habr ããã®ãããã¯ã«é¢ãã詳现ãªèšäºãå€æ°å ¬éããŠããŸã)ããŠãŒãã£ãªãã£ãšã©ã€ãã©ãªã®ã»ããå šäœãä»å±ããŠããããã§ãã æ§é åããŒã¿ãšéæ§é åããŒã¿ã®äž¡æ¹ã®å·šå€§ãªã»ãããå ¥åãšããŠéä¿¡ã§ããã·ã¹ãã èªäœãããããã³ã³ãã¥ãŒãã£ã³ã°èœåéã§åæ£ããŸãã ããã«ããããã®åã容éã¯ãã€ã§ãå¢å ãŸãã¯ç¡å¹ã«ããããšãã§ããåãæ°Žå¹³æ¹åã®ã¹ã±ãŒã©ããªãã£ãåäœããŸãã
2017 幎ã圱é¿åã®ããã³ã³ãµã«ãã£ã³ã°äŒç€Ÿ Gartner
Hadoop ã¯ããã€ãã®æ±ã«åºã¥ããŠããŸãããã®äžã§æã泚ç®ã«å€ããã®ã¯ãMapReduce ãã¯ãããž (ãµãŒããŒéã§èšç®çšã®ããŒã¿ãåæ£ããã·ã¹ãã ) ãš HDFS ãã¡ã€ã« ã·ã¹ãã ã§ãã åŸè ã¯ãã¯ã©ã¹ã¿ãŒ ããŒãéã§æ å ±ãåæ£ããŠä¿åããããã«ç¹ã«èšèšãããŠããŸããåºå®ãµã€ãºã®åãããã¯ãè€æ°ã®ããŒãã«é 眮ã§ããã¬ããªã±ãŒã·ã§ã³ã®ãããã§ãã·ã¹ãã ã¯åã ã®ããŒãã®é害ã«èæ§ããããŸãã ãã¡ã€ã« ããŒãã«ã®ä»£ããã«ãNameNode ãšåŒã°ããç¹å¥ãªãµãŒããŒã䜿çšãããŸãã
以äžã®å³ã¯ãMapReduce ãã©ã®ããã«æ©èœãããã瀺ããŠããŸãã 第 XNUMX 段éã§ã¯ãããŒã¿ã¯ç¹å®ã®å±æ§ã«åŸã£ãŠåå²ããã第 XNUMX 段éã§ã¯èšç®èœåã«ãã£ãŠåæ£ããã第 XNUMX 段éã§ã¯èšç®ãè¡ãããŸãã
MapReduce ã¯ããšããš Google ã«ãã£ãŠæ€çŽ¢ã®ããŒãºã®ããã«äœæãããŸããã ãã®åŸãMapReduce ã¯ããªãŒ ã³ãŒãã«ãªããApache ããããžã§ã¯ããåŒãç¶ããŸããã Google ã¯åŸã
ã«ä»ã®ãœãªã¥ãŒã·ã§ã³ã«ç§»è¡ããŠããŸããã èå³æ·±ããã¥ã¢ã³ã¹ã§ããçŸåšãGoogle ã«ã¯ãGoogle Cloud Dataflow ãšåŒã°ãããããžã§ã¯ãããããHadoop ã®æ¬¡ã®ã¹ããããšããŠããã®è¿
éãªä»£æ¿ãšããŠäœçœ®ä»ããããŠããŸãã
詳ããèŠããšãGoogle Cloud Dataflow 㯠Apache Beam ã®ããªãšãŒã·ã§ã³ã«åºã¥ããŠããããšãããããŸãããApache Beam ã«ã¯ååã«ææžåããã Apache Spark ãã¬ãŒã ã¯ãŒã¯ãå«ãŸããŠãããããã«ãããœãªã¥ãŒã·ã§ã³ã®å®è¡é床ã¯ã»ãŒåãã«ãªããŸãã Apache Spark 㯠HDFS ãã¡ã€ã« ã·ã¹ãã äžã§æ£åžžã«åäœãããããHadoop ãµãŒããŒã«ãããã€ã§ããŸãã
ããã«ãGoogle Cloud Dataflow ã«å¯Ÿãã Hadoop ãš Spark ã®å€§éã®ããã¥ã¡ã³ããšæ¢è£œã®ãœãªã¥ãŒã·ã§ã³ãè¿œå ãããšãããŒã«ã®éžæãæ確ã«ãªããŸãã ããã«ããšã³ãžãã¢ã¯ãã¿ã¹ã¯ãçµéšãè³æ Œã«éç¹ã眮ããŠãHadoop ãŸã㯠Spark ã§ã©ã®ã³ãŒããå®è¡ããããèªåã§æ±ºå®ã§ããŸãã
ã¯ã©ãŠããŸãã¯ããŒã«ã«ãµãŒããŒ
ã¯ã©ãŠããžã®äžè¬çãªç§»è¡ã®åŸåã«ãããHadoop-as-a-service ãªã©ã®èå³æ·±ãçšèªããçãŸããŸããã ãã®ãããªã·ããªãªã§ã¯ãæ¥ç¶ããããµãŒããŒã®ç®¡çãéåžžã«éèŠã«ãªããŸãã ãªããªããæ²ããããšã«ããã®äººæ°ã«ãããããããçŽç²ãª Hadoop ã¯æåã§å€ãã®ããšãè¡ãå¿ èŠããããããèšå®ãããªãé£ããããŒã«ã ããã§ãã ããšãã°ããµãŒããŒãåå¥ã«æ§æãããã®ããã©ãŒãã³ã¹ãç£èŠããå€ãã®ãã©ã¡ãŒã¿ãŒã埮調æŽã§ããŸãã äžè¬ã«ãã¢ããã¥ã¢ã®ä»äºã§ã¯ãã©ããã§å€±æããããäœããèŠèœãšãããããå¯èœæ§ã倧ãã«ãããŸãã
ãã®ããã䟿å©ãªå°å ¥ããŒã«ã管çããŒã«ãæåããåãã£ãŠããããŸããŸãªãã£ã¹ããªãã¥ãŒã·ã§ã³ãéåžžã«äººæ°ã«ãªã£ãŠããŸãã Spark ããµããŒãããäœæ¥ã容æã«ããæã人æ°ã®ãããã£ã¹ããªãã¥ãŒã·ã§ã³ã® XNUMX ã€ã¯ Cloudera ã§ãã ææããŒãžã§ã³ãšç¡æããŒãžã§ã³ã®äž¡æ¹ããããåŸè ã§ã¯ãããŒãæ°ã®å¶éãªãããã¹ãŠã®äž»èŠãªæ©èœãå©çšã§ããŸãã
ã»ããã¢ããäžã«ãCloudera Manager 㯠SSH çµç±ã§ãµãŒããŒã«æ¥ç¶ããŸãã èå³æ·±ãç¹: ã€ã³ã¹ããŒã«ãããšãã¯ããããã å°å
: ç¹å¥ãªããã±ãŒãžãããããã«ã¯ãçžäºã«åäœããããã«æ§æãããå¿
èŠãªã³ã³ããŒãã³ãããã¹ãŠå«ãŸããŠããŸãã å®éãããã¯ããã±ãŒãž ãããŒãžã£ãŒã®æ¹è¯çã§ãã
ã€ã³ã¹ããŒã«åŸãã¯ã©ã¹ã¿ãŒç®¡çã³ã³ãœãŒã«ã衚瀺ãããã¯ã©ã¹ã¿ãŒã®ãã¬ã¡ããªãã€ã³ã¹ããŒã«ãããŠãããµãŒãã¹ã確èªã§ããã»ãããªãœãŒã¹ã®è¿œå /åé€ãã¯ã©ã¹ã¿ãŒæ§æã®ç·šéãå¯èœã§ãã
ãã®çµæããã®ãã±ããã®åæé¢ãç®ã®åã«çŸããBigDataã®èŒãããæªæ¥ãžãšããªããå°ããŸãã ãããããè¡ããŸãããããšèšãåã«ããã³ãããã®äžã§æ©éãããŠã¿ãŸãããã
ããŒããŠã§ã¢èŠä»¶
Cloudera 㯠Web ãµã€ãã§ãå¯èœãªããŸããŸãªæ§æã«ã€ããŠèšåããŠããŸãã ããããæ§ç¯ãããäžè¬ååãå³ã«ç€ºããŸãã
MapReduce ã䜿çšãããšããã®æ¥œèŠ³çãªç¶æ³ããŒãããŠããŸãå¯èœæ§ããããŸãã åã®ã»ã¯ã·ã§ã³ã®å³ãããäžåºŠèŠããšãã»ãšãã©ã®å Žåããã£ã¹ã¯ãŸãã¯ãããã¯ãŒã¯ããããŒã¿ãèªã¿åããšãã« MapReduce ãžã§ããããã«ããã¯ã«ééããå¯èœæ§ãããããšãããããŸãã ãã㯠Cloudera ããã°ã«ãèšèŒãããŠããŸãã ãã®çµæããªã¢ã«ã¿ã€ã èšç®ã«ãã䜿çšããã Spark ãå«ãé«éèšç®ã§ã¯ãI/O é床ãéåžžã«éèŠã«ãªããŸãã ãããã£ãŠãHadoop ã䜿çšããå Žåã¯ããã©ã³ã¹ã®åããé«éãªãã·ã³ãã¯ã©ã¹ã¿ãŒã«çµã¿èŸŒãããšãéåžžã«éèŠã§ãããæ§ããã«èšã£ãŠããã¯ã©ãŠã ã€ã³ãã©ã¹ãã©ã¯ãã£ã«ã¯å¿
ããããããæäŸãããŠããããã§ã¯ãããŸããã
è² è·åæ£ã®ãã©ã³ã¹ã¯ã匷åãªãã«ãã³ã¢ CPU ãæèŒãããµãŒããŒäžã§ OpenStack ä»®æ³åã䜿çšããããšã§å®çŸãããŸãã ããŒã¿ ããŒãã«ã¯ãç¬èªã®ããã»ããµ ãªãœãŒã¹ãšç¹å®ã®ãã£ã¹ã¯ãå²ãåœãŠãããŸãã ç§ãã¡ã®æ±ºå®ã§ã¯ Atos Codex ããŒã¿ã¬ã€ã¯ ãšã³ãžã³ åºç¯ãªä»®æ³åãå®çŸããããããããã©ãŒãã³ã¹ (ãããã¯ãŒã¯ ã€ã³ãã©ã¹ãã©ã¯ãã£ãžã®åœ±é¿ãæå°éã«æãããã) ãš TCO (äœåãªç©çãµãŒããŒãæé€ããã) ã®äž¡æ¹ã®ç¹ã§åœç€Ÿãåå©ãåããŠããŸãã
BullSequana S200 ãµãŒããŒã䜿çšããå Žåãããã€ãã®ããã«ããã¯ããªããéåžžã«åäžãªè² è·ãåŸãããŸãã æå°æ§æã«ã¯ããããã 3 ã€ã® JBOD ãåãã 200 å°ã® BullSequana S200 ãµãŒããŒãå«ãŸããŠãããããã«ãªãã·ã§ã³ã§ XNUMX ã€ã®ããŒã¿ ããŒããå«ãè¿œå ã® SXNUMX ãæ¥ç¶ãããŸãã TeraGen ãã¹ãã§ã®è² è·ã®äŸã次ã«ç€ºããŸãã
ç°ãªãããŒã¿éãšã¬ããªã±ãŒã·ã§ã³å€ã䜿çšãããã¹ãã§ã¯ãã¯ã©ã¹ã¿ãŒ ããŒãéã®è² è·åæ£ãšããç¹ã§ã¯åãçµæã瀺ãããŸãã 以äžã¯ãããã©ãŒãã³ã¹ ãã¹ãã«ãããã£ã¹ã¯ ã¢ã¯ã»ã¹ã®ååžã®ã°ã©ãã§ãã
èšç®ã¯ã3 å°ã® BullSequana S200 ãµãŒããŒã®æå°æ§æã«åºã¥ããŠããŸãã ããã«ã¯ã9 ã€ââã®ããŒã¿ ããŒããš 3 ã€ã®ãã¹ã¿ãŒ ããŒãã«å ããOpenStack Virtualization ã«åºã¥ãä¿è·ãå°å
¥ããå Žåã®äºçŽæžã¿ä»®æ³ãã·ã³ãå«ãŸããŸãã TeraSort ãã¹ãçµæ: æå·åã䜿çšããã¬ããªã±ãŒã·ã§ã³ä¿æ° 512 ã® 23,1 MB ããã㯠ãµã€ãºã¯ XNUMX åã§ãã
ã©ãããã°ã·ã¹ãã ãæ¡åŒµã§ããã®ã§ããããïŒ Data Lake Engine ã§ã¯ãããŸããŸãªçš®é¡ã®æ¡åŒµæ©èœã䜿çšã§ããŸãã
- ããŒã¿ããŒã: 40 TB ã®äœ¿çšå¯èœãªã¹ããŒã¹ããš
- GPUãã€ã³ã¹ããŒã«ã§ããåæããŒã
- ããžãã¹ ããŒãºã«å¿ãããã®ä»ã®ãªãã·ã§ã³ (ããšãã°ãKafka ãªã©ãå¿ èŠãªå Žå)
Atos Codex Data Lake Engine è€åäœã«ã¯ããµãŒããŒèªäœãšãã©ã€ã»ã³ã¹ä»ãã® Cloudera ããããå«ããã¬ã€ã³ã¹ããŒã«ããããœãããŠã§ã¢ã®äž¡æ¹ãå«ãŸããŠããŸãã Hadoop èªäœãRedHat Enterprise Linux ã«ãŒãã«ã«åºã¥ãä»®æ³ãã·ã³ãåãã OpenStackãããŒã¿ ã¬ããªã±ãŒã·ã§ã³ããã³ããã¯ã¢ãã ã·ã¹ãã (ããã¯ã¢ãã ããŒãããã³ Cloudera BDR ã®äœ¿çšãå«ã - ããã¯ã¢ãããšçœå®³åŸ©æ§)ã Atos Codex Data Lake Engine ã¯èªå®ãããæåã®ä»®æ³åãœãªã¥ãŒã·ã§ã³ã§ã
詳现ã«ãèå³ãããããŸããããã³ã¡ã³ãæ¬ã§ã質åã«ãçãããããŸãã
åºæïŒ habr.com