ç§ãã¡ã¯ããè€æ°ã®æåããæ·±ãæãäžããããšãªããããã€ãã®æ¢è£œã®ãªãŒãã³ãœãŒã¹ ããŒã«ãçŽ æ©ãç°¡åã«æ¥ç¶ããã¹ã¿ãã¯ãªãŒããŒãããŒã®ã¢ããã€ã¹ã«åŸã£ãŠãæèããªãã«ããŠãã»ããã¢ããããèµ·åã§ããçŽ æŽãããæ代ã«çããŠããŸããåæ¥é転ãéå§ããŸãã ãããŠãæŽæ°/æ¡åŒµããå¿ èŠããããšãããŸãã¯èª°ãã誀ã£ãŠããã€ãã®ãã·ã³ãåèµ·åãããšããããçš®ã®å·æãªæªã倢ãçŸå®ã«å§ãŸã£ãŠããããšã«æ°ã¥ãããã¹ãŠãçªç¶èªèã§ããªãã»ã©è€éã«ãªããåŸæ»ãã¯ã§ãããæªæ¥ã¯æŒ ç¶ãšããŠããŸããããŠããå®å šãªã®ã¯ãããã°ã©ãã³ã°ã®ä»£ããã«ãããããç¹æ®ãããŠããŒãºãäœãããšã§ãã
çµéšè±å¯ãªååãã¡ããé ããã°ã ããã§ãã§ã«é ãçã£çœã«ãªããªããããææ°ã®èšèªãã§ãããã¥ãŒããã®ãã³ã³ãããã®ããã¯ãä¿¡ããããªãã»ã©é«éã«æ°åå°ã®ãµãŒããŒã«å±éããéåæãã³ããããã³ã° I/Oãæ§ããã«åŸ®ç¬ã¿ãŸãã ãããŠåœŒãã¯é»ã£ãŠãman psããèªã¿çŽãããnginxãã®ãœãŒã¹ã³ãŒããç®ããè¡ãåºãã»ã©æãäžããŠãåäœãã¹ããæžããæžããæžãç¶ããŸãã ååãã¡ã¯ã倧æŠæ¥ã®å€ãããæ¥ããã¹ãŠããè³ãããããšãã«æãèå³æ·±ãããšãèµ·ããããšãç¥ã£ãŠããŸãã ãããŠãUNIX ã®æ§è³ªãèšæ¶ããã TCP/IP ç¶æ
ããŒãã«ãããã³åºæ¬çãªäžŠã¹æ¿ãæ€çŽ¢ã¢ã«ãŽãªãºã ãæ·±ãç解ããããšã«ãã£ãŠã®ã¿åœ¹ç«ã¡ãŸãã ãã£ã€ã ã鳎ããšåæã«ã·ã¹ãã ãå
ã®ç¶æ
ã«æ»ãããã§ãã
ãããããå°ã話ããããŠããŸããŸããããæåŸ
ã®æ§åãäŒããã°å¹žãã§ãã
ä»æ¥ã¯ããŸã£ããç°ãªãæ§é éšéã®ç€Ÿå
ã®åæã¿ã¹ã¯ã®å€§éšåã解決ãããDataLake çšã®äŸ¿å©ã§å®äŸ¡ãªã¹ã¿ãã¯ã®å°å
¥ã«é¢ããçµéšãå
±æããããšæããŸãã
å°ãåã«ãç§ãã¡ã¯äŒæ¥ã補ååæãšæè¡åæã®äž¡æ¹ã®ææããŸããŸãå¿ èŠãšãïŒæ©æ¢°åŠç¿ãšããããŸãã¯èšããŸã§ããããŸããïŒãåŸåãšãªã¹ã¯ãç解ããå¿ èŠãããããšãç解ããŸãããåéããŠåæããå¿ èŠãããã®ã§ããã¡ããªã¯ã¹ãã©ãã©ãå¢ããŠãããŸãã
Bitrix24 ã®åºæ¬çãªãã¯ãã«ã«åæ
æ°å¹ŽåãBitrix24 ãµãŒãã¹ã®éå§ãšåæã«ãã€ã³ãã©ã¹ãã©ã¯ãã£ã®åé¡ãè¿ éã«ç¹å®ãã次ã®ã¹ããããèšç»ããã®ã«åœ¹ç«ã€ãã·ã³ãã«ã§ä¿¡é Œæ§ã®é«ãåæãã©ãããã©ãŒã ã®äœæã«æéãšãªãœãŒã¹ãç©æ¥µçã«æè³ããŸããã ãã¡ãããã§ããã ãã·ã³ãã«ã§ããããããæ¢è£œã®ããŒã«ã䜿çšããããšããå§ãããŸãã ãã®çµæãç£èŠã«ã¯ nagios ããåæãšèŠèŠåã«ã¯ munin ãéžæãããŸããã çŸåšãnagios ã«ã¯æ°åã®ãã§ãã¯ããããmunin ã«ã¯æ°çŸã®ãã£ãŒãããããååã¯ããããæ¯æ¥åé¡ãªã䜿çšããŠããŸãã ã¡ããªã¯ã¹ãã°ã©ããæ確ã§ãã·ã¹ãã ã¯æ°å¹Žéã«ããã£ãŠç¢ºå®ã«åäœããŠãããæ°ãããã¹ããšã°ã©ããå®æçã«è¿œå ãããŠããŸããæ°ãããµãŒãã¹ãéçšãããšãã¯ãããã€ãã®ãã¹ããšã°ã©ããè¿œå ããŸãã 幞éãã
ãã£ã³ã¬ãŒã»ãªã³ã»ã¶ã»ãã«ã¹ - é«åºŠãªãã¯ãã«ã«åæ
åé¡ã«é¢ããæ å ±ããã§ããã ãæ©ããåãåããããšãã欲æ±ãããç§ãã¡ã¯ã·ã³ãã«ã§ããããããããŒã«ã§ãã pinba ãš xhprof ã䜿çšããç©æ¥µçãªå®éšãè¡ããŸããã
Pinba ã¯ãPHP ã§ã® Web ããŒãžã®äžéšã®åäœé床ã«é¢ãã UDP ãã±ããã®çµ±èšãç§ãã¡ã«éä¿¡ããŠãããã®ã§ãMySQL ã¹ãã¬ãŒãž (Pinba ã«ã¯é«éã€ãã³ãåæçšã®ç¬èªã® MySQL ãšã³ãžã³ãä»å±ããŠããŸã) ã§ãªã³ã©ã€ã³ã§åé¡ã®çããªã¹ãã確èªãã察å¿ããããšãã§ããŸããã圌ãã ãŸããxhprof ã䜿çšãããšãã¯ã©ã€ã¢ã³ãããæãé ã PHP ããŒãžã®å®è¡ã°ã©ããèªåçã«åéããäœãåå ã§ãããªãå¯èœæ§ãããããåæã§ããããã«ãªããŸãã (èœã¡çããŠãè¶ãäœã匷ãã®é£²ã¿ç©ã泚ããªã©)ã
å°ãåã«ããã®ããŒã«ãããã«ã¯ãäŒèª¬ç㪠Lucene ã©ã€ãã©ãªã§ãã Elastic/Kibana ã«å®å šã«å®è£ ããããéã€ã³ããã¯ã¹äœæã¢ã«ãŽãªãºã ã«åºã¥ãå¥ã®éåžžã«ã·ã³ãã«ã§ãããããããšã³ãžã³ãè£å ãããŸããã ãã°å ã®ã€ãã³ãã«åºã¥ããŠãã«ãã¹ã¬ããã§ããã¥ã¡ã³ããé Lucene ã€ã³ããã¯ã¹ã«èšé²ãããã¡ã»ããåå²ã䜿çšããŠããã¥ã¡ã³ãããã°ããæ€çŽ¢ãããšããã·ã³ãã«ãªã¢ã€ãã¢ãéåžžã«äŸ¿å©ã§ããããšãããããŸããã
Kibana ã®ããžã¥ã¢ã©ã€ãŒãŒã·ã§ã³ã¯ãããã±ãããäžåãã«æµããããªã©ã®äœã¬ãã«ã®æŠå¿µãããŸã å®å šã«å¿ãå»ãããŠããªããªã¬ãŒã·ã§ãã«ä»£æ°ã®åçºæãããèšèªã䜿çšããŠããªãæè¡çã«èŠããŸããããã®ããŒã«ã¯æ¬¡ã®ã¿ã¹ã¯ã§ãã圹ç«ã¡å§ããŸããã
- éå» 24 æéã«ãp1 ããŒã¿ã«ã§ BitrixXNUMX ã¯ã©ã€ã¢ã³ãã«çºçãã PHP ãšã©ãŒã¯äœä»¶ãããŸãã? ç解ããèš±ããããã«ä¿®æ£ããŠãã ããã
- éå» 24 æéã«ãã€ãã®ããŒã¿ã«ã§ãããªé話ãäœåè¡ãããŸããã?ãã®å質ã¯ã©ããããã§ãã?ãŸãããã£ãã«/ãããã¯ãŒã¯ã«åé¡ã¯ãããŸããã?
- ææ°ã®ãµãŒãã¹æŽæ°ã§ãœãŒã¹ããã³ã³ãã€ã«ãããã¯ã©ã€ã¢ã³ãã«å±éãããã·ã¹ãã æ©èœ (PHP çšã® C æ¡åŒµæ©èœ) ã¯ã©ã®çšåºŠããŸãæ©èœããŸãã? ã»ã°ã¡ã³ããŒã·ã§ã³éåã¯ãããŸãã?
- 顧客ããŒã¿ã¯ PHP ã¡ã¢ãªã«åãŸããŸãã? ããã»ã¹ã«å²ãåœãŠãããã¡ã¢ãªã®è¶ éã«é¢ãããšã©ãŒ (ãã¡ã¢ãªäžè¶³ã) ã¯ãããŸãã? èŠã€ããŠç¡ååããã
å ·äœçãªäŸã瀺ããŸãã 培åºããè€æ°ã¬ãã«ã®ãã¹ãã«ãããããããã¯ã©ã€ã¢ã³ãã¯éåžžã«éæšæºçãªã±ãŒã¹ãšç Žæããå ¥åããŒã¿ãæ±ããŠãããããè¿·æãªäºæãã¬ãšã©ãŒãåãåãããµã€ã¬ã³ã鳎ãããããè¿ éã«ä¿®æ£ããããã»ã¹ãå§ãŸããŸããã
ããã«ãkibana ã䜿çšãããšãæå®ããã€ãã³ãã®éç¥ãæŽçããããšãã§ããçæéã®ãã¡ã«ç€Ÿå
ã®ãã®ããŒã«ã¯ããã¯ãã«ã« ãµããŒããéçºãã QA ã«è³ããŸã§ãããŸããŸãªéšéã®æ°å人ã®åŸæ¥å¡ã«ãã£ãŠäœ¿çšãããããã«ãªããŸããã
瀟å ã®ããããéšéã®ã¢ã¯ãã£ããã£ã远跡ããã³æž¬å®ããã®ã䟿å©ã«ãªããŸããããµãŒããŒäžã®ãã°ãæåã§åæãã代ããã«ã解æãã°ãäžåºŠã»ããã¢ããããŠãšã©ã¹ãã£ã㯠ã¯ã©ã¹ã¿ãŒã«éä¿¡ããã ãã§ãããšãã° kibana ã§çèããããšãã§ããŸããããã·ã¥ããŒãã«ã¯ãå æã® 3D ããªã³ã¿ãŒã§å°å·ãããåé ã®åç«ã®è²©å£²æ°ã衚瀺ãããŸãã
åºæ¬çãªããžãã¹åæ
äŒæ¥ã«ãããããžãã¹åæã¯ãå€ãã®å ŽåãExcel ãç©æ¥µçã«äœ¿çšããããšããå§ãŸãããšã¯èª°ããç¥ã£ãŠããŸãã ãããéèŠãªããšã¯ãããã ãã§ã¯çµãããªããšããããšã§ãã ã¯ã©ãŠãããŒã¹ã® Google Analytics ãç«ã«æ²¹ã泚ããããããã«è¯ããã®ã«æ £ãå§ããŠããŸããŸãã
調åã®ãšããçºå±ãéããåœç€Ÿã§ã¯ããã倧èŠæš¡ãªããŒã¿ã䜿ã£ãããéäžçãªäœæ¥ã®ãäºèšè ãããã¡ãã¡ã«çŸãå§ããŸããã ãã詳现ã§å€é¢çãªã¬ããŒãã®å¿ èŠæ§ãå®æçã«çŸãå§ããããŸããŸãªéšéã®äººã ã®åªåã«ããããã°ããåã«ãClickHouse ãš PowerBI ãçµã¿åããããã·ã³ãã«ã§å®çšçãªãœãªã¥ãŒã·ã§ã³ãçµç¹ãããŸããã
ããªãé·ãéããã®æè»ãªãœãªã¥ãŒã·ã§ã³ã¯éåžžã«åœ¹ã«ç«ã¡ãŸããããåŸã ã«ãClickHouse ã¯ãŽã ã§ã¯ãªãããã®ããã«å²ç¬ããããšã¯ã§ããªããšããç解ãåŸããå§ããŸããã
ããã§ãClickHouseãDruidãVerticaãAmazon RedShift (postgres ããŒã¹) ãªã©ã¯ãéåžžã«äŸ¿å©ãªåæ (åèšãéèšãåããšã®æå°å€ãšæ倧å€ãããã³ããã€ãã®å¯èœãªçµå) ã®ããã«æé©åãããåæãšã³ãžã³ã§ããããšãããç解ããããšãéèŠã§ãã ïŒã ãªããªãMySQL ãæ¢ç¥ã®ä»ã® (è¡æå) ããŒã¿ããŒã¹ãšã¯ç°ãªãããªã¬ãŒã·ã§ãã« ããŒãã«ã®åãå¹ççã«æ ŒçŽã§ããããã«ç·šæãããŠããŸãã
æ¬è³ªçã«ãClickHouse ã¯ããã容éã®å€§ãããããŒã¿ããŒã¹ãã«éããããã€ã³ãããšã®æ¿å ¥ã¯ããã»ã©äŸ¿å©ã§ã¯ãããŸãã (ãããæå³ãããŠããã®ã§ããã¹ãŠåé¡ãããŸãã) ããå¿«é©ãªåæãšãããŒã¿ãæäœããããã®èå³æ·±ã匷åãªé¢æ°ã®ã»ãããåããŠããŸãã ã¯ããã¯ã©ã¹ã¿ãŒãäœæããããšãã§ããŸããããããé¡åŸ®é¡ã䜿ã£ãŠéãæã€ããšãå®å šã«æ£ããããã§ã¯ãªãããšã¯ç解ãããŠãããç§ãã¡ã¯ä»ã®è§£æ±ºçãæ¢ãå§ããŸããã
Python ãšã¢ããªã¹ãã®éèŠ
åœç€Ÿã«ã¯ã10 ïœ 20 幎éãã»ãŒæ¯æ¥ãPHPãJavaScriptãC#ãC/C++ãJavaãGoãRustãPythonãBash ã§ã³ãŒããæžããŠããéçºè
ãå€æ°ããŸãã ãŸããçµ±èšã®æ³åã«åœãŠã¯ãŸããªããä¿¡ããããªããããªçœå®³ãè€æ°åçµéšããçµéšè±å¯ãªã·ã¹ãã 管çè
ãæ°å€ãããŸã (ããšãã°ãRAID-10 å
ã®ãã£ã¹ã¯ã®å€§éšåã匷ãèœé·ã«ãã£ãŠç Žå£ãããå Žåãªã©)ã ãã®ãããªç¶æ³ã§ã¯ãé·ãéããPython ã¢ããªã¹ãããäœãªã®ãã¯æ確ã§ã¯ãããŸããã§ããã Python 㯠PHP ã«äŒŒãŠããŸãããååãå°ãé·ãã ãã§ãã€ã³ã¿ããªã¿ã®ãœãŒã¹ ã³ãŒãã«ã¯ç²Ÿç¥ãå€åãããç©è³ªã®çè·¡ãå°ãå°ãªããªã£ãŠããŸãã ããããããå€ãã®åæã¬ããŒããäœæãããã«ã€ããŠãçµéšè±å¯ãªéçºè
ã¯ãnumpyãpandasãmatplotlibãseaborn ãªã©ã®ããŒã«ã®çãå°éåéã®éèŠæ§ããŸããŸãç解ãå§ããŸããã
ãããã決å®çãªåœ¹å²ãæãããã®ã¯ããããžã¹ãã£ãã¯ååž°ããšããèšèãšããããããpyspark ã䜿çšãã倧èŠæš¡ããŒã¿ã®å¹æçãªã¬ããŒãã®ãã¢ã³ã¹ãã¬ãŒã·ã§ã³ã®çµã¿åããã«ããåŸæ¥å¡ã®çªç¶ã®å€±ç¥ã§ããã
Apache Sparkããªã¬ãŒã·ã§ãã«ä»£æ°ãå®ç§ã«é©åãããã®é¢æ°ãã©ãã€ã ãããã³ãã®æ©èœã¯ãMySQL ã«æ £ããŠããéçºè ã«å€§ããªå°è±¡ãäžããçµéšè±å¯ãªã¢ããªã¹ããšã®ã©ã³ã¯ã匷åããå¿ èŠæ§ãæ¥ã«æ¥ã«æããã«ãªããŸããã
Apache Spark/Hadoop ã®ãããªãè©Šã¿ãšãã¹ã¯ãªããã©ããã«é²ãŸãªãã£ãç¹
ããããSpark ã«ã¯ã·ã¹ãã çã«äœãåé¡ãããããåã«æãããæŽãå¿ èŠãããããšãããã«æããã«ãªããŸããã Hadoop/MapReduce/Lucene ã¹ã¿ãã¯ãããªãçµéšè±å¯ãªããã°ã©ãã«ãã£ãŠäœæããããã®ã§ããå Žå (Java ã®ãœãŒã¹ ã³ãŒãã Lucene ã§ã® Doug Cutting ã®ã¢ã€ãã¢ããã芳å¯ããã°æããã§ã)ãSpark ã¯çªç¶ããšããŸããã¯ãªèšèª Scala ã§æžãããŠããŸããå®çšæ§ã®èŠ³ç¹ããéåžžã«ç©è°ãéžããŠãããçŸåšã¯éçºãããŠããŸããã ãããŠããªãã¥ãŒã¹æäœã®ããã®ã¡ã¢ãªå²ãåœãŠã«ããéè«ççã§ããŸãéææ§ã®äœãäœæ¥ (å€ãã®ããŒãäžåºŠã«å°çãã) ã«ãã Spark ã¯ã©ã¹ã¿ãŒã§ã®èšç®ã®å®æçãªäœäžã«ãããæé·ã®äœå°ããããã®ã®åšå²ã«åŸå ãçæãããŸãã ããã«ãç¶æ³ã¯ãå€æ°ã®å¥åŠã«éããããŒããæãç解ã§ããªãå Žæã§æé·ããäžæãã¡ã€ã«ãããã³èšå€§ãª jar äŸåé¢ä¿ã«ãã£ãŠããã«æªåããŸãããããã«ãããã·ã¹ãã 管çè ã¯ãåäŸã®é ããããç¥ãããŠããææ ãã€ãŸãæ¿ããæãã¿ (ãŸãã¯ãããã) ãæ±ãããã«ãªããŸããã圌ãã¯ç³é¹žã§æãæŽãå¿ èŠããããŸããïŒã
ãã®çµæãç§ãã¡ã¯ãApache Spark (Spark StreamingãSpark SQL ãå«ã) ãš Hadoop ãšã³ã·ã¹ãã (ãªã©) ãç©æ¥µçã«äœ¿çšããããã€ãã®å
éšåæãããžã§ã¯ãããçã延ã³ãŸãããã æéãçµã€ã«ã€ããŠããããããéåžžã«ããŸãæºåããŠç£èŠããæ¹æ³ãåŠã³ãããŒã¿ã®æ§è³ªã®å€åãåäžãª RDD ããã·ã¥ã®äžåè¡¡ã«ãã£ãŠãããããçªç¶ã¯ã©ãã·ã¥ããããšã¯ã»ãšãã©ãªããªããŸãããããã§ã«æºåãã§ããŠãããã®ã䜿çšããããšãã欲æ±ã¯ã ãã¯ã©ãŠãã®ã©ããã§æŽæ°ããã³ç®¡çããããŸããŸã匷åã«ãªããŸããã ç§ãã¡ãã¢ããŸã³ ãŠã§ã ãµãŒãã¹ã®æ¢æã®ã¯ã©ãŠã ã¢ã»ã³ããªã䜿çšããããšããã®ã¯ãã®æã§ããã
åæçšã®ã©ããŒãã¡ã€ã«ã¹ãã¬ãŒãžã¯ç·æ¥ã®ããŒãºã§ã
äœã®ããŸããŸãªéšåã«ç«å·ãè² ããHadoop/Sparkãã調çãããçµéšã¯ç¡é§ã§ã¯ãããŸããã§ããã ããŒããŠã§ã¢é害ã«åŒ·ããããŸããŸãªã·ã¹ãã ããããŸããŸãªåœ¢åŒã§ãã¡ã€ã«ãä¿åãããã®ããŒã¿ããã¬ããŒãçšã®å¹ççãã€æéå¹çã®é«ããµã³ãã«ãäœæã§ãããå®äŸ¡ã§ä¿¡é Œæ§ã®é«ãåäžã®ãã¡ã€ã« ã¹ãã¬ãŒãžãäœæããå¿ èŠæ§ããŸããŸãé«ãŸã£ãŠããŸããã¯ãªã¢ã
ãŸãããã®ãã©ãããã©ãŒã ã®ãœãããŠã§ã¢ã®æŽæ°ãã20 ããŒãžã® Java ãã¬ãŒã¹ãèªã¿åããSpark History Server ãšããã¯ã©ã€ãä»ãè«çŒé¡ã䜿çšããŠã¯ã©ã¹ã¿ãŒã®æ°ããã¡ãŒãã«ã«ããã詳现ãªãã°ãåæãããšããæ°å¹Žã®æªå€¢ã«ãªããªãããã«ããããšãæããŸããã ãœãŒã¹ ããŒã¿ ããŒãã£ã·ã§ãã³ã° ã¢ã«ãŽãªãºã ãé©åã«éžæãããŠããªãããã«ããŒã¿åæžã¯ãŒã«ãŒãã¡ã¢ãªäžè¶³ã«ãªããéçºè ã®æšæº MapReduce ãªã¯ãšã¹ããå®è¡ãåæ¢ããå Žåã«ãå®æçã«å éšã«æœãå¿ èŠã®ãªããã·ã³ãã«ã§ééçãªããŒã«ãå¿ èŠã§ããã
Amazon S3 㯠DataLake ã®åè£ã§ãã?
Hadoop/MapReduce ã®çµéšãããã¹ã±ãŒã©ãã«ã§ä¿¡é Œæ§ã®é«ããã¡ã€ã« ã·ã¹ãã ãšããã®äžã«ããã¹ã±ãŒã©ãã«ãªã¯ãŒã«ãŒãå¿ èŠã§ããããããã¯ãŒã¯äžã§ããŒã¿ã転éããªãããã«ããŒã¿ã«ãè¿ã¥ããå¿ èŠãããããšãåãããŸããã äœæ¥è ã¯ããŸããŸãªåœ¢åŒã®ããŒã¿ãèªã¿åããå¿ èŠããããŸãããäžèŠãªæ å ±ãèªã¿åãããäœæ¥è ã«ãšã£ãŠäœ¿ãããã圢åŒã§ããŒã¿ãäºåã«ä¿åã§ããããšãæãŸããã§ãã
ããäžåºŠãåºæ¬çãªèãæ¹ãã ããã°ããŒã¿ãåäžã®ã¯ã©ã¹ã¿ãŒåæãšã³ãžã³ã«ã泚ã蟌ããããšã¯æãŸãããããŸãããé ããæ©ããæ©èœäžå šã«é¥ããéãã·ã£ãŒãåãå¿ èŠã«ãªããŸãã ãã¡ã€ã« (åãªããã¡ã€ã«) ããããããã圢åŒã§ä¿åããå¥ã®ããããããããŒã«ã䜿çšããŠãããã«å¯ŸããŠå¹æçãªåæã¯ãšãªãå®è¡ããããšèããŠããŸãã ãããŠãããŸããŸãªåœ¢åŒã®ãã¡ã€ã«ããŸããŸãå€ããªãã§ãããã ãŸãããšã³ãžã³ã§ã¯ãªããœãŒã¹ ããŒã¿ãã·ã£ãŒãåããããšããå§ãããŸãã æ¡åŒµå¯èœã§ãŠãããŒãµã«ãª DataLake ãå¿ èŠã§ãããšå€æããŸãã...
Hadoop ããç¬èªã®ãã§ãããæºåããããšãªãã䜿ãæ £ããããç¥ãããã¹ã±ãŒã©ãã«ãªã¯ã©ãŠã ã¹ãã¬ãŒãž Amazon S3 ã«ãã¡ã€ã«ãä¿åãããã©ããªãã§ãããã?
å人ããŒã¿ããå°ãªããããšã¯æããã§ããããããåãåºããŠãå¹æçã«æŽ»çšãããå Žåãä»ã®ããŒã¿ã¯ã©ããªãã§ãããã?
ã¢ããŸã³ ãŠã§ã ãµãŒãã¹ã®ã¯ã©ã¹ã¿ãŒãããã°ããŒã¿ãåæã®ãšã³ã·ã¹ãã - éåžžã«ç°¡åãªèšèã§èšããš
AWS ã§ã®çµéšããå€æãããšãApache Hadoop/MapReduce ã¯ãDataPipeline ãµãŒãã¹ãªã©ãããŸããŸãªãœãŒã¹ã§é·æéã«ããã£ãŠç©æ¥µçã«äœ¿çšãããŠããŸãã (ååãããããŸããã§ãã圌ãã¯æ£ããæºåããæ¹æ³ãåŠãã§ããŸãã)ã ããã§ã¯ãDynamoDB ããŒãã«ã®ããŸããŸãªãµãŒãã¹ããã®ããã¯ã¢ãããã»ããã¢ããããŸãã
ãããŠããããã¯æ°å¹Žåããæèšä»æãã®ããã«ãçµã¿èŸŒã¿ã® Hadoop/MapReduce ã¯ã©ã¹ã¿ãŒäžã§å®æçã«å®è¡ãããŠããŸãã ãèšå®ãããããšã¯å¿ããã:
ãŸããã¢ããªã¹ãçšã«ã¯ã©ãŠãã« Jupiter ã©ããããããã»ããã¢ããããAWS SageMaker ãµãŒãã¹ã䜿çšã㊠AI ã¢ãã«ããã¬ãŒãã³ã°ããŠæŠéã«å°å
¥ããããšã§ãããŒã¿æªéåŽæã«å¹æçã«åãçµãããšãã§ããŸãã ç§ãã¡ã«ãšã£ãŠã¯æ¬¡ã®ããã«ãªããŸãã
ãããŠã¯ããèªåèªèº«ãŸãã¯ã¯ã©ãŠãå
ã®ã¢ããªã¹ãçšã«ã©ããããããéžæããããã Hadoop/Spark ã¯ã©ã¹ã¿ãŒã«æ¥ç¶ããèšç®ãè¡ã£ãŠããã¹ãŠã確å®ããããšãã§ããŸãã
åã
ã®åæãããžã§ã¯ãã«ã¯éåžžã«äŸ¿å©ã§ãäžéšã§ã¯å€§èŠæš¡ãªèšç®ãšåæã« EMR ãµãŒãã¹ã䜿çšããŠæåããŸããã DataLake ã®ã·ã¹ãã ãœãªã¥ãŒã·ã§ã³ã«ã€ããŠã¯ã©ãã§ãããããããŸããããŸãã? ãã®ç¬éãç§ãã¡ã¯åžæãšçµ¶æã®çéã§æ玢ãç¶ããŸããã
AWS Glue - 匷åããã Apache Spark ããã¡ããšããã±ãŒãžå
AWS ã«ã¯ç¬èªããŒãžã§ã³ã®ãHive/Pig/Sparkãã¹ã¿ãã¯ãããããšãå€æããŸããã Hive ã®åœ¹å²ãã€ãŸãDataLake å ã®ãã¡ã€ã«ãšãã®çš®é¡ã®ã«ã¿ãã°ã¯ããããŒã¿ ã«ã¿ãã°ããµãŒãã¹ã«ãã£ãŠå®è¡ãããŸãããã®ãµãŒãã¹ã¯ãApache Hive 圢åŒãšã®äºææ§ãé ããŸããã ãã¡ã€ã«ã®å Žæãšåœ¢åŒã«é¢ããæ å ±ããã®ãµãŒãã¹ã«è¿œå ããå¿ èŠããããŸãã ããŒã¿ã¯ s3 ã ãã§ãªãããŒã¿ããŒã¹ã«ãååšããŸãããããã¯ãã®æçš¿ã®äž»é¡ã§ã¯ãããŸããã DataLake ããŒã¿ ãã£ã¬ã¯ããªã¯æ¬¡ã®ããã«æ§æãããŠããŸãã
ãã¡ã€ã«ãç»é²ãããŸããã ãã¡ã€ã«ãæŽæ°ãããŠããå Žåã¯ãæåãŸãã¯ã¹ã±ãžã¥ãŒã«ã«åŸã£ãŠã¯ããŒã©ãŒãèµ·åããã¬ã€ã¯ããã¯ããŒã©ãŒã«é¢ããæ
å ±ãæŽæ°ããŠä¿åããŸãã ãã®åŸãæ¹ããã®ããŒã¿ãåŠçããçµæãã©ããã«ã¢ããããŒãã§ããŸãã æãåçŽãªã±ãŒã¹ã§ã¯ãs3 ã«ãã¢ããããŒãããŸãã ããŒã¿åŠçã¯ã©ãã§ãå®è¡ã§ããŸãããAWS Glue API ã«ããé«åºŠãªæ©èœã䜿çšã㊠Apache Spark ã¯ã©ã¹ã¿ãŒã§åŠçãèšå®ããããšããå§ãããŸãã å®éãpyspark ã©ã€ãã©ãªã䜿çšããŠå€ãè¯ã䜿ãæ
£ãã Python ã³ãŒããååŸããHadoop ã®äžèº«ãæãäžã㊠docker-moker ã³ã³ããããã©ãã°ããããäŸåé¢ä¿ã®ç«¶åãæé€ãããããããšãªããç£èŠä»ãã®ããçšåºŠã®å®¹éãæã€ã¯ã©ã¹ã¿ãŒã® N ããŒãäžã§ãã®å®è¡ãæ§æã§ããŸãã ã
ãŸãããŠãã·ã³ãã«ãªã¢ã€ãã¢ã Apache Spark ãæ§æããå¿ èŠã¯ãããŸãããå¿ èŠãªã®ã¯ãpyspark çšã® Python ã³ãŒããäœæãããã¹ã¯ãããäžã§ããŒã«ã«ã«ãã¹ãããŠããããœãŒã¹ ããŒã¿ã®å Žæãšçµæã®ä¿åå Žæãæå®ããŠãã¯ã©ãŠãå ã®å€§èŠæš¡ãªã¯ã©ã¹ã¿ãŒäžã§å®è¡ããã ãã§ãã å Žåã«ãã£ãŠã¯ãããå¿ èŠã§äŸ¿å©ã§ãããã®èšå®æ¹æ³ã¯æ¬¡ã®ãšããã§ãã
ãããã£ãŠãs3 ã®ããŒã¿ã䜿çšã㊠Spark ã¯ã©ã¹ã¿ãŒã§äœããèšç®ããå¿
èŠãããå Žåã¯ãPython/pyspark ã§ã³ãŒããäœæãããã¹ãããŠãã¯ã©ãŠãã«å¹žéããããããŸãã
ãªãŒã±ã¹ãã¬ãŒã·ã§ã³ã«ã€ããŠã¯ã©ãã§ããïŒ ã¿ã¹ã¯ãèœã¡ãŠæ¶ããŠããŸã£ããã©ããªãã§ãããã? ã¯ããApache Pig ã¹ã¿ã€ã«ã§çŸãããã€ãã©ã€ã³ãäœæããããšãææ¡ãããŠãããç§ãã¡ãè©ŠããŠã¿ãŸããããä»ã®ãšãããPHP ãš JavaScript ã§æ·±ãã«ã¹ã¿ãã€ãºããããªãŒã±ã¹ãã¬ãŒã·ã§ã³ã䜿çšããããšã«ããŸãã (èªç¥çäžååãããããšã¯ç解ããŠããŸãããããã¯æ©èœããŸãã幎ããšã©ãŒãªãïŒã
ã¬ã€ã¯ã«ä¿åããããã¡ã€ã«ã®åœ¢åŒãããã©ãŒãã³ã¹ã®éµãšãªããŸã
ããã« XNUMX ã€ã®éèŠãªãã€ã³ããç解ããããšãéåžžã«éèŠã§ãã ã¬ã€ã¯å ã®ãã¡ã€ã« ããŒã¿ã«å¯Ÿããã¯ãšãªãã§ããã ãæ©ãå®è¡ããæ°ããæ å ±ãè¿œå ããããšãã«ããã©ãŒãã³ã¹ãäœäžããªãããã«ããã«ã¯ã次ã®ããšãè¡ãå¿ èŠããããŸãã
- ãã¡ã€ã«ã®åãåå¥ã«ä¿åããŸã (åã®å 容ãç解ããããã«ãã¹ãŠã®è¡ãèªãå¿ èŠããªããªããŸã)ã ãã®ããã«ãå§çž®ä»ãã®å¯æšçŽ°å·¥ã®åœ¢åŒãæ¡çšããŸããã
- ãã¡ã€ã«ãèšèªã幎ãæãæ¥ãé±ãªã©ã®ãã©ã«ããŒã«åå²ããããšãéåžžã«éèŠã§ãã ãã®ã¿ã€ãã®ã·ã£ãŒãã£ã³ã°ãç解ãããšã³ãžã³ã¯ãè¡å ã®ãã¹ãŠã®ããŒã¿ã調ã¹ãã«ãå¿ èŠãªãã©ã«ããŒã®ã¿ã調ã¹ãŸãã
åºæ¬çã«ããã®æ¹æ³ã§ãäžéšã«åãäžããããåæãšã³ãžã³ã«ãšã£ãŠæãå¹ççãªåœ¢åŒã§ãœãŒã¹ ããŒã¿ãã¬ã€ã¢ãŠãããŸããåæãšã³ãžã³ã¯ãã·ã£ãŒã ãã©ã«ããŒå ã§ãã£ãŠãããã¡ã€ã«ããå¿ èŠãªåã®ã¿ãéžæããŠå ¥åããŠèªã¿åãããšãã§ããŸãã ããŒã¿ãã©ãã«ããåãããå¿ èŠã¯ãããŸãã (ã¹ãã¬ãŒãžããã³ã¯ããã ãã§ã)ããã ã¡ã«æ£ãã圢åŒã§ãã¡ã€ã« ã·ã¹ãã ã«è³¢æã«é 眮ããã ãã§ãã ãã¡ããã巚倧㪠CSV ãã¡ã€ã«ã DataLake ã«ä¿åããããšã¯ãåãæœåºããããã«ã¯ã©ã¹ã¿ãŒã«ãã£ãŠæåã« XNUMX è¡ãã€èªã¿åãããå¿ èŠããããããããŸããå§ãã§ããªãããšã¯ããã§æããã§ãã ãªããã®ãããªããšãèµ·ãã£ãŠããã®ãããŸã æããã§ãªãå Žåã¯ãäžèšã® XNUMX ã€ã®ç¹ãããäžåºŠèããŠãã ããã
AWS Athena - ã³ã£ããç®±
ãããŠãæ¹ãäœæããŠãããšãã«ãã©ããããããå¶ç¶ Amazon Athena ã«ééããŸããã 巚倧ãªãã° ãã¡ã€ã«ãæ£ãã (å¯æšçŽ°å·¥ã®) å圢åŒã§ãã©ã«ã㌠ã·ã£ãŒãã«æ éã«é 眮ããããšã§ãApache Spark/Glue ã¯ã©ã¹ã¿ãŒã䜿çšããã«ããããããéåžžã«æçãªéžæãéåžžã«è¿ éã«è¡ããã¬ããŒããäœæã§ããããšãçªç¶å€æããŸããã
S3 ã®ããŒã¿ãå©çšãã Athena ãšã³ãžã³ã¯ãäŒèª¬çãªãšã³ãžã³ã«åºã¥ããŠããŸãã
Athena ãžã®ãªã¯ãšã¹ãã®äŸ¡æ Œèšå®ãèå³æ·±ãã§ãã ç§ãã¡ãæ¯æããŸã
ãããŠãæ£ããã·ã£ãŒãã£ã³ã°ããããã©ã«ããŒããå¿ èŠãªåã®ã¿ããªã¯ãšã¹ãããããšã«ãããAthena ãµãŒãã¹ã«ã¯æé¡æ°åãã«ã®è²»çšããããããšãå€æããŸããã ã¯ã©ã¹ã¿ãŒã®åæãšæ¯èŒãããšãã»ãšãã©ç¡æã§çŽ æŽãããã§ãã
ã¡ãªã¿ã«ãs3 ã§ããŒã¿ãã·ã£ãŒãã£ã³ã°ããæ¹æ³ã¯æ¬¡ã®ãšããã§ãã
ãã®çµæãçæéã®ãã¡ã«ãæ
å ±ã»ãã¥ãªãã£ããåæãŸã§ã瀟å
ã®ãŸã£ããç°ãªãéšéã Athena ã«ç©æ¥µçã«ãªã¯ãšã¹ããéä¿¡ãå§ããæ°ãæãæ°ã¶æãšããããªãé·æéã«ããã£ãŠãããã°ãããŒã¿ããæçšãªåçãæ°ç§ã§è¿
éã«åãåããŸããã XNUMX ãæãªã©ãP.
ããããç§ãã¡ã¯ããã«é²ãã§ãã¯ã©ãŠãã«çããæ±ãå§ããŸããã
ãã®çµæãå¹ççãªåæå圢åŒã§ãããŒã¿ããã©ã«ããŒã«é©åã«ã·ã£ãŒãã£ã³ã°ããŠãããŒã¿ã s3 ã«ä¿åããããšã決å®ããDataLake ãšé«éã§å®äŸ¡ãªåæãšã³ãžã³ãç¡æã§åãåããŸããã ãããŠåœŒã¯ç€Ÿå ã§ãšãŠã人æ°è ã«ãªã£ãããªããªã... SQL ãç解ããã¯ã©ã¹ã¿ãŒã®éå§/åæ¢/ã»ããã¢ãããããæ¡éãã«é«éã«åäœããŸãã ãçµæãåããªãããªããã£ãšãéãæãå¿ èŠãããã®ã§ãããã?ã
Athenaãžã®ãªã¯ãšã¹ãã¯ãããªæãã ãã¡ãããå¿
èŠã«å¿ããŠãååãªåœ¢åŒãäœæã§ããŸã
æèŠ
é·ããŠèŠããéã®ãã¯èšããŸã§ããããŸãããããªã¹ã¯ãšè€éãã®ã¬ãã«ããµããŒãã®ã³ã¹ããåžžã«é©åã«è©äŸ¡ããªãããã¹ããŒããšææã³ã¹ãã®äž¡æ¹ã§æºè¶³ããŠããã ãã DataLake ãšåæã®ãœãªã¥ãŒã·ã§ã³ãèŠã€ããŸããã
äŒç€Ÿã®ãŸã£ããç°ãªãéšéã®ããŒãºã«åãããŠãå¹æçãé«éãäœã³ã¹ãã§éçšã§ãã DataLake ãæ§ç¯ããããšã¯ãã¢ãŒããã¯ããšããŠåããããšããªããæ£æ¹åœ¢ã®äžã«æ£æ¹åœ¢ãæãæ¹æ³ãç¥ããªãçµéšè±å¯ãªéçºè ã§ãå®å šã«èœåã®ç¯å²å ã§ããããšãããããŸãããç¢å°ã䜿çšããŠãHadoop ãšã³ã·ã¹ãã ã® 50 ã®çšèªãç解ããŸãã
æ ã®åããç§ã®é ã¯ããªãŒãã³ãœãããŠã§ã¢ãšã¯ããŒãºããœãããŠã§ã¢ã®å€ãã®éçåç©åãšãåå«ãžã®è²¬ä»»ã®è² æ ã®ç解ããåè£ããŠããŸããã nagios/munin -> elastic/kibana -> Hadoop/Spark/s3... ãšããåçŽãªããŒã«ãã DataLake ã®æ§ç¯ãéå§ãããã£ãŒãããã¯ãåéããçºçããããã»ã¹ã®ç©çãæ·±ãç解ããŸãã ãã¹ãŠãè€éã§ææ§ã§ã - ãããæµã競äºçžæã«æž¡ããŸãã
ã¯ã©ãŠãã«ã¯è¡ããããªããããªãŒãã³ãœãŒã¹ ãããžã§ã¯ãã®ãµããŒããæŽæ°ããããé©çšãåžæããå Žåã¯ãHadoop ãš Presto ãæèŒããå®äŸ¡ãªãªãã£ã¹ ãã·ã³äžã§ãç§ãã¡ã®ãã®ãšåæ§ã®ã¹ããŒã ãããŒã«ã«ã§æ§ç¯ã§ããŸãã éèŠãªããšã¯ãç«ã¡æ¢ãŸããã«åé²ããæ°ããã·ã³ãã«ã§æ確ãªè§£æ±ºçãæ¢ãããšã§ããããããã°ãã¹ãŠã¯ééããªãããŸããããŸãã çããé 匵ã£ãŠãã ããããŸããäŒãããŸãããïŒ
åºæïŒ habr.com