ã«ãã
ã©ã®ãã¯ãããžãŒ ã¹ãã«ãæã人æ°ãããããç解ããããã«ã2020 幎 XNUMX æçŸåšã®ããŒã¿ ãšã³ãžãã¢ã®ããžã·ã§ã³ã®æ±äººãåæããŸããã 次ã«ããã®çµââæãããŒã¿ ãµã€ãšã³ãã£ã¹ãã®æ±äººã«é¢ããçµ±èšãšæ¯èŒãããšãããããã€ãã®èå³æ·±ãéããæããã«ãªããŸããã
å眮ãã¯ãã®ãããã«ããŠãæ±äººæ å ±ã§æãé »ç¹ã«èšåããããã¯ãããžãŒã®ããã XNUMX ã以äžã«ç€ºããŸãã
2020 幎ã®ããŒã¿ ãšã³ãžãã¢ã®ããžã·ã§ã³ã§æ¬ å¡äžã®ãã¯ãããžãŒã«ã€ããŠã®èšå
ãããèããŠã¿ãŸãããã
ããŒã¿ãšã³ãžãã¢ã®è²¬ä»»
ä»æ¥ãããŒã¿ ãšã³ãžãã¢ã®ä»äºã¯çµç¹ã«ãšã£ãŠéåžžã«éèŠã§ããããŒã¿ ãšã³ãžãã¢ã¯ãæ å ±ãä¿åããä»ã®åŸæ¥å¡ãäœæ¥ã§ãã圢åŒã«å€æãã責任ãè² ããŸãã ããŒã¿ ãšã³ãžãã¢ã¯ãè€æ°ã®ãœãŒã¹ããããŒã¿ãã¹ããªãŒãã³ã°ãŸãã¯ãããåŠçããããã®ãã€ãã©ã€ã³ãæ§ç¯ããŸãã ãã®åŸããã€ãã©ã€ã³ã¯æœåºãå€æãèªã¿èŸŒã¿æäœ (ã€ãŸã ETL ããã»ã¹) ãå®è¡ããããŒã¿ããããªã䜿çšã«é©ãããã®ã«ããŸãã ãã®åŸãããŒã¿ã¯ãã詳现ãªåŠçã®ããã«ã¢ããªã¹ããããŒã¿ ãµã€ãšã³ãã£ã¹ãã«éä¿¡ãããŸãã æåŸã«ãããŒã¿ã¯ããã·ã¥ããŒããã¬ããŒããæ©æ¢°åŠç¿ã¢ãã«ã§ãã®æ ãçµããŸãã
ç§ã¯ãçŸæç¹ã§ããŒã¿ ãšã³ãžãã¢ã®ä»äºã«ãããŠã©ã®ãã¯ãããžãŒãæãéèŠããããã«ã€ããŠçµè«ãå°ãããã®æ å ±ãæ¢ããŠããŸããã
ã¡ãœãã
XNUMXã€ã®æ±äººãµã€ãããæ
å ±ãéããŸããã
ããŒã¯ãŒãããšã«ãåãµã€ãã®ããã¹ãç·æ°ãããããããå²åãåå¥ã«èšç®ããXNUMX ã€ã®ãœãŒã¹ã®å¹³åãèšç®ããŸããã
çµæ
以äžã¯ãXNUMX ã€ã®çŸå Žãã¹ãŠã§æé«ã¹ã³ã¢ãç²åŸãã XNUMX ã®æè¡ããŒã¿ ãšã³ãžãã¢ãªã³ã°çšèªã§ãã
åãæ°åã衚圢åŒã§ç€ºããŸãã
é çªã«è¡ããŸãããã
çµæã®æŠèŠ
調æ»å¯Ÿè±¡ã®æ±äººã® XNUMX åã® XNUMX 以äžã« SQL ãš Python ã®äž¡æ¹ãå«ãŸããŠããŸãã æåã«å匷ããã®ã¯ããã XNUMX ã€ã®ãã¯ãããžãŒã§ãã
æ±äººã®çŽåæ°ã§ Spark ã®ååãæããããŠããŸãã
AWS ã¯æ±äººæ
å ±ã®çŽ 45% ã«æ²èŒãããŠããŸãã ããã¯ãAmazon ã«ãã£ãŠè£œé ãããã¯ã©ãŠã ã³ã³ãã¥ãŒãã£ã³ã° ãã©ãããã©ãŒã ã§ãã ãã¹ãŠã®ã¯ã©ãŠã ãã©ãããã©ãŒã ã®äžã§æ倧ã®åžå Žã·ã§ã¢ãæã£ãŠããŸãã
次㫠Java ãš Hadoop ãç¶ããå
åŒã®å²å㯠40% ãå°ãè¶
ããŠããŸãã
ãŸãã§ã¿ã€ã ãã·ã³ã«ä¹ã£ãŠãããããªæ°åã
次ã«ãHiveãScalaãKafkaãNoSQL ã衚瀺ãããŸãããããã®ãã¯ãããžãŒã¯ãããããæåºãããæ±äººæ
å ±ã® XNUMX åã® XNUMX ã§èšåãããŠããŸãã Apache Hive ã¯ããSQL ã䜿çšããŠãåæ£ã¹ãã¢ã«ååšãã倧èŠæš¡ãªããŒã¿ã»ããã®èªã¿åããæžã蟌ã¿ã管çã容æã«ãããããŒã¿ ãŠã§ã¢ããŠã¹ ãœãããŠã§ã¢ã§ãã
ããŒã¿ãµã€ãšã³ãã£ã¹ãã®æ±äººæ¡ä»¶ãšã®æ¯èŒ
ããã§ã¯ãããŒã¿ ãµã€ãšã³ã¹ã®éçšè ã®éã§æãäžè¬ç㪠XNUMX ã®ãã¯ãããžãŒçšèªã玹ä»ããŸãã ãã®ãªã¹ãã¯ãããŒã¿ ãšã³ãžãã¢ãªã³ã°ã«é¢ããŠäžã§èª¬æããã®ãšåãæ¹æ³ã§ååŸããŸããã
2020 幎ã®ããŒã¿ ãµã€ãšã³ãã£ã¹ãã®æ±äººã«ããããã¯ãããžãŒã«é¢ããèšå
ç·æ°ã«ã€ããŠèšãã°ã以åã«æ€èšãããŠããæ¡çšãšæ¯èŒããŠãæ¬ å¡æ°ã¯ 28% å¢å ããŸãã (12 察 013)ã ããŒã¿ ãµã€ãšã³ãã£ã¹ãã®æ±äººãããŒã¿ ãšã³ãžãã¢ãããäžè¬çã§ã¯ãªããã¯ãããžãŒãèŠãŠã¿ãŸãããã
ããŒã¿ãšã³ãžãã¢ãªã³ã°ã§ãã人æ°ã®ãã
以äžã®ã°ã©ãã¯ãå¹³åå·®ã 10% ãè¶ ããããŸã㯠-10% æªæºã§ããããŒã¯ãŒãã瀺ããŠããŸãã
ããŒã¿ ãšã³ãžãã¢ãšããŒã¿ ãµã€ãšã³ãã£ã¹ãã®ããŒã¯ãŒãé »åºŠã«ãããæ倧ã®éã
AWS ã¯æãé¡èãªå¢å ã瀺ããŠããŸããããŒã¿ ãšã³ãžãã¢ãªã³ã°ã§ã¯ãããŒã¿ ãµã€ãšã³ã¹ããã 25% å®æçã«åºçŸããŠããŸã (ãããããæ±äººç·æ°ã®çŽ 45% ãš 20%)ã éãã¯é¡èã§ãïŒ
åãããŒã¿ãå°ãç°ãªããã¬ãŒã³ããŒã·ã§ã³ã§ç€ºããŸããã°ã©ãã§ã¯ãããŒã¿ ãšã³ãžãã¢ãšããŒã¿ ãµã€ãšã³ãã£ã¹ãã®æ±äººã«ãããåãããŒã¯ãŒãã®çµæã䞊ã¹ãŠè¡šç€ºãããŠããŸãã
ããŒã¿ ãšã³ãžãã¢ãšããŒã¿ ãµã€ãšã³ãã£ã¹ãã®ããŒã¯ãŒãé »åºŠã«ãããæ倧ã®éã
ç§ã泚ç®ãã次ã«å€§ããªé£èºã¯ Spark ã§ãããããŒã¿ ãšã³ãžãã¢ã¯ããã° ããŒã¿ãæ±ãããšããããããŸãã
ããŒã¿ãšã³ãžãã¢ãªã³ã°ã§ã¯ããŸã人æ°ããããŸãã
次ã«ãããŒã¿ ãšã³ãžãã¢ã®æ±äººã§ã©ã®ãã¯ãããžãŒã人æ°ãäœãããèŠãŠã¿ãŸãããã
ããŒã¿ãµã€ãšã³ã¹åéãšæ¯èŒããŠæãæ¥æ¿ãªæžå°ãèŠãããã®ã¯ã
ããŒã¿ãšã³ãžãã¢ãªã³ã°ãšããŒã¿ãµã€ãšã³ã¹ã®äž¡æ¹ã§éèŠããã
äž¡æ¹ã®ã»ããã®æåã® XNUMX äœã®ãã¡ XNUMX äœãåãã§ããããšã«æ³šæããŠãã ããã SQLãPythonãSparkãAWSãJavaãHadoopãHiveãScala ããããŒã¿ ãšã³ãžãã¢ãªã³ã°æ¥çãšããŒã¿ ãµã€ãšã³ã¹æ¥çã®äž¡æ¹ã§ããã XNUMX ã«å ¥ã£ãŠããŸãã 以äžã®ã°ã©ãã§ã¯ãããŒã¿ ãšã³ãžãã¢ã®éçšäž»ã®éã§æã人æ°ã®ãã XNUMX ã®ãã¯ãããžãŒã瀺ãããŠããããã®é£ã«ã¯ããŒã¿ ãµã€ãšã³ãã£ã¹ãã®æ¬ å¡çã瀺ãããŠããŸãã
æèš
ããŒã¿ ãšã³ãžãã¢ãªã³ã°ã«èå³ãããå Žåã¯ã次ã®ãã¯ãããžãç¿åŸããããšããå§ãããŸããããããã®åªå é äœã®é ã«ãªã¹ãããŠããŸãã
SQLãåŠã³ãŸãããã ç§ã PostgreSQL ã«åŸããŠããã®ã¯ãPostgreSQL ããªãŒãã³ãœãŒã¹ã§ãããã³ãã¥ããã£ã§éåžžã«äººæ°ããããæé·æ®µéã«ããããã§ãã ãã®èšèªã®äœ¿çšæ¹æ³ã¯ãæžç±ãMy Memorable SQLãããåŠã¶ããšãã§ããŸãããã€ãããçãå
¥æå¯èœã§ãã
ããšãæãããŒãã³ã¢ãªã¬ãã«ã§ã¯ãªããŠããPython ããã¹ã¿ãŒããŠãã ããã My Memorable Python ã¯åå¿è
åãã«ç¹å¥ã«èšèšãããŠããŸãã ã§è³Œå
¥ã§ããŸã
Python ã«æ
£ããããããŒã¿ã®ã¯ãªãŒãã³ã°ãšåŠçã«äœ¿çšããã Python ã©ã€ãã©ãªã§ãã pandas ã«é²ã¿ãŸãã Python ã§èšè¿°ã§ããèœåãå¿
èŠãšããäŒæ¥ (ãããŠãã®å€§éšåãããã§ã) ã§åãããšãç®æããŠããå Žåãããã©ã«ãã§ãã³ãã®ç¥èããããã®ãšã¿ãªãããããšã¯ééããããŸããã çŸåšããã³ããæ±ãããã®å
¥éã¬ã€ããå®æãããŠããŸãã
ãã¹ã¿ãŒAWSã ããŒã¿ ãšã³ãžãã¢ã«ãªãããã®ã§ããã°ãã¯ã©ãŠã ãã©ãããã©ãŒã ãæ¬ ãããŸããããã®äžã§ã AWS ãæã人æ°ããããŸãã ã³ãŒã¹ã¯ãšãŠã圹ã«ç«ã¡ãŸãã
ãã§ã«ãã®ãªã¹ãå šäœãå®äºããŠããŠãããŒã¿ ãšã³ãžãã¢ãšããŠéçšäž»ã®ç®ããããã«æé·ããããšèããŠããå Žåã¯ãããã° ããŒã¿ãæ±ãããã« Apache Spark ãè¿œå ããããšããå§ãããŸãã ããŒã¿ ãµã€ãšã³ãã£ã¹ãã®æ¬ å¡ã«é¢ããç§ã®èª¿æ»ã§ã¯é¢å¿ãäœäžããŠããããšã瀺ãããŸããããããŒã¿ ãšã³ãžãã¢ã®éã§ã¯äŸç¶ãšããŠã»ãŒ XNUMX çªç®ã®æ¬ å¡ã«ãã®é¢å¿ãçŸããŠããŸãã
æåŸã«
ããŒã¿ ãšã³ãžãã¢ã«ãšã£ãŠæãéèŠã®é«ããã¯ãããžã«é¢ãããã®æŠèŠãã圹ã«ç«ãŠã°å¹žãã§ãã ã¢ããªã¹ãã®ä»äºãã©ããªã£ãŠããã®ãæ°ã«ãªãå Žåã¯ããã¡ãããèªã¿ãã ããã
åºæïŒ habr.com