ããŒã¿ ãµã€ãšã³ãã£ã¹ããšããŒã¿ ãšã³ãžãã¢ãšããè·æ¥ã¯æ··åãããããšããããããŸãã åäŒæ¥ã«ã¯ãããŒã¿ã®åãæ±ãã«é¢ããç¬èªã®è©³çŽ°ãåæã®ããŸããŸãªç®çãããã³ã©ã®å°é家ãäœæ¥ã®ã©ã®éšåãæ åœãã¹ããã«ã€ããŠã®ããŸããŸãªèãæ¹ããããããããããç¬èªã®èŠä»¶ããããŸãã
ãããã®ã¹ãã·ã£ãªã¹ãã®éãã¯äœãªã®ãã圌ããã©ã®ãããªããžãã¹äžã®åé¡ã解決ããã®ããã©ã®ãããªã¹ãã«ãæã£ãŠããã®ãããããŠã©ã®ãããã®åå ¥ãåŸãŠããã®ããèŠãŠã¿ãŸãããã è³æãèšå€§ã«ãªã£ãã®ã§ãXNUMX ã€ã®åºçç©ã«åå²ããŸããã
æåã®èšäºã§ã¯ãåŠéšé·ã®ãšã¬ãã»ã²ã©ã·ã¢ã¯æ°ãã
ãšã³ãžãã¢ãšç§åŠè ã®åœ¹å²ã¯ã©ãéãã®ã
ããŒã¿ ãšã³ãžãã¢ã¯ãäžæ¹ã§ã¯ããŒã¿ããŒã¹ãã¹ãã¬ãŒãžã倧éåŠçã·ã¹ãã ãªã©ã®ããŒã¿ ã€ã³ãã©ã¹ãã©ã¯ãã£ã®éçºããã¹ããä¿å®ãè¡ãå°é家ã§ãã äžæ¹ãããã¯ãã¢ããªã¹ããããŒã¿ ãµã€ãšã³ãã£ã¹ãã䜿çšã§ããããã«ããŒã¿ãã¯ãªãŒãã³ã°ããŠãåéããããã€ãŸãããŒã¿åŠçãã€ãã©ã€ã³ãäœæãã人ã§ãã
ããŒã¿ ãµã€ãšã³ãã£ã¹ãã¯ãæ©æ¢°åŠç¿ã¢ã«ãŽãªãºã ãšãã¥ãŒã©ã« ãããã¯ãŒã¯ã䜿çšããŠäºæž¬ (ããã³ãã®ä»ã®) ã¢ãã«ãäœæããã³ãã¬ãŒãã³ã°ããäŒæ¥ãé ãããã¿ãŒã³ãçºèŠããå±éãäºæž¬ããäž»èŠãªããžãã¹ ããã»ã¹ãæé©åã§ããããã«æ¯æŽããŸãã
ããŒã¿ ãµã€ãšã³ãã£ã¹ããšããŒã¿ ãšã³ãžãã¢ã®äž»ãªéãã¯ãéåžžãäž¡è ã®ç®æšãç°ãªãããšã§ãã ã©ã¡ãããããŒã¿ã«ã¢ã¯ã»ã¹å¯èœã§é«å質ã§ããããšãä¿èšŒããããã«æ©èœããŸãã ããããããŒã¿ ãµã€ãšã³ãã£ã¹ãã¯ãããŒã¿ ãšã³ã·ã¹ãã (ããšãã°ãHadoop ããŒã¹) ã§è³ªåã«å¯ŸããçããèŠã€ããŠä»®èª¬ããã¹ãããããŒã¿ ãšã³ãžãã¢ã¯ãããŒã¿ ãµã€ãšã³ãã£ã¹ããäœæããæ©æ¢°åŠç¿ã¢ã«ãŽãªãºã ãåããµãŒãã¹å ã® Spark ã¯ã©ã¹ã¿ãŒã«æäŸããããã®ãã€ãã©ã€ã³ãäœæããŸããçæ ç³»ã
ããŒã¿ ãšã³ãžãã¢ã¯ãããŒã ã®äžå¡ãšããŠåãããšã§ããžãã¹ã«äŸ¡å€ããããããŸãã ãã®ä»»åã¯ãéçºè ããã¬ããŒãã®ããžãã¹å©çšè ãŸã§ãããŸããŸãªåå è éã®éèŠãªãªã³ã¯ãšããŠæ©èœããããŒã±ãã£ã³ã°ãã補åãBI ãŸã§ã¢ããªã¹ãã®çç£æ§ãåäžãããããšã§ãã
ããã«å¯ŸããŠãããŒã¿ ãµã€ãšã³ãã£ã¹ãã¯ãäŒæ¥ã®æŠç¥ã«ç©æ¥µçã«åå ããæŽå¯ãæœåºããææ決å®ãè¡ããèªååã¢ã«ãŽãªãºã ãå®è£
ããããŒã¿ãã¢ããªã³ã°ããŠäŸ¡å€ãçæããŸãã
ããŒã¿ã®æäœã«ã¯ãGIGO (ã¬ããŒãž ã€ã³ - ã¬ããŒãž ã¢ãŠã) ååãé©çšãããŸããã€ãŸããã¢ããªã¹ããããŒã¿ ãµã€ãšã³ãã£ã¹ããæºåãæŽã£ãŠããªããäžæ£ç¢ºãªå¯èœæ§ã®ããããŒã¿ãæ±ãå ŽåãæãæŽç·Žãããåæã¢ã«ãŽãªãºã ã䜿çšããå Žåã§ããçµæã¯äžæ£ç¢ºã«ãªããŸãã
ããŒã¿ ãšã³ãžãã¢ã¯ãããŒã¿ã®åŠçãã¯ãªãŒãã³ã°ãå€æã®ããã®ãã€ãã©ã€ã³ãæ§ç¯ããããŒã¿ ãµã€ãšã³ãã£ã¹ããé«å質ã®ããŒã¿ãæ±ããããã«ããããšã§ããã®åé¡ã解決ããŸãã
åžå Žã«ã¯ãããŒã¿ã®è¡šç€ºããåºåãåç· åœ¹äŒã®ããã·ã¥ããŒãã«è³ããŸã§ããããã段éãã«ããŒããããŒã¿ãæäœããããã®ããŒã«ãæ°å€ãååšããŸãã ãããŠããããã䜿çšãã決å®ã¯ãšã³ãžãã¢ã«ãã£ãŠè¡ãããããšãéèŠã§ãããããæµè¡ã ããã§ã¯ãªããããã»ã¹ã®ä»ã®åå è ã®äœæ¥ãå®éã«å©ããããã§ãã
åŸæ¥: äŒæ¥ã BI ãš ETL ã®éã®æ¥ç¶ (ããŒã¿ã®ããŒããšã¬ããŒãã®æŽæ°) ãè¡ãå¿ èŠãããå ŽåãããŒã¿ ãšã³ãžãã¢ã察åŠããå¿ èŠãããå žåçãªã¬ã¬ã·ãŒåºç€ã次ã«ç€ºããŸã (ããŒã ã«ã¢ãŒããã¯ãããããšããã§ããã)ã
ããŒã¿ ãšã³ãžãã¢ã®è²¬ä»»
- ããŒã¿åŠçã€ã³ãã©ã®éçºãæ§ç¯ãä¿å®ã
- ãšã©ãŒãåŠçããä¿¡é Œæ§ã®é«ãããŒã¿åŠçãã€ãã©ã€ã³ãäœæããŸãã
- ããŸããŸãªåçãœãŒã¹ããã®éæ§é åããŒã¿ããã¢ããªã¹ãã®äœæ¥ã«å¿ èŠãªåœ¢åŒã«å€æããŸãã
- ããŒã¿ã®äžè²«æ§ãšå質ãåäžãããããã®æšå¥šäºé ãæäŸããŸãã
- ããŒã¿ ãµã€ãšã³ãã£ã¹ããšããŒã¿ ã¢ããªã¹ãã䜿çšããããŒã¿ ã¢ãŒããã¯ãã£ãæäŸããã³ç¶æããŸãã
- æ°åãŸãã¯æ°çŸã®ãµãŒããŒãããªãåæ£ã¯ã©ã¹ã¿ãŒã§ãããŒã¿ãäžè²«ããŠå¹ççã«åŠçããŠä¿åããŸãã
- ããŒã«ã®æè¡çãªãã¬ãŒããªããè©äŸ¡ããŠãæ··ä¹±ã«èããããã·ã³ãã«ãã€å ç¢ãªã¢ãŒããã¯ãã£ãäœæããŸãã
- ããŒã¿ ãããŒããã³é¢é£ã·ã¹ãã ã®å¶åŸ¡ãšãµããŒã (ç£èŠãšã¢ã©ãŒãã®èšå®)ã
ããŒã¿ ãšã³ãžãã¢ã®è»è·¡ã«ã¯ãML ãšã³ãžãã¢ãšããå¥ã®å°éåéããããŸãã ã€ãŸãããããã®ãšã³ãžãã¢ã¯ãæ©æ¢°åŠç¿ã¢ãã«ãç£æ¥ã«å°å ¥ããŠäœ¿çšããããšãå°éãšããŠããŸãã å€ãã®å ŽåãããŒã¿ ãµã€ãšã³ãã£ã¹ãããåãåã£ãã¢ãã«ã¯ç 究ã®äžéšã§ãããæŠéç¶æ ã§ã¯æ©èœããªãå¯èœæ§ããããŸãã
ããŒã¿ãµã€ãšã³ãã£ã¹ãã®è²¬ä»»
- ããŒã¿ããç¹åŸŽãæœåºããŠæ©æ¢°åŠç¿ã¢ã«ãŽãªãºã ãé©çšããŸãã
- ããŸããŸãªæ©æ¢°åŠç¿ããŒã«ã䜿çšããŠãããŒã¿å ã®ãã¿ãŒã³ãäºæž¬ããã³åé¡ããŸãã
- ã¢ã«ãŽãªãºã ã埮調æŽããã³æé©åããããšã§ãæ©æ¢°åŠç¿ã¢ã«ãŽãªãºã ã®ããã©ãŒãã³ã¹ãšç²ŸåºŠãåäžãããŸãã
- äŒæ¥ã®æŠç¥ã«åŸã£ãŠããã¹ããå¿ èŠãªã匷åãªã仮説ã圢æããŸãã
ããŒã¿ ãšã³ãžãã¢ãšããŒã¿ ãµã€ãšã³ãã£ã¹ãã¯ã©ã¡ããããŒã¿æåã®çºå±ã«æ確ã«è²¢ç®ããŠããããããéããŠäŒæ¥ã¯è¿œå ã®å©çãçã¿åºããããã³ã¹ããåæžãããã§ããŸãã
ãšã³ãžãã¢ãç§åŠè ã¯ã©ã®ãããªèšèªãããŒã«ã䜿çšããŠããŸãã?
çŸåšãããŒã¿ãµã€ãšã³ãã£ã¹ãã«å¯ŸããæåŸ ã¯å€åããŠããŸãã 以åã¯ããšã³ãžãã¢ã¯å€§èŠæš¡ãª SQL ã¯ãšãªãåéããæå㧠MapReduce ãäœæããInformatica ETLãPentaho ETLãTalend ãªã©ã®ããŒã«ã䜿çšããŠããŒã¿ãåŠçããŠããŸããã
2020 幎ã«ãããŠãå°é家㯠Python ãšææ°ã®èšç®ããŒã« (Airflow ãªã©) ã®ç¥èãã¯ã©ãŠã ãã©ãããã©ãŒã ã®æäœåå (ã»ãã¥ãªãã£ååãéµå®ããªãããã¯ã©ãŠã ãã©ãããã©ãŒã ã䜿çšããŠããŒããŠã§ã¢ãç¯çŽãã) ãç解ããŠããªããã°ãªããŸããã
SAPãOracleãMySQLãRedis ã¯ã倧äŒæ¥ã®ããŒã¿ ãšã³ãžãã¢åãã®åŸæ¥ã®ããŒã«ã§ãã ãããã¯åªããŠããŸãããã©ã€ã»ã³ã¹ã®ã³ã¹ããéåžžã«é«ãããããããã®æäœæ¹æ³ãåŠã¶ã®ã¯ç£æ¥ãããžã§ã¯ãã§ã®ã¿æå³ããããŸãã åæã«ãPostgres ãšããç¡æã®ä»£æ¿æ段ããããŸããããã¯ç¡æã§ããããã¬ãŒãã³ã°ã ãã§ãªãé©ããŠããŸãã
æŽå²çã«ã¯ãJava ã Scala ã«å¯Ÿãããªã¯ãšã¹ããããèŠãããŸãããããã¯ãããžãŒãã¢ãããŒããçºå±ããã«ã€ããŠããããã®èšèªã¯èæ¯ã«æ¶ããŠãããŸããã
ãã ããããŒãã³ã¢ãª BigData: HadoopãSparkãããã³ãã®ä»ã®åç©åã¯ããã¯ãããŒã¿ ãšã³ãžãã¢ã®åææ¡ä»¶ã§ã¯ãªããåŸæ¥ã® ETL ã§ã¯è§£æ±ºã§ããªãåé¡ã解決ããããã®äžçš®ã®ããŒã«ã§ãã
ãã®åŸåã¯ãäœæãããèšèªã®ç¥èããªããŠãããŒã«ã䜿çšã§ãããµãŒãã¹ (Java ã®ç¥èããªããŠã Hadoop ãªã©) ããã¹ããªãŒãã³ã° ããŒã¿ãåŠçããããã®æ¢è£œã®ãµãŒãã¹ (ãããªäžã®é³å£°èªèãç»åèªè) ã®æäŸã§ãã ïŒã
SAS ãš SPSS ã®ç£æ¥çšãœãªã¥ãŒã·ã§ã³ã人æ°ã§ãããTableauãRapidminerãStataãJulia ãããŒã¿ ãµã€ãšã³ãã£ã¹ãã«ãã£ãŠããŒã«ã« ã¿ã¹ã¯ã«åºã䜿çšãããŠããŸãã
ãã€ãã©ã€ã³èªäœãæ§ç¯ã§ããæ©èœãã¢ããªã¹ããããŒã¿ ãµã€ãšã³ãã£ã¹ãã«ç»å Žããã®ã¯ãã»ãã®æ°å¹Žåã§ããããšãã°ãæ¯èŒçåçŽãªã¹ã¯ãªããã䜿çšã㊠PostgreSQL ããŒã¹ã®ã¹ãã¬ãŒãžã«ããŒã¿ãéä¿¡ããããšã¯ãã§ã«å¯èœã§ãã
éåžžããã€ãã©ã€ã³ãšçµ±åããŒã¿æ§é ã®äœ¿çšã¯ãäŸç¶ãšããŠããŒã¿ ãšã³ãžãã¢ã®è²¬ä»»ã§ãã ãããä»æ¥ã§ã¯ãããŒã«ã絶ããç°¡çŽ åãããŠãããããé¢é£åéã§å¹ åºãèœåãæ〠T ååã®ã¹ãã·ã£ãªã¹ããæ±ããåŸåããããŸã§ä»¥äžã«åŒ·ããªã£ãŠããŸãã
ããŒã¿ ãšã³ãžãã¢ãšããŒã¿ ãµã€ãšã³ãã£ã¹ããé£æºããçç±
ãšã³ãžãã¢ãšç·å¯ã«é£æºããããšã§ãããŒã¿ ãµã€ãšã³ãã£ã¹ãã¯ç 究é¢ã«éäžããŠãæ¬çªç°å¢ã«å¯Ÿå¿ããæ©æ¢°åŠç¿ã¢ã«ãŽãªãºã ãäœæã§ããŸãã
ãŸãããšã³ãžãã¢ã¯ãã¹ã±ãŒã©ããªãã£ãããŒã¿ã®åå©çšãããã³åã
ã®ãããžã§ã¯ãã®ããŒã¿å
¥åºåãã€ãã©ã€ã³ãã°ããŒãã« ã¢ãŒããã¯ãã£ã«æºæ ããŠããããšã確èªããããšã«éç¹ã眮ãå¿
èŠããããŸãã
ãã®è²¬ä»»ã®åé¢ã«ãããããŸããŸãªæ©æ¢°åŠç¿ãããžã§ã¯ãã«åãçµãããŒã éã§ã®äžè²«æ§ã確ä¿ãããŸãã
ã³ã©ãã¬ãŒã·ã§ã³ã¯ãæ°è£œåãå¹ççã«äœæããã®ã«åœ¹ç«ã¡ãŸãã é床ãšå質ã¯ãå šå¡åãã®ãµãŒãã¹ã®äœæ (ã°ããŒãã« ã¹ãã¬ãŒãžãŸãã¯ããã·ã¥ããŒãã®çµ±å) ãšãããããã®ç¹å®ã®ããŒãºãŸãã¯ãããžã§ã¯ãã®å®è£ (é«åºŠã«å°éåããããã€ãã©ã€ã³ãå€éšãœãŒã¹ã®æ¥ç¶) ã®éã®ãã©ã³ã¹ã«ãã£ãŠéæãããŸãã
ããŒã¿ ãµã€ãšã³ãã£ã¹ããã¢ããªã¹ããšç·å¯ã«é£æºããããšã§ããšã³ãžãã¢ã¯ããè¯ãã³ãŒããäœæããããã®åæããã³èª¿æ»ã¹ãã«ã身ã«ä»ããããšãã§ããŸãã ãŠã§ã¢ããŠã¹ãšããŒã¿ã¬ã€ã¯ã®ãŠãŒã¶ãŒéã®ç¥èå ±æãåäžãããããžã§ã¯ãã®æ©ææ§ãé«ãŸããããæç¶å¯èœãªé·æçãªææãåŸãããŸãã
ããŒã¿ãæ±ããããŒã¿ã«åºã¥ããŠããžãã¹ ããã»ã¹ãæ§ç¯ããæåã®çºå±ãç®æãäŒæ¥ã§ã¯ãããŒã¿ ãµã€ãšã³ãã£ã¹ããšããŒã¿ ãšã³ãžãã¢ãçžäºã«è£å®ããå®å šãªããŒã¿åæã·ã¹ãã ãæ§ç¯ããŸãã
次åã®èšäºã§ã¯ãããŒã¿ãšã³ãžãã¢ãšããŒã¿ãµã€ãšã³ãã£ã¹ããã©ã®ãããªæè²ãåããã¹ãããã©ã®ãããªã¹ãã«ã身ã«ä»ããå¿ èŠããããããããŠåžå Žã¯ã©ã®ããã«æ©èœãããã«ã€ããŠèª¬æããŸãã
Netology ã®ç·šéè ãã
ããŒã¿ ãšã³ãžãã¢ãŸãã¯ããŒã¿ ãµã€ãšã³ãã£ã¹ãã®è·æ¥ãæ€èšããŠããå Žåã¯ã次ã®ã³ãŒã¹ ããã°ã©ã ãåŠç¿ããããšããå§ãããŸãã
- è·æ¥ "
ããŒã¿ãšã³ãžã㢠'ã - è·æ¥ "
ããŒã¿ãµã€ãšã³ãã£ã¹ã 'ã
åºæïŒ habr.com