ããã«ãããã«ã¡ã¯ïŒ æšæ¥ã®
Spark ã®äœ¿çšæ¹æ³ãå°ã玹ä»ããŸãã XNUMXã¶æéã®ããã°ã©ã ããããŸã
ç§ãã¡ã®äœ¿çšã®ç¹åŸŽã¯ãSpark ã§åæã«äœæ¥ãã人ã®æ°ãã°ã«ãŒãå šäœãšåãã«ãªãå¯èœæ§ãããããšã§ãã ããšãã°ãã»ãããŒã§ãå šå¡ãåæã«äœãã«ææŠããå çã®åŸã«åãããšãç¹°ãè¿ããšãã ãããŠããã¯å€ãã¯ãããŸãã - æã«ã¯æ倧40人ã§ãã ããããããã®ãããªãŠãŒã¹ã±ãŒã¹ã«çŽé¢ããŠããäŒæ¥ã¯äžçäžã«å€ããããŸããã
次ã«ãç¹å®ã®æ§æãã©ã¡ãŒã¿ãéžæããæ¹æ³ãšçç±ã«ã€ããŠèª¬æããŸãã
æåããå§ããŸãããã Spark ã«ã¯ãã¯ã©ã¹ã¿ãŒäžã§å®è¡ãã 3 ã€ã®ãªãã·ã§ã³ããããŸããã¹ã¿ã³ãã¢ãã³ãMesos ã®äœ¿çšãYARN ã®äœ¿çšã§ãã ç§ãã¡ã«ãšã£ãŠã¯çã«ããªã£ãŠããã®ã§ãXNUMX çªç®ã®ãªãã·ã§ã³ãéžæããããšã«ããŸããã ãã§ã« Hadoop ã¯ã©ã¹ã¿ãŒããããŸãã åå è ã¯ãã§ã«ãã®ã¢ãŒããã¯ãã£ã«ã€ããŠããç¥ã£ãŠããŸãã YARNã䜿ã£ãŠã¿ãŸãããã
spark.master=yarn
ããã«é¢çœãã ããã 3 ã€ã®å±éãªãã·ã§ã³ã«ã¯ãããããã¯ã©ã€ã¢ã³ããšã¯ã©ã¹ã¿ãŒãšãã 2 ã€ã®å±éãªãã·ã§ã³ããããŸãã ããŒã¹
spark.deploy-mode=client
äžè¬ã«ãä»åŸ Spark ã¯äœããã®åœ¢ã§ YARN ã§åäœããããã«ãªããŸãããããã§ã¯ååã§ã¯ãããŸããã§ããã ç§ãã¡ã¯ããã°ããŒã¿ã«é¢ããããã°ã©ã ãè¡ã£ãŠããããããªãœãŒã¹ãåçã«ã¹ã©ã€ã¹ããæ çµã¿ã§ã¯ãåå è ãåŸãããæ å ±ãååã«åŸãããªãããšããããŸããã ãããŠãåçãªãªãœãŒã¹å²ãåœãŠãšããèå³æ·±ããã®ãçºèŠããŸããã ã€ãŸããéèŠãªç¹ã¯æ¬¡ã®ãšããã§ããé£ããã¿ã¹ã¯ããããã¯ã©ã¹ã¿ãŒã空ããŠããå Žå (ããšãã°ãååäž)ããã®ãªãã·ã§ã³ã䜿çšãããšãSpark ã«ããè¿œå ã®ãªãœãŒã¹ãæäŸãããå¯èœæ§ããããŸãã ããã§ã¯å¿ èŠæ§ãç¡çŸãªå ¬åŒã«åŸã£ãŠèšç®ãããŸãã 詳现ã¯èª¬æããŸããããããŸãæ©èœããŸãã
spark.dynamicAllocation.enabled=true
ãã®ãã©ã¡ãŒã¿ãèšå®ããŸããããèµ·åæã« Spark ãã¯ã©ãã·ã¥ããŠèµ·åããŸããã§ããã ããã§ããèªãŸãªããã°ãªããªãã£ãã®ã§
spark.shuffle.service.enabled=true
ãªãå¿ èŠãªã®ã§ãããã? ç§ãã¡ã®ãžã§ãã§ããã»ã©å€ãã®ãªãœãŒã¹ãå¿ èŠãªããªã£ãããSpark ã¯ãªãœãŒã¹ãå ±éããŒã«ã«æ»ãå¿ èŠããããŸãã ã»ãŒãã¹ãŠã® MapReduce ã¿ã¹ã¯ã§æãæéã®ãããã¹ããŒãžã¯ãShuffle ã¹ããŒãžã§ãã ãã®ãã©ã¡ãŒã¿ã䜿çšãããšããã®æ®µéã§çæãããããŒã¿ãä¿åããããã«å¿ããŠãšã°ãŒãã¥ãŒã¿ã解æŸã§ããŸãã ãããŠããšã°ãŒãã¥ãŒã¿ã¯ã¯ãŒã«ãŒäžã§ãã¹ãŠãèšç®ããããã»ã¹ã§ãã äžå®æ°ã®ããã»ããµ ã³ã¢ãšäžå®éã®ã¡ã¢ãªãæèŒãããŠããŸãã
ãã®ãã©ã¡ãŒã¿ãè¿œå ãããŸããã ãã¹ãŠãããŸãããããã«èŠããŸããã åå è ãå®éã«å¿ èŠãªãšãã«ãããå€ãã®ãªãœãŒã¹ãæäŸãããããšãé¡èã«ãªããŸããã ããããå¥ã®åé¡ãçºçããŸãããããæç¹ã§ä»ã®åå è ãç®èŠããŠãSpark ã䜿çšããããšèããŸããããããã§ã¯ãã¹ãŠãããžãŒã§ããã圌ãã¯äžæºãæããŠããŸããã ãããã¯ç解ã§ããŸãã ç§ãã¡ã¯ããã¥ã¡ã³ããèŠå§ããŸããã ããã»ã¹ã«åœ±é¿ãäžããããã«äœ¿çšã§ãããã©ã¡ãŒã¿ãŒãä»ã«ãå€æ°ããããšãå€æããŸããã ããšãã°ããšã°ãŒãã¥ãŒã¿ãã¹ã¿ã³ã〠ã¢ãŒãã®å Žåãã©ããããã®æéãçµéãããšãšã°ãŒãã¥ãŒã¿ãããªãœãŒã¹ãååŸã§ããŸãã?
spark.dynamicAllocation.executorIdleTimeout=120s
ç§ãã¡ã®å Žåãå·è¡è ã XNUMX åéäœãããªãã£ãå Žåã¯ãå·è¡è ãå ±éããŒã«ã«æ»ããŠãã ããã ãããããã®ãã©ã¡ãŒã¿ã ãã§ã¯å¿ ãããååã§ã¯ãããŸããã§ããã ãã®äººãé·ãéäœãããŠããªãã£ãããšãæããã§ããªãœãŒã¹ã解æŸãããŠããŸããã§ããã ãã£ãã·ã¥ãããããŒã¿ãå«ããšã°ãŒãã¥ãŒã¿ãéžæããæéãæå®ãããšããç¹å¥ãªãã©ã¡ãŒã¿ãããããšãå€æããŸããã ããã©ã«ãã§ã¯ããã®ãã©ã¡ãŒã¿ã¯ç¡é倧ã§ããã ä¿®æ£ãããŠããã ããŸããã
spark.dynamicAllocation.cachedExecutorIdleTimeout=600s
ã€ãŸãããšã°ãŒãã¥ãŒã¿ã 5 åéäœãããªãã£ãå Žåãããããå ±éããŒã«ã«æž¡ããŸãã ãã®ã¢ãŒãã§ã¯ãå€æ°ã®ãŠãŒã¶ãŒã«å¯ŸãããªãœãŒã¹ã®è§£æŸãšçºè¡ã®é床ãããªãéããªããŸããã äžæºã®éã¯æžããŸããã ããããç§ãã¡ã¯ããã«é²ãã§ãã¢ããªã±ãŒã·ã§ã³ããšãã€ãŸãããã°ã©ã åå è ããšã«å®è¡è ã®æ倧æ°ãå¶éããããšã«ããŸããã
spark.dynamicAllocation.maxExecutors=19
ãã¡ãããå察åŽã«ã¯ãã¯ã©ã¹ã¿ãŒã¯ã¢ã€ãã«ç¶æ ã§ãå®è¡è 㯠19 人ããããªãããšããäžæºãæã€äººã ãããŸãããäœãã§ããã§ãããã? ããçš®ã®æ£ãããã©ã³ã¹ãå¿ èŠã§ãã å šå¡ã幞ãã«ããããšã¯ã§ããŸããã
ãããŠãç§ãã¡ã®äºä»¶ã®è©³çŽ°ã«é¢é£ãããã 16 ã€ã®å°ããªè©±ã ã©ãããããããæ°äººãå®è·µçãªã¬ãã¹ã³ã«é å»ããäœããã®çç±ã§ Spark ãéå§ãããŸããã§ããã ç¡æãªãœãŒã¹ã®éã確èªããŸãã - ååšããããã§ãã ã¹ããŒã¯ãéå§ãããã¯ãã§ãã 幞ããªããšã«ããã®æãŸã§ã«ããã¥ã¡ã³ãã¯ãµãã³ãŒããã¯ã¹ã®ã©ããã«ãã§ã«è¿œå ãããŠãããSpark ãèµ·åæã«èµ·åããããŒããæ¢ãããšãæãåºããŸããã ç¯å²å ã®æåã®ããŒããããžãŒã®å Žåãé çªã«æ¬¡ã®ããŒãã«ç§»åããŸãã ç¡æã§ããã°ãã£ããã£ããŸãã ãããŠãããã«å¯Ÿããæ倧詊è¡åæ°ã瀺ããã©ã¡ãŒã¿ããããŸãã ããã©ã«ã㯠16 ã§ãããã®æ°ã¯ãã¯ã©ã¹ã®ã°ã«ãŒãã®äººæ°ãããå°ãªãã§ãã ãããã£ãŠãXNUMX åè©Šã¿ãåŸãSpark ã¯è«ŠããŠãéå§ã§ããªããšèšããŸããã ãã®èšå®ãä¿®æ£ããŸããã
spark.port.maxRetries=50
次ã«ãä»åã®ã±ãŒã¹ã®è©³çŽ°ãšã¯ããŸãé¢ä¿ã®ãªãããã€ãã®èšå®ã«ã€ããŠèª¬æããŸãã
Spark ãããéãèµ·åããã«ã¯ãSPARK_HOME ããŒã ãã£ã¬ã¯ããªã«ãã jars ãã©ã«ããŒãã¢ãŒã«ã€ãããHDFS ã«é 眮ããããšããå§ãããŸãã ããããã°ãåŽåè ããããã®ãžã£ãŒãã¯ãç©ã¿èŸŒãæéãç¡é§ã«ããããšã¯ãªããªããŸãã
spark.yarn.archive=hdfs:///tmp/spark-archive.zip
åäœãé«éåããããã«ãkryo ãã·ãªã¢ã©ã€ã¶ãŒãšããŠäœ¿çšããããšãæšå¥šãããŸãã ããã©ã«ããããããã«æé©åãããŠããŸãã
spark.serializer=org.apache.spark.serializer.KryoSerializer
ãŸããSpark ã«ã¯ãé »ç¹ã«ã¡ã¢ãªããã¯ã©ãã·ã¥ãããšããé·å¹Žã®åé¡ããããŸãã å€ãã®å Žåãããã¯äœæ¥å¡ããã¹ãŠãèšç®ããçµæããã©ã€ããŒã«éä¿¡ããç¬éã«çºçããŸãã ç§ãã¡ã¯ãã®ãã©ã¡ãŒã¿ãèªåãã¡ã§å€§ããããŸããã ããã©ã«ãã§ã¯ 1GB ã§ããã3 ã«ããŸããã
spark.driver.maxResultSize=3072
ãããŠæåŸã¯ãã¶ãŒããšããŠã HortonWorks ãã£ã¹ããªãã¥ãŒã·ã§ã³ - HDP 2.1 㧠Spark ãããŒãžã§ã³ 2.5.3.0 ã«æŽæ°ããæ¹æ³ã ãã®ããŒãžã§ã³ã® HDP ã«ã¯ããŒãžã§ã³ 2.0 ãããªã€ã³ã¹ããŒã«ãããŠããŸãããSpark ã¯éåžžã«æŽ»çºã«éçºãããŠãããåæ°ããããŒãžã§ã³ã§ã¯ããã€ãã®ãã°ãä¿®æ£ãããããã« Python API ãªã©ã®è¿œå æ©èœãæäŸããããšå€æãããããäœãããå¿ èŠããããã決å®ããŸãããè¡ãããã®ã¯ã¢ããããŒãã§ãã
Hadoop 2.7 ã®ããŒãžã§ã³ãå ¬åŒ Web ãµã€ãããããŠã³ããŒãããŸããã 解åããŠHDPãã©ã«ãã«å ¥ããŸãã å¿ èŠã«å¿ããŠã·ã³ããªãã¯ãªã³ã¯ãã€ã³ã¹ããŒã«ããŸããã èµ·åããŸãããèµ·åããŸããã éåžžã«äžæçãªãšã©ãŒãæžã蟌ã¿ãŸãã
java.lang.NoClassDefFoundError: com/sun/jersey/api/client/config/ClientConfig
ã°ãŒã°ã«ã§èª¿ã¹ãçµæãSpark 㯠Hadoop ã®èªçãåŸ
ããã«ãæ°ããããŒãžã§ã³ã®ãžã£ãŒãžã䜿çšããããšã«æ±ºããããšãããããŸããã 圌ãèªèº«ããJIRA ã®ãã®ãããã¯ã«ã€ããŠäºãã«è°è«ããŠããŸãã 解決çã¯ããŠã³ããŒãããããšã§ãã
ãã®ãšã©ãŒã¯åé¿ãããŸããããæ°ãããããªãåçåããããšã©ãŒãçºçããŸããã
org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master
åæã«ãããŒãžã§ã³ 2.0 ãå®è¡ããŠã¿ãŸããããã¹ãŠåé¡ãããŸããã äœãèµ·ãã£ãŠããã®ãæšæž¬ããŠã¿ãŠãã ããã ãã®ã¢ããªã±ãŒã·ã§ã³ã®ãã°ã調ã¹ããšããã次ã®ãããªããšãããããŸããã
/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar
äžè¬ã«ãäœããã®çç±ã§ hdp.version ã¯è§£æ±ºãããŸããã§ããã ã°ãŒã°ã«ã§èª¿ã¹ããšããã解決çãèŠã€ãããŸããã Ambari ã® YARN èšå®ã«ç§»åããããã«ãã©ã¡ãŒã¿ãã«ã¹ã¿ã Yarn-site ã«è¿œå ããå¿ èŠããããŸãã
hdp.version=2.5.3.0-37
ãã®éæ³ã圹ã«ç«ã¡ãã¹ããŒã¯ã¯é£ã³ç«ã¡ãŸããã ç§ãã¡ã¯ããã€ãã® Jupyter ã©ãããããããã¹ãããŸããã ãã¹ãŠãæ©èœããŠããŸãã åææ¥ïŒææ¥ïŒã®æåã® Spark ã¬ãã¹ã³ã®æºåãã§ããŠããŸãã
UPDã ææ¥äžã«ãŸãå¥ã®åé¡ãçºèŠããã ããæç¹ã§ãYARN 㯠Spark çšã®ã³ã³ãããŒã®æäŸãåæ¢ããŸããã YARN ã§ã¯ããã©ã¡ãŒã¿ãä¿®æ£ããå¿ èŠããããŸãããããã©ã«ãã§ã¯ 0.2 ã§ããã
yarn.scheduler.capacity.maximum-am-resource-percent=0.8
ã€ãŸãããªãœãŒã¹ã® 20% ã®ã¿ããªãœãŒã¹ã®é
åžã«åå ããŸããã ãã©ã¡ãŒã¿ãå€æŽããåŸãYARN ããªããŒãããŸããã åé¡ã¯è§£æ±ºãããæ®ãã®åå è
ã Spark ã³ã³ããã¹ããå®è¡ã§ããããã«ãªããŸããã
åºæïŒ habr.com