ããã«ã¡ã¯ããã«ïŒ ããã°ããŒã¿ãšæ©æ¢°åŠç¿ã®ããŒã¿ã»ããã¯æ¥æ¿ã«å¢å ããŠãããç§ãã¡ã¯ãããã«è¿œãã€ãå¿
èŠããããŸãã ã〠ããã©ãŒãã³ã¹ ã³ã³ãã¥ãŒãã£ã³ã° (HPCãã〠ããã©ãŒãã³ã¹ ã³ã³ãã¥ãŒãã£ã³ã°) åéã®ãã XNUMX ã€ã®é©æ°çãªãã¯ãããžãŒã«é¢ããæçš¿ããã³ã°ã¹ãã³ã®ããŒã¹ã§å±ç€ºãããŸããã
GPU ããã©ãŒãã³ã¹ãããŒã¿èªã¿èŸŒã¿ãäžåã
æ±çšã¢ããªã±ãŒã·ã§ã³ãéçºããããã® GPU ããŒã¹ã®ããŒããŠã§ã¢ãšãœãããŠã§ã¢ã®äžŠåã³ã³ãã¥ãŒãã£ã³ã° ã¢ãŒããã¯ãã£ã§ãã CUDA ã 2007 幎ã«äœæãããŠä»¥æ¥ãGPU èªäœã®ããŒããŠã§ã¢æ©èœã¯ä¿¡ããããªãã»ã©æé·ããŸããã çŸåšãããã° ããŒã¿ãæ©æ¢°åŠç¿ (ML)ããã£ãŒã ã©ãŒãã³ã° (DL) ãªã©ã® HPC ã¢ããªã±ãŒã·ã§ã³ã§ GPU ã䜿çšãããããšãå¢ããŠããŸãã
çšèªã¯äŒŒãŠããŸãããæåŸã® XNUMX ã€ã¯ã¢ã«ãŽãªãºã çã«ç°ãªãã¿ã¹ã¯ã§ããããšã«æ³šæããŠãã ããã ML ã¯æ§é åããŒã¿ã«åºã¥ããŠã³ã³ãã¥ãŒã¿ãŒããã¬ãŒãã³ã°ããã®ã«å¯ŸããDL ã¯ãã¥ãŒã©ã« ãããã¯ãŒã¯ããã®ãã£ãŒãããã¯ã«åºã¥ããŠã³ã³ãã¥ãŒã¿ãŒããã¬ãŒãã³ã°ããŸãã éããç解ããã®ã«åœ¹ç«ã€äŸã¯éåžžã«ç°¡åã§ãã ã³ã³ãã¥ãŒã¿ãã¹ãã¬ãŒãž ã·ã¹ãã ããããŒããããç«ãšç¬ã®åçãåºå¥ããå¿ èŠããããšä»®å®ããŸãã ML ã®å Žåã¯ãå€ãã®ã¿ã°ãä»ããäžé£ã®ç»åãéä¿¡ããå¿ èŠããããŸããåã¿ã°ã¯åç©ã® XNUMX ã€ã®ç¹å®ã®ç¹åŸŽãå®çŸ©ããŸãã DLã®å Žåã¯ããã£ãšå€§éã®ç»åãã¢ããããŒãããã ãã§ååã§ããããããã¯ç«ã§ãããããã¯ç¬ã§ãããšããã¿ã°ãXNUMXã€ã ãã§ãã DL ã¯ã幌å ãæè²ãããæ¹æ³ãšéåžžã«ãã䌌ãŠããŸããåäŸãã¡ã¯ãåã«æ¬ãç掻ã®äžã§ç¬ãç«ã®çµµãèŠããããã ã㧠(ã»ãšãã©ã®å Žåã詳现ãªéãã¯èª¬æãããŸãã)ããã®åŸãåäŸã®è³èªäœãåç©ã®çš®é¡ãå€æãå§ããŸããæ¯èŒã®ããã®ç¹å®ã®éèŠãªæ°ã®åçïŒæšå®ã«ãããšã幌å æãéããŠãã£ãXNUMXåãXNUMXåã®ã·ã§ãŒã«ã€ããŠè©±ããŠããã ãã§ãïŒã DL ã¢ã«ãŽãªãºã ã¯ãŸã ããã»ã©å®ç§ã§ã¯ãããŸããããã¥ãŒã©ã« ãããã¯ãŒã¯ãç»åã®èå¥ã«ãæ£åžžã«æ©èœããã«ã¯ãäœçŸäžãã®ç»åã GPU ã«äŸçµŠããŠåŠçããå¿ èŠããããŸãã
åºæã®èŠçŽ: GPU ã«åºã¥ããŠãããã° ããŒã¿ãMLãããã³ DL ã®åé㧠HPC ã¢ããªã±ãŒã·ã§ã³ãæ§ç¯ã§ããŸãããåé¡ããããŸããããŒã¿ ã»ãããéåžžã«å€§ãããããã¹ãã¬ãŒãž ã·ã¹ãã ãã GPU ãžã®ããŒã¿ã®ããŒãã«æéãããããšããããšã§ããã¢ããªã±ãŒã·ã§ã³ã®å šäœçãªããã©ãŒãã³ã¹ãäœäžãå§ããŸãã èšãæããã°ãä»ã®ãµãã·ã¹ãã ããã®äœé I/O ããŒã¿ã«ãããé«é GPU ãååã«æŽ»çšãããªããŸãŸã«ãªããŸãã GPU ãš CPU/ã¹ãã¬ãŒãž ã·ã¹ãã ãžã®ãã¹ã® I/O é床ã®éãã¯ãæ¡éãã«ãªãå¯èœæ§ããããŸãã
GPUDirect ã¹ãã¬ãŒãž ãã¯ãããžãŒã¯ã©ã®ããã«æ©èœããŸãã?
I/O ããã»ã¹ã¯ããããªãåŠçã®ããã«ã¹ãã¬ãŒãžãã GPU ã«ããŒã¿ãããŒãããããã»ã¹ãšåæ§ã«ãCPU ã«ãã£ãŠå¶åŸ¡ãããŸãã ãã®ãããGPU ãš NVMe ãã©ã€ãéã§çŽæ¥ã¢ã¯ã»ã¹ããŠçžäºã«è¿ éã«éä¿¡ã§ãããã¯ãããžãŒãæ±ããããŸããã NVIDIA ã¯ãã®ãããªãã¯ãããžãŒãæåã«æäŸããããã GPUDirect Storage ãšåŒã³ãŸããã å®éãããã¯åœŒãã以åã«éçºãã GPUDirect RDMA (ãªã¢ãŒã ãã€ã¬ã¯ã ã¡ã¢ãª ã¢ãã¬ã¹) ãã¯ãããžã®ããªãšãŒã·ã§ã³ã§ãã
NVIDIA ã® CEO ã§ãã Jensen Huang ã¯ãSC-19 㧠GPUDirect RDMA ã®ããªãšãŒã·ã§ã³ãšã㊠GPUDirect Storage ã玹ä»ããŸãã åºå
ž: NVIDIA
GPUDirect RDMA ãš GPUDirect Storage ã®éãã¯ãã¢ãã¬ãã·ã³ã°ãå®è¡ãããããã€ã¹ã«ãããŸãã GPUDirect RDMA ãã¯ãããžã¯ãããã³ããšã³ã ãããã¯ãŒã¯ ã€ã³ã¿ãŒãã§ã€ã¹ ã«ãŒã (NIC) ãš GPU ã¡ã¢ãªéã§ããŒã¿ãçŽæ¥ç§»åããããã«åå©çšãããGPUDirect ã¹ãã¬ãŒãžã¯ãNVMe ãŸã㯠NVMe over Fabric (NVMe-oF) ãªã©ã®ããŒã«ã«ãŸãã¯ãªã¢ãŒã ã¹ãã¬ãŒãžãšã¡ã¢ãªéã®çŽæ¥ããŒã¿ ãã¹ãæäŸããŸãã GPUã¡ã¢ãªã
GPUDirect RDMA ãš GPUDirect Storage ã¯ã©ã¡ãããCPU ã¡ã¢ãªå ã®ãããã¡ãä»ããäžèŠãªããŒã¿ã®ç§»åãåé¿ãããã€ã¬ã¯ã ã¡ã¢ãª ã¢ã¯ã»ã¹ (DMA) ã¡ã«ããºã ã«ãããäžå€®ã® CPU ã«è² è·ããããããšãªãããããã¯ãŒã¯ ã«ãŒããŸãã¯ã¹ãã¬ãŒãžãã GPU ã¡ã¢ãªãžããŸã㯠GPU ã¡ã¢ãªããçŽæ¥ããŒã¿ã移åã§ããããã«ããŸãã GPUDirect ã¹ãã¬ãŒãžã®å Žåãã¹ãã¬ãŒãžã®å Žæã¯éèŠã§ã¯ãããŸãããGPU ãŠãããå ã® NVME ãã£ã¹ã¯ãã©ãã¯å ã®ããŸã㯠NVMe-oF ãšããŠãããã¯ãŒã¯çµç±ã§æ¥ç¶ããã NVME ãã£ã¹ã¯ã«ããããšãã§ããŸãã
GPUDirect Storage ã®åäœã¹ããŒã ã åºå
ž: NVIDIA
NVMe äžã®ãã€ãšã³ã ã¹ãã¬ãŒãž ã·ã¹ãã ã¯ãHPC ã¢ããªã±ãŒã·ã§ã³åžå Žã§éèŠããããŸã
GPUDirect Storage ã®åºçŸã«ãããGPU ã®ã¹ã«ãŒãããã«å¯Ÿå¿ãã I/O é床ãåããã¹ãã¬ãŒãž ã·ã¹ãã ã®æäŸã«å€§èŠæš¡é¡§å®¢ã®é¢å¿ãéãŸãããšãèªèãããã³ã°ã¹ãã³ã¯ SC-19 å±ç€ºäŒã§ã NVMe ãã£ã¹ã¯ãš GPU ãåãããŠããããããŒã¹ãšããã¹ãã¬ãŒãž ã·ã¹ãã ã§ã10 ç§ãããæ°åæã®è¡æç»åãåæããŸããã 1000 å°ã® DC2M U.XNUMX NVMe ãã©ã€ãã«åºã¥ããã®ãããªã¹ãã¬ãŒãž ã·ã¹ãã ã«ã€ããŠã¯ãã§ã«æžããŸããã
10 å°ã® DC1000M U.2 NVMe ãã©ã€ããããŒã¹ã«ããã¹ãã¬ãŒãž ã·ã¹ãã ã¯ãã°ã©ãã£ã㯠ã¢ã¯ã»ã©ã¬ãŒã¿ãåãããµãŒããŒãé©åã«è£å®ããŸãã åºå
ž: ãã³ã°ã¹ãã³
ãã®ã¹ãã¬ãŒãž ã·ã¹ãã 㯠1U 以äžã®ã©ã㯠ãŠããããšããŠèšèšãããŠãããããããã®å®¹éã 1000 ïœ 2 TB ã® DC3.84M U.7.68 NVMe ãã©ã€ãã®æ°ã«å¿ããŠæ¡åŒµã§ããŸãã DC1000M ã¯ãKingston ã®ããŒã¿ã»ã³ã¿ãŒ ãã©ã€ã ã·ãªãŒãºã«ããã U.2 ãã©ãŒã ãã¡ã¯ã¿ã®æåã® NVMe SSD ã¢ãã«ã§ãã èä¹ æ§è©äŸ¡ (DWPDãXNUMX æ¥ãããã®ãã©ã€ãæžã蟌ã¿æ°) ãããããã©ã€ãã®ä¿èšŒå¯¿åœã®éãXNUMX æ¥ã« XNUMX åããŒã¿ãæ倧容éãŸã§åæžã蟌ã¿ã§ããŸãã
Ubuntu 3.13 LTS ãªãã¬ãŒãã£ã³ã° ã·ã¹ãã ãLinux ã«ãŒãã« 18.04.3-5.0.0-generic ã§ã® fio v31 ãã¹ãã§ã¯ãå±ç€ºã¹ãã¬ãŒãž ãµã³ãã«ã¯ãæç¶å¯èœãªã¹ã«ãŒããã (æç¶åž¯åå¹ ) 㧠5.8 äž IOPS ã®èªã¿åãé床 (æç¶èªã¿åã) ã瀺ããŸããã ) 23.8 ã®ã¬ããã/ç§ã
Kingston ã® SSD ããžãã¹ ãããŒãžã£ãŒã§ãã Ariel Perez æ°ã¯ãæ°ããã¹ãã¬ãŒãž ã·ã¹ãã ã«ã€ããŠæ¬¡ã®ããã«è¿°ã¹ãŠããŸãããåœç€Ÿã¯ã次äžä»£ãµãŒããŒã« U.2 NVMe SSD ãœãªã¥ãŒã·ã§ã³ãæèŒãããããŸã§ã¹ãã¬ãŒãžã«é¢é£ããŠããããŒã¿è»¢éã®ããã«ããã¯ã®å€ãã解æ¶ããæºåãã§ããŠããŸãã NVMe SSD ãã©ã€ããšåœç€Ÿã®ãã¬ãã¢ã Server Premier DRAM ã®çµã¿åããã«ãããKingston ã¯æ¥çã§æãå æ¬çãªãšã³ãããŒãšã³ãã®ããŒã¿ ãœãªã¥ãŒã·ã§ã³ ãããã€ããŒã® XNUMX ã€ã«ãªããŸãããã
gfio v3.13 ãã¹ãã§ã¯ãDC23.8M U.1000 NVMe ãã©ã€ãäžã®ã㢠ã¹ãã¬ãŒãž ã·ã¹ãã ã®ã¹ã«ãŒãããã 2 Gbps ã§ããããšã瀺ãããŸããã åºå
ž: ãã³ã°ã¹ãã³
GPUDirect Storage ãŸãã¯åæ§ã®ãã¯ãããžã䜿çšãã HPC ã¢ããªã±ãŒã·ã§ã³ã®äžè¬çãªã·ã¹ãã ã¯ã©ã®ãããªãã®ã«ãªãã§ãããã? ããã¯ãã©ãã¯å ã§æ©èœãŠããããç©ççã«åé¢ããã¢ãŒããã¯ãã£ã§ããRAM çšã« XNUMX ã€ãŸã㯠XNUMX ã€ã®ãŠããããGPU ããã³ CPU ã³ã³ãã¥ãŒãã£ã³ã° ããŒãçšã«ããã«ããã€ãã®ãŠããããã¹ãã¬ãŒãž ã·ã¹ãã çšã« XNUMX ã€ä»¥äžã®ãŠãããããããŸãã
GPUDirect Storage ã®çºè¡šãšä»ã® GPU ãã³ããŒããã®åæ§ã®ãã¯ãããžãŒã®ç»å Žã®å¯èœæ§ã«ããããã€ããã©ãŒãã³ã¹ ã³ã³ãã¥ãŒãã£ã³ã°ã§äœ¿çšããããã«èšèšãããã¹ãã¬ãŒãž ã·ã¹ãã ã«å¯Ÿãã Kingston ã®éèŠãæ¡å€§ããŠããŸãã ææšãšãªãã®ã¯ãã¹ãã¬ãŒãž ã·ã¹ãã ããã®ããŒã¿ã®èªã¿åãé床ã§ãããGPU ãåããã³ã³ãã¥ãŒãã£ã³ã° ãŠãããã®å ¥ãå£ã«ãã 40 ã®ã¬ããããŸã㯠100 ã®ã¬ãããã®ãããã¯ãŒã¯ ã«ãŒãã®ã¹ã«ãŒãããã«å¹æµããŸãã ãããã£ãŠããã¡ããªãã¯çµç±ã®å€éš NVMe ãå«ãè¶ é«éã¹ãã¬ãŒãž ã·ã¹ãã ã¯ãHPC ã¢ããªã±ãŒã·ã§ã³ã«ãšã£ãŠçãããã®ããäž»æµã«ãªãã§ãããã ç§åŠã財åèšç®ã«å ããŠãã»ãŒãã·ãã£ã®å€§éœåžã¬ãã«ã®ã»ãã¥ãªã㣠ã·ã¹ãã ããæ¯ç§æ°çŸäžãã® HD ç»åã®èªèãšèå¥é床ãå¿ èŠãšããã亀éç£èŠã»ã³ã¿ãŒãªã©ãä»ã®å€ãã®å®çšçãªåéã«ãå¿çšã§ããã§ããããããããã¹ãã¬ãŒãžã·ã¹ãã ã®ãããåžå Ž
Kingston 補åã®è©³çŽ°ã«ã€ããŠã¯ã次㮠Web ãµã€ããã芧ãã ããã
åºæïŒ habr.com