èšäºã®ç¿»èš³ã¯ã³ãŒã¹éå§åå€ã«æºåãããŸãã
è€æ°ã®ã〠ããã©ãŒãã³ã¹ ã³ã³ãã¥ãŒãã£ã³ã° ã€ã³ã¹ã¿ã³ã¹ã§ã®åæ£ãã¬ãŒãã³ã°ã«ããã倧éã®ããŒã¿ã«å¯Ÿããææ°ã®ãã£ãŒã ãã¥ãŒã©ã« ãããã¯ãŒã¯ã®ãã¬ãŒãã³ã°æéãæ°é±éããæ°æéãããã«ã¯æ°åã«ãŸã§ççž®ã§ããããããã®ãã¬ãŒãã³ã°ææ³ã¯ãã£ãŒã ã©ãŒãã³ã°ã®å®éã®ã¢ããªã±ãŒã·ã§ã³ã§æ®åããŠããŸãã ãŠãŒã¶ãŒã¯ãè€æ°ã®ã€ã³ã¹ã¿ã³ã¹éã§ããŒã¿ãå ±æããã³åæããæ¹æ³ãç解ããå¿ èŠããããŸããããã¯ãã¹ã±ãŒãªã³ã°å¹çã«å€§ããªåœ±é¿ãäžããŸãã ããã«ããŠãŒã¶ãŒã¯ãåäžã€ã³ã¹ã¿ã³ã¹ã§å®è¡ããããã¬ãŒãã³ã° ã¹ã¯ãªãããè€æ°ã®ã€ã³ã¹ã¿ã³ã¹ã«ãããã€ããæ¹æ³ãç¥ã£ãŠããå¿ èŠããããŸãã
ãã®èšäºã§ã¯ããªãŒãã³ãªæ·±å±€åŠç¿ã©ã€ãã©ãªã§ãã Apache MXNet ãš Horovod åæ£åŠç¿ãã¬ãŒã ã¯ãŒã¯ã䜿çšããŠåŠç¿ãåæ£ããè¿
éãã€ç°¡åãªæ¹æ³ã«ã€ããŠèª¬æããŸãã Horovod ãã¬ãŒã ã¯ãŒã¯ã®ããã©ãŒãã³ã¹äžã®å©ç¹ãæ確ã«ç€ºããHorovod ã§åæ£æ¹åŒã§åäœããããã« MXNet ãã¬ãŒãã³ã° ã¹ã¯ãªãããäœæããæ¹æ³ã瀺ããŸãã
Apache MXNet ãšã¯
ãã©ã¡ãŒã¿ãµãŒããŒã䜿çšããMXNetã§ã®åæ£ãã¬ãŒãã³ã°
ãããŽã©ããšã¯
MXNet ãš Horovod ã®çµ±å
MXNet ã¯ãHorovod ã§å®çŸ©ãããåæ£åŠç¿ API ãéã㊠Horovod ãšçµ±åãããŸãã Horovod éä¿¡ API horovod.broadcast(), horovod.allgather() О horovod.allreduce() ã¿ã¹ã¯ ã°ã©ãã®äžéšãšããŠãMXNet ãšã³ãžã³ã®éåæã³ãŒã«ããã¯ã䜿çšããŠå®è£ ãããŸãã ãã®ããã«ããŠãéä¿¡ãšèšç®ã®éã®ããŒã¿ã®äŸåé¢ä¿ã MXNet ãšã³ãžã³ã«ãã£ãŠç°¡åã«åŠçãããåæã«ããããã©ãŒãã³ã¹ã®æ倱ãåé¿ãããŸãã Horovod ã§å®çŸ©ãããåæ£ãªããã£ãã€ã¶ãŒ ãªããžã§ã¯ã horovod.DistributedOptimizer æ¡å€§ãã ãªããã£ãã€ã¶ MXNet ã§ãåæ£ãã©ã¡ãŒã¿æŽæ°ã®ããã«å¯Ÿå¿ãã Horovod API ãåŒã³åºãããã«ããŸãã ãããã®å®è£ ã®è©³çŽ°ã¯ãã¹ãŠããšã³ã ãŠãŒã¶ãŒã«ãšã£ãŠééçã§ãã
ãã¡ã¹ãã¹ã¿ãŒã
MacBook äžã® MXNet ãš Horovod ã䜿çšããŠãMNIST ããŒã¿ã»ããäžã§å°ããªç³ã¿èŸŒã¿ãã¥ãŒã©ã« ãããã¯ãŒã¯ã®ãã¬ãŒãã³ã°ãããã«éå§ã§ããŸãã
ãŸããPyPI ãã mxnet ãš horovod ãã€ã³ã¹ããŒã«ããŸãã
pip install mxnet
pip install horovod
泚: éäžã§ãšã©ãŒãçºçããå Žåã¯ã pip ã€ã³ã¹ããŒã« horovodå€æ°ãè¿œå ããå¿ èŠããããããããŸãã MACOSX_DEPLOYMENT_TARGET=10.vvã©ã vv â ãã㯠MacOS ã®ããŒãžã§ã³ã§ããããšãã°ãMacOSX Sierra ã®å Žåã¯æ¬¡ã®ããã«èšè¿°ããå¿ èŠããããŸãã MACOSX_DEPLOYMENT_TARGET=10.12 pip ã€ã³ã¹ããŒã« horovod
次ã«ãOpenMPIãã€ã³ã¹ããŒã«ããŸã
æåŸã«ããã¹ãã¹ã¯ãªãããããŠã³ããŒãããŸã mxnet_mnist.py
mpirun -np 2 -H localhost:2 -bind-to none -map-by slot python mxnet_mnist.py
ããã«ãããããã»ããµã® XNUMX ã€ã®ã³ã¢ã§ãã¬ãŒãã³ã°ãå®è¡ãããŸãã åºåã¯æ¬¡ã®ããã«ãªããŸãã
INFO:root:Epoch[0] Batch [0-50] Speed: 2248.71 samples/sec accuracy=0.583640
INFO:root:Epoch[0] Batch [50-100] Speed: 2273.89 samples/sec accuracy=0.882812
INFO:root:Epoch[0] Batch [50-100] Speed: 2273.39 samples/sec accuracy=0.870000
ããã©ãŒãã³ã¹ãã¢
50 ã€ã®ã€ã³ã¹ã¿ã³ã¹ãæ〠1 GPU äžã® ImageNet ããŒã¿ã»ãã㧠ResNet64-vXNUMX ã¢ãã«ããã¬ãŒãã³ã°ããå Žå p3.16xã©ãŒãž AWS ã¯ã©ãŠãäžã«ãããã 2 ã€ã® NVIDIA Tesla V8 GPU ãå«ãŸãã EC100 ã§ã¯ã45000 ç»å/ç§ (ã€ãŸãã44 ç§ãããã®ãã¬ãŒãã³ã°ããããµã³ãã«æ°) ã®ãã¬ãŒãã³ã° ã¹ã«ãŒããããéæããŸããã ãã¬ãŒãã³ã°ã¯ 90 ãšããã¯åŸ 75.7 åã§å®äºããæé«ã®ç²ŸåºŠã¯ XNUMX% ã§ããã
ããããåäžã®ãã©ã¡ãŒã¿ ãµãŒããŒãšãµãŒããŒå¯Ÿã¯ãŒã«ãŒã®æ¯çããããã 8 察 16 ããã³ 32 察 64 ã§ã1ã1ã2ãããã³ 1 GPU äžã®ãã©ã¡ãŒã¿ãŒ ãµãŒããŒã䜿çšãã MXNet ã®åæ£ãã¬ãŒãã³ã° ã¢ãããŒããšæ¯èŒããŸããã 以äžã®å³ 1 ã«çµæã瀺ããŸãã å·ŠåŽã® Y 軞ã®æ£ã¯ 38 ç§ãããã«ãã¬ãŒãã³ã°ããç»åã®æ°ãè¡šããå³åŽã® Y 軞ã®ç·ã¯ã¹ã±ãŒãªã³ã°å¹ç (ã€ãŸããå®éã®ã¹ã«ãŒããããšçæ³çãªã¹ã«ãŒãããã®æ¯) ãè¡šããŸãã ã芧ã®ãšããããµãŒããŒã®æ°ã®éžæã¯ã¹ã±ãŒãªã³ã°å¹çã«åœ±é¿ããŸãã ãã©ã¡ãŒã¿ãŒ ãµãŒããŒã 64 ã€ãããªãå Žåãã¹ã±ãŒãªã³ã°å¹ç㯠XNUMX GPU 㧠XNUMX% ã«äœäžããŸãã Horovod ãšåãã¹ã±ãŒãªã³ã°å¹çãéæããã«ã¯ãã¯ãŒã«ãŒã®æ°ã«å¯ŸããŠãµãŒããŒã®æ°ã XNUMX åã«ããå¿ èŠããããŸãã
å³ 1. MXNet ãš Horovod ããã³ãã©ã¡ãŒã¿ ãµãŒããŒã䜿çšããåæ£åŠç¿ã®æ¯èŒ
以äžã®è¡š 1 ã§ã¯ã64 GPU ã§å®éšãå®è¡ããå Žåã®ã€ã³ã¹ã¿ã³ã¹ãããã®æçµã³ã¹ããæ¯èŒããŠããŸãã Horovod 㧠MXNet ã䜿çšãããšãæå°éã®ã³ã¹ãã§æé«ã®ã¹ã«ãŒããããåŸãããŸãã
è¡š 1. ãµãŒããŒãšã¯ãŒã«ãŒã®æ¯çã 2 察 1 ã®å Žåã® Horovod ãš Parameter Server ã®ã³ã¹ãã®æ¯èŒã
åçŸããæé
次ã®ã¹ãããã§ã¯ãMXNet ãš Horovod ã䜿çšããŠåæ£ãã¬ãŒãã³ã°ã®çµæãåçŸããæ¹æ³ã瀺ããŸãã MXNet ã䜿çšããåæ£åŠç¿ã«ã€ããŠè©³ããã¯ããã¡ããã芧ãã ããã
ã¹ããã1
åæ£åŠç¿ã䜿çšããã«ã¯ãMXNet ããŒãžã§ã³ 1.4.0 以éãš Horovod ããŒãžã§ã³ 0.16.0 以éã䜿çšããŠåçš®ã€ã³ã¹ã¿ã³ã¹ã®ã¯ã©ã¹ã¿ãŒãäœæããŸãã GPU ãã¬ãŒãã³ã°çšã®ã©ã€ãã©ãªãã€ã³ã¹ããŒã«ããå¿
èŠããããŸãã ç§ãã¡ã®ã€ã³ã¹ã¿ã³ã¹ã«ã¯ãGPU ãã©ã€ã㌠16.04ãCUDA 396.44ãcuDNN 9.2 ã©ã€ãã©ãªãNCCL 7.2.1 ã³ãã¥ãã±ãŒã¿ãŒãããã³ OpenMPI 2.2.13 ãåãã Ubuntu 3.1.1 Linux ãéžæããŸããã ãŸãã䜿çšããããšãã§ããŸã
ã¹ããã2
Horovod API ãæäœããæ©èœã MXNet ãã¬ãŒãã³ã° ã¹ã¯ãªããã«è¿œå ããŸãã MXNet Gluon API ã«åºã¥ã以äžã®ã¹ã¯ãªããã¯ãåçŽãªãã³ãã¬ãŒããšããŠäœ¿çšã§ããŸãã 察å¿ãããã¬ãŒãã³ã° ã¹ã¯ãªããããã§ã«ããå Žåã¯ã倪åã®è¡ãå¿ èŠã§ãã Horovod ã§åŠç¿ããããã«å¿ èŠãªéèŠãªå€æŽãããã€ã瀺ããŸãã
- ãã¬ãŒãã³ã°ãæ£ããã°ã©ãã£ãã¯ã¹ ã³ã¢ã§å®è¡ãããããšãç解ããããã«ãããŒã«ã« Horovod ã©ã³ã¯ (8 è¡ç®) ã«åŸã£ãŠã³ã³ããã¹ããèšå®ããŸãã
- åæãã©ã¡ãŒã¿ã 18 ã€ã®ã¯ãŒã«ãŒãããã¹ãŠã®ã¯ãŒã«ãŒã«æž¡ã (XNUMX è¡ç®)ããã¹ãŠã®ã¯ãŒã«ãŒãåãåæãã©ã¡ãŒã¿ã§éå§ãããããã«ããŸãã
- ãããŽã©ãããäœæãã åæ£ãªããã£ãã€ã¶ãŒ (25 è¡ç®) åæ£æ¹åŒã§ãã©ã¡ãŒã¿ãæŽæ°ããŸãã
å®å
šãªã¹ã¯ãªãããå
¥æããã«ã¯ãHorovod-MXNet ã®äŸãåç
§ããŠãã ããã
1 import mxnet as mx
2 import horovod.mxnet as hvd
3
4 # Horovod: initialize Horovod
5 hvd.init()
6
7 # Horovod: pin a GPU to be used to local rank
8 context = mx.gpu(hvd.local_rank())
9
10 # Build model
11 model = ...
12
13 # Initialize parameters
14 model.initialize(initializer, ctx=context)
15 params = model.collect_params()
16
17 # Horovod: broadcast parameters
18 hvd.broadcast_parameters(params, root_rank=0)
19
20 # Create optimizer
21 optimizer_params = ...
22 opt = mx.optimizer.create('sgd', **optimizer_params)
23
24 # Horovod: wrap optimizer with DistributedOptimizer
25 opt = hvd.DistributedOptimizer(opt)
26
27 # Create trainer and loss function
28 trainer = mx.gluon.Trainer(params, opt, kvstore=None)
29 loss_fn = ...
30
31 # Train model
32 for epoch in range(num_epoch):
33 ...
ã¹ããã3
ããããã®ã¯ãŒã«ãŒã«ãã°ã€ã³ããŠãMPI ãã£ã¬ã¯ãã£ãã䜿çšããŠåæ£ãã¬ãŒãã³ã°ãéå§ããŸãã ãã®äŸã§ã¯ãåæ£ãã¬ãŒãã³ã°ã¯ããããã 4 ã€ã® GPU ãåãã 16 ã€ã®ã€ã³ã¹ã¿ã³ã¹ã§å®è¡ãããã¯ã©ã¹ã¿ãŒå ã®åèš XNUMX ã® GPU ã䜿çšãããŸãã 確ççåŸé éäžæ³ (SGD) ãªããã£ãã€ã¶ãŒã¯ã次ã®ãã€ããŒãã©ã¡ãŒã¿ãŒãšãšãã«äœ¿çšãããŸãã
- ããããããµã€ãº: 256
- åŠç¿ç: 0.1
- å¢ã: 0.9
- ééæžè¡°: 0.0001
64 ã€ã® GPU ãã 0,1 GPU ãŸã§ã¹ã±ãŒãªã³ã°ããéãGPU ãããã®ç»åæ°ã 1 ã«ä¿ã¡ãªãããGPU ã®æ°ã«å¿ããŠãã¬ãŒãã³ã° ã¬ãŒããç·åœ¢ã«ã¹ã±ãŒãªã³ã°ããŸãã (6,4 GPU ã® 64 ãã 256 GPU ã® 256)ã 1 GPU ã®å Žå㯠16 æã®ç»åã384 GPU ã®å Žå㯠64 æã®ç»åïŒã GPU ã®æ°ãå¢å ãããšãéã¿ã®æžè¡°ãšéåéã®ãã©ã¡ãŒã¿ãŒãå€åããŸããã NVIDIA Tesla GPU ã§ãµããŒãããã float16 èšç®ãé«éåããããã«ããã©ã¯ãŒã ãã¹ã«ã¯ float32 ããŒã¿åãåŸé ã«ã¯ float16 ããŒã¿åã䜿çšããæ··å粟床ãã¬ãŒãã³ã°ã䜿çšããŸããã
$ mpirun -np 16
-H server1:4,server2:4,server3:4,server4:4
-bind-to none -map-by slot
-mca pml ob1 -mca btl ^openib
python mxnet_imagenet_resnet50.py
ãŸãšã
ãã®èšäºã§ã¯ãApache MXNet ãš Horovod ã䜿çšããåæ£ã¢ãã« ãã¬ãŒãã³ã°ãžã®ã¹ã±ãŒã©ãã«ãªã¢ãããŒãã«ã€ããŠæ€èšããŸããã ResNet50-v1 ã¢ãã«ããã¬ãŒãã³ã°ããã ImageNet ããŒã¿ã»ããã«å¯Ÿãããã©ã¡ãŒã¿ãŒ ãµãŒã㌠ã¢ãããŒããšæ¯èŒããã¹ã±ãŒãªã³ã°å¹çãšè²»çšå¯Ÿå¹æãå®èšŒããŸããã Horovod ã䜿çšããŠãã«ãã€ã³ã¹ã¿ã³ã¹ ãã¬ãŒãã³ã°ãå®è¡ããããã«æ¢åã®ã¹ã¯ãªãããå€æŽããããã«äœ¿çšã§ããæé ãå«ãŸããŠããŸãã
MXNet ãšãã£ãŒã ã©ãŒãã³ã°ãå§ããã°ããã®å Žåã¯ãã€ã³ã¹ããŒã« ããŒãžã«ç§»åããŠãã ããã
ãã§ã« MXNet ã䜿çšããããšããããHorovod ã䜿çšããåæ£åŠç¿ãè©ŠããŠã¿ããå Žåã¯ã以äžãã芧ãã ããã
*ã³ã¹ãã¯ä»¥äžã«åºã¥ããŠèšç®ãããŸã
ã³ãŒã¹ã«ã€ããŠè©³ããèŠã
åºæïŒ habr.com