Kumasulira kwa nkhaniyi kunakonzedwa madzulo a chiyambi cha maphunziro
Maphunziro ogawidwa pamakompyuta ochita bwino kwambiri amatha kuchepetsa nthawi yophunzitsira ma neural network amakono pazambiri zambiri kuyambira masabata mpaka maola kapena mphindi, zomwe zimapangitsa kuti njira yophunzitsira iyi ikhale yofala mukugwiritsa ntchito mozama kuphunzira. Ogwiritsa ntchito ayenera kumvetsetsa momwe angagawire ndi kulunzanitsa deta muzochitika zingapo, zomwe zimakhudza kwambiri makulitsidwe. Kuphatikiza apo, ogwiritsa ntchito akuyeneranso kudziwa momwe angagwiritsire ntchito zolemba zophunzitsira zomwe zimayenda nthawi imodzi mpaka zingapo.
M'nkhaniyi tikambirana za njira yachangu ndi yosavuta kugawa kuphunzira ntchito lotseguka laibulale kuphunzira Apache MXNet ndi Horovod anagawa kuphunzira chimango. Tidzawonetsa momveka bwino ubwino wa ntchito ya ndondomeko ya Horovod ndikuwonetsa momwe tingalembere script yophunzitsira ya MXNet kuti igwire ntchito mogawidwa ndi Horovod.
Apache MXNet ndi chiyani
Maphunziro ogawidwa mu MXNet okhala ndi seva ya parameter
Horovod ndi chiyani
Kuphatikiza kwa MXNet ndi Horovod
MXNet imaphatikizana ndi Horovod kudzera mu Distributed Learning APIs yofotokozedwa ku Horovod. Horovod communication APIs horovod.broadcast(), horovod.allgather() и horovod.allreduce() ikugwiritsidwa ntchito pogwiritsa ntchito ma callbacks asynchronous a injini ya MXNet, monga gawo la graph yake. Mwanjira iyi, kudalira kwa data pakati pa kulumikizana ndi kuwerengera kumayendetsedwa mosavuta ndi injini ya MXNet kuti tipewe kuwonongeka kwa magwiridwe antchito chifukwa cha kulunzanitsa. Chinthu chothandizira chogawidwa chofotokozedwa ku Horovod horovod.DistributedOptimizer amakula Optimizer mu MXNet kotero kuti imayitanitsa ma Horovod APIs kuti agawidwe zosintha. Zonse zazomwe zikuchitikazi zimawonekera kwa ogwiritsa ntchito.
Kuyamba mwachangu
Mutha kuyamba mwachangu kuphunzitsa neural neural network yaying'ono pa dataset ya MNIST pogwiritsa ntchito MXNet ndi Horovod pa MacBook yanu.
Choyamba, ikani mxnet ndi horovod kuchokera ku PyPI:
pip install mxnet
pip install horovod
Zindikirani: Ngati mukukumana ndi vuto panthawi pip kukhazikitsa horovodmwina muyenera kuwonjezera variable MACOSX_DEPLOYMENT_TARGET=10.vvkumene vv - iyi ndiye mtundu wa mtundu wanu wa MacOS, mwachitsanzo, kwa MacOSX Sierra muyenera kulemba MACOSX_DEPLOYMENT_TARGET=10.12 pip kukhazikitsa horovod
Kenako kukhazikitsa OpenMPI
Pamapeto pake, tsitsani script yoyeserera mxnet_mnist.py
mpirun -np 2 -H localhost:2 -bind-to none -map-by slot python mxnet_mnist.py
Izi zidzayendetsa maphunziro pamitundu iwiri ya purosesa yanu. Zotsatira zake zidzakhala izi:
INFO:root:Epoch[0] Batch [0-50] Speed: 2248.71 samples/sec accuracy=0.583640
INFO:root:Epoch[0] Batch [50-100] Speed: 2273.89 samples/sec accuracy=0.882812
INFO:root:Epoch[0] Batch [50-100] Speed: 2273.39 samples/sec accuracy=0.870000
Chiwonetsero cha Performance
Mukamaphunzitsa mtundu wa ResNet50-v1 pa dataset ya ImageNet pa 64 GPUs ndi zochitika zisanu ndi zitatu. p3.16 kukula EC2, iliyonse yomwe ili ndi 8 NVIDIA Tesla V100 GPUs pamtambo wa AWS, tidakwanitsa maphunziro a zithunzi / mphindi 45000 (i.e., kuchuluka kwa zitsanzo zophunzitsidwa pamphindikati). Maphunziro amalizidwa mu mphindi 44 pambuyo pa 90 epochs ndi kulondola kwabwino kwa 75.7%.
Tidafanizira izi ndi njira yophunzitsira yogawidwa ya MXNet yogwiritsa ntchito ma seva a parameter pa 8, 16, 32 ndi 64 GPU okhala ndi seva imodzi yokha komanso chiŵerengero cha seva kwa wogwira ntchito cha 1 mpaka 1 ndi 2 mpaka 1, motsatana. Mutha kuwona zotsatira mu Chithunzi 1 pansipa. Pa y-axis kumanzere, mipiringidzo imayimira kuchuluka kwa zithunzi zophunzitsidwa pamphindikati, mizere imawonetsa kuwongolera bwino (ndiko kuti, chiŵerengero cha kutulutsa kwenikweni ndi koyenera) pa y-axis kumanja. Monga mukuwonera, kusankha kwa kuchuluka kwa ma seva kumakhudza magwiridwe antchito. Ngati pali seva imodzi yokha ya parameter, kukwera bwino kumatsika mpaka 38% pa 64 GPUs. Kuti mukwaniritse ntchito yofananira ndi Horovod, muyenera kuwirikiza kawiri kuchuluka kwa ma seva poyerekeza ndi kuchuluka kwa ogwira ntchito.
Chithunzi 1. Kuyerekeza kwa maphunziro ogawidwa pogwiritsa ntchito MXNet ndi Horovod ndi seva ya parameter
Patebulo 1 pansipa, tikufanizira mtengo womaliza pachochitika chilichonse mukayesa kuyesa ma 64 GPU. Kugwiritsa ntchito MXNet ndi Horovod kumapereka njira yabwino kwambiri pamtengo wotsika kwambiri.
Gulu 1. Kuyerekeza mtengo pakati pa Horovod ndi Parameter Seva yokhala ndi seva kwa ogwira ntchito 2 mpaka 1.
Masitepe ochulukitsa
Mumasitepe otsatirawa, tikuwonetsani momwe mungapangirenso zotsatira za maphunziro ogawidwa pogwiritsa ntchito MXNet ndi Horovod. Kuti mudziwe zambiri za maphunziro ogawidwa ndi MXNet werengani
mwatsatane 1
Pangani gulu la zochitika zofanana ndi MXNet version 1.4.0 kapena apamwamba ndi Horovod version 0.16.0 kapena apamwamba kuti mugwiritse ntchito maphunziro ogawidwa. Muyeneranso kukhazikitsa malaibulale ophunzirira GPU. Kwa zochitika zathu, tinasankha Ubuntu 16.04 Linux, ndi GPU Driver 396.44, CUDA 9.2, cuDNN 7.2.1 library, NCCL 2.2.13 communicator ndi OpenMPI 3.1.1. Mukhozanso kugwiritsa ntchito
mwatsatane 2
Onjezani kuthekera kogwira ntchito ndi Horovod API ku zolemba zanu zophunzitsira za MXNet. Zolemba pansipa zochokera pa MXNet Gluon API zitha kugwiritsidwa ntchito ngati template yosavuta. Mizere yakuda kwambiri ndiyofunika ngati muli ndi zolemba zofananira nazo. Nazi zosintha zingapo zofunika zomwe muyenera kupanga kuti muphunzire ndi Horovod:
- Khazikitsani nkhaniyo molingana ndi malo aku Horovod (mzere 8) kuti mumvetsetse kuti maphunziro amachitidwa pazithunzi zolondola.
- Dulani magawo oyambira kuchokera kwa wogwira ntchito m'modzi kupita kwa onse (mzere 18) kuti muwonetsetse kuti ogwira ntchito onse ayamba ndi zoyambira zomwezo.
- Pangani Horovod DistributedOptimizer (mzere 25) kuti musinthe magawo munjira yogawidwa.
Kuti mupeze zolemba zonse, chonde onani zitsanzo za Horovod-MXNet
1 import mxnet as mx
2 import horovod.mxnet as hvd
3
4 # Horovod: initialize Horovod
5 hvd.init()
6
7 # Horovod: pin a GPU to be used to local rank
8 context = mx.gpu(hvd.local_rank())
9
10 # Build model
11 model = ...
12
13 # Initialize parameters
14 model.initialize(initializer, ctx=context)
15 params = model.collect_params()
16
17 # Horovod: broadcast parameters
18 hvd.broadcast_parameters(params, root_rank=0)
19
20 # Create optimizer
21 optimizer_params = ...
22 opt = mx.optimizer.create('sgd', **optimizer_params)
23
24 # Horovod: wrap optimizer with DistributedOptimizer
25 opt = hvd.DistributedOptimizer(opt)
26
27 # Create trainer and loss function
28 trainer = mx.gluon.Trainer(params, opt, kvstore=None)
29 loss_fn = ...
30
31 # Train model
32 for epoch in range(num_epoch):
33 ...
mwatsatane 3
Lowani kwa m'modzi mwa ogwira nawo ntchito kuti muyambe kugawa maphunziro pogwiritsa ntchito malangizo a MPI. Muchitsanzo ichi, maphunziro ogawidwa amayenda pazochitika zinayi ndi ma GPU 4 iliyonse, ndi ma GPU onse 16 mgululi. Stochastic Gradient Descent (SGD) optimizer idzagwiritsidwa ntchito ndi ma hyperparameter awa:
- mini-batch kukula: 256
- mlingo: 0.1
- kukula: 0.9
- kulemera kwa thupi: 0.0001
Pamene tidakwera kuchokera pa GPU imodzi kupita ku ma GPU 64, tidakulitsa kuchuluka kwa maphunziro molingana ndi kuchuluka kwa ma GPU (kuchokera pa 0,1 kwa 1 GPU mpaka 6,4 kwa 64 GPUs), ndikusunga kuchuluka kwa zithunzi pa GPU pa 256 (kuchokera pagulu la Zithunzi 256 za 1 GPU mpaka 16 za 384 GPUs). Kuwola kolemera ndi magawo amphamvu zidasintha pomwe kuchuluka kwa ma GPU akuchulukirachulukira. Tidagwiritsa ntchito maphunziro osakanikirana bwino ndi mtundu wa data wa float64 pakupita patsogolo ndi float16 kuti ma gradients afulumizitse kuwerengera kwa float32 mothandizidwa ndi NVIDIA Tesla GPUs.
$ mpirun -np 16
-H server1:4,server2:4,server3:4,server4:4
-bind-to none -map-by slot
-mca pml ob1 -mca btl ^openib
python mxnet_imagenet_resnet50.py
Pomaliza
M'nkhaniyi, tayang'ana njira yowonjezereka yogawa maphunziro achitsanzo pogwiritsa ntchito Apache MXNet ndi Horovod. Tidawonetsa kuwongolera bwino komanso kutsika mtengo poyerekeza ndi njira ya seva ya parameter pa dataset ya ImageNet pomwe mtundu wa ResNet50-v1 unaphunzitsidwa. Taphatikizanso masitepe omwe mungagwiritse ntchito kusintha script yomwe ilipo kuti muyambe maphunziro amitundu yambiri pogwiritsa ntchito Horovod.
Ngati mutangoyamba kumene ndi MXNet komanso kuphunzira mozama, pitani patsamba lokhazikitsa
Ngati munagwirapo kale ntchito ndi MXNet ndipo mukufuna kuyesa kugawa kuphunzira ndi Horovod, ndiye yang'anani
* mtengo umawerengedwa potengera
Dziwani zambiri za maphunzirowa
Source: www.habr.com