Kuturikirwa kwechinyorwa kwakagadzirirwa manheru ekutanga kwekosi
Kudzidziswa kwakagoverwa pamakomputa akawanda ekuita kwepamusoro-soro kunogona kuderedza nguva yekudzidziswa yemazuva ano yakadzika neural network pahuwandu hwedata kubva kumavhiki kusvika kumaawa kana kunyange maminetsi, zvichiita kuti iyi nzira yekudzidzisa iwande mukushandisa kunoshanda kwekudzidza kwakadzama. Vashandisi vanofanirwa kunzwisisa nzira yekugovana uye kuwiriranisa data mukati mezviitiko zvakawanda, izvo zvinozove zvine chekuita nekukura kwekuita basa. Mukuwedzera, vashandisi vanofanirawo kuziva nzira yekuendesa script yekudzidzisa iyo inomhanya pane imwe nguva kune dzakawanda zviitiko.
Muchinyorwa chino tichataura nezve inokurumidza uye nyore nzira yekugovera kudzidza uchishandisa yakavhurika yakadzama raibhurari yekudzidza Apache MXNet uye iyo Horovod yakagovera yekudzidza masisitimu. Ticharatidza zvakajeka mabhenefiti ekuita kweHorovod framework uye kuratidza nzira yekunyora MXNet yekudzidziswa script kuitira kuti ishande nenzira yakagoverwa neHorovod.
Chii chinonzi Apache MXNet
Yakagoverwa kudzidziswa muMXNet ine parameter server
Chii chinonzi Horovod
MXNet uye Horovod kubatanidzwa
MXNet inobatana neHorovod kuburikidza neDistributed Learning APIs inotsanangurwa muHorovod. Horovod communication APIs horovod.broadcast(), horovod.allgather() ΠΈ horovod.allreduce() inoshandiswa uchishandisa asynchronous callbacks yeMXNet injini, sechikamu chegirafu yebasa rayo. Nenzira iyi, kutsamira kwedata pakati pekutaurirana uye komputa inobatwa zviri nyore neinjini yeMXNet kudzivirira kurasikirwa kwekuita nekuda kwekuenderana. Yakagoverwa optimizer chinhu chinotsanangurwa muHorovod horovod.DistributedOptimizer inowedzera Optimizer muMXNet kuitira kuti ishevedze inoenderana Horovod APIs yakagoverwa parameter inogadziridza. Ese aya maitiro ekuita ari pachena kune vashandisi vekupedzisira.
Kurumidza kutanga
Unogona kukurumidza kutanga kudzidzisa diki convolutional neural network pane MNIST dhata uchishandisa MXNet uye Horovod paMacBook yako.
Kutanga, isa mxnet uye horovod kubva kuPyPI:
pip install mxnet
pip install horovod
Cherechedza: Kana ukasangana nechikanganiso panguva pip kuisa horovodpamwe unoda kuwedzera shanduko MACOSX_DEPLOYMENT_TARGET=10.vvkupi vv - iyi ndiyo vhezheni yeMacOS yako vhezheni, semuenzaniso, yeMacOSX Sierra iwe uchafanirwa kunyora MACOSX_DEPLOYMENT_TARGET=10.12 pip isa horovod
Wobva waisa OpenMPI
Pakupedzisira, dhawunirodha test script mxnet_mnist.py
mpirun -np 2 -H localhost:2 -bind-to none -map-by slot python mxnet_mnist.py
Izvi zvinomhanyisa kudzidziswa pamacores maviri e processor yako. Zvinobuda zvichave zvinotevera:
INFO:root:Epoch[0] Batch [0-50] Speed: 2248.71 samples/sec accuracy=0.583640
INFO:root:Epoch[0] Batch [50-100] Speed: 2273.89 samples/sec accuracy=0.882812
INFO:root:Epoch[0] Batch [50-100] Speed: 2273.39 samples/sec accuracy=0.870000
Performance Demo
Kana uchidzidzisa ResNet50-v1 modhi pane ImageNet dataset pa64 GPUs ine zviitiko zvisere. p3.16xlarge EC2, imwe neimwe iine 8 NVIDIA Tesla V100 GPUs paAWS gore, takawana nzira yekudzidziswa ye45000 mifananidzo/sec (kureva, nhamba yemasampuli akadzidziswa pasekondi). Kudzidziswa kwakapedzwa mumaminetsi makumi mana nemana mushure me44 epochs nekunyatsojeka kwe90%.
Isu takaenzanisa izvi neMXNet's yakagovaniswa yekudzidziswa maitiro ekushandisa parameter maseva pa8, 16, 32 uye 64 GPUs ine imwechete parameter server uye server kune mushandi reshiyo ye1 kusvika 1 uye 2 kusvika 1, zvichiteerana. Iwe unogona kuona mhedzisiro muFigure 1 pazasi. Pa y-axis kuruboshwe, mabhawa anomiririra nhamba yemifananidzo yekudzidzisa pasekondi imwe neimwe, mitsetse inoratidza kuyera kwekuita (kureva, chiyero cheiyo chaiyo kusvika kune yakanaka yekufambisa) pane y-axis kurudyi. Sezvauri kuona, kusarudzwa kwenhamba yemaseva kunokanganisa kuyera kwekuita. Kana paine imwe chete paramende sevha, chiyero chekushanda chinodonha kusvika 38% pa64 GPUs. Kuti uwane kufanana kwekuyera kuita seHorovod, unofanirwa kupeta kaviri nhamba yemaseva zvichienderana nehuwandu hwevashandi.
Mufananidzo 1. Kuenzanisa kwekudzidza kwakagoverwa uchishandisa MXNet neHorovod uye neparameter server
MuTafura 1 pazasi, tinofananidza mutengo wekupedzisira pamuenzaniso paunenge uchimhanyisa bvunzo pa64 GPUs. Kushandisa MXNet neHorovod kunopa yakanakisa kuburikidza nemutengo wakaderera.
Tafura 1. Mutengo wekuenzanisa pakati peHorovod neParameter Server ine server kune yevashandi reshiyo ye2 kusvika ku1.
Matanho ekubereka
Mumatanho anotevera, tinokuratidza nzira yekuburitsa mhedzisiro yekudzidziswa kwakagoverwa uchishandisa MXNet neHorovod. Kuti udzidze zvakawanda nezve kudzidza kwakagoverwa neMXNet verenga
vanotsika 1
Gadzira boka rezviitiko zvakafanana neMXNet vhezheni 1.4.0 kana yepamusoro uye Horovod vhezheni 0.16.0 kana kupfuura kuti ushandise kudzidza kwakagoverwa. Iwe unozofanirwawo kuisa maraibhurari ekudzidziswa kweGPU. Kwezviitiko zvedu, takasarudza Ubuntu 16.04 Linux, ine GPU Driver 396.44, CUDA 9.2, cuDNN 7.2.1 raibhurari, NCCL 2.2.13 mutauriri uye OpenMPI 3.1.1. Uyezve unogona kushandisa
vanotsika 2
Wedzera kugona kushanda neHorovod API kune yako MXNet yekudzidzisa script. Iyo pazasi script yakavakirwa paMXNet Gluon API inogona kushandiswa seyakapusa template. Mitsetse yemavara matema inodiwa kana uchinge uchitova negwaro rekudzidzisa rinoenderana. Heano mashoma akakosha shanduko aunoda kuita kuti udzidze neHorovod:
- Seta mamiriro acho maererano neiyo Horovod chinzvimbo (mutsara 8) kuti unzwisise kuti kudzidziswa kunoitwa pane chaiyo giraidhi musimboti.
- Pfuura maparamendi ekutanga kubva kune mumwe mushandi kune vese (mutsara wegumi nemasere) kuve nechokwadi chekuti vashandi vese vanotanga nemaparamita ekutanga akafanana.
- Gadzira Horovod DistributedOptimizer (mutsara 25) kugadzirisa ma parameter nenzira yakagoverwa.
Kuti uwane iyo yakazara script, ndapota tarisa kune iyo Horovod-MXNet mienzaniso
1 import mxnet as mx
2 import horovod.mxnet as hvd
3
4 # Horovod: initialize Horovod
5 hvd.init()
6
7 # Horovod: pin a GPU to be used to local rank
8 context = mx.gpu(hvd.local_rank())
9
10 # Build model
11 model = ...
12
13 # Initialize parameters
14 model.initialize(initializer, ctx=context)
15 params = model.collect_params()
16
17 # Horovod: broadcast parameters
18 hvd.broadcast_parameters(params, root_rank=0)
19
20 # Create optimizer
21 optimizer_params = ...
22 opt = mx.optimizer.create('sgd', **optimizer_params)
23
24 # Horovod: wrap optimizer with DistributedOptimizer
25 opt = hvd.DistributedOptimizer(opt)
26
27 # Create trainer and loss function
28 trainer = mx.gluon.Trainer(params, opt, kvstore=None)
29 loss_fn = ...
30
31 # Train model
32 for epoch in range(num_epoch):
33 ...
vanotsika 3
Pinda kune mumwe wevashandi kuti utange kugovera dzidziso uchishandisa iyo MPI kuraira. Mumuenzaniso uyu, kudzidziswa kwakagoverwa kunomhanya pazviitiko zvina ne4 GPUs imwe neimwe, uye huwandu hwegumi nematanhatu GPUs musumbu. Iyo Stochastic Gradient Descent (SGD) optimizer ichashandiswa neinotevera hyperparameters:
- mini-batch saizi: 256
- chiyero chekudzidza: 0.1
- kuwedzera: 0.9
- kuora kwehuremu: 0.0001
Sezvo isu takakwira kubva paGPU imwe kuenda ku64 GPUs, takayera chiyero chekudzidziswa zvinoenderana nehuwandu hweGPUs (kubva pa0,1 ye1 GPU kusvika 6,4 ye64 GPUs), tichichengeta huwandu hwemifananidzo paGPU pa256 (kubva pabatch ye 256 mifananidzo ye1 GPU kusvika 16 ye384 GPUs). Kuora kwehuremu uye maparamendi ekumhanya akachinja sezvo huwandu hweGPU hwakawedzera. Isu takashandisa yakasanganiswa chaiyo kudzidziswa neiyo float64 dhata mhando yekupfuura uye float16 yemagradients kuti ikurumidze kuverenga float32 inotsigirwa neNVIDIA Tesla GPUs.
$ mpirun -np 16
-H server1:4,server2:4,server3:4,server4:4
-bind-to none -map-by slot
-mca pml ob1 -mca btl ^openib
python mxnet_imagenet_resnet50.py
mhedziso
Muchinyorwa chino, takatarisa nzira inokatyamadza yekugovera modhi yekudzidziswa tichishandisa Apache MXNet neHorovod. Isu takaratidza kuyera kwekuita uye mutengo-unoshanda tichienzanisa neiyo parameter server nzira paImageNet dataset pairi iyo ResNet50-v1 modhi yakadzidziswa. Isu takabatanidzawo matanho aunogona kushandisa kugadzirisa script iripo kuti umhanye akawanda-mienzaniso kudzidziswa uchishandisa Horovod.
Kana iwe uchangotanga neMXNet uye kudzidza kwakadzama, enda kune yekumisikidza peji
Kana iwe wakatoshanda neMXNet uye uchida kuyedza yakagoverwa kudzidza neHorovod, wobva watarisa
* mutengo unoverengerwa zvichibva pa
Dzidza zvakawanda nezvekosi
Source: www.habr.com