Yakagoverwa Kudzidza neApache MXNet uye Horovod

Kuturikirwa kwechinyorwa kwakagadzirirwa manheru ekutanga kwekosi "Industrial ML pane Big Data"

Kudzidziswa kwakagoverwa pamakomputa akawanda ekuita kwepamusoro-soro kunogona kuderedza nguva yekudzidziswa yemazuva ano yakadzika neural network pahuwandu hwedata kubva kumavhiki kusvika kumaawa kana kunyange maminetsi, zvichiita kuti iyi nzira yekudzidzisa iwande mukushandisa kunoshanda kwekudzidza kwakadzama. Vashandisi vanofanirwa kunzwisisa nzira yekugovana uye kuwiriranisa data mukati mezviitiko zvakawanda, izvo zvinozove zvine chekuita nekukura kwekuita basa. Mukuwedzera, vashandisi vanofanirawo kuziva nzira yekuendesa script yekudzidzisa iyo inomhanya pane imwe nguva kune dzakawanda zviitiko.

Muchinyorwa chino tichataura nezve inokurumidza uye nyore nzira yekugovera kudzidza uchishandisa yakavhurika yakadzama raibhurari yekudzidza Apache MXNet uye iyo Horovod yakagovera yekudzidza masisitimu. Ticharatidza zvakajeka mabhenefiti ekuita kweHorovod framework uye kuratidza nzira yekunyora MXNet yekudzidziswa script kuitira kuti ishande nenzira yakagoverwa neHorovod.

Chii chinonzi Apache MXNet

Apache MX Net inzvimbo yakavhurika-yakadzika yekudzidza iyo inoshandiswa kugadzira, kudzidzisa, uye kuendesa yakadzika neural network. MXNet inodonhedza kuomarara kwakabatana nekushandisa neural network, inoshanda zvakanyanya uye inotyisa, uye inopa maAPIs emitauro yakakurumbira yekuronga senge. Python, C ++, Clojure, Java, Julia, R, Scala nevamwe.

Yakagoverwa kudzidziswa muMXNet ine parameter server

Yakajairika yakagoverwa kudzidza module muMXNet inoshandisa parameter server nzira. Inoshandisa seti yeparameta maseva kuunganidza gradients kubva kumushandi wega wega, kuita kuunganidza, uye kutumira yakagadziridzwa gradients kudzokera kuvashandi kune inotevera optimization iteration. Kuona chiyero chakakodzera chemaseva kune vashandi ndicho kiyi yekuyera kunobudirira. Kana paine imwe chete paramende sevha, inogona kuve bhodhoro mukuverenga. Sezvineiwo, kana maseva akawandisa akashandiswa, akawanda-kune-akawanda kutaurirana anogona kuvhara ese network yekubatanidza.

Chii chinonzi Horovod

Horovod inzira yakazaruka yakaparadzirwa yakadzama yekudzidza yakagadziridzwa paUber. Iyo inosimudzira inoshanda muchinjika-GPU uye muchinjika-node matekinoroji akadai seNVIDIA Collective Communications Raibhurari (NCCL) uye Message Passing Interface (MPI) kugovera uye kuunganidza modhi maparamendi kune vorecs. Iyo inokwidziridza kushandiswa kwetiweki bandwidth uye zviyero zvakanaka kana uchishanda neyakadzama neural network modhi. Ikozvino inotsigira akati wandei anozivikanwa muchina kudzidza masisitimu, anoti MX Net, Tensorflow, Keras, uye PyTorch.

MXNet uye Horovod kubatanidzwa

MXNet inobatana neHorovod kuburikidza neDistributed Learning APIs inotsanangurwa muHorovod. Horovod communication APIs horovod.broadcast(), horovod.allgather() ΠΈ horovod.allreduce() inoshandiswa uchishandisa asynchronous callbacks yeMXNet injini, sechikamu chegirafu yebasa rayo. Nenzira iyi, kutsamira kwedata pakati pekutaurirana uye komputa inobatwa zviri nyore neinjini yeMXNet kudzivirira kurasikirwa kwekuita nekuda kwekuenderana. Yakagoverwa optimizer chinhu chinotsanangurwa muHorovod horovod.DistributedOptimizer inowedzera Optimizer muMXNet kuitira kuti ishevedze inoenderana Horovod APIs yakagoverwa parameter inogadziridza. Ese aya maitiro ekuita ari pachena kune vashandisi vekupedzisira.

Kurumidza kutanga

Unogona kukurumidza kutanga kudzidzisa diki convolutional neural network pane MNIST dhata uchishandisa MXNet uye Horovod paMacBook yako.
Kutanga, isa mxnet uye horovod kubva kuPyPI:

pip install mxnet
pip install horovod

Cherechedza: Kana ukasangana nechikanganiso panguva pip kuisa horovodpamwe unoda kuwedzera shanduko MACOSX_DEPLOYMENT_TARGET=10.vvkupi vv - iyi ndiyo vhezheni yeMacOS yako vhezheni, semuenzaniso, yeMacOSX Sierra iwe uchafanirwa kunyora MACOSX_DEPLOYMENT_TARGET=10.12 pip isa horovod

Wobva waisa OpenMPI kubva pano.

Pakupedzisira, dhawunirodha test script mxnet_mnist.py kubva pano uye mhanyisa iyo inotevera mirairo muMacBook terminal mune inoshanda dhairekitori:

mpirun -np 2 -H localhost:2 -bind-to none -map-by slot python mxnet_mnist.py

Izvi zvinomhanyisa kudzidziswa pamacores maviri e processor yako. Zvinobuda zvichave zvinotevera:

INFO:root:Epoch[0] Batch [0-50] Speed: 2248.71 samples/sec      accuracy=0.583640
INFO:root:Epoch[0] Batch [50-100] Speed: 2273.89 samples/sec      accuracy=0.882812
INFO:root:Epoch[0] Batch [50-100] Speed: 2273.39 samples/sec      accuracy=0.870000

Performance Demo

Kana uchidzidzisa ResNet50-v1 modhi pane ImageNet dataset pa64 GPUs ine zviitiko zvisere. p3.16xlarge EC2, imwe neimwe iine 8 NVIDIA Tesla V100 GPUs paAWS gore, takawana nzira yekudzidziswa ye45000 mifananidzo/sec (kureva, nhamba yemasampuli akadzidziswa pasekondi). Kudzidziswa kwakapedzwa mumaminetsi makumi mana nemana mushure me44 epochs nekunyatsojeka kwe90%.

Isu takaenzanisa izvi neMXNet's yakagovaniswa yekudzidziswa maitiro ekushandisa parameter maseva pa8, 16, 32 uye 64 GPUs ine imwechete parameter server uye server kune mushandi reshiyo ye1 kusvika 1 uye 2 kusvika 1, zvichiteerana. Iwe unogona kuona mhedzisiro muFigure 1 pazasi. Pa y-axis kuruboshwe, mabhawa anomiririra nhamba yemifananidzo yekudzidzisa pasekondi imwe neimwe, mitsetse inoratidza kuyera kwekuita (kureva, chiyero cheiyo chaiyo kusvika kune yakanaka yekufambisa) pane y-axis kurudyi. Sezvauri kuona, kusarudzwa kwenhamba yemaseva kunokanganisa kuyera kwekuita. Kana paine imwe chete paramende sevha, chiyero chekushanda chinodonha kusvika 38% pa64 GPUs. Kuti uwane kufanana kwekuyera kuita seHorovod, unofanirwa kupeta kaviri nhamba yemaseva zvichienderana nehuwandu hwevashandi.

Yakagoverwa Kudzidza neApache MXNet uye Horovod
Mufananidzo 1. Kuenzanisa kwekudzidza kwakagoverwa uchishandisa MXNet neHorovod uye neparameter server

MuTafura 1 pazasi, tinofananidza mutengo wekupedzisira pamuenzaniso paunenge uchimhanyisa bvunzo pa64 GPUs. Kushandisa MXNet neHorovod kunopa yakanakisa kuburikidza nemutengo wakaderera.

Yakagoverwa Kudzidza neApache MXNet uye Horovod
Tafura 1. Mutengo wekuenzanisa pakati peHorovod neParameter Server ine server kune yevashandi reshiyo ye2 kusvika ku1.

Matanho ekubereka

Mumatanho anotevera, tinokuratidza nzira yekuburitsa mhedzisiro yekudzidziswa kwakagoverwa uchishandisa MXNet neHorovod. Kuti udzidze zvakawanda nezve kudzidza kwakagoverwa neMXNet verenga iyi post.

vanotsika 1

Gadzira boka rezviitiko zvakafanana neMXNet vhezheni 1.4.0 kana yepamusoro uye Horovod vhezheni 0.16.0 kana kupfuura kuti ushandise kudzidza kwakagoverwa. Iwe unozofanirwawo kuisa maraibhurari ekudzidziswa kweGPU. Kwezviitiko zvedu, takasarudza Ubuntu 16.04 Linux, ine GPU Driver 396.44, CUDA 9.2, cuDNN 7.2.1 raibhurari, NCCL 2.2.13 mutauriri uye OpenMPI 3.1.1. Uyezve unogona kushandisa Amazon Deep Kudzidza AMI, uko maraibhurari aya akatoiswa kare.

vanotsika 2

Wedzera kugona kushanda neHorovod API kune yako MXNet yekudzidzisa script. Iyo pazasi script yakavakirwa paMXNet Gluon API inogona kushandiswa seyakapusa template. Mitsetse yemavara matema inodiwa kana uchinge uchitova negwaro rekudzidzisa rinoenderana. Heano mashoma akakosha shanduko aunoda kuita kuti udzidze neHorovod:

  • Seta mamiriro acho maererano neiyo Horovod chinzvimbo (mutsara 8) kuti unzwisise kuti kudzidziswa kunoitwa pane chaiyo giraidhi musimboti.
  • Pfuura maparamendi ekutanga kubva kune mumwe mushandi kune vese (mutsara wegumi nemasere) kuve nechokwadi chekuti vashandi vese vanotanga nemaparamita ekutanga akafanana.
  • Gadzira Horovod DistributedOptimizer (mutsara 25) kugadzirisa ma parameter nenzira yakagoverwa.

Kuti uwane iyo yakazara script, ndapota tarisa kune iyo Horovod-MXNet mienzaniso MNIST ΠΈ IMAGEnet.

1  import mxnet as mx
2  import horovod.mxnet as hvd
3
4  # Horovod: initialize Horovod
5  hvd.init()
6
7  # Horovod: pin a GPU to be used to local rank
8  context = mx.gpu(hvd.local_rank())
9
10 # Build model
11 model = ...
12
13 # Initialize parameters
14 model.initialize(initializer, ctx=context)
15 params = model.collect_params()
16
17 # Horovod: broadcast parameters
18 hvd.broadcast_parameters(params, root_rank=0)
19
20 # Create optimizer
21 optimizer_params = ...
22 opt = mx.optimizer.create('sgd', **optimizer_params)
23
24 # Horovod: wrap optimizer with DistributedOptimizer
25 opt = hvd.DistributedOptimizer(opt)
26
27 # Create trainer and loss function
28 trainer = mx.gluon.Trainer(params, opt, kvstore=None)
29 loss_fn = ...
30
31 # Train model
32 for epoch in range(num_epoch):
33    ...

vanotsika 3

Pinda kune mumwe wevashandi kuti utange kugovera dzidziso uchishandisa iyo MPI kuraira. Mumuenzaniso uyu, kudzidziswa kwakagoverwa kunomhanya pazviitiko zvina ne4 GPUs imwe neimwe, uye huwandu hwegumi nematanhatu GPUs musumbu. Iyo Stochastic Gradient Descent (SGD) optimizer ichashandiswa neinotevera hyperparameters:

  • mini-batch saizi: 256
  • chiyero chekudzidza: 0.1
  • kuwedzera: 0.9
  • kuora kwehuremu: 0.0001

Sezvo isu takakwira kubva paGPU imwe kuenda ku64 GPUs, takayera chiyero chekudzidziswa zvinoenderana nehuwandu hweGPUs (kubva pa0,1 ye1 GPU kusvika 6,4 ye64 GPUs), tichichengeta huwandu hwemifananidzo paGPU pa256 (kubva pabatch ye 256 mifananidzo ye1 GPU kusvika 16 ye384 GPUs). Kuora kwehuremu uye maparamendi ekumhanya akachinja sezvo huwandu hweGPU hwakawedzera. Isu takashandisa yakasanganiswa chaiyo kudzidziswa neiyo float64 dhata mhando yekupfuura uye float16 yemagradients kuti ikurumidze kuverenga float32 inotsigirwa neNVIDIA Tesla GPUs.

$ mpirun -np 16 
    -H server1:4,server2:4,server3:4,server4:4 
    -bind-to none -map-by slot 
    -mca pml ob1 -mca btl ^openib 
    python mxnet_imagenet_resnet50.py

mhedziso

Muchinyorwa chino, takatarisa nzira inokatyamadza yekugovera modhi yekudzidziswa tichishandisa Apache MXNet neHorovod. Isu takaratidza kuyera kwekuita uye mutengo-unoshanda tichienzanisa neiyo parameter server nzira paImageNet dataset pairi iyo ResNet50-v1 modhi yakadzidziswa. Isu takabatanidzawo matanho aunogona kushandisa kugadzirisa script iripo kuti umhanye akawanda-mienzaniso kudzidziswa uchishandisa Horovod.

Kana iwe uchangotanga neMXNet uye kudzidza kwakadzama, enda kune yekumisikidza peji MXNekutanga kuvaka MXNet. Isu tinokurudzirawo zvakasimba kuverenga chinyorwa MXNet mumaminetsi makumi matanhatukuti nditange.

Kana iwe wakatoshanda neMXNet uye uchida kuyedza yakagoverwa kudzidza neHorovod, wobva watarisa Horovod yekuisa peji, ivake kubva kuMXNet uye uteedzere muenzaniso MNIST kana IMAGEnet.

* mutengo unoverengerwa zvichibva pa mitengo yeawa AWS yeEC2 Zvimiro

Dzidza zvakawanda nezvekosi "Industrial ML pane Big Data"

Source: www.habr.com

Voeg