Maphunziro Ogawidwa ndi Apache MXNet ndi Horovod

Kumasulira kwa nkhaniyi kunakonzedwa madzulo a chiyambi cha maphunziro "Industrial ML pa Big Data"

Maphunziro ogawidwa pamakompyuta ochita bwino kwambiri amatha kuchepetsa nthawi yophunzitsira ma neural network amakono pazambiri zambiri kuyambira masabata mpaka maola kapena mphindi, zomwe zimapangitsa kuti njira yophunzitsira iyi ikhale yofala mukugwiritsa ntchito mozama kuphunzira. Ogwiritsa ntchito ayenera kumvetsetsa momwe angagawire ndi kulunzanitsa deta muzochitika zingapo, zomwe zimakhudza kwambiri makulitsidwe. Kuphatikiza apo, ogwiritsa ntchito akuyeneranso kudziwa momwe angagwiritsire ntchito zolemba zophunzitsira zomwe zimayenda nthawi imodzi mpaka zingapo.

M'nkhaniyi tikambirana za njira yachangu ndi yosavuta kugawa kuphunzira ntchito lotseguka laibulale kuphunzira Apache MXNet ndi Horovod anagawa kuphunzira chimango. Tidzawonetsa momveka bwino ubwino wa ntchito ya ndondomeko ya Horovod ndikuwonetsa momwe tingalembere script yophunzitsira ya MXNet kuti igwire ntchito mogawidwa ndi Horovod.

Apache MXNet ndi chiyani

Apache MX Net ndi njira yophunzirira mwakuya yotseguka yomwe imagwiritsidwa ntchito popanga, kuphunzitsa, ndi kutumiza maukonde ozama a neural. MXNet imafotokoza zovuta zomwe zimagwirizanitsidwa ndi kukhazikitsa ma neural network, imagwira bwino ntchito komanso yowopsa, ndipo imapereka ma API azilankhulo zodziwika bwino monga. Python, C ++, Kutseka, Java, Julia, R, Scala ndi ena.

Maphunziro ogawidwa mu MXNet okhala ndi seva ya parameter

Module yophunzirira yogawidwa mu MXNet amagwiritsa ntchito njira ya seva ya parameter. Imagwiritsa ntchito ma seva angapo kuti asonkhanitse ma gradients kuchokera kwa wogwira ntchito aliyense, kuphatikizira, ndikutumiza ma gradients osinthidwa kwa ogwira ntchito kuti akakonzenso. Kuzindikira chiŵerengero cholondola cha ma seva kwa ogwira ntchito ndiye chinsinsi chokweza bwino. Ngati pali seva imodzi yokha ya parameter, zitha kukhala zovuta pakuwerengera. Mosiyana ndi zimenezi, ngati ma seva ambiri agwiritsidwa ntchito, kulankhulana kwambiri-kuchuluka kungathe kutseka ma intaneti onse.

Horovod ndi chiyani

Horovod ndi njira yophunzirira yozama yotseguka yopangidwa ku Uber. Imagwiritsa ntchito matekinoloje apamwamba a GPU ndi ma crossnode monga NVIDIA Collective Communications Library (NCCL) ndi Message Passing Interface (MPI) kugawa ndikuphatikiza magawo amitundu pa vorecs. Imakulitsa kugwiritsa ntchito ma network bandwidth ndi masikelo bwino mukamagwira ntchito ndi mitundu yakuya ya neural network. Pakali pano imathandizira machitidwe angapo ophunzirira makina odziwika, omwe ndi MX Net, Tensorflow, Keras, ndi PyTorch.

Kuphatikiza kwa MXNet ndi Horovod

MXNet imaphatikizana ndi Horovod kudzera mu Distributed Learning APIs yofotokozedwa ku Horovod. Horovod communication APIs horovod.broadcast(), horovod.allgather() и horovod.allreduce() ikugwiritsidwa ntchito pogwiritsa ntchito ma callbacks asynchronous a injini ya MXNet, monga gawo la graph yake. Mwanjira iyi, kudalira kwa data pakati pa kulumikizana ndi kuwerengera kumayendetsedwa mosavuta ndi injini ya MXNet kuti tipewe kuwonongeka kwa magwiridwe antchito chifukwa cha kulunzanitsa. Chinthu chothandizira chogawidwa chofotokozedwa ku Horovod horovod.DistributedOptimizer amakula Optimizer mu MXNet kotero kuti imayitanitsa ma Horovod APIs kuti agawidwe zosintha. Zonse zazomwe zikuchitikazi zimawonekera kwa ogwiritsa ntchito.

Kuyamba mwachangu

Mutha kuyamba mwachangu kuphunzitsa neural neural network yaying'ono pa dataset ya MNIST pogwiritsa ntchito MXNet ndi Horovod pa MacBook yanu.
Choyamba, ikani mxnet ndi horovod kuchokera ku PyPI:

pip install mxnet
pip install horovod

Zindikirani: Ngati mukukumana ndi vuto panthawi pip kukhazikitsa horovodmwina muyenera kuwonjezera variable MACOSX_DEPLOYMENT_TARGET=10.vvkumene vv - iyi ndiye mtundu wa mtundu wanu wa MacOS, mwachitsanzo, kwa MacOSX Sierra muyenera kulemba MACOSX_DEPLOYMENT_TARGET=10.12 pip kukhazikitsa horovod

Kenako kukhazikitsa OpenMPI kuchokera pano.

Pamapeto pake, tsitsani script yoyeserera mxnet_mnist.py kuchokera pano ndikuyendetsa malamulo otsatirawa mu MacBook terminal mu bukhu logwira ntchito:

mpirun -np 2 -H localhost:2 -bind-to none -map-by slot python mxnet_mnist.py

Izi zidzayendetsa maphunziro pamitundu iwiri ya purosesa yanu. Zotsatira zake zidzakhala izi:

INFO:root:Epoch[0] Batch [0-50] Speed: 2248.71 samples/sec      accuracy=0.583640
INFO:root:Epoch[0] Batch [50-100] Speed: 2273.89 samples/sec      accuracy=0.882812
INFO:root:Epoch[0] Batch [50-100] Speed: 2273.39 samples/sec      accuracy=0.870000

Chiwonetsero cha Performance

Mukamaphunzitsa mtundu wa ResNet50-v1 pa dataset ya ImageNet pa 64 GPUs ndi zochitika zisanu ndi zitatu. p3.16 kukula EC2, iliyonse yomwe ili ndi 8 NVIDIA Tesla V100 GPUs pamtambo wa AWS, tidakwanitsa maphunziro a zithunzi / mphindi 45000 (i.e., kuchuluka kwa zitsanzo zophunzitsidwa pamphindikati). Maphunziro amalizidwa mu mphindi 44 pambuyo pa 90 epochs ndi kulondola kwabwino kwa 75.7%.

Tidafanizira izi ndi njira yophunzitsira yogawidwa ya MXNet yogwiritsa ntchito ma seva a parameter pa 8, 16, 32 ndi 64 GPU okhala ndi seva imodzi yokha komanso chiŵerengero cha seva kwa wogwira ntchito cha 1 mpaka 1 ndi 2 mpaka 1, motsatana. Mutha kuwona zotsatira mu Chithunzi 1 pansipa. Pa y-axis kumanzere, mipiringidzo imayimira kuchuluka kwa zithunzi zophunzitsidwa pamphindikati, mizere imawonetsa kuwongolera bwino (ndiko kuti, chiŵerengero cha kutulutsa kwenikweni ndi koyenera) pa y-axis kumanja. Monga mukuwonera, kusankha kwa kuchuluka kwa ma seva kumakhudza magwiridwe antchito. Ngati pali seva imodzi yokha ya parameter, kukwera bwino kumatsika mpaka 38% pa 64 GPUs. Kuti mukwaniritse ntchito yofananira ndi Horovod, muyenera kuwirikiza kawiri kuchuluka kwa ma seva poyerekeza ndi kuchuluka kwa ogwira ntchito.

Maphunziro Ogawidwa ndi Apache MXNet ndi Horovod
Chithunzi 1. Kuyerekeza kwa maphunziro ogawidwa pogwiritsa ntchito MXNet ndi Horovod ndi seva ya parameter

Patebulo 1 pansipa, tikufanizira mtengo womaliza pachochitika chilichonse mukayesa kuyesa ma 64 GPU. Kugwiritsa ntchito MXNet ndi Horovod kumapereka njira yabwino kwambiri pamtengo wotsika kwambiri.

Maphunziro Ogawidwa ndi Apache MXNet ndi Horovod
Gulu 1. Kuyerekeza mtengo pakati pa Horovod ndi Parameter Seva yokhala ndi seva kwa ogwira ntchito 2 mpaka 1.

Masitepe ochulukitsa

Mumasitepe otsatirawa, tikuwonetsani momwe mungapangirenso zotsatira za maphunziro ogawidwa pogwiritsa ntchito MXNet ndi Horovod. Kuti mudziwe zambiri za maphunziro ogawidwa ndi MXNet werengani positi iyi.

mwatsatane 1

Pangani gulu la zochitika zofanana ndi MXNet version 1.4.0 kapena apamwamba ndi Horovod version 0.16.0 kapena apamwamba kuti mugwiritse ntchito maphunziro ogawidwa. Muyeneranso kukhazikitsa malaibulale ophunzirira GPU. Kwa zochitika zathu, tinasankha Ubuntu 16.04 Linux, ndi GPU Driver 396.44, CUDA 9.2, cuDNN 7.2.1 library, NCCL 2.2.13 communicator ndi OpenMPI 3.1.1. Mukhozanso kugwiritsa ntchito Amazon Deep Learning AMI, kumene malaibulalewa adakhazikitsidwa kale.

mwatsatane 2

Onjezani kuthekera kogwira ntchito ndi Horovod API ku zolemba zanu zophunzitsira za MXNet. Zolemba pansipa zochokera pa MXNet Gluon API zitha kugwiritsidwa ntchito ngati template yosavuta. Mizere yakuda kwambiri ndiyofunika ngati muli ndi zolemba zofananira nazo. Nazi zosintha zingapo zofunika zomwe muyenera kupanga kuti muphunzire ndi Horovod:

  • Khazikitsani nkhaniyo molingana ndi malo aku Horovod (mzere 8) kuti mumvetsetse kuti maphunziro amachitidwa pazithunzi zolondola.
  • Dulani magawo oyambira kuchokera kwa wogwira ntchito m'modzi kupita kwa onse (mzere 18) kuti muwonetsetse kuti ogwira ntchito onse ayamba ndi zoyambira zomwezo.
  • Pangani Horovod DistributedOptimizer (mzere 25) kuti musinthe magawo munjira yogawidwa.

Kuti mupeze zolemba zonse, chonde onani zitsanzo za Horovod-MXNet MNIST и ImageNet.

1  import mxnet as mx
2  import horovod.mxnet as hvd
3
4  # Horovod: initialize Horovod
5  hvd.init()
6
7  # Horovod: pin a GPU to be used to local rank
8  context = mx.gpu(hvd.local_rank())
9
10 # Build model
11 model = ...
12
13 # Initialize parameters
14 model.initialize(initializer, ctx=context)
15 params = model.collect_params()
16
17 # Horovod: broadcast parameters
18 hvd.broadcast_parameters(params, root_rank=0)
19
20 # Create optimizer
21 optimizer_params = ...
22 opt = mx.optimizer.create('sgd', **optimizer_params)
23
24 # Horovod: wrap optimizer with DistributedOptimizer
25 opt = hvd.DistributedOptimizer(opt)
26
27 # Create trainer and loss function
28 trainer = mx.gluon.Trainer(params, opt, kvstore=None)
29 loss_fn = ...
30
31 # Train model
32 for epoch in range(num_epoch):
33    ...

mwatsatane 3

Lowani kwa m'modzi mwa ogwira nawo ntchito kuti muyambe kugawa maphunziro pogwiritsa ntchito malangizo a MPI. Muchitsanzo ichi, maphunziro ogawidwa amayenda pazochitika zinayi ndi ma GPU 4 iliyonse, ndi ma GPU onse 16 mgululi. Stochastic Gradient Descent (SGD) optimizer idzagwiritsidwa ntchito ndi ma hyperparameter awa:

  • mini-batch kukula: 256
  • mlingo: 0.1
  • kukula: 0.9
  • kulemera kwa thupi: 0.0001

Pamene tidakwera kuchokera pa GPU imodzi kupita ku ma GPU 64, tidakulitsa kuchuluka kwa maphunziro molingana ndi kuchuluka kwa ma GPU (kuchokera pa 0,1 kwa 1 GPU mpaka 6,4 kwa 64 GPUs), ndikusunga kuchuluka kwa zithunzi pa GPU pa 256 (kuchokera pagulu la Zithunzi 256 za 1 GPU mpaka 16 za 384 GPUs). Kuwola kolemera ndi magawo amphamvu zidasintha pomwe kuchuluka kwa ma GPU akuchulukirachulukira. Tidagwiritsa ntchito maphunziro osakanikirana bwino ndi mtundu wa data wa float64 pakupita patsogolo ndi float16 kuti ma gradients afulumizitse kuwerengera kwa float32 mothandizidwa ndi NVIDIA Tesla GPUs.

$ mpirun -np 16 
    -H server1:4,server2:4,server3:4,server4:4 
    -bind-to none -map-by slot 
    -mca pml ob1 -mca btl ^openib 
    python mxnet_imagenet_resnet50.py

Pomaliza

M'nkhaniyi, tayang'ana njira yowonjezereka yogawa maphunziro achitsanzo pogwiritsa ntchito Apache MXNet ndi Horovod. Tidawonetsa kuwongolera bwino komanso kutsika mtengo poyerekeza ndi njira ya seva ya parameter pa dataset ya ImageNet pomwe mtundu wa ResNet50-v1 unaphunzitsidwa. Taphatikizanso masitepe omwe mungagwiritse ntchito kusintha script yomwe ilipo kuti muyambe maphunziro amitundu yambiri pogwiritsa ntchito Horovod.

Ngati mutangoyamba kumene ndi MXNet komanso kuphunzira mozama, pitani patsamba lokhazikitsa MXNekuti apange MXNet koyamba. Timalimbikitsanso kwambiri kuwerenga nkhaniyi MXNet mu mphindi 60kuti ndiyambe.

Ngati munagwirapo kale ntchito ndi MXNet ndipo mukufuna kuyesa kugawa kuphunzira ndi Horovod, ndiye yang'anani Tsamba lokhazikitsa Horovod, pangani kuchokera ku MXNet ndikutsatira chitsanzo MNIST kapena ImageNet.

* mtengo umawerengedwa potengera mitengo paola AWS ya EC2 Instances

Dziwani zambiri za maphunzirowa "Industrial ML pa Big Data"

Source: www.habr.com

Kuwonjezera ndemanga