O le faaliliuga o le tusiga na saunia i le afiafi o le amataga o le vasega
O a'oa'oga tu'ufa'atasia i le tele o fa'atonuga fa'akomepiuta maualuga e mafai ona fa'aitiitia ai le taimi o a'oa'oga o feso'ota'iga neural loloto fa'aonaponei i luga o le tele o fa'amaumauga mai vaiaso i itula po'o ni minute fo'i, ma fa'ateleina ai lenei metotia fa'aa'oa'oga i fa'aoga fa'atino o a'oa'oga loloto. E tatau i tagata fa'aoga ona malamalama pe fa'afefea ona fa'asoa ma fa'amaopoopo fa'amaumauga i le tele o fa'ata'ita'iga, lea e i ai se a'afiaga tele i le fa'aleleia lelei. E le gata i lea, e tatau foi i tagata faʻaoga ona iloa pe faʻafefea ona faʻapipiʻi se tusitusiga aʻoaʻoga e alu i luga o se faʻataʻitaʻiga e tasi i le tele o taimi.
I totonu o lenei tusiga o le a tatou talanoa e uiga i se auala vave ma faigofie e tufatufa ai aʻoaʻoga e faʻaaoga ai le faletusi aʻoaʻoga loloto matala Apache MXNet ma le faʻavae aʻoaʻoga tufatufaina a Horovod. O le a matou faʻaalia manino le aoga o faʻatinoga o le faʻatulagaga o Horovod ma faʻaalia pe faʻapefea ona tusia se tusitusiga aʻoaʻoga MXNet ina ia galue i se faʻasalalauga faʻatasi ma Horovod.
O le a le Apache MXNet
Fa'asoa a'oa'oga i le MXNet fa'atasi ai ma le fa'aumau fa'amaumau
O le a le Horovod
MXNet ma Horovod tu'ufa'atasiga
MXNet fa'atasi ma Horovod e ala i le Tufatufaina o Aoaoga API o lo'o fa'amatalaina ile Horovod. API o feso'ota'iga Horovod horovod.broadcast(), horovod.allgather() и horovod.allreduce() fa'atinoina ile fa'aogaina ole asynchronous callbacks ole afi MXNet, ose vaega o lana kalafa galuega. I lenei auala, o faʻamaumauga faʻalagolago i le va o fesoʻotaʻiga ma faʻatusatusaga e faigofie ona taulimaina e le afi MXNet e aloese ai mai le gau o faʻatinoga ona o le faʻaogaina. Tufatufaina optimizer mea fa'amatalaina i Horovod horovod.DistributedOptimizer fa'alautele Auiliiliga i le MXNet ina ia taʻua ai le Horovod APIs mo faʻasalalauga faʻasalalau faʻasalalau. O nei fa'amatalaga fa'atinoga uma e manino i tagata fa'au'uga.
vave amata
E mafai ona vave amata aʻoaʻoina se laʻititi fesoʻotaʻiga neural network i luga o le MNIST dataset faʻaaoga MXNet ma Horovod i lau MacBook.
Muamua, faʻapipiʻi mxnet ma horovod mai PyPI:
pip install mxnet
pip install horovod
Manatua: Afai e te feagai ma se mea sese i le taimi pip fa'apipi'i horovodmasalo e te manaʻomia le faʻaopoopoina o se fesuiaiga MACOSX_DEPLOYMENT_TARGET=10.vvfea vv - o le faʻasologa lenei o lau faʻasologa MacOS, mo se faʻataʻitaʻiga, mo MacOSX Sierra e te manaʻomia e tusi MACOSX_DEPLOYMENT_TARGET=10.12 pip fa'apipi'i horovod
Ona faʻapipiʻi lea o OpenMPI
I le fa'ai'uga, la'u mai le su'ega tusitusiga mxnet_mnist.py
mpirun -np 2 -H localhost:2 -bind-to none -map-by slot python mxnet_mnist.py
O lenei mea o le a faʻatautaia aʻoaʻoga i luga o 'au e lua o lau processor. O le a fa'aalia mea nei:
INFO:root:Epoch[0] Batch [0-50] Speed: 2248.71 samples/sec accuracy=0.583640
INFO:root:Epoch[0] Batch [50-100] Speed: 2273.89 samples/sec accuracy=0.882812
INFO:root:Epoch[0] Batch [50-100] Speed: 2273.39 samples/sec accuracy=0.870000
Fa'aaliga Fa'atinoga
Pe a aʻoaʻoina se faʻataʻitaʻiga ResNet50-v1 i luga o se faʻamaumauga ImageNet i luga ole 64 GPU ma le valu taimi p3.16xtele EC2, o loʻo i ai taʻitasi 8 NVIDIA Tesla V100 GPU i luga o le ao AWS, na matou ausia se toleniga o le 45000 ata / sec (ie, numera o faʻataʻitaʻiga aʻoaʻoina i le sekone). Na maeʻa aʻoaʻoga i le 44 minute i le maeʻa ai o le 90 taimi ma le saʻo sili o le 75.7%.
Na matou faʻatusatusaina lenei mea i le MXNet faʻasoa aʻoaʻoga auala o le faʻaogaina o sapalai faʻamau i luga ole 8, 16, 32 ma 64 GPU faʻatasi ai ma se faʻaumau faʻamaufaʻailoga e tasi ma se faʻaumau i tagata faigaluega o le 1 i le 1 ma le 2 i le 1, i le faasologa. E mafai ona e vaʻai i le iʻuga i le Ata 1 i lalo. I luga o le y-axis i le itu agavale, o faʻamau e faʻatusalia le numera o ata e toleni i le sekone, o laina e atagia ai le faʻaleleia lelei (o lona uiga, o le fua faatatau o le mea moni i le gaosiga lelei) i luga o le y-axis i le itu taumatau. E pei ona mafai ona e vaʻai, o le filifiliga o le numera o 'auʻaunaga e aʻafia ai le faʻaleleia lelei. Afai e na'o le tasi le 'au'aunaga fa'amaufa'ailoga, e pa'ū le fa'aleleia o le malosi i le 38% ile 64 GPU. Ina ia ausia le faʻaleleia tutusa e pei o Horovod, e tatau ona e faʻaluaina le numera o sapalai e faʻatatau i le numera o tagata faigaluega.
Ata 1. Fa'atusatusaga o a'oa'oga fa'asoa e fa'aaoga ai le MXNet ma le Horovod ma fa'atasi ai ma le fa'aumau
I le Laulau 1 o loʻo i lalo, matou te faʻatusatusaina le tau mulimuli i le faʻataʻitaʻiga pe a faʻataʻitaʻiina suʻega i 64 GPU. O le faʻaaogaina o le MXNet ma Horovod e maua ai le gaosiga sili ona lelei ile tau maualalo.
Laulau 1. Fa'atusatusaga tau i le va o le Horovod ma le Parameter Server fa'atasi ai ma le fua faatatau o le server i le tagata faigaluega o le 2 i le 1.
Laasaga e toe gaosia
I isi laasaga, matou te faʻaali atu ia te oe pe faʻapefea ona toe gaosia le taunuʻuga o aʻoaʻoga tufatufaina e faʻaaoga ai le MXNet ma Horovod. Mo nisi fa'amatalaga e uiga i a'oa'oga fa'asoa fa'atasi ma MXNet faitau
laa 1
Fausia se fuifui o faʻataʻitaʻiga tutusa ma le MXNet version 1.4.0 poʻo le maualuga ma le Horovod version 0.16.0 poʻo le maualuga e faʻaaoga ai aʻoaʻoga tufatufaina. E te manaʻomia foʻi le faʻapipiʻiina o faletusi mo aʻoaʻoga GPU. Mo a matou faʻataʻitaʻiga, na matou filifilia le Ubuntu 16.04 Linux, faʻatasi ai ma le GPU Avetaʻavale 396.44, CUDA 9.2, cuDNN 7.2.1 faletusi, NCCL 2.2.13 communicator ma OpenMPI 3.1.1. E mafai foi ona e faʻaaogaina
laa 2
Faʻaopoopo le tomai e galue ai ma le Horovod API i lau tusitusiga aʻoaʻoga MXNet. O le faʻamaumauga o loʻo i lalo e faʻavae i luga o le MXNet Gluon API e mafai ona faʻaaogaina e fai ma faʻataʻitaʻiga faigofie. E mana'omia laina fa'amaualuga pe afai ua uma ona iai sau tusitusiga fa'aa'oa'oga talafeagai. O nisi nei o suiga taua e tatau ona e faia e aʻoaʻo ai ma Horovod:
- Seti le talaaga e tusa ai ma le tulaga o le Horovod i le lotoifale (laina 8) e malamalama ai o loʻo faia aʻoaʻoga i luga o le ata saʻo.
- Tu'u atu fa'amaufa'ailoga muamua mai le tagata faigaluega e to'atasi i tagata uma (laina 18) ina ia fa'amautinoa e amata uma tagata faigaluega i fa'amaufa'ailoga muamua e tasi.
- Fausia se Horovod DistributedOptimizer (laina 25) e faʻafou ai faʻamaufaʻailoga i se auala tufatufa.
Ina ia maua le faʻamatalaga atoa, faʻamolemole vaʻai i faʻataʻitaʻiga Horovod-MXNet
1 import mxnet as mx
2 import horovod.mxnet as hvd
3
4 # Horovod: initialize Horovod
5 hvd.init()
6
7 # Horovod: pin a GPU to be used to local rank
8 context = mx.gpu(hvd.local_rank())
9
10 # Build model
11 model = ...
12
13 # Initialize parameters
14 model.initialize(initializer, ctx=context)
15 params = model.collect_params()
16
17 # Horovod: broadcast parameters
18 hvd.broadcast_parameters(params, root_rank=0)
19
20 # Create optimizer
21 optimizer_params = ...
22 opt = mx.optimizer.create('sgd', **optimizer_params)
23
24 # Horovod: wrap optimizer with DistributedOptimizer
25 opt = hvd.DistributedOptimizer(opt)
26
27 # Create trainer and loss function
28 trainer = mx.gluon.Trainer(params, opt, kvstore=None)
29 loss_fn = ...
30
31 # Train model
32 for epoch in range(num_epoch):
33 ...
laa 3
Ulufale i se tasi o tagata faigaluega e amata tufatufaina aʻoaʻoga e faʻaaoga ai le faʻatonuga a le MPI. I lenei faʻataʻitaʻiga, o aʻoaʻoga tufatufaina e faʻatautaia i luga o faʻataʻitaʻiga e 4 GPU taʻitasi, ma le aofaʻi o 16 GPU i le fuifui. O le Stochastic Gradient Descent (SGD) optimizer o le a faʻaaogaina faʻatasi ma hyperparameters nei:
- lapopo'a la'ititi: 256
- fua faatatau aʻoaʻoga: 0.1
- malosi: 0.9
- pa'u mamafa: 0.0001
A o matou siʻitia mai le tasi GPU i le 64 GPU, matou te faʻavasegaina le fua faatatau o aʻoaʻoga e tusa ai ma le numera o GPU (mai le 0,1 mo le 1 GPU i le 6,4 mo le 64 GPU), aʻo tausia le numera o ata i le GPU i le 256 (mai se vaega o 256 ata mo le 1 GPU i le 16 mo le 384 GPU). O le pa'u o le mamafa ma le fa'asologa o le malosi na suia a'o fa'ateleina le numera o GPU. Sa matou fa'aogaina a'oa'oga fa'afefiloi fa'atasi ma le ituaiga fa'amaumauga o le float64 mo le pasi i luma ma le float16 mo gradients e fa'avavevave ai fa'atusatusaga o le float32 e lagolagoina e NVIDIA Tesla GPU.
$ mpirun -np 16
-H server1:4,server2:4,server3:4,server4:4
-bind-to none -map-by slot
-mca pml ob1 -mca btl ^openib
python mxnet_imagenet_resnet50.py
iʻuga
I totonu o lenei tusiga, na matou vaʻavaʻai i se auala faʻapitoa e tufatufaina atu aʻoaʻoga faʻataʻitaʻiga e faʻaaoga ai Apache MXNet ma Horovod. Na matou faʻaalia le faʻaogaina o le faʻaleleia ma le taugofie pe a faʻatusatusa i le faʻaogaina o le server i luga o le ImageNet dataset lea na aʻoaʻoina ai le ResNet50-v1 faʻataʻitaʻiga. Ua matou fa'aofiina fo'i la'asaga e mafai ona e fa'aogaina e sui ai se fa'amatalaga o lo'o iai e fa'atino ai a'oa'oga fa'atele e fa'aoga ai le Horovod.
Afai o loʻo e amataina i le MXNet ma aʻoaʻoga loloto, alu i le itulau faʻapipiʻi
Afai ua uma ona e galue ma MXNet ma e te manaʻo e faʻataʻitaʻi aʻoaʻoga tufatufaina ma Horovod, ona e tilotilo lea i
* tau e fa'atatau ile
Aoao atili e uiga i le kosi
puna: www.habr.com