Phetolelo ea sengoloa e lokisitsoe bosiung ba pele thupelo e qala
Koetliso e ajoang maemong a mangata a ts'ebetso e phahameng ea komporo e ka fokotsa nako ea koetliso ea marang-rang a morao-rao a tebileng a methapo ea methapo ho li-data tse kholo ho tloha libekeng ho ea ho lihora kapa esita le metsotso, ho etsa hore mokhoa ona oa koetliso o ate ts'ebelisong e sebetsang ea thuto e tebileng. Basebelisi ba tlameha ho utloisisa mokhoa oa ho arolelana le ho hokahanya data maemong a mangata, e leng se tlang ho ama ts'ebetso e mpe haholo. Ho feta moo, basebelisi ba boetse ba tlameha ho tseba ho sebelisa sengoloa sa koetliso se sebetsang ketsahalong e le 'ngoe maemong a mangata.
Sehloohong sena, re tla tšohla mokhoa o potlakileng le o bonolo oa koetliso e ajoang re sebelisa laeborari ea thuto e tebileng ea Apache MXNet le moralo oa koetliso o phatlalalitsoeng oa Horovod. Re tla bonts'a melemo ea ts'ebetso ea moralo oa Horovod le ho bonts'a mokhoa oa ho ngola mongolo oa koetliso oa MXNet o tsamaisanang le Horovod.
Apache MXNet ke eng?
MXNet ke moralo o bulehileng oa ho ithuta o tebileng o sebelisoang ho theha, ho koetlisa le ho tsamaisa marang-rang a tebileng a methapo ea kutlo. MXNet e hlakisa mathata a amanang le ho kenya ts'ebetsong marang-rang a neural, e fana ka ts'ebetso e phahameng le scalability, 'me e fana ka li-API tsa lipuo tse tsebahalang tsa lenaneo joalo ka. , , , , , , le ba bang.
Koetliso e ajoang ho MXNet e nang le Parameter Server
e sebelisa mokhoa oa seva oa parameter. E sebelisa sehlopha sa li-server ho bokella li-gradients ho tsoa ho mosebeletsi e mong le e mong, ho etsa aggregation, le ho romela li-gradients tse nchafalitsoeng ho basebetsi bakeng sa ntlafatso e latelang. Ho fumana karo-karolelano e nepahetseng ea li-server ho basebetsi ke senotlolo sa ho lekanya hantle. Haeba ho sebelisoa seva e le 'ngoe feela ea paramethara, e ka fetoha botlolo ea computational. Ka lehlakoreng le leng, haeba ho sebelisoa li-server tse ngata haholo, likamano tse ngata ho tse ngata li ka tlatsa likhokahano tsohle tsa marang-rang.
Horovod ke eng?
– moralo o bulehileng o phatlalalitsoeng oa ho ithuta o tebileng o entsoeng Uber. E sebelisa mahlale a sebetsang hantle bakeng sa ho buisana le li-GPU tse ngata le li-node, joalo ka NVIDIA Collective Communications Library (NCCL) le Message Passing Interface (MPI), ho aba le ho kopanya liparamente tsa mohlala ho pholletsa le li-vortex. E ntlafatsa ts'ebeliso ea marang-rang a marang-rang le sekala hantle ha e sebelisa mekhoa e tebileng ea neural network. Hajoale e ts'ehetsa mekhoa e mengata e tsebahalang ea ho ithuta ea mochini, e leng , Tensorflow, Keras, le PyTorch.
Khokahano ea MXNet le Horovod
MXNet e hokahana le Horovod ka li-API tsa ho ithuta tse phatlalalitsoeng ho Horovod. Li-API tsa puisano tsa Horovod horovod.broadcast(), horovod.allgather() и horovod.allreduce() li kengoa ts'ebetsong ho sebelisoa li-callbacks tsa asynchronous tsa enjene ea MXNet e le karolo ea kerafo ea eona ea mosebetsi. Ka tsela ena, litšepiso tsa data lipakeng tsa puisano le khomphutha li sebetsoa habonolo ke enjene ea MXNet ho qoba tahlehelo ea ts'ebetso ka lebaka la khokahano. Ntho e ajoang ea optimizer, e hlalosoang ho Horovod horovod.DistributedOptimizer ea atoloha Optimizer ho MXNet hoo e bitsang li-API tsa Horovod tse tsamaellanang bakeng sa lintlafatso tsa paramethara. Lintlha tsena tsohle tsa ts'ebetsong li hlakile ho basebelisi ba ho qetela.
Ho qala kapele
U ka qala ka potlako ho koetlisa marang-rang a manyane a convolutional neural ho dataset ea MNIST u sebelisa MXNet le Horovod ho MacBook ea hau.
Ho qala, kenya mxnet le horovod ho tsoa ho PyPI:
pip install mxnet
pip install horovodTlhokomeliso: Haeba u kopana le phoso nakong ea pip kenya horovod, ho ka 'na ha hlokahala hore u kenye phetoho MACOSX_DEPLOYMENT_TARGET=10.vvkae vv - ena ke mofuta oa mofuta oa hau oa MacOS, mohlala, bakeng sa MacOSX Sierra o tla hloka ho ngola MACOSX_DEPLOYMENT_TARGET=10.12 pip kenya horovod
Ebe u kenya OpenMPI .
Qetellong, kenya mongolo oa tlhahlobo mxnet_mnist.py 'me u tsamaise litaelo tse latelang ho terminal ea MacBook bukeng ea ho sebetsa:
mpirun -np 2 -H localhost:2 -bind-to none -map-by slot python mxnet_mnist.pySena se tla tsamaisa koetliso ho li-cores tse peli tsa processor ea hau. Sephetho e tla ba:
INFO:root:Epoch[0] Batch [0-50] Speed: 2248.71 samples/sec accuracy=0.583640
INFO:root:Epoch[0] Batch [50-100] Speed: 2273.89 samples/sec accuracy=0.882812
INFO:root:Epoch[0] Batch [50-100] Speed: 2273.39 samples/sec accuracy=0.870000Demo ea Ts'ebetso
Ha u koetlisa mohlala oa ResNet50-v1 ho dataset ea ImageNet ho 64 GPUs ka makhetlo a robeli. p3.16x kholo Re sebelisa li-server tsa EC2, e 'ngoe le e 'ngoe e na le li-GPU tse robeli tsa NVIDIA Tesla V100 ho AWS Cloud, re fihletse koetliso ea litšoantšo tse 45000 ka motsotsoana (ke hore, palo ea lisampole tse koetlisitsoeng motsotsoana). Koetliso e phethiloe ka metsotso e 44 ka mor'a linako tse 90, ka ho nepahala ha maemo a 75.7%.
Re bapisitse sena le lithupelo tse ajoang tsa MXNet re sebelisa li-server tsa paramethara ho 8, 16, 32, le 64 GPUs tse nang le seva e le 'ngoe ea paramethara le karolelano ea seva ho basebetsi ea 1:1 le 2:1, ka ho latellana. Liphetho li bontšoa ho Setšoantšo sa 1 ka tlase. Likholomo tse ka letsohong le letšehali la y-axis li emela palo ea litšoantšo tsa koetliso ka motsotsoana, 'me mela e lehlakoreng le letona la y-axis e emela bokhoni ba ho sekala (ke hore, karolelano ea sephetho se nepahetseng). Joalokaha u ka bona, khetho ea palo ea li-server e ama bokhoni ba ho lekanya. Ka seva e le 'ngoe ea paramethara, ts'ebetso ea sekhahla e theohela ho 38% ho 64 GPUs. Ho fihlela katleho e ts'oanang ea ho lekanya joalo ka Horovod, palo ea li-server e tlameha ho imena habeli ho latela palo ea basebetsi.

Setšoantšo sa 1. Papiso ea lithupelo tse ajoang ho sebelisoa MXNet le Horovod le seva sa parameter.
Letlapa la 1 le ka tlase le bapisa litšenyehelo tse felletseng ketsahalong ka 'ngoe ha ho etsoa liteko ho 64 GPUs. Ho sebelisa MXNet le Horovod ho fana ka tlhahiso e ntle ka ho fetisisa ka litšenyehelo tse tlase.

Letlapa la 1. Papiso ea litšenyehelo pakeng tsa Horovod le seva sa parameter se nang le 2: 1 seva ho basebetsi.
Mehato ea ho ikatisa
Mehatong e latelang, re tla u bontša mokhoa oa ho hlahisa sephetho sa koetliso se ajoang u sebelisa MXNet le Horovod. Ho ithuta haholoanyane ka koetliso e ajoang le MXNet, bala .
hata 1
Создайте кластер однородных экземпляров с MXNet версии 1.4.0 или выше и Horovod версии 0.16.0 или выше, чтобы использовать распределенное обучение. Вам также нужно будет установить библиотеки для обучения на GPU. Для наших экземпляров мы выбрали Ubuntu 16.04 Linux, с GPU Driver 396.44, CUDA 9.2, библиотеку cuDNN 7.2.1, коммуникатор NCCL 2.2.13 и OpenMPI 3.1.1. Также вы можете использовать , moo lilaebrari tsena li seng li kentsoe esale pele.
hata 2
Ntlafatsa mongolo oa hau oa koetliso oa MXNet ka Horovod API. Sengoloa se ka tlase, se ipapisitseng le MXNet Gluon API, se ka sebelisoa e le template e bonolo. Ho hlokahala mela e ngotsoeng ka mongolo o motenya haeba u se u ntse u e-na le sengoloa sa koetliso se tsamaisanang le sona. Mona ke liphetoho tse 'maloa tsa bohlokoa tseo u hlokang ho li etsa ho ikoetlisa le Horovod:
- Beha maemo ho boemo ba Horovod ea lehae (mola oa 8) ho netefatsa hore koetliso e etsoa motheong o nepahetseng oa GPU.
- Fetisa liparamente tsa pele ho tloha ho mosebeletsi a le mong ho ea ho bohle (mola oa 18) ho netefatsa hore basebetsi bohle ba qala ka mekhoa e tšoanang ea pele.
- Etsa Horovod DistributedOptimizer (mohala oa 25) ho nchafatsa liparamente ka mokhoa o ajoang.
Bakeng sa mongolo o felletseng, ka kopo sheba mehlala ea Horovod-MXNet. и .
1 import mxnet as mx
2 import horovod.mxnet as hvd
3
4 # Horovod: initialize Horovod
5 hvd.init()
6
7 # Horovod: pin a GPU to be used to local rank
8 context = mx.gpu(hvd.local_rank())
9
10 # Build model
11 model = ...
12
13 # Initialize parameters
14 model.initialize(initializer, ctx=context)
15 params = model.collect_params()
16
17 # Horovod: broadcast parameters
18 hvd.broadcast_parameters(params, root_rank=0)
19
20 # Create optimizer
21 optimizer_params = ...
22 opt = mx.optimizer.create('sgd', **optimizer_params)
23
24 # Horovod: wrap optimizer with DistributedOptimizer
25 opt = hvd.DistributedOptimizer(opt)
26
27 # Create trainer and loss function
28 trainer = mx.gluon.Trainer(params, opt, kvstore=None)
29 loss_fn = ...
30
31 # Train model
32 for epoch in range(num_epoch):
33 ...hata 3
Kena ho e mong oa basebetsi ho tsamaisa lithupelo tse ajoang u sebelisa taelo ea MPI. Mohlaleng ona, koetliso e ajoang e tsamaisoa ka makhetlo a mane ka li-GPU tse 'ne ka ngoe, bakeng sa kakaretso ea li-GPU tse 16 sehlopheng. Sesebelisoa sa "stochastic gradient descent" (SGD) se tla sebelisoa le li-hyperparameter tse latelang:
- boholo ba palo e nyane: 256
- sekhahla sa ho ithuta: 0.1
- lebelo: 0.9
- ho senyeha ha boima: 0.0001
Ha re ntse re theoha ho tloha ho GPU e le 'ngoe ho ea ho li-GPU tse 64, re ile ra phahamisa lebelo la koetliso ka palo ea li-GPU (ho tloha ho 0,1 bakeng sa 1 GPU ho isa ho 6,4 bakeng sa 64 GPUs) ha re ntse re boloka palo ea litšoantšo ka GPU ho 256 (ho tloha sehlopheng sa litšoantšo tse 256 bakeng sa 1 GPU ho isa ho 16,384 bakeng sa 64 GPUs). Ho senyeha ha boima ba 'mele le li-parameter tsa lebelo li ile tsa fetoloa ha palo ea li-GPU e ntse e eketseha. Re sebelisitse koetliso e nepahetseng e tsoakiloeng le mefuta ea data ea float16 bakeng sa phallo ea pele le float32 bakeng sa li-gradient ho potlakisa likhomphutha tsa float16 tse tšehetsoeng ke NVIDIA Tesla GPUs.
$ mpirun -np 16
-H server1:4,server2:4,server3:4,server4:4
-bind-to none -map-by slot
-mca pml ob1 -mca btl ^openib
python mxnet_imagenet_resnet50.pyfihlela qeto e
Sengoliloeng sena, re hlahlobile mokhoa o kotsi oa ho fana ka koetliso ea mohlala ho sebelisa Apache MXNet le Horovod. Re ile ra bonts'a sekhahla le katleho ea litšenyehelo ha li bapisoa le mokhoa oa seva oa parameter ho dataset ea ImageNet, e neng e sebelisetsoa ho koetlisa mohlala oa ResNet50-v1. Hape re hlalositse mehato eo u ka e nkang ho fetola sengoloa se seng se ntse se le teng ho tsamaisa koetliso maemong a mangata u sebelisa Horovod.
Haeba u sa tsoa qala ka MXNet le ho ithuta ka botebo, leba leqepheng la ho kenya , ho qala ho aha MXNet. Re boetse re khothaletsa haholo ho bala sengoloa ho qala.
Haeba u se u ntse u sebetsa le MXNet 'me u batla ho leka ho ithuta ka Horovod, hlahloba , e hlophise le MXNet 'me u latele mohlala kapa .
*litšenyehelo li baloa ho latela AWS bakeng sa liketsahalo tsa EC2
Ithute haholoanyane ka thupelo
Source: www.habr.com
