Muab Kev Kawm nrog Apache MXNet thiab Horovod

Kev txhais lus ntawm tsab xov xwm tau npaj rau hnub ua ntej ntawm kev pib kawm "Industrial ML ntawm Cov Ntaub Ntawv Loj"

Kev cob qhia faib rau ntau qhov kev ua tau zoo hauv kev suav tuaj yeem txo lub sijhawm kev cob qhia ntawm cov niaj hnub sib sib zog nqus neural tes hauj lwm ntawm cov ntaub ntawv loj ntawm lub lis piam mus rau teev los yog feeb, ua rau cov txheej txheem kev cob qhia no nthuav dav hauv kev siv tswv yim ntawm kev kawm tob. Cov neeg siv yuav tsum to taub yuav ua li cas qhia thiab synchronize cov ntaub ntawv nyob rau hauv ntau zaus, uas nyob rau hauv lem muaj ib tug loj cuam tshuam rau scaling efficiency. Tsis tas li ntawd, cov neeg siv yuav tsum paub yuav ua li cas siv cov ntawv qhia kev kawm uas khiav ntawm ib qho piv txwv mus rau ntau zaus.

Hauv tsab xov xwm no peb yuav tham txog txoj hauv kev yooj yim thiab yooj yim los faib kev kawm siv lub tsev qiv ntawv qhib kev kawm sib sib zog nqus Apache MXNet thiab Horovod faib kev kawm. Peb yuav qhia meej txog cov txiaj ntsig kev ua tau zoo ntawm Horovod lub moj khaum thiab ua qauv qhia yuav ua li cas sau ntawv MXNet kev cob qhia kom nws ua haujlwm hauv kev faib nrog Horovod.

Apache MXNet yog dab tsi

Apache MX Net yog qhov qhib-qhov kev kawm sib sib zog nqus uas yog siv los tsim, cob qhia, thiab siv cov sib sib zog nqus neural networks. MXNet paub daws teeb meem cuam tshuam nrog kev siv cov neural networks, ua tau zoo heev thiab ua kom muaj peev xwm, thiab muaj APIs rau cov lus programming nrov xws li Nab hab sej, C ++, Clojure, Java, Julia, R, Scala thiab lwm tus.

Kev faib tawm kev cob qhia hauv MXNet nrog rau tus neeg rau zaub mov parameter

Tus qauv faib kev kawm module hauv MXNet siv parameter server mus kom ze. Nws siv cov txheej txheem ntawm cov servers los sau cov gradients los ntawm txhua tus neeg ua haujlwm, ua kev sib sau ua ke, thiab xa cov gradients tshiab rov qab rau cov neeg ua haujlwm rau qhov kev ua kom zoo dua ntxiv. Kev txiav txim siab qhov sib piv ntawm cov servers rau cov neeg ua haujlwm yog tus yuam sij rau kev ntsuas kom zoo. Yog tias tsuas muaj ib tus neeg rau zaub mov parameter, nws yuav dhau los ua qhov tsis muaj zog hauv kev suav. Hloov pauv, yog tias siv ntau cov servers, ntau-rau-ntau qhov kev sib txuas lus tuaj yeem cuam tshuam tag nrho cov kev sib txuas hauv network.

Horovod yog dab tsi

Horovod yog ib qho kev qhib kev kawm sib sib zog nqus uas tsim los ntawm Uber. Nws leverages cross-GPU thiab cross-node technologies xws li NVIDIA Collective Communications Library (NCCL) thiab Message Passing Interface (MPI) los faib thiab sau cov qauv tsis sib xws thoob plaws vorecs. Nws optimizes kev siv network bandwidth thiab scales zoo thaum ua hauj lwm nrog sib sib zog nqus neural network qauv. Nws tam sim no txhawb nqa ntau lub tshuab kev kawm nrov, uas yog MX Net, Tensorflow, Keras, thiab PyTorch.

MXNet thiab Horovod kev koom ua ke

MXNet koom nrog Horovod los ntawm Distributed Learning APIs tau teev tseg hauv Horovod. Horovod kev sib txuas lus APIs horovod.broadcast(), horovod.allgather() ΠΈ horovod.allreduce() tau siv los ntawm asynchronous callbacks ntawm MXNet lub cav, ua ib feem ntawm nws daim duab ua haujlwm. Nyob rau hauv txoj kev no, cov ntaub ntawv nyob ntawm kev sib txuas lus thiab kev suav tau yooj yim los ntawm MXNet lub cav kom tsis txhob muaj kev poob haujlwm vim yog synchronization. Distributed optimizer khoom txhais hauv Horovod horovod.DistributedOptimizer nthuav Qhov zoo hauv MXNet kom nws hu rau Horovod APIs sib raug rau cov kev hloov pauv hloov tshiab. Tag nrho cov ntsiab lus ntawm kev siv no yog pob tshab rau cov neeg siv kawg.

Pib nrawm

Koj tuaj yeem pib sai sai pib kev cob qhia me me convolutional neural network ntawm MNIST dataset siv MXNet thiab Horovod ntawm koj MacBook.
Ua ntej, nruab mxnet thiab horovod los ntawm PyPI:

pip install mxnet
pip install horovod

Nco tseg: Yog tias koj ntsib qhov yuam kev thaum lub sijhawm pip nruab horovodtej zaum koj yuav tau ntxiv ib qho txawv MACOSX_DEPLOYMENT_TARGET=10.vvqhov twg vv - qhov no yog version ntawm koj MacOS version, piv txwv li, rau MacOSX Sierra koj yuav tsum tau sau MACOSX_DEPLOYMENT_TARGET=10.12 pip nruab horovod

Tom qab ntawd nruab OpenMPI ntawm no.

Thaum kawg, rub tawm cov ntawv xeem mxnet_mnist.py ntawm no thiab khiav cov lus txib hauv qab no hauv MacBook davhlau ya nyob twg hauv daim ntawv teev npe ua haujlwm:

mpirun -np 2 -H localhost:2 -bind-to none -map-by slot python mxnet_mnist.py

Qhov no yuav khiav kev cob qhia ntawm ob cores ntawm koj lub processor. Cov zis yuav yog cov hauv qab no:

INFO:root:Epoch[0] Batch [0-50] Speed: 2248.71 samples/sec      accuracy=0.583640
INFO:root:Epoch[0] Batch [50-100] Speed: 2273.89 samples/sec      accuracy=0.882812
INFO:root:Epoch[0] Batch [50-100] Speed: 2273.39 samples/sec      accuracy=0.870000

Kev Ua Haujlwm Demo

Thaum cob qhia tus qauv ResNet50-v1 ntawm ImageNet dataset ntawm 64 GPUs nrog yim zaus p 3.16x EC2, txhua tus muaj 8 NVIDIA Tesla V100 GPUs ntawm AWS huab, peb tau txais kev cob qhia dhau los ntawm 45000 dluab / sec (piv txwv li, tus naj npawb ntawm cov qauv kawm ib ob). Kev cob qhia ua tiav hauv 44 feeb tom qab 90 lub sijhawm nrog qhov tseeb zoo tshaj plaws ntawm 75.7%.

Peb muab qhov no piv rau MXNet txoj kev qhia kev faib tawm ntawm kev siv parameter servers ntawm 8, 16, 32 thiab 64 GPUs nrog ib tus neeg rau zaub mov tsis zoo thiab ib tus neeg rau zaub mov rau cov neeg ua haujlwm piv ntawm 1 txog 1 thiab 2 rau 1, raws li. Koj tuaj yeem pom qhov tshwm sim hauv daim duab 1 hauv qab no. Ntawm y-axis ntawm sab laug, cov kab sawv cev rau tus naj npawb ntawm cov duab rau kev cob qhia ib ob, cov kab qhia txog qhov ua tau zoo (uas yog, qhov sib piv ntawm qhov tseeb mus rau qhov zoo tshaj plaws) ntawm y-axis ntawm sab xis. Raws li koj tuaj yeem pom, qhov kev xaiv ntawm tus naj npawb ntawm cov servers cuam tshuam rau kev ua haujlwm scaling. Yog tias tsuas muaj ib tus neeg rau zaub mov parameter xwb, qhov ntsuas qhov ua tau zoo poob rau 38% ntawm 64 GPUs. Txhawm rau ua kom tiav qhov kev ua tau zoo ib yam li nrog Horovod, koj yuav tsum tau muab ob npaug ntawm cov servers txheeb ze rau cov neeg ua haujlwm.

Muab Kev Kawm nrog Apache MXNet thiab Horovod
Daim duab 1. Kev sib piv ntawm kev faib kev kawm siv MXNet nrog Horovod thiab nrog cov neeg rau zaub mov parameter

Hauv Table 1 hauv qab no, peb piv tus nqi kawg ntawm ib qho piv txwv thaum khiav kev sim ntawm 64 GPUs. Siv MXNet nrog Horovod muab qhov zoo tshaj plaws los ntawm tus nqi qis tshaj.

Muab Kev Kawm nrog Apache MXNet thiab Horovod
Table 1. Tus nqi sib piv ntawm Horovod thiab Parameter Server nrog lub server rau tus neeg ua haujlwm piv ntawm 2 txog 1.

Cov kauj ruam kom rov tsim dua

Hauv cov kauj ruam tom ntej, peb yuav qhia koj yuav ua li cas rov tsim cov txiaj ntsig ntawm kev cob qhia faib siv MXNet thiab Horovod. Yog xav paub ntxiv txog kev faib kev kawm nrog MXNet nyeem tsab ntawv no.

kauj ruam 1

Tsim ib pawg ntawm homogeneous piv txwv nrog MXNet version 1.4.0 lossis siab dua thiab Horovod version 0.16.0 lossis siab dua los siv kev kawm sib faib. Koj tseem yuav tau nruab cov tsev qiv ntawv rau kev cob qhia GPU. Rau peb qhov xwm txheej, peb xaiv Ubuntu 16.04 Linux, nrog GPU Driver 396.44, CUDA 9.2, cuDNN 7.2.1 tsev qiv ntawv, NCCL 2.2.13 kev sib txuas lus thiab OpenMPI 3.1.1. Kuj koj siv tau Amazon Deep Learning AMI, qhov twg cov tsev qiv ntawv no twb pre-installed.

kauj ruam 2

Ntxiv qhov muaj peev xwm ua haujlwm nrog Horovod API rau koj cov ntawv qhia MXNet. Cov ntawv hauv qab no raws li MXNet Gluon API tuaj yeem siv los ua tus qauv yooj yim. Cov kab hauv bold yog xav tau yog tias koj twb muaj ib tsab ntawv qhia kev sib raug zoo. Nov yog qee qhov kev hloov pauv tseem ceeb uas koj yuav tsum tau ua kom kawm nrog Horovod:

  • Teem lub ntsiab lus raws li hauv zos Horovod qeb (kab 8) kom nkag siab tias kev cob qhia ua tiav ntawm cov duab kos duab raug.
  • Dhau qhov pib tsis dhau los ntawm ib tus neeg ua haujlwm mus rau txhua tus (kab 18) kom ntseeg tau tias txhua tus neeg ua haujlwm pib nrog tib qhov kev ntsuas thawj zaug.
  • Tsim ib tug Horovod DistributedOptimizer (kab 25) txhawm rau hloov kho qhov tsis sib xws hauv kev faib tawm.

Txhawm rau kom tau txais cov ntawv sau tag nrho, thov xa mus rau Horovod-MXNet piv txwv MNIST ΠΈ ImageNet.

1  import mxnet as mx
2  import horovod.mxnet as hvd
3
4  # Horovod: initialize Horovod
5  hvd.init()
6
7  # Horovod: pin a GPU to be used to local rank
8  context = mx.gpu(hvd.local_rank())
9
10 # Build model
11 model = ...
12
13 # Initialize parameters
14 model.initialize(initializer, ctx=context)
15 params = model.collect_params()
16
17 # Horovod: broadcast parameters
18 hvd.broadcast_parameters(params, root_rank=0)
19
20 # Create optimizer
21 optimizer_params = ...
22 opt = mx.optimizer.create('sgd', **optimizer_params)
23
24 # Horovod: wrap optimizer with DistributedOptimizer
25 opt = hvd.DistributedOptimizer(opt)
26
27 # Create trainer and loss function
28 trainer = mx.gluon.Trainer(params, opt, kvstore=None)
29 loss_fn = ...
30
31 # Train model
32 for epoch in range(num_epoch):
33    ...

kauj ruam 3

Nkag mus rau ib tus neeg ua haujlwm kom pib faib kev cob qhia siv MPI cov lus qhia. Hauv qhov piv txwv no, kev cob qhia faib ua plaub ntu nrog 4 GPUs txhua, thiab tag nrho ntawm 16 GPUs hauv pawg. Stochastic Gradient Descent (SGD) optimizer yuav siv nrog cov nram qab no hyperparameters:

  • Mini-batch loj: 256
  • kev kawm tus nqi: 0.1
  • zog: 0.9
  • Qhov hnyav: 0.0001

Raws li peb ntsuas los ntawm ib GPU mus rau 64 GPUs, peb linearly scaled qhov kev cob qhia tus nqi raws li tus naj npawb ntawm GPUs (los ntawm 0,1 rau 1 GPU rau 6,4 rau 64 GPUs), thaum khaws tus naj npawb ntawm cov duab ib GPU ntawm 256 (los ntawm ib pawg ntawm 256 cov duab rau 1 GPU rau 16 rau 384 GPUs). Qhov hnyav poob thiab lub zog tsis hloov pauv raws li tus naj npawb ntawm GPUs tau nce. Peb tau siv cov kev cob qhia sib xyaw ua ke nrog cov ntaub ntawv float64 rau kev hla dhau mus thiab float16 rau gradients kom ceev cov float32 xam txhawb los ntawm NVIDIA Tesla GPUs.

$ mpirun -np 16 
    -H server1:4,server2:4,server3:4,server4:4 
    -bind-to none -map-by slot 
    -mca pml ob1 -mca btl ^openib 
    python mxnet_imagenet_resnet50.py

xaus

Hauv tsab xov xwm no, peb tau saib ntawm txoj hauv kev uas muaj peev xwm nthuav dav rau cov qauv kev cob qhia uas siv Apache MXNet thiab Horovod. Peb tau ua kom pom qhov ntsuas qhov ua tau zoo thiab tus nqi-ua tau zoo piv rau qhov ntsuas tus neeg rau zaub mov mus kom ze ntawm ImageNet dataset uas tus qauv ResNet50-v1 tau kawm. Peb kuj tau suav nrog cov kauj ruam uas koj tuaj yeem siv los hloov kho cov ntawv uas twb muaj lawm los khiav ntau qhov kev cob qhia siv Horovod.

Yog tias koj nyuam qhuav pib nrog MXNet thiab kev kawm tob, mus rau nplooj ntawv teeb tsa MXNethawj zaug tsim MXNet. Peb kuj xav kom nyeem tsab xov xwm MXNet hauv 60 feebpib.

Yog tias koj twb tau ua haujlwm nrog MXNet thiab xav sim faib kev kawm nrog Horovod, tom qab ntawd saib Nplooj ntawv horovod installation, tsim nws los ntawm MXNet thiab ua raws li qhov piv txwv MNIST los yog ImageNet.

* tus nqi yog xam raws li tus nqi ib teev AWS for EC2 Instances

Kawm ntxiv txog chav kawm "Industrial ML ntawm Cov Ntaub Ntawv Loj"

Tau qhov twg los: www.hab.com

Ntxiv ib saib