Koyon Rarraba tare da Apache MXNet da Horovod

An shirya fassarar labarin a jajibirin fara karatun "Masana'antu ML akan Babban Data"

Horon da aka rarraba akan lokuttan ƙididdiga masu girma da yawa na iya rage lokacin horo na hanyoyin sadarwa na zamani mai zurfi akan ɗimbin bayanai daga makonni zuwa sa'o'i ko ma mintuna, yin wannan dabarar horarwa ta mamaye aikace-aikace masu amfani na zurfafa ilmantarwa. Dole ne masu amfani su fahimci yadda ake rabawa da aiki tare da bayanai a cikin yanayi da yawa, wanda hakan ke da babban tasiri kan ingancin ƙima. Bugu da kari, masu amfani kuma yakamata su san yadda ake tura rubutun horo wanda ke gudana akan misali guda zuwa lokuta da yawa.

A cikin wannan labarin za mu yi magana game da hanya mai sauri da sauƙi don rarraba ilmantarwa ta amfani da bude ɗakin karatu mai zurfi Apache MXNet da Horovod da aka rarraba tsarin ilmantarwa. Za mu nuna a fili fa'idodin aikin Horovod da kuma nuna yadda ake rubuta rubutun horo na MXNet don yin aiki a cikin hanyar rarraba tare da Horovod.

Menene Apache MXNet

Apache MX Net tsarin tsarin ilmantarwa mai zurfi ne mai buɗe ido wanda ake amfani dashi don ƙirƙira, horarwa, da tura hanyoyin sadarwa masu zurfi. MXNet yana ƙaddamar da rikitattun abubuwan da ke da alaƙa da aiwatar da hanyoyin sadarwa na jijiyoyi, yana aiki sosai kuma yana iya daidaitawa, kuma yana ba da APIs don shahararrun harsunan shirye-shirye kamar su. Python, C ++, Clojure, Java, Julia, R, Scala da sauransu.

An rarraba horo a cikin MXNet tare da uwar garken sigina

Daidaitaccen tsarin ilmantarwa da aka rarraba a cikin MXNet yana amfani da tsarin sabar sabar. Yana amfani da saitin sabar sigina don tattara gradients daga kowane ma'aikaci, yin tarawa, da aika sabbin gradients zuwa ga ma'aikata don haɓaka haɓakawa na gaba. Ƙayyade madaidaicin rabon sabobin ga ma'aikata shine mabuɗin don ingantaccen ƙima. Idan uwar garken siga guda ɗaya ce kawai, yana iya zama cikas a lissafin. Sabanin haka, idan an yi amfani da sabar da yawa, sadarwa da yawa zuwa da yawa na iya toshe duk hanyoyin sadarwa.

Menene Horovod

Horovod tsarin ilmantarwa mai zurfi ne wanda aka rarraba a Uber. Yana yin amfani da ingantacciyar fasahar giciye-GPU da fasahar giciye irin su NVIDIA Collective Communications Library (NCCL) da Saƙon Wucewa Interface (MPI) don rarrabawa da tara sigogin ƙira a cikin vorecs. Yana haɓaka amfani da bandwidth na cibiyar sadarwa da ma'auni da kyau yayin aiki tare da ƙirar cibiyar sadarwa mai zurfi. A halin yanzu yana goyan bayan shahararrun tsarin koyo na inji, wato MX Net, Tensorflow, Keras, da PyTorch.

MXNet da Horovod hadewa

MXNet yana haɗawa tare da Horovod ta hanyar API ɗin Rarraba Ilimi da aka ayyana a cikin Horovod. APIs na sadarwar Horovod horovod.watsawa(), horovod.allgather() и horovod.allreduce() aiwatarwa ta amfani da asynchronous callbacks na injin MXNet, a matsayin wani ɓangare na jadawalin aikin sa. Ta wannan hanyar, abubuwan dogaro da bayanai tsakanin sadarwa da ƙididdigewa suna cikin sauƙin sarrafa injin MXNet don guje wa asarar aiki saboda aiki tare. Abun ingantawa da aka rarraba wanda aka ayyana a cikin Horovod horovod.DistributedOptimizer yana faɗaɗa Mai ingantawa a cikin MXNet don ya kira Horovod APIs masu dacewa don sabunta sigogin da aka rarraba. Duk waɗannan bayanan aiwatarwa a bayyane suke ga masu amfani da ƙarshen.

Saurin farawa

Kuna iya fara horar da ƙaramin hanyar sadarwa ta juzu'i cikin sauri akan saitin bayanan MNIST ta amfani da MXNet da Horovod akan MacBook ɗinku.
Da farko, shigar da mxnet da horovod daga PyPI:

pip install mxnet
pip install horovod

Lura: Idan kun haɗu da kuskure lokacin pip shigar horovodwatakila kana buƙatar ƙara mai canzawa MACOSX_DEPLOYMENT_TARGET=10.vvinda vv - Wannan sigar MacOS ce ta sigar ku, alal misali, don MacOSX Sierra kuna buƙatar rubutawa MACOSX_DEPLOYMENT_TARGET=10.12 pip shigar horovod

Sannan shigar da OpenMPI daga nan.

A ƙarshe, zazzage rubutun gwajin mxnet_mnist.py daga nan kuma gudanar da waɗannan umarni a cikin tashar MacBook a cikin kundin aiki:

mpirun -np 2 -H localhost:2 -bind-to none -map-by slot python mxnet_mnist.py

Wannan zai gudanar da horo a kan manyan motocinku biyu. Fitowar za ta kasance kamar haka:

INFO:root:Epoch[0] Batch [0-50] Speed: 2248.71 samples/sec      accuracy=0.583640
INFO:root:Epoch[0] Batch [50-100] Speed: 2273.89 samples/sec      accuracy=0.882812
INFO:root:Epoch[0] Batch [50-100] Speed: 2273.39 samples/sec      accuracy=0.870000

Ayyukan Demo

Lokacin horar da samfurin ResNet50-v1 akan bayanan bayanan ImageNet akan GPUs 64 tare da misalai takwas. p3.16 ku EC2, kowanne yana dauke da 8 NVIDIA Tesla V100 GPUs akan girgijen AWS, mun sami nasarar samar da horo na hotuna 45000 / s (watau adadin samfuran horarwa a sakan daya). An kammala horo a cikin mintuna 44 bayan zamanin 90 tare da ingantaccen daidaito na 75.7%.

Mun kwatanta wannan zuwa tsarin horarwa na rarrabawar MXNet na amfani da sabar sigina akan 8, 16, 32 da 64 GPUs tare da sabar siga guda ɗaya da sabar zuwa rabon ma'aikaci na 1 zuwa 1 da 2 zuwa 1, bi da bi. Kuna iya ganin sakamakon a hoto na 1 a ƙasa. A kan y-axis a gefen hagu, sanduna suna wakiltar adadin hotuna don horar da su a cikin dakika guda, layukan suna nuna ingancin sikeli (wato, rabo na ainihin kayan aiki mai kyau) akan y-axis a dama. Kamar yadda kake gani, zaɓin adadin sabobin yana shafar ingantaccen haɓaka. Idan uwar garken siga guda ɗaya ce kawai, ƙimar ƙimar ƙimar ta ragu zuwa 38% akan 64 GPUs. Don cimma daidaitattun daidaiton ƙima kamar Horovod, kuna buƙatar ninka adadin sabobin dangane da adadin ma'aikata.

Koyon Rarraba tare da Apache MXNet da Horovod
Hoto 1. Kwatanta ilmantarwa da aka rarraba ta amfani da MXNet tare da Horovod kuma tare da uwar garken sigina

A cikin Teburin 1 da ke ƙasa, muna kwatanta farashin ƙarshe a kowane misali lokacin gudanar da gwaje-gwaje akan 64 GPUs. Yin amfani da MXNet tare da Horovod yana samar da mafi kyawun kayan aiki a mafi ƙarancin farashi.

Koyon Rarraba tare da Apache MXNet da Horovod
Tebur 1. Kwatankwacin farashi tsakanin Horovod da Parameter Server tare da sabar zuwa rabon ma'aikaci na 2 zuwa 1.

Matakan haifuwa

A cikin matakai na gaba, za mu nuna maka yadda za a sake haifar da sakamakon da aka rarraba ta hanyar amfani da MXNet da Horovod. Don ƙarin koyo game da rarraba koyo tare da karanta MXNet wannan post.

Mataki 1

Ƙirƙirar gungu na misalan kamanni tare da nau'in MXNet 1.4.0 ko mafi girma da nau'in Horovod 0.16.0 ko sama don amfani da ilmantarwa da aka rarraba. Hakanan kuna buƙatar shigar da ɗakunan karatu don horar da GPU. Ga misalin mu, mun zaɓi Ubuntu 16.04 Linux, tare da GPU Driver 396.44, CUDA 9.2, cuDNN 7.2.1 ɗakin karatu, NCCL 2.2.13 mai sadarwa da OpenMPI 3.1.1. Hakanan zaka iya amfani Amazon Deep Learning AMI, inda aka riga aka shigar da waɗannan ɗakunan karatu.

Mataki 2

Ƙara ikon yin aiki tare da Horovod API zuwa rubutun horo na MXNet. Rubutun da ke ƙasa bisa MXNet Gluon API za a iya amfani da shi azaman samfuri mai sauƙi. Ana buƙatar layukan cikin ƙarfi idan kun riga kuna da rubutun horo daidai. Anan ga wasu mahimman canje-canje da kuke buƙatar yin don koyo tare da Horovod:

  • Saita mahallin bisa ga matsayi na Horovod na gida (layi 8) don fahimtar cewa ana yin horo akan ainihin zane-zane.
  • Wuce sigogi na farko daga ma'aikaci ɗaya zuwa duka (layi 18) don tabbatar da cewa duk ma'aikata sun fara da sigogin farko iri ɗaya.
  • Ƙirƙiri Horovod RarrabaOptimizer (layi na 25) don sabunta sigogi ta hanyar rarrabawa.

Don samun cikakken rubutun, da fatan za a koma ga misalan Horovod-MXNet MNIST и Hoton Hotuna.

1  import mxnet as mx
2  import horovod.mxnet as hvd
3
4  # Horovod: initialize Horovod
5  hvd.init()
6
7  # Horovod: pin a GPU to be used to local rank
8  context = mx.gpu(hvd.local_rank())
9
10 # Build model
11 model = ...
12
13 # Initialize parameters
14 model.initialize(initializer, ctx=context)
15 params = model.collect_params()
16
17 # Horovod: broadcast parameters
18 hvd.broadcast_parameters(params, root_rank=0)
19
20 # Create optimizer
21 optimizer_params = ...
22 opt = mx.optimizer.create('sgd', **optimizer_params)
23
24 # Horovod: wrap optimizer with DistributedOptimizer
25 opt = hvd.DistributedOptimizer(opt)
26
27 # Create trainer and loss function
28 trainer = mx.gluon.Trainer(params, opt, kvstore=None)
29 loss_fn = ...
30
31 # Train model
32 for epoch in range(num_epoch):
33    ...

Mataki 3

Shiga ɗaya daga cikin ma'aikata don fara rarraba horo ta amfani da umarnin MPI. A cikin wannan misalin, horarwar da aka rarraba tana gudana akan lokuta huɗu tare da GPUs 4 kowanne, da jimillar GPUs 16 a cikin tari. Za a yi amfani da ingantawar Stochastic Gradient Descent (SGD) tare da ma'auni masu zuwa:

  • mini-batch size: 256
  • Yawan koyo: 0.1
  • Shafin: 0.9
  • Lalacewar nauyi: 0.0001

Kamar yadda muka haɓaka daga GPU ɗaya zuwa 64 GPUs, mun daidaita ƙimar horo bisa ga adadin GPUs (daga 0,1 don 1 GPU zuwa 6,4 don 64 GPUs), yayin da muke kiyaye adadin hotuna a kowane GPU a 256 (daga batch na Hotuna 256 don 1 GPU zuwa 16 don 384 GPUs). Lalacewar nauyi da sigogin motsi sun canza yayin da adadin GPUs ya karu. Mun yi amfani da ingantaccen horo tare da nau'in bayanan float64 don wucewa ta gaba da float16 don gradients don haɓaka lissafin float32 wanda NVIDIA Tesla GPUs ke tallafawa.

$ mpirun -np 16 
    -H server1:4,server2:4,server3:4,server4:4 
    -bind-to none -map-by slot 
    -mca pml ob1 -mca btl ^openib 
    python mxnet_imagenet_resnet50.py

ƙarshe

A cikin wannan labarin, mun kalli hanyar da za a iya daidaitawa don rarraba horon samfurin ta amfani da Apache MXNet da Horovod. Mun nuna ingancin sikelin da ingancin farashi idan aka kwatanta da tsarin sabar uwar garken akan bayanan bayanan ImageNet wanda aka horar da ƙirar ResNet50-v1. Mun kuma haɗa matakan da za ku iya amfani da su don gyara rubutun da ke akwai don gudanar da horo na misali da yawa ta amfani da Horovod.

Idan kawai kuna farawa da MXNet da zurfin koyo, je zuwa shafin shigarwa MXNedon fara gina MXNet. Muna kuma ba da shawarar karanta labarin sosai MXNet a cikin mintuna 60don farawa.

Idan kun riga kun yi aiki tare da MXNet kuma kuna son gwada ilmantarwa da aka rarraba tare da Horovod, to ku duba Shafin shigarwa na Horovod, gina shi daga MXNet kuma ku bi misalin MNIST ko Hoton Hotuna.

*An ƙididdige farashi bisa ga farashin sa'a AWS don EC2 Misalai

Ƙara koyo game da kwas "Masana'antu ML akan Babban Data"

source: www.habr.com

Add a comment