OpenVINO hackathon: ho lemoha lentsoe le maikutlo ho Raspberry Pi

November 30 - December 1 e ne e tšoaretsoe Nizhny Novgorod OpenVINO hackathon. Barupeluoa ba ile ba botsoa ho etsa mohlala oa tharollo ea sehlahisoa ba sebelisa Intel OpenVINO toolkit. Bahlophisi ba ile ba etsa tlhahiso ea lethathamo la lihlooho tse hakanyetsoang tse ka tataisoang ke ho khetha mosebetsi, empa qeto ea ho qetela e ile ea sala le lihlopha. Ho phaella moo, tšebeliso ea mehlala e sa kenyelletsoeng sehlahisoa e ile ea khothalletsoa.

OpenVINO hackathon: ho lemoha lentsoe le maikutlo ho Raspberry Pi

Sehloohong sena re tla u joetsa hore na re thehile mohlala oa rona oa sehlahisoa joang, oo qetellong re ileng ra nka sebaka sa pele.

Lihlopha tse fetang 10 li nkile karolo ho hackathon. Ho monate hore ebe ba bang ba bona ba ne ba tsoa libakeng tse ling. Sebaka sa hackathon e ne e le mohaho oa "Kremlinsky on Pochain", moo lifoto tsa khale tsa Nizhny Novgorod li neng li fanyehiloe ka hare, ka mokoloko! (Ke u hopotsa hore hajoale ofisi e bohareng ea Intel e Nizhny Novgorod). Barupeluoa ba ile ba fuoa lihora tse 26 ho ngola khoutu, 'me qetellong ba tlameha ho fana ka tharollo ea bona. Molemo o fapaneng e ne e le ho ba teng ha seboka sa demo ho etsa bonnete ba hore ntho e 'ngoe le e' ngoe e reriloeng e hlile e kenngoa ts'ebetsong 'me ha e lule e le maikutlo tlhahisong. Thepa, lijo tse bobebe, lijo, tsohle le tsona li ne li le teng!

Ntle le moo, Intel e fane ka lik'hamera ka boikhethelo, Raspberry PI, Neural Compute Stick 2.

Khetho ea mosebetsi

E 'ngoe ea likarolo tse thata ka ho fetisisa tsa ho itokisetsa hackathon ea mahala ke ho khetha phephetso. Hang-hang re ile ra etsa qeto ea ho hlahisa ntho e 'ngoe e neng e e-s'o be sehlahisoa, kaha phatlalatso e boletse hore sena se amohelehile haholo.

Ha u se u hlahlobile mohlala, tse kenyelletsoeng sehlahisoa ho lokolloa ha hona joale, re fihlela qeto ea hore boholo ba bona ba rarolla mathata a sa tšoaneng a pono ea k'homphieutha. Ho feta moo, ho thata haholo ho tla le bothata tšimong ea pono ea k'homphieutha e ke keng ea rarolloa ka OpenVINO, 'me le haeba motho a ka qaptjoa, ho thata ho fumana mehlala e koetlisitsoeng pele ho sechaba. Re etsa qeto ea ho cheka ka lehlakoreng le leng - mabapi le ts'ebetso ea puo le analytics. A re nahaneng ka mosebetsi o thahasellisang oa ho lemoha maikutlo a puo. Ho tlameha ho boleloa hore OpenVINO e se e ntse e na le mohlala o khethollang maikutlo a motho ho latela sefahleho sa bona, empa:

  • Ka khopolo, hoa khoneha ho etsa algorithm e kopantsoeng e tla sebetsa ka bobeli molumo le setšoantšo, se lokelang ho fana ka keketseho ea ho nepahala.
  • Hangata lik'hamera li na le angle e patisaneng ea ho shebella; ho hlokahala lik'hamera tse fetang e le 'ngoe ho koahela sebaka se seholo; molumo ha o na moeli o joalo.

Ha re hlaolele mohopolo: ha re nke mohopolo oa karolo ea thekiso e le motheo. U ka lekanya khotsofalo ea bareki ha u reka mabenkeleng. Haeba e mong oa bareki a sa khotsofala ka tšebeletso mme a qala ho phahamisa molumo oa bona, o ka letsetsa mookameli hang-hang bakeng sa thuso.
Tabeng ena, re hloka ho eketsa temoho ea lentsoe la motho, sena se tla re lumella ho khetholla basebetsi ba lebenkele ho bareki le ho fana ka li-analytics bakeng sa motho ka mong. Hantle, ho phaella moo, ho tla khoneha ho hlahloba boitšoaro ba basebeletsi ba lebenkele ka bobona, ho hlahloba sepakapaka sehlopheng, ho utloahala hantle!

Re theha litlhoko tsa tharollo ea rona:

  • Boholo bo nyane ba sesebelisoa se shebiloeng
  • Ts'ebetso ea nako ea 'nete
  • Theko e tlase
  • Bonolo scalability

Ka lebaka leo, re khetha Raspberry Pi 3 c joalo ka sesebelisoa se shebiloeng Intel NCS 2.

Mona ke habohlokoa ho ela hloko tšobotsi e le 'ngoe ea bohlokoa ea NCS - e sebetsa hantle ka ho fetisisa ka meralo e tloaelehileng ea CNN, empa haeba u hloka ho tsamaisa mohlala o nang le mekhahlelo e tloaelehileng ho oona, joale u lebelle ntlafatso ea boemo bo tlaase.

Ho na le ntho e le 'ngoe feela e nyane: o hloka ho fumana maekrofono. Maekerofounu e tloaelehileng ea USB e tla sebetsa, empa e ke ke ea shebahala hantle hammoho le RPI. Empa le mona tharollo e hlile e "ea haufi." Ho rekota lentsoe, re etsa qeto ea ho sebelisa boto ea Voice Bonnet ho tloha kit Google AIY Voice Kit, eo ho eona ho nang le maekerofounu ea stereo e nang le terata.

Khoasolla Raspbian ho tsoa Sebaka sa polokelo ea merero ea AIY ebe o e kenya ho flash drive, leka hore na maekrofono o sebetsa ka taelo e latelang (e tla rekota molumo ka metsotsoana e 5 ebe o e boloka faeleng):

arecord -d 5 -r 16000 test.wav

Ke lokela ho hlokomela hang-hang hore microphone e na le kutloelo-bohloko haholo 'me e nka lerata hantle. Ho lokisa sena, a re ee ho alsamixer, khetha Capture lisebelisoa le ho fokotsa boemo ba lets'oao la ho kenya ho 50-60%.

OpenVINO hackathon: ho lemoha lentsoe le maikutlo ho Raspberry Pi
Re fetola 'mele ka faele mme ntho e' ngoe le e 'ngoe e lumellana, u ka e koala ka sekwahelo

Ho eketsa konopo ea indicator

Ha re ntse re arola AIY Voice Kit, re hopola hore ho na le konopo ea RGB, lebone le ka morao le ka laoloang ke software. Re batla "Google AIY Led" mme re fumana litokomane: https://aiyprojects.readthedocs.io/en/latest/aiy.leds.html
Ke hobane'ng ha u sa sebelise konopo ena ho bontša maikutlo a amohelehang, re na le lihlopha tse 7 feela, 'me konopo e na le mebala e 8, e lekaneng feela!

Re hokela konopo ka GPIO ho Voice Bonnet, re kenya lilaebrari tse hlokahalang (li se li kentsoe setsing sa kabo ho tsoa mererong ea AIY)

from aiy.leds import Leds, Color
from aiy.leds import RgbLeds

Ha re theheng taelo eo ho eona maikutlo a mang le a mang a tla ba le 'mala o lumellanang ka mokhoa oa RGB Tuple le ntho ea sehlopha aiy.leds.Leds, eo ka eona re tla ntlafatsa' mala:

led_dict = {'neutral': (255, 255, 255), 'happy': (0, 255, 0), 'sad': (0, 255, 255), 'angry': (255, 0, 0), 'fearful': (0, 0, 0), 'disgusted':  (255, 0, 255), 'surprised':  (255, 255, 0)} 
leds = Leds()

'Me qetellong, ka mor'a polelo e' ngoe le e 'ngoe e ncha ea maikutlo, re tla ntlafatsa' mala oa konopo ho ea ka eona (ka senotlolo).

leds.update(Leds.rgb_on(led_dict.get(classes[prediction])))

OpenVINO hackathon: ho lemoha lentsoe le maikutlo ho Raspberry Pi
Konopo, chesa!

Ho sebetsa ka lentsoe

Re tla sebelisa pyaudio ho hapa molapo o tsoang ho maekrofono le webrtcvad ho sefa lerata le ho utloa lentsoe. Ho feta moo, re tla theha letoto leo re tla eketsa ka mokhoa o lumellanang le ho tlosa maqiti a lentsoe.

Kaha webrtcvad e na le moeli ho boholo ba sekhechana se fanoeng - e tlameha ho lekana le 10/20/30ms, 'me koetliso ea mohlala bakeng sa ho lemoha maikutlo (joalokaha re tla ithuta hamorao) e ile ea etsoa ho 48kHz dataset, re tla hapa likotoana tsa boholo ba 48000×20ms/1000×1(mono)=960 bytes. Webrtcvad e tla khutlisa 'Nete/False bakeng sa e' ngoe le e 'ngoe ea likarolo tsena, tse tsamaellanang le ho ba teng kapa ho ba sieo ha likhetho karolong.

Ha re kenye ts'ebetsong logic e latelang:

  • Re tla eketsa lenaneng likotoana tseo ho nang le likhetho; haeba ho se na likhetho, re tla eketsa palo ea likotoana tse se nang letho.
  • Haeba counter ea likotoana tse se nang letho ke> = 30 (600 ms), joale re sheba boholo ba lenane la likotoana tse bokelletsoeng; haeba e le> 250, ebe re e eketsa moleng; haeba ho se joalo, re nka hore bolelele ea rekoto ha hoa lekana ho e fepa ho ea mohlala ho khetholla sebui.
  • Haeba k'hamphani ea likotoana tse se nang letho e ntse e le <30,' me boholo ba lenane la li-chunks tse bokelitsoeng li feta 300, joale re tla eketsa sekhechana moleng bakeng sa ponelopele e nepahetseng haholoanyane. (hobane maikutlo a atisa ho fetoha ha nako e ntse e ea)

 def to_queue(frames):
    d = np.frombuffer(b''.join(frames), dtype=np.int16)
    return d

framesQueue = queue.Queue()
def framesThreadBody():
    CHUNK = 960
    FORMAT = pyaudio.paInt16
    CHANNELS = 1
    RATE = 48000

    p = pyaudio.PyAudio()
    vad = webrtcvad.Vad()
    vad.set_mode(2)
    stream = p.open(format=FORMAT,
                channels=CHANNELS,
                rate=RATE,
                input=True,
                frames_per_buffer=CHUNK)
    false_counter = 0
    audio_frame = []
    while process:
        data = stream.read(CHUNK)
        if not vad.is_speech(data, RATE):
            false_counter += 1
            if false_counter >= 30:
                if len(audio_frame) > 250:              
                    framesQueue.put(to_queue(audio_frame,timestamp_start))
                    audio_frame = []
                    false_counter = 0

        if vad.is_speech(data, RATE):
            false_counter = 0
            audio_frame.append(data)
            if len(audio_frame) > 300:                
                    framesQueue.put(to_queue(audio_frame,timestamp_start))
                    audio_frame = []

Ke nako ea ho batla mehlala e koetlisitsoeng pele sebakeng sa sechaba, e-ea ho github, Google, empa hopola hore re na le moeli oa mohaho o sebelisitsoeng. Ena ke karolo e thata haholo, hobane o tlameha ho leka mehlala ho data ea hau e kentsoeng, 'me ho ekelletsa moo, o e fetole hore e be sebopeho sa kahare sa OpenVINO - IR (Boemeli ba Bohareng). Re lekile ka litharollo tse fapaneng tsa 5-7 tse tsoang ho github, mme haeba mohlala oa ho lemoha maikutlo o sebetsa hang-hang, joale ka temoho ea lentsoe re ile ra tlameha ho ema nako e teletsana - ba sebelisa meralo e rarahaneng haholoanyane.

Re tsepamisa maikutlo ho tse latelang:

E latelang re tla bua ka ho fetola mehlala, ho qala ka khopolo. OpenVINO e kenyelletsa li-module tse 'maloa:

  • Bula Model Zoo, mefuta e ka sebelisoang le ho kenyelletsoa sehlahisoa sa hau
  • Model Optimzer, ka lebaka leo u ka fetolelang mohlala ho tsoa liforomong tse fapaneng tsa meralo (Tensorflow, ONNX joalo-joalo) hore e be sebopeho sa Boemeli ba Bohareng, boo re tla sebetsa ka bona ho ea pele.
  • Inference Engine e u lumella ho tsamaisa mefuta ka sebopeho sa IR ho li-processor tsa Intel, li-chips tsa Myriad le li-accelerator tsa Neural Compute Stick.
  • Mofuta o sebetsang ka ho fetesisa oa OpenCV (ka tšehetso ea Inference Engine)
    Moetso o mong le o mong ka sebopeho sa IR o hlalosoa ke lifaele tse peli: .xml le .bin.
    Mefuta e fetoleloa ho sebopeho sa IR ka Model Optimizer ka tsela e latelang:

    python /opt/intel/openvino/deployment_tools/model_optimizer/mo_tf.py --input_model speaker.hdf5.pb --data_type=FP16 --input_shape [1,512,1000,1]

    --data_type e u lumella ho khetha mofuta oa data oo mohlala o tla sebetsa ka oona. FP32, FP16, INT8 lia tšehetsoa. Ho khetha mofuta o nepahetseng oa data ho ka matlafatsa ts'ebetso e ntle.
    --input_shape e bonts'a boholo ba data e kentsoeng. Bokhoni ba ho e fetola ka matla bo bonahala bo le teng ho C ++ API, empa ha rea ​​ka ra cheka hole joalo mme ra e lokisa bakeng sa e 'ngoe ea mehlala.
    Ka mor'a moo, ha re leke ho kenya mofuta o seng o fetotsoe ka sebopeho sa IR ka mojule oa DNN ho OpenCV ebe o o fetisetsa ho eona.

    import cv2 as cv
    emotionsNet = cv.dnn.readNet('emotions_model.bin',
                              'emotions_model.xml')
    emotionsNet.setPreferableTarget(cv.dnn.DNN_TARGET_MYRIAD)

    Mohala oa ho qetela tabeng ena o u lumella ho khutlisetsa lipalo ho Neural Compute Stick, lipalo tsa motheo li etsoa ho processor, empa tabeng ea Raspberry Pi sena se ke ke sa sebetsa, u tla hloka thupa.

    Ka mor'a moo, mabaka ke a latelang: re arola molumo oa rona ka lifensetere tsa boholo bo itseng (ho rona ke 0.4 s), re fetolela e 'ngoe le e' ngoe ea lifensetere tsena hore e be MFCC, eo re e fepang ho grid:

    emotionsNet.setInput(MFCC_from_window)
    result = emotionsNet.forward()

    Ka mor'a moo, a re nke sehlopha se tloaelehileng ka ho fetisisa bakeng sa lifensetere tsohle. Tharollo e bonolo, empa bakeng sa hackathon ha ho hlokahale hore u tle le ntho e sa utloahaleng haholo, hafeela u e-na le nako. Re sa na le mosebetsi o mongata oo re lokelang ho o etsa, kahoo ha re feteleng pele - re tla sebetsana le ho tsebahatsa lentsoe. Hoa hlokahala ho etsa mofuta o itseng oa database moo li-spectrogram tsa mantsoe a rekotiloeng li neng li tla bolokoa. Kaha ho saletsoe ke nako e nyenyane, re tla rarolla bothata bona ka hohle kamoo re ka khonang.

    Ka mantsoe a mang, re theha script bakeng sa ho rekota tlhaloso ea lentsoe (e sebetsa ka tsela e tšoanang le e hlalositsoeng ka holimo, feela ha e sitisoa ho tswa ho keyboard e tla boloka lentsoe ho faele).

    Ha re leke:

    python3 voice_db/record_voice.py test.wav

    Re rekota mantsoe a batho ba 'maloa (ho rona, litho tse tharo tsa sehlopha)
    Ka mor'a moo, bakeng sa lentsoe le leng le le leng le rekotiloeng re etsa phetoho e potlakileng ea fourier, fumana spectrogram ebe re e boloka e le numpy array (.npy):

    for file in glob.glob("voice_db/*.wav"):
            spec = get_fft_spectrum(file)
            np.save(file[:-4] + '.npy', spec)

    Lintlha tse ling faeleng create_base.py
    Ka lebaka leo, ha re tsamaisa mongolo oa mantlha, re tla fumana li-embeddings ho tsoa ho li-spectrogram tsena qalong:

    for file in glob.glob("voice_db/*.npy"):
        spec = np.load(file)
        spec = spec.astype('float32')
        spec_reshaped = spec.reshape(1, 1, spec.shape[0], spec.shape[1])
        srNet.setInput(spec_reshaped)
        pred = srNet.forward()
        emb = np.squeeze(pred)

    Ka mor'a ho amohela embedding ho tloha karolong e utloahalang, re tla khona ho tseba hore na ke ea mang ka ho nka sebaka sa cosine ho tloha phasejeng ho ea ho mantsoe ohle a polokelong ea litaba (a manyenyane, ho feta) - bakeng sa demo re beha moeli. ho ea ho 0.3):

            dist_list = cdist(emb, enroll_embs, metric="cosine")
            distances = pd.DataFrame(dist_list, columns = df.speaker)

    Qetellong, ke rata ho hlokomela hore lebelo la inference le ne le potlakile mme le entse hore ho khonehe ho eketsa mefuta e meng ea 1-2 (bakeng sa mohlala oa metsotsoana ea 7 nako e telele e nkile 2.5 bakeng sa tlhaloso). Ha re sa na nako ea ho eketsa mefuta e mecha mme re tsepamisitse maikutlo ho ngoleng mohlala oa ts'ebeliso ea webo.

    Sesebelisoa sa webo

    Ntlha ea bohlokoa: re nka router le rona ho tloha lapeng mme re theha marang-rang a sebaka sa rona, e thusa ho hokahanya sesebelisoa le li-laptops holim'a marang-rang.

    The backend ke mocha oa molaetsa oa ho qetela pakeng tsa pele le Raspberry Pi, o ipapisitse le theknoloji ea websocket (http over tcp protocol).

    Mokhahlelo oa pele ke ho amohela tlhahisoleseling e sebetsitsoeng ho tsoa ho raspberry, ke hore, li-predictors tse pakiloeng ho json, tse bolokiloeng polokelong ea litaba bohareng ba leeto la bona e le hore lipalo-palo li ka hlahisoa mabapi le semelo sa maikutlo sa mosebelisi bakeng sa nako eo. Pakete ena e romelloa ho ea pele, e sebelisang ho ngolisa le ho amohela lipakete ho tloha qetellong ea websocket. Mochine oohle oa morao-rao o hahiloe ka puo ea golang; e khethiloe hobane e loketse hantle bakeng sa mesebetsi ea asynchronous, eo li-goroutines li e sebetsanang hantle.
    Ha o fihla pheletsong, mosebelisi o ngolisitsoe mme a kenngoa sebopehong, ebe molaetsa oa hae oa amoheloa. Ka bobeli mosebelisi le molaetsa li kenngoa setsing se tloaelehileng, moo melaetsa e seng e rometsoe ho ea pele (ho ea pele e ngolisitsoeng), mme haeba mosebelisi a koala khokahano (raspberry kapa ka pele), joale peeletso ea hae e hlakotsoe mme o tlosoa ho eona. setsiba.

    OpenVINO hackathon: ho lemoha lentsoe le maikutlo ho Raspberry Pi
    Re emetse khokahano e tsoang ka morao

    Front-end ke sesebelisoa sa webo se ngotsoeng ka JavaScript se sebelisa laeborari ea React ho potlakisa le ho nolofatsa ts'ebetso ea nts'etsopele. Morero oa ts'ebeliso ena ke ho bona ka mahlo a kelello data e fumanoeng ho sebelisoa li-algorithms tse tsamaeang ka morao-rao le ka kotloloho ho Raspberry Pi. Leqephe le na le tsela ea likarolo e sebelisoang ho sebelisa react-router, empa leqephe le ka sehloohong la thahasello ke leqephe le ka sehloohong, moo data e tsoelang pele e amoheloang ka nako ea sebele ho tswa ho seva ho sebelisa theknoloji ea WebSocket. Raspberry Pi e lemoha lentsoe, e khetha hore na ke ea motho ea itseng ho tsoa polokelong ea polokelo e ngolisitsoeng, ebe e romela lethathamo la menyetla ho moreki. Moreki o bonts'a lintlha tsa morao-rao tse amehang, o bonts'a avatar ea motho eo ho ka etsahalang hore ebe o buile ka maekrofono, hammoho le maikutlo ao a bitsang mantsoe ka ona.

    OpenVINO hackathon: ho lemoha lentsoe le maikutlo ho Raspberry Pi
    Leqephe la lehae le nang le likhakanyo tse ntlafalitsoeng

    fihlela qeto e

    Ho ne ho sa khonehe ho phethela ntho e 'ngoe le e' ngoe joalo ka ha ho reriloe, re ne re se na nako, kahoo tšepo e kholo e ne e le ho demo, hore tsohle li tla sebetsa. Puisanong ba buile ka hore na ntho e 'ngoe le e' ngoe e sebetsa joang, ke mehlala efe eo ba e nkileng, ke mathata afe ao ba kopaneng le 'ona. E latelang e ne e le karolo ea demo - litsebi li ile tsa potoloha phaposi ka tatellano e sa reroang 'me tsa atamela sehlopha ka seng ho sheba mohlala o sebetsang. Le bona ba ile ba re botsa lipotso, e mong le e mong a araba karolo ea hae, ba siea marang-rang ho laptop, 'me ntho e' ngoe le e 'ngoe e hlile e sebetsa kamoo ho neng ho lebelletsoe.

    E re ke hlokomele hore litšenyehelo tsohle tsa tharollo ea rona e ne e le $150:

    • Raspberry Pi 3 ~ $35
    • Google AIY Voice Bonnet (o ka nka tefiso ea sebui) ~ 15$
    • Intel NCS 2 ~ 100$

    Mokhoa oa ho ntlafatsa:

    • Sebelisa ngoliso ho tsoa ho moreki - kopa ho bala mongolo o hlahisoang ka tšohanyetso
    • Eketsa mefuta e meng e seng mekae: o ka tseba hore na bong le lilemo li kae ka lentsoe
    • Arola mantsoe a llang ka nako e le 'ngoe (diarization)

    Sebaka sa polokelo: https://github.com/vladimirwest/OpenEMO

    OpenVINO hackathon: ho lemoha lentsoe le maikutlo ho Raspberry Pi
    Re kgathetse empa re thabile

    Qetellong, ke rata ho leboha bahlophisi le bankakarolo. Har'a merero ea lihlopha tse ling, rona ka borona re ratile tharollo ea ho beha leihlo libaka tsa mahala tsa ho paka makoloi. Ho rona, e ne e le boiphihlelo bo monate ba ho qoelisoa ka har'a sehlahisoa le nts'etsopele. Ke tšepa hore liketsahalo tse ngata tse thahasellisang li tla tšoareloa libakeng, ho kenyeletsa le lihlooho tsa AI.

Source: www.habr.com

Eketsa ka tlhaloso