OpenVINO hackathon: kuzindikira mawu ndi malingaliro pa Raspberry Pi

November 30 - December 1 ku Nizhny Novgorod unachitika OpenVINO hackathon. Ophunzira adafunsidwa kuti apange chitsanzo cha yankho lazogulitsa pogwiritsa ntchito zida za Intel OpenVINO. Okonzawo adapereka mndandanda wamitu pafupifupi yomwe ingatsogoleredwe posankha ntchito, koma chigamulo chomaliza chinakhalabe ndi magulu. Kuonjezera apo, kugwiritsa ntchito zitsanzo zomwe sizinaphatikizidwe muzinthuzo zinalimbikitsidwa.

OpenVINO hackathon: kuzindikira mawu ndi malingaliro pa Raspberry Pi

M'nkhaniyi tikuuzani momwe tidapangira prototype yathu yazinthuzo, zomwe pamapeto pake tidatenga malo oyamba.

Magulu opitilira 10 adatenga nawo gawo mu hackathon. Ndizosangalatsa kuti ena a iwo adachokera kumadera ena. Malo opangira hackathon anali "Kremlinsky on Pochain", pomwe zithunzi zakale za Nizhny Novgorod zidapachikidwa mkati, motsatira! (Ndikukumbutsani kuti pakadali pano ofesi yapakati ya Intel ili ku Nizhny Novgorod). Ophunzira adapatsidwa maola 26 kuti alembe ma code, ndipo pamapeto pake adayenera kupereka yankho lawo. Ubwino wina ndi kukhalapo kwa gawo lachiwonetsero kuwonetsetsa kuti zonse zomwe zidakonzedwa zidakwaniritsidwa ndipo sizinakhalebe malingaliro pazowonetsera. Zogulitsa, zokhwasula-khwasula, chakudya, zonse zinali pamenepo!

Kuphatikiza apo, Intel adapereka makamera, Raspberry PI, Neural Compute Stick 2.

Kusankha ntchito

Chimodzi mwazinthu zovuta kwambiri pokonzekera hackathon yaulere ndikusankha zovuta. Nthawi yomweyo tinaganiza zobwera ndi chinthu chomwe sichinapangidwe, popeza chilengezocho chinati izi ndi zolandiridwa kwambiri.

Atasanthula lachitsanzo, zomwe zikuphatikizidwa muzogulitsa zomwe zatulutsidwa panopa, timafika pamapeto kuti ambiri a iwo amathetsa mavuto osiyanasiyana a masomphenya apakompyuta. Komanso, ndizovuta kwambiri kubwera ndi vuto m'munda wa masomphenya a kompyuta omwe sangathe kuthetsedwa pogwiritsa ntchito OpenVINO, ndipo ngakhale munthu angapangidwe, n'zovuta kupeza zitsanzo zophunzitsidwa kale pagulu. Tasankha kukumba mbali ina - kuwongolera mawu ndi kusanthula. Tiyeni tilingalire ntchito yosangalatsa yozindikira kukhudzidwa kwa mawu. Ziyenera kunenedwa kuti OpenVINO ili kale ndi chitsanzo chomwe chimatsimikizira momwe munthu akumvera potengera nkhope yake, koma:

  • Mwachidziwitso, ndizotheka kupanga algorithm yophatikizana yomwe idzagwire ntchito pamawu ndi chithunzi, zomwe ziyenera kupatsa kuwonjezereka kolondola.
  • Makamera nthawi zambiri amakhala ndi ngodya yopapatiza; makamera opitilira imodzi amafunikira kuphimba malo akulu; mawu alibe malire otero.

Tiyeni tipange lingaliro: tiyeni titenge lingaliro la gawo lazogulitsa ngati maziko. Mutha kuyeza kukhutitsidwa kwamakasitomala pamakasitomala ogulitsa. Ngati m'modzi mwa makasitomala sakukhutira ndi ntchitoyi ndikuyamba kukweza mawu awo, mutha kuyimbira nthawi yomweyo woyang'anira kuti akuthandizeni.
Pankhaniyi, tifunika kuwonjezera kuzindikira kwa mawu aumunthu, izi zidzatilola kusiyanitsa ogwira ntchito m'sitolo ndi makasitomala ndikupereka ma analytics kwa munthu aliyense. Chabwino, kuonjezera apo, kudzakhala kotheka kusanthula khalidwe la ogwira ntchito m'sitolo okha, kuyesa mlengalenga mu timu, zikumveka bwino!

Timapanga zofunikira payankho lathu:

  • Kukula kochepa kwa chipangizo chandamale
  • Nthawi yeniyeni ntchito
  • Mtengo wotsika
  • Easy scalability

Zotsatira zake, timasankha Raspberry Pi 3 c ngati chida chandamale Intel NCS 2.

Apa ndikofunika kuzindikira chinthu chimodzi chofunikira cha NCS - chimagwira ntchito bwino ndi zomangamanga za CNN, koma ngati mukufunikira kuyendetsa chitsanzo chokhala ndi zigawo zachikhalidwe, ndiye kuti muyembekezere kukhathamiritsa kwapang'ono.

Pali chinthu chimodzi chaching'ono choti muchite: muyenera kupeza maikolofoni. Maikolofoni yokhazikika ya USB idzachita, koma siziwoneka bwino limodzi ndi RPI. Koma ngakhale apa yankho lake kwenikweni "liri pafupi." Kuti tijambule mawu, tasankha kugwiritsa ntchito bolodi la Voice Bonnet kuchokera pakiti Google AIY Voice Kit, pomwe pali maikolofoni ya sitiriyo yawaya.

Koperani Raspbian kuchokera AIY projekiti yosungirako ndikuyiyika pa drive drive, yesani kuti maikolofoni imagwira ntchito pogwiritsa ntchito lamulo ili (idzajambulitsa mawu masekondi 5 ndikusunga ku fayilo):

arecord -d 5 -r 16000 test.wav

Ndiyenera kuzindikira nthawi yomweyo kuti maikolofoniyo ndi omvera kwambiri ndipo amanyamula phokoso bwino. Kuti tikonze izi, tiyeni tipite ku alsamixer, sankhani Zida za Capture ndikuchepetsa siginecha yolowera mpaka 50-60%.

OpenVINO hackathon: kuzindikira mawu ndi malingaliro pa Raspberry Pi
Timasintha thupi ndi fayilo ndipo zonse zimagwirizana, mutha kutseka ndi chivindikiro

Kuwonjezera chizindikiro batani

Tikupatula AIY Voice Kit, timakumbukira kuti pali batani la RGB, lounikira kumbuyo lomwe limatha kuwongoleredwa ndi mapulogalamu. Timasaka "Google AIY Led" ndikupeza zolemba: https://aiyprojects.readthedocs.io/en/latest/aiy.leds.html
Bwanji osagwiritsa ntchito batani ili kuti muwonetse kukhudzidwa kodziwika, tili ndi makalasi 7 okha, ndipo batani ili ndi mitundu 8, yokwanira!

Timalumikiza batani kudzera pa GPIO kupita ku Voice Bonnet, kukweza malaibulale ofunikira (adayikidwa kale m'gulu logawa kuchokera kuzinthu za AIY)

from aiy.leds import Leds, Color
from aiy.leds import RgbLeds

Tiyeni tipange lamulo lomwe kutengeka kulikonse kudzakhala ndi mtundu wofananira mu mawonekedwe a RGB Tuple ndi chinthu cha kalasi aiy.leds.Leds, momwe tidzasinthira mtunduwo:

led_dict = {'neutral': (255, 255, 255), 'happy': (0, 255, 0), 'sad': (0, 255, 255), 'angry': (255, 0, 0), 'fearful': (0, 0, 0), 'disgusted':  (255, 0, 255), 'surprised':  (255, 255, 0)} 
leds = Leds()

Ndipo potsiriza, pambuyo pa kulosera kwatsopano kulikonse kwakumverera, tidzasintha mtundu wa batani molingana ndi izo (ndi fungulo).

leds.update(Leds.rgb_on(led_dict.get(classes[prediction])))

OpenVINO hackathon: kuzindikira mawu ndi malingaliro pa Raspberry Pi
Batani, yatsani!

Kuchita ndi mawu

Tidzagwiritsa ntchito pyaudio kujambula mtsinje kuchokera ku maikolofoni ndi webrtcvad kusefa phokoso ndikuzindikira mawu. Kuphatikiza apo, tidzapanga mzere womwe tidzawonjezera ndikuchotsamo mawu.

Popeza webrtcvad ili ndi malire pa kukula kwa chidutswa chomwe chaperekedwa - chiyenera kukhala chofanana ndi 10/20/30ms, ndipo kuphunzitsa kwachitsanzo kuzindikira malingaliro (monga momwe tidzaphunzirira pambuyo pake) kunachitika pa 48kHz dataset, tidzatero. jambulani zigawo za kukula 48000Γ—20ms/1000Γ—1(mono)=960 mabayiti. Webrtcvad ibweza Zoona/Zabodza pa chilichonse mwa zigawozi, zomwe zimagwirizana ndi kupezeka kapena kusapezeka kwa mavoti mu chunk.

Tiyeni tigwiritse ntchito logic iyi:

  • Tiwonjezera pamndandandawo zigawo zomwe pali voti; ngati palibe voti, ndiye kuti tidzawonjezera machulu opanda kanthu.
  • Ngati chowerengera cha chunks chopanda kanthu ndi> = 30 (600 ms), ndiye timayang'ana kukula kwa mndandanda wamagulu osonkhanitsidwa; ngati ndi> 250, ndiye timawonjezera pamzere; ngati sichoncho, timawona kuti kutalika kwake. wa mbiri sikokwanira kudyetsa kwa chitsanzo kuzindikira wokamba.
  • Ngati chowerengera cha chunks chopanda kanthu chikadali <30, ndipo kukula kwa mndandanda wamagulu osonkhanitsidwa kupitilira 300, ndiye kuti tiwonjeza chidutswacho pamzere kuti tilosere molondola. (chifukwa malingaliro amatha kusintha pakapita nthawi)

 def to_queue(frames):
    d = np.frombuffer(b''.join(frames), dtype=np.int16)
    return d

framesQueue = queue.Queue()
def framesThreadBody():
    CHUNK = 960
    FORMAT = pyaudio.paInt16
    CHANNELS = 1
    RATE = 48000

    p = pyaudio.PyAudio()
    vad = webrtcvad.Vad()
    vad.set_mode(2)
    stream = p.open(format=FORMAT,
                channels=CHANNELS,
                rate=RATE,
                input=True,
                frames_per_buffer=CHUNK)
    false_counter = 0
    audio_frame = []
    while process:
        data = stream.read(CHUNK)
        if not vad.is_speech(data, RATE):
            false_counter += 1
            if false_counter >= 30:
                if len(audio_frame) > 250:              
                    framesQueue.put(to_queue(audio_frame,timestamp_start))
                    audio_frame = []
                    false_counter = 0

        if vad.is_speech(data, RATE):
            false_counter = 0
            audio_frame.append(data)
            if len(audio_frame) > 300:                
                    framesQueue.put(to_queue(audio_frame,timestamp_start))
                    audio_frame = []

Yakwana nthawi yoti muyang'ane zitsanzo zophunzitsidwa kale pagulu la anthu, pitani ku github, Google, koma kumbukirani kuti tili ndi malire pazomanga zomwe zimagwiritsidwa ntchito. Ili ndi gawo lovuta kwambiri, chifukwa muyenera kuyesa zitsanzo pazomwe mumalowetsa, ndikuwonjezeranso, zisinthe kukhala mawonekedwe amkati a OpenVINO - IR (Intermediate Representation). Tinayesa za 5-7 mayankho osiyanasiyana kuchokera ku github, ndipo ngati chitsanzo chozindikira malingaliro chinagwira ntchito nthawi yomweyo, ndiye ndi kuzindikira mawu tinayenera kuyembekezera nthawi yayitali - amagwiritsa ntchito zomangamanga zovuta kwambiri.

Timayang'ana kwambiri izi:

Kenako tikambirana za akatembenuka zitsanzo, kuyambira chiphunzitso. OpenVINO imaphatikizapo ma module angapo:

  • Tsegulani Model Zoo, mitundu yomwe ingagwiritsidwe ntchito ndikuphatikizidwa pazogulitsa zanu
  • Model Optimzer, chifukwa chake mutha kusintha mtundu kuchokera pamawonekedwe osiyanasiyana (Tensorflow, ONNX etc) kukhala mtundu wa Intermediate Representation, womwe tidzagwira nawo ntchito mopitilira.
  • Inference Engine imakulolani kuyendetsa mitundu mumtundu wa IR pa ma Intel processors, Myriad chips ndi Neural Compute Stick accelerators.
  • Mtundu wothandiza kwambiri wa OpenCV (ndi Inference Engine thandizo)
    Mtundu uliwonse wamtundu wa IR umafotokozedwa ndi mafayilo awiri: .xml ndi .bin.
    Mitundu imasinthidwa kukhala mawonekedwe a IR kudzera pa Model Optimizer motere:

    python /opt/intel/openvino/deployment_tools/model_optimizer/mo_tf.py --input_model speaker.hdf5.pb --data_type=FP16 --input_shape [1,512,1000,1]

    --data_type amakulolani kusankha mtundu wa data womwe chitsanzocho chidzagwire ntchito. FP32, FP16, INT8 amathandizidwa. Kusankha mtundu woyenera wa data kungapereke chiwongola dzanja chabwino.
    --input_shape imasonyeza kukula kwa deta yolowetsa. Kutha kusintha kwakukulu kumawoneka kuti kulipo mu C ++ API, koma sitinakumba mpaka pano ndikungoyikonzera imodzi mwazojambulazo.
    Kenako, tiyeni tiyese kukweza mtundu womwe wasinthidwa kale mu mtundu wa IR kudzera mu gawo la DNN mu OpenCV ndikutumiza kwa iwo.

    import cv2 as cv
    emotionsNet = cv.dnn.readNet('emotions_model.bin',
                              'emotions_model.xml')
    emotionsNet.setPreferableTarget(cv.dnn.DNN_TARGET_MYRIAD)

    Mzere womaliza pankhaniyi umakupatsani mwayi wolozera kuwerengera ku Neural Compute Stick, kuwerengera koyambira kumachitika pa purosesa, koma pankhani ya Raspberry Pi izi sizigwira ntchito, mudzafunika ndodo.

    Chotsatira, malingalirowa ndi awa: timagawaniza zomvera zathu m'mawindo a kukula kwake (kwa ife ndi 0.4 s), timatembenuza mawindo awa kukhala MFCC, omwe timawadyetsa ku gridi:

    emotionsNet.setInput(MFCC_from_window)
    result = emotionsNet.forward()

    Kenako, tiyeni titenge kalasi wamba onse mazenera. Yankho losavuta, koma kwa hackathon simusowa kuti mubwere ndi chinthu chosadziwika bwino, pokhapokha ngati muli ndi nthawi. Tidakali ndi ntchito yambiri yoti tichite, kotero tiyeni tipitirize - tithana ndi kuzindikira mawu. Ndikofunikira kupanga mtundu wina wa database momwe ma spectrogram a mawu ojambulidwa kale amasungidwa. Popeza yatsala ndi nthawi yochepa, tidzathetsa nkhaniyi mmene tingathere.

    Mwakutero, timapanga zolemba zojambulira mawu (zimagwira ntchito mofananamo monga tafotokozera pamwambapa, pokhapokha zitasokonezedwa pa kiyibodi zimasunga mawuwo ku fayilo).

    Tiyeni tiyese:

    python3 voice_db/record_voice.py test.wav

    Timajambula mawu a anthu angapo (kwa ife, mamembala atatu a gulu)
    Kenako, pa liwu lililonse lojambulidwa timapanga masinthidwe othamanga kwambiri, pezani chithunzithunzi ndikuchisunga ngati numpy array (.npy):

    for file in glob.glob("voice_db/*.wav"):
            spec = get_fft_spectrum(file)
            np.save(file[:-4] + '.npy', spec)

    Zambiri mufayilo create_base.py
    Zotsatira zake, tikamayendetsa script yayikulu, tipeza zoyikapo kuchokera ku ma spectrograms koyambirira:

    for file in glob.glob("voice_db/*.npy"):
        spec = np.load(file)
        spec = spec.astype('float32')
        spec_reshaped = spec.reshape(1, 1, spec.shape[0], spec.shape[1])
        srNet.setInput(spec_reshaped)
        pred = srNet.forward()
        emb = np.squeeze(pred)

    Pambuyo polandira kulowetsedwa kuchokera kugawo lomveka, tidzatha kudziwa kuti ndi ndani potenga mtunda wa cosine kuchokera pa ndimeyi kupita ku mawu onse omwe ali mu database (ang'onoang'ono, ochulukirapo) - pa chiwonetsero chomwe timayika pakhomo. ku 0.3):

            dist_list = cdist(emb, enroll_embs, metric="cosine")
            distances = pd.DataFrame(dist_list, columns = df.speaker)

    Pamapeto pake, ndikufuna kudziwa kuti kuthamanga kwachangu kunali kofulumira ndipo kunapangitsa kuti awonjezere mitundu ina ya 1-2 (pachitsanzo cha masekondi 7 adatenga 2.5 kuti afotokoze). Sitinakhalenso ndi nthawi yowonjezeretsa zitsanzo zatsopano ndikungoyang'ana polemba chitsanzo cha intaneti.

    Kugwiritsa ntchito pa intaneti

    Mfundo yofunika: timatenga rauta ndi ife kuchokera kunyumba ndikukhazikitsa maukonde athu amderalo, zimathandiza kulumikiza chipangizocho ndi laputopu pamaneti.

    Kumbuyo kwake ndi njira yofikira kumapeto pakati pa kutsogolo ndi Raspberry Pi, kutengera ukadaulo wa websocket (http over tcp protocol).

    Gawo loyamba ndikulandila zidziwitso zokonzedwa kuchokera ku rasipiberi, ndiye kuti, zolosera zomwe zadzaza mu json, zomwe zimasungidwa m'dawunilodi pakati paulendo wawo kuti ziwerengero zitha kupangidwa zokhudzana ndi momwe wogwiritsa ntchitoyo akumvera panthawiyo. Paketi iyi imatumizidwa kutsogolo, yomwe imagwiritsa ntchito kulembetsa ndi kulandira mapaketi kuchokera kumapeto kwa websocket. Makina onse akumbuyo amamangidwa mu chilankhulo cha golang; idasankhidwa chifukwa ndiyoyenera ntchito za asynchronous, zomwe ma goroutines amagwira bwino.
    Mukafika kumapeto, wogwiritsa ntchitoyo amalembedwa ndikulowa mu dongosolo, ndiye uthenga wake umalandiridwa. Onse wogwiritsa ntchito ndi uthengawo amalowetsedwa mubwalo wamba, pomwe mauthenga amatumizidwa kale (kutsogolo kolembetsa), ndipo ngati wogwiritsa ntchito atseka kulumikizana (rasipiberi kapena kutsogolo), ndiye kuti kulembetsa kwake kumachotsedwa ndipo amachotsedwa. hub.

    OpenVINO hackathon: kuzindikira mawu ndi malingaliro pa Raspberry Pi
    Tikuyembekezera kugwirizana kuchokera kumbuyo

    Front-end ndi pulogalamu yapaintaneti yolembedwa mu JavaScript pogwiritsa ntchito laibulale ya React kuti ifulumizitse komanso kufewetsa chitukuko. Cholinga cha pulogalamuyi ndikuwoneratu zomwe zapezeka pogwiritsa ntchito ma algorithms omwe akuyenda kumbuyo chakumbuyo komanso mwachindunji pa Raspberry Pi. Tsambali liri ndi magawo omwe amagwiritsidwa ntchito pogwiritsa ntchito react-router, koma tsamba lalikulu lachidwi ndilo tsamba lalikulu, pomwe deta yosalekeza imalandiridwa mu nthawi yeniyeni kuchokera ku seva pogwiritsa ntchito teknoloji ya WebSocket. Rasipiberi Pi amazindikira mawu, amazindikira ngati ndi a munthu wina kuchokera pankhokwe yolembetsedwa, ndikutumiza mndandanda wazotheka kwa kasitomala. Makasitomala amawonetsa zidziwitso zaposachedwa, akuwonetsa avatar ya munthu yemwe nthawi zambiri amalankhula maikolofoni, komanso momwe amatchulira mawuwo.

    OpenVINO hackathon: kuzindikira mawu ndi malingaliro pa Raspberry Pi
    Tsamba lofikira lomwe lili ndi zolosera zasinthidwa

    Pomaliza

    Sizinali zotheka kumaliza zonse monga momwe tinakonzera, tinalibe nthawi, kotero chiyembekezo chachikulu chinali pachiwonetsero, kuti zonse zigwire ntchito. M'chiwonetserocho adakambirana za momwe chilichonse chimagwirira ntchito, zitsanzo zomwe adatenga, mavuto omwe adakumana nawo. Chotsatira chinali gawo lachiwonetsero - akatswiri adayenda mozungulira chipindacho mwachisawawa ndikufikira gulu lililonse kuti liwone mawonekedwe omwe akugwira ntchito. Adatifunsanso mafunso, aliyense adayankha gawo lake, adasiya intaneti pa laputopu, ndipo zonse zidayenda bwino momwe amayembekezera.

    Ndiloleni ndizindikire kuti mtengo wonse wa yankho lathu unali $150:

    • Raspberry Pi 3 ~ $35
    • Google AIY Voice Bonnet (mutha kutenga ndalama zoyankhulira) ~ 15$
    • Intel NCS 2 ~ 100 $

    Momwe mungasinthire:

    • Gwiritsani ntchito kulembetsa kuchokera kwa kasitomala - funsani kuti muwerenge mawu omwe amapangidwa mwachisawawa
    • Onjezani zitsanzo zingapo: mutha kudziwa jenda ndi zaka ndi mawu
    • Olekanitsa mawu omveka nthawi imodzi (diarization)

    Posungira: https://github.com/vladimirwest/OpenEMO

    OpenVINO hackathon: kuzindikira mawu ndi malingaliro pa Raspberry Pi
    Tatopa koma ndife okondwa

    Pomaliza, ndikufuna kunena zikomo kwa okonza ndi omwe atenga nawo mbali. Pakati pa ntchito zamagulu ena, ife tokha timakonda njira yowunikira malo oimika magalimoto aulere. Kwa ife, chinali chochitika chozizira kwambiri chomiza muzogulitsa ndi chitukuko. Ndikuyembekeza kuti zochitika zowonjezereka zowonjezereka zidzachitika m'madera, kuphatikizapo pamitu ya AI.

Source: www.habr.com

Kuwonjezera ndemanga