OpenVINO hackathon: kuziva izwi uye manzwiro paRaspberry Pi

Mbudzi 30 - Zvita 1 muNizhny Novgorod yakaitwa OpenVINO hackathon. Vatori vechikamu vakakumbirwa kuti vagadzire prototype yemhinduro yechigadzirwa vachishandisa Intel OpenVINO toolkit. Varongi vakaronga runyoro rwemisoro ingangove inotungamirwa nekusarudza basa, asi sarudzo yekupedzisira yakaramba iine zvikwata. Mukuwedzera, kushandiswa kwemhando dzisina kuiswa muchigadzirwa kwakakurudzirwa.

OpenVINO hackathon: kuziva izwi uye manzwiro paRaspberry Pi

Muchikamu chino tichakuudza nezve masikirwo atakaita prototype yechigadzirwa, icho isu takazotora nzvimbo yekutanga.

Zvinopfuura zvikwata gumi zvakatora chikamu muhackathon. Zvinofadza kuti vamwe vavo vakabva kune mamwe matunhu. Nzvimbo yehackathon yaiva "Kremlinsky paPochain" yakaoma, apo mifananidzo yekare yeNizhny Novgorod yakasungirirwa mukati, mumusangano! (Ndinokuyeuchidza kuti panguva ino hofisi yepakati yeIntel iri muNizhny Novgorod). Vatori vechikamu vakapihwa maawa makumi maviri nematanhatu ekunyora kodhi, uye pakupedzisira vaifanira kuratidza mhinduro yavo. Mukana wakasiyana waive kuvepo kwedemo sesheni yekuona kuti zvese zvakarongwa zvanyatsoitwa uye hazvina kuramba zviri pfungwa mumharidzo. Kutengeswa, chikafu, chikafu, zvese zvaivepo futi!

Uye zvakare, Intel inosarudza yakapihwa makamera, Raspberry PI, Neural Compute Stick 2.

Basa rekusarudza

Chimwe chezvikamu zvakaoma zvikuru zvekugadzirira mahara-fomu hackathon kusarudza dambudziko. Takabva tangosarudza kuuya nechimwe chinhu chakanga chisati chiri muchigadzirwa chacho, sezvo chiziviso chacho chaiti izvi zvinogamuchirwa zvikuru.

Ndaongorora mienzaniso, iyo inosanganisirwa muchigadzirwa mukuburitswa kwazvino, tinosvika pakugumisa kuti vazhinji vavo vanogadzirisa matambudziko akasiyana ekuona komputa. Uyezve, zvakaoma zvikuru kuuya nedambudziko mumunda wekuona kwekombiyuta isingagoni kugadziriswa uchishandisa OpenVINO, uye kunyange kana imwe inogona kugadzirwa, zvakaoma kuwana mienzaniso yakadzidziswa kare munharaunda yevanhu. Isu tinosarudza kuchera mune imwe nzira - yakanangana nekugadzirisa kutaura uye analytics. Ngationei basa rinonakidza rekuziva manzwiro kubva mukutaura. Zvinofanira kutaurwa kuti OpenVINO yatove nemuenzaniso unosarudza manzwiro emunhu zvichienderana nechiso chavo, asi:

  • Mukutaura, zvinokwanisika kugadzira algorithm yakabatanidzwa iyo ichashanda pane zvose inzwi uye mufananidzo, iyo inofanira kupa kuwedzera kwekururama.
  • Makamera anowanzo ane nhete yekuona kona; inodarika kamera imwe inodiwa kuvhara nzvimbo yakakura; ruzha haruna muganhu wakadaro.

Ngatikudziridzei zano: ngatitorei zano rechikamu chekutengesa sehwaro. Iwe unogona kuyera kugutsikana kwevatengi pazvitoro zvekubhadhara. Kana mumwe wevatengi asingagutsikane nebasa uye otanga kusimudza toni yavo, unogona kufonera maneja kuti ubatsirwe.
Muchiitiko ichi, tinoda kuwedzera kuzivikanwa kwezwi revanhu, izvi zvichatibvumira kusiyanisa vashandi vezvitoro kubva kune vatengi uye kupa analytics kune mumwe nomumwe. Zvakanaka, nekuwedzera, zvinokwanisika kuongorora maitiro evashandi vechitoro ivo pachavo, kuongorora mamiriro ekunze muchikwata, inonzwika zvakanaka!

Isu tinogadzira zvinodikanwa zvemhinduro yedu:

  • Diki saizi yechinhu chakanangwa
  • Real time operation
  • Mutengo wakaderera
  • Easy scalability

Nekuda kweizvozvo, isu tinosarudza Raspberry Pi 3 c sechinhu chinonangwa Intel NCS 2.

Pano zvakakosha kucherechedza chinhu chimwe chakakosha cheNCS - inoshanda zvakanyanya neyakajairwa CNN architecture, asi kana iwe uchida kumhanyisa modhi ine tsika magalamu pairi, saka tarisira yakaderera-level optimization.

Pane chinhu chidiki chimwe chete chekuita: unofanirwa kutora maikorofoni. Iyo yakajairika maikorofoni ye USB ichaita, asi haizotaridzike zvakanaka pamwe chete neRPI. Asi kunyange pano mhinduro yacho chaizvoizvo "iri pedyo." Kurekodha izwi, isu tinosarudza kushandisa Voice Bonnet board kubva pakiti Google AIY Voice Kit, pairi pane wired stereo maikorofoni.

Dhawunirodha Raspbian kubva AIY mapurojekiti repository uye uiise kune flash drive, edza kuti maikorofoni inoshanda uchishandisa murairo unotevera (inozorekodha odhiyo 5 masekonzi kureba uye kuichengeta kufaira):

arecord -d 5 -r 16000 test.wav

Ini ndinofanira kukurumidza kuona kuti maikorofoni inonyatsonzwa uye inotora ruzha zvakanaka. Kugadzirisa izvi, ngatiendei kune alsamixer, sarudza Capture zvishandiso uye uderedze iyo yekuisa chiratidzo nhanho kusvika 50-60%.

OpenVINO hackathon: kuziva izwi uye manzwiro paRaspberry Pi
Isu tinogadzirisa muviri nefaira uye zvese zvinoenderana, unogona kutovhara nevhavha

Kuwedzera bhatani rechiratidzo

Tichiri kutora iyo AIY Voice Kit parutivi, tinorangarira kuti kune bhatani reRGB, iro rekumashure iro rinogona kudzorwa nesoftware. Isu tinotsvaga "Google AIY Led" uye tinowana zvinyorwa: https://aiyprojects.readthedocs.io/en/latest/aiy.leds.html
Wadii kushandisa bhatani iri kuratidza manzwiro anozivikanwa, isu tine makirasi manomwe chete, uye bhatani rine mavara masere, akakwana!

Isu tinobatanidza bhatani kuburikidza neGPIO kune Voice Bonnet, torodha maraibhurari anodiwa (atove akaiswa mukiti yekugovera kubva kuAIY mapurojekiti)

from aiy.leds import Leds, Color
from aiy.leds import RgbLeds

Ngatigadzirei dict umo manzwiro ega ega achange aine ruvara runoenderana muchimiro cheRGB Tuple uye chinhu chekirasi aiy.leds.Leds, kuburikidza naro isu tichavandudza ruvara:

led_dict = {'neutral': (255, 255, 255), 'happy': (0, 255, 0), 'sad': (0, 255, 255), 'angry': (255, 0, 0), 'fearful': (0, 0, 0), 'disgusted':  (255, 0, 255), 'surprised':  (255, 255, 0)} 
leds = Leds()

Uye pakupedzisira, mushure mekufanotaura kwega kwega kwemanzwiro, isu tichavandudza ruvara rwebhatani zvinoenderana naro (nekiyi).

leds.update(Leds.rgb_on(led_dict.get(classes[prediction])))

OpenVINO hackathon: kuziva izwi uye manzwiro paRaspberry Pi
Bhatani, pisa!

Kushanda nezwi

Isu tichashandisa pyaudio kubata rukova kubva maikorofoni uye webrtcvad kusefa ruzha uye kuona izwi. Uye zvakare, isu tichagadzira mutsara watichawedzera asynchronously uye kubvisa manzwi ezwi.

Sezvo webrtcvad ine ganhuriro pahukuru hwechidimbu chakapihwa - inofanirwa kuenzana ne10/20/30ms, uye kudzidziswa kwemuenzaniso wekuziva manzwiro (sezvatichadzidza gare gare) kwakaitwa pane 48kHz dataset, isu tichaita. tora machunks ehukuru 48000Γ—20ms/1000Γ—1(mono)=960 bytes. Webrtcvad ichadzosa Chokwadi/Nhema pane chimwe nechimwe chezvikamu izvi, izvo zvinoenderana nekuvapo kana kusavapo kwevhoti muchunk.

Ngatishandisei inotevera logic:

  • Isu tichawedzera kune iyo chunks kune iyo ine vhoti; kana pasina vhoti, isu tichawedzera iyo counter yezvisina chinhu chunks.
  • Kana iyo counter yezvisina chinhu chunks iri > = 30 (600 ms), tobva tatarisa saizi yerondedzero yeakaunganidzwa chunks; kana iri > 250, tobva tawedzera kumutsara; kana zvisiri, isu tinofunga kuti kureba. yerekodhi haina kukwana kuidyisa kumuenzaniso kuti uone mukurukuri.
  • Kana iyo counter yezvimedu zvisina chinhu ichiri <30, uye saizi yerondedzero yezvimedu zvakaunganidzwa inodarika mazana matatu, isu tichawedzera chidimbu kumutsara wekufanotaura kwakaringana. (nekuti manzwiro anowanzo chinja nekufamba kwenguva)

 def to_queue(frames):
    d = np.frombuffer(b''.join(frames), dtype=np.int16)
    return d

framesQueue = queue.Queue()
def framesThreadBody():
    CHUNK = 960
    FORMAT = pyaudio.paInt16
    CHANNELS = 1
    RATE = 48000

    p = pyaudio.PyAudio()
    vad = webrtcvad.Vad()
    vad.set_mode(2)
    stream = p.open(format=FORMAT,
                channels=CHANNELS,
                rate=RATE,
                input=True,
                frames_per_buffer=CHUNK)
    false_counter = 0
    audio_frame = []
    while process:
        data = stream.read(CHUNK)
        if not vad.is_speech(data, RATE):
            false_counter += 1
            if false_counter >= 30:
                if len(audio_frame) > 250:              
                    framesQueue.put(to_queue(audio_frame,timestamp_start))
                    audio_frame = []
                    false_counter = 0

        if vad.is_speech(data, RATE):
            false_counter = 0
            audio_frame.append(data)
            if len(audio_frame) > 300:                
                    framesQueue.put(to_queue(audio_frame,timestamp_start))
                    audio_frame = []

Yave nguva yekutsvaga mamodheru akadzidziswa munzvimbo yeruzhinji, enda ku github, Google, asi rangarira kuti isu tine ganhuriro pakuvaka kunoshandiswa. Ichi chikamu chakaoma, nekuti iwe unofanirwa kuyedza mamodheru pane yako yekuisa data, uye nekuwedzera, shandura iwo kune OpenVINO's yemukati fomati - IR (Intermediate Representation). Takaedza nezve 5-7 mhinduro dzakasiyana kubva kugithub, uye kana iyo modhi yekuziva manzwiro akashanda nekukurumidza, ipapo nekuzivikanwa kwezwi taifanira kumirira kwenguva yakareba - vanoshandisa zvivakwa zvakaomarara.

Isu tinotarisa pane zvinotevera:

Tevere tichataura nezve kushandura mamodheru, kutanga nedzidziso. OpenVINO inosanganisira akati wandei mamodule:

  • Vhura Model Zoo, mamodheru anogona kushandiswa uye kuverengerwa muchigadzirwa chako
  • Model Optimzer, nekuda kwaunokwanisa kushandura modhi kubva kwakasiyana mafomati mafomati (Tensorflow, ONNX nezvimwewo) kuita iyo Yepakati Representation fomati, yatichaenderera mberi nekushanda.
  • Inference Injini inobvumidza iwe kumhanya modhi muIR fomati paIntel processors, Myriad chips uye Neural Compute Stick accelerators.
  • Iyo inonyanya kushanda vhezheni yeOpenCV (ine Inference Injini rutsigiro)
    Modhi yega yega muIR format inotsanangurwa nemafaira maviri: .xml uye .bin.
    Mamodheru anoshandurwa kuita IR fomati kuburikidza neModel Optimizer sezvinotevera:

    python /opt/intel/openvino/deployment_tools/model_optimizer/mo_tf.py --input_model speaker.hdf5.pb --data_type=FP16 --input_shape [1,512,1000,1]

    --data_type inokubvumira kuti usarudze iyo data data iyo iyo modhi ichashanda nayo. FP32, FP16, INT8 inotsigirwa. Kusarudza iyo yakanakisa data mhando inogona kupa yakanaka kuita kuwedzera.
    --input_shape inoratidza chiyero che data yekupinza. Iko kugona kuishandura zvine simba inoita kunge iripo muC ++ API, asi isu hatina kuchera kusvika kure uye nekungoigadzirisa kune imwe yemhando.
    Tevere, ngatiedze kurodha iyo yakatoshandurwa modhi muIR fomati kuburikidza neDNN module muOpenCV uye titumire kwairi.

    import cv2 as cv
    emotionsNet = cv.dnn.readNet('emotions_model.bin',
                              'emotions_model.xml')
    emotionsNet.setPreferableTarget(cv.dnn.DNN_TARGET_MYRIAD)

    Mutsetse wekupedzisira mune iyi kesi unobvumidza iwe kuti udzore kuverenga kuNeural Compute Stick, maverengero ekutanga anoitwa pane processor, asi mune iyo Raspberry Pi izvi hazvishande, iwe unozoda tsvimbo.

    Tevere, pfungwa yacho ndeiyi: tinogovanisa odhiyo yedu mumahwindo eimwe saizi (kwedu ndeye 0.4 s), tinoshandura imwe neimwe yemahwindo aya kuita MFCC, yatinozodyisa kune grid:

    emotionsNet.setInput(MFCC_from_window)
    result = emotionsNet.forward()

    Tevere, ngatitorei kirasi yakajairika kune ese mahwindo. Mhinduro yakapusa, asi kune hackathon haufanire kuuya nechimwe chinhu chakanyanya abstruse, chete kana uine nguva. Tichine basa rakawanda rekuita, saka ngatienderere mberi - tichabata nekuzivikanwa kwezwi. Izvo zvinodikanwa kugadzira imwe mhando yedatabase umo ma spectrograms emazwi akarekodhwa aizochengetwa. Sezvo kwasara nguva shoma, tichagadzirisa nyaya iyi nepatinogona napo.

    Sezvineiwo, isu tinogadzira script yekurekodha izwi rakatorwa (rinoshanda nenzira imwechete sezvakatsanangurwa pamusoro, chete kana ikakanganiswa kubva pane keyboard inochengetedza izwi kufaira).

    Ngatiedze:

    python3 voice_db/record_voice.py test.wav

    Isu tinorekodha manzwi evanhu vakati wandei (munyaya yedu, nhengo nhatu dzechikwata)
    Tevere, pazwi rega rega rakarekodhwa tinoita shanduko inokurumidza kunana, tora spectrogram uye chengeta senge numpy array (.npy):

    for file in glob.glob("voice_db/*.wav"):
            spec = get_fft_spectrum(file)
            np.save(file[:-4] + '.npy', spec)

    Mamwe mashoko mufaira create_base.py
    Nekuda kweizvozvo, kana isu tichimhanya iyo huru script, isu tinowana embeddings kubva kune aya spectrograms pakutanga chaipo:

    for file in glob.glob("voice_db/*.npy"):
        spec = np.load(file)
        spec = spec.astype('float32')
        spec_reshaped = spec.reshape(1, 1, spec.shape[0], spec.shape[1])
        srNet.setInput(spec_reshaped)
        pred = srNet.forward()
        emb = np.squeeze(pred)

    Mushure mekugamuchira iyo yekumisikidza kubva muchikamu chakanzwika, isu tichakwanisa kuona kuti ndeyaani nekutora cosine chinhambwe kubva mundima kuenda kune ese manzwi ari mudhatabhesi (idiki, yakanyanya kuitika) - yedemo tinoisa chikumbaridzo. kusvika 0.3):

            dist_list = cdist(emb, enroll_embs, metric="cosine")
            distances = pd.DataFrame(dist_list, columns = df.speaker)

    Mukupedzisira, ndinoda kuona kuti inference yekumhanyisa yaive nekukurumidza uye yakaita kuti zvikwanise kuwedzera 1-2 mamwe mamodheru (yemuenzaniso 7 masekonzi kureba kwakatora 2.5 yekufungidzira). Isu takanga tisisina nguva yekuwedzera mhando nyowani uye takatarisana nekunyora prototype yewebhu application.

    Webhu application

    Chinhu chakakosha: isu tinotora router nesu kubva kumba uye kumisikidza network yedu yemuno, inobatsira kubatanidza mudziyo uye malaptop pane network.

    Iyo yekumashure ndeyekupedzisira-kusvika-kumagumo meseji chiteshi pakati pemberi neRaspberry Pi, yakavakirwa pawebhu socket tekinoroji (http pamusoro pe tcp protocol).

    Nhanho yekutanga ndeyekugamuchira ruzivo rwakagadziriswa kubva kune raspberry, ndiko kuti, vafanotaura vakarongedzerwa mujson, avo vanochengetwa mudhatabhesi pakati perwendo rwavo kuitira kuti nhamba dzigone kugadzirwa nezve mamiriro emushandisi epanguva yacho. Iyi pakiti inozotumirwa kumberi, iyo inoshandisa kunyoreswa uye inogamuchira mapaketi kubva kune websocket endpoint. Iyo yese yekumashure meshini yakavakirwa mumutauro wegolang; yakasarudzwa nekuti yakanyatso kuenderana neasynchronous mabasa, ayo goroutines anobata zvakanaka.
    Paunenge uchiwana iyo yekupedzisira, mushandisi anonyoreswa uye apinzwa muchimiro, ipapo meseji yake inogamuchirwa. Zvese mushandisi uye meseji zvinopindirwa mune yakajairika hub, kubva kune iyo mameseji anototumirwa mberi (kune akanyoreswa kumberi), uye kana mushandisi akavhara kubatana (raspberry kana kumberi), ipapo kunyorera kwake kunobviswa uye anobviswa kubva. hub.

    OpenVINO hackathon: kuziva izwi uye manzwiro paRaspberry Pi
    Isu takamirira kubatana kubva kumashure

    Kumberi-kumagumo iwebhu application yakanyorwa muJavaScript uchishandisa iyo React raibhurari kumhanyisa uye kurerutsa maitiro ekuvandudza. Chinangwa chechishandiso ichi ndechekuona data yakawanikwa uchishandisa algorithms inomhanya kuseri-kumucheto uye zvakananga paRaspberry Pi. Iro peji rine chikamu chekufambisa chinoshandiswa uchishandisa react-router, asi peji huru yekufarira ndiyo peji huru, iyo inoenderera mberi yedata inogamuchirwa munguva chaiyo kubva kuvhavha uchishandisa WebSocket teknolojia. Raspberry Pi inoona izwi, inoona kuti nderemumwe munhu kubva kudhatabhesi yakanyoreswa, uye inotumira runyorwa rwemukana kune mutengi. Mutengi anoratidza data razvino rakakodzera, anoratidza avatar yemunhu angangotaura mumakrofoni, pamwe nemanzwiro aanotaura nawo mazwi.

    OpenVINO hackathon: kuziva izwi uye manzwiro paRaspberry Pi
    Peji yekumba ine fungidziro dzakagadziridzwa

    mhedziso

    Zvakanga zvisingaite kupedzisa zvese sekuronga, isu takanga tisina nguva, saka tariro huru yaive mudemo, kuti zvese zvaizoshanda. Mumharidzo vakataura nezve kuti zvese zvinoshanda sei, ndeapi mamodheru avakatora, matambudziko api avakasangana nawo. Chinotevera chaive chikamu chedemo - nyanzvi dzakafamba dzakatenderedza mukamuri mune zvisina kurongeka uye dzakaswedera kune imwe neimwe timu kuti itarise iyo inoshanda prototype. Vakatibvunzawo mibvunzo, munhu wese akapindura chikamu chake, vakasiya webhu palaptop, uye zvese zvakanyatsoshanda sezvaitarisirwa.

    Rega ndione kuti mutengo wakakwana wemhinduro yedu yaive $150:

    • Raspberry Pi 3 ~ $35
    • Google AIY Voice Bonnet (unogona kutora muripo wekutaura) ~ 15$
    • Intel NCS 2 ~ 100 $

    Nzira yekuvandudza:

    • Shandisa kunyoresa kubva kumutengi - kumbira kuverenga zvinyorwa zvinogadzirwa zvisina tsarukano
    • Wedzera mamwe mashoma emhando: iwe unogona kuona kuti murume kana mukadzi nezera nezwi
    • Kuparadzanisa manzwi anonzwika panguva imwe chete (diarization)

    Repository: https://github.com/vladimirwest/OpenEMO

    OpenVINO hackathon: kuziva izwi uye manzwiro paRaspberry Pi
    Taneta asi tinofara isu

    Mukupedzisa, ndinoda kuti mazvita kune varongi nevatori vechikamu. Pakati pemapurojekiti ezvimwe zvikwata, isu pachedu takada mhinduro yekutarisa nzvimbo dzemahara dzekupaka. Kwatiri, chaive chiitiko chinotonhorera chekunyudzwa muchigadzirwa nebudiriro. Ndinovimba kuti zviitiko zvakawanda uye zvinonakidza zvichaitwa mumatunhu, kusanganisira paAI misoro.

Source: www.habr.com

Voeg