OpenVINO hackathon: ịmata olu na mmetụta na Raspberry Pi

November 30 - December 1 na Nizhny Novgorod e mere OpenVINO hackathon. A gwara ndị sonyere ka ha mepụta ụdị ngwọta ngwaahịa site na iji ngwa Intel OpenVINO. Ndị nhazi ahụ tụpụtara ndepụta nke isiokwu dị nso nke enwere ike iduzi mgbe ị na-ahọrọ ọrụ, mana mkpebi ikpeazụ ka dị na otu. Tụkwasị na nke ahụ, a kwadoro iji ụdị ndị na-adịghị etinye na ngwaahịa ahụ.

OpenVINO hackathon: ịmata olu na mmetụta na Raspberry Pi

N'isiokwu a anyị ga-agwa gị banyere otú anyị si kee anyị prototype nke ngwaahịa, nke anyị mesịrị were mbụ ebe.

Ihe karịrị otu 10 sonyere na hackathon. Ọ dị mma na ụfọdụ n'ime ha si mpaghara ndị ọzọ bịa. Ebe maka hackathon bụ ogige "Kremlinsky na Pochain", bụ ebe a na-etinye foto oge ochie nke Nizhny Novgorod n'ime, na ndị agha! (M na-echetara gị na n'oge a Central ụlọ ọrụ Intel dị na Nizhny Novgorod). E nyere ndị sonyere awa 26 ka ha dee koodu, na njedebe ha ga-eweta ngwọta ha. Uru dị iche iche bụ ọnụnọ nke nnọkọ ngosi iji jide n'aka na emejuputa ihe niile akwadoro ma ghara ịnọgide na-eche echiche na ngosi. Ahịa, nri nri, nri, ihe niile dịkwa ebe ahụ!

Na mgbakwunye, Intel nyere nhọrọ igwefoto, Raspberry PI, Neural Compute Stick 2.

Nhọrọ ọrụ

Otu n'ime akụkụ kachasị sie ike nke ịkwadebe maka hackathon n'efu bụ ịhọrọ ihe ịma aka. Anyị kpebiri ozugbo iwepụta ihe na-adịbeghị na ngwaahịa ahụ, ebe ọ bụ na ọkwa ahụ kwuru na a nabatara nke a nke ukwuu.

N'inyochala ụdị, nke gụnyere na ngwaahịa na ntọhapụ ugbu a, anyị na-abịa ná nkwubi okwu na ọtụtụ n'ime ha na-edozi nsogbu dị iche iche nke kọmputa. Ọzọkwa, ọ na-esiri ike ịmepụta nsogbu na mpaghara ọhụụ kọmputa nke enweghị ike idozi site na iji OpenVINO, ọ bụrụgodị na enwere ike ịmepụta ya, ọ na-esiri ike ịchọta ụdị ndị a zụrụ azụ na mpaghara ọha. Anyị na-ekpebi igwu n'akụkụ ọzọ - kwupụta nhazi okwu na nyocha. Ka anyị tụlee otu ọrụ na-adọrọ mmasị nke ịmata mmetụta sitere na okwu. A ghaghị ikwu na OpenVINO enweelarị ihe nlereanya nke na-ekpebi mmetụta mmadụ dabere na ihu ha, mana:

  • Na tiori, ọ ga-ekwe omume ịmepụta algọridim jikọtara ọnụ nke ga-arụ ọrụ na ma ụda ma oyiyi, nke kwesịrị inye mmụba nke ziri ezi.
  • Igwefoto na-enwekarị akụkụ nlele dị warara; ihe karịrị otu igwefoto ka achọrọ iji kpuchie nnukwu mpaghara; ụda enweghị oke dị otú ahụ.

Ka anyị zụlite echiche ahụ: ka anyị were echiche maka mpaghara azụmaahịa dịka ndabere. Ị nwere ike ịlele afọ ojuju ndị ahịa na nlele ụlọ ahịa. Ọ bụrụ na otu n'ime ndị ahịa ahụ enweghị afọ ojuju na ọrụ ahụ wee malite ịkwalite ụda ha, ị nwere ike ịkpọ onye nchịkwa ozugbo maka enyemaka.
N'okwu a, anyị kwesịrị ịgbakwunye ụda olu mmadụ, nke a ga-enye anyị ohere ịmata ọdịiche dị n'etiti ndị ọrụ ụlọ ahịa na ndị ahịa ma nye nyocha maka onye ọ bụla. Ọfọn, na mgbakwunye, ọ ga-ekwe omume nyochaa omume nke ndị ọrụ ụlọ ahịa n'onwe ha, nyochaa ikuku na otu, na-ada ụda!

Anyị na-emepụta ihe achọrọ maka ngwọta anyị:

  • Obere size nke lekwasịrị ngwaọrụ
  • Ezigbo oge ọrụ
  • Ọnụahịa dị ala
  • Mfe scalability

N'ihi ya, anyị na-ahọrọ Raspberry Pi 3 c dị ka lekwasịrị ngwaọrụ Intel NCS 2.

N'ebe a, ọ dị mkpa iburu n'uche otu akụkụ dị mkpa nke NCS - ọ na-arụ ọrụ kacha mma na ụlọ ọrụ CNN ọkọlọtọ, mana ọ bụrụ na ịchọrọ ịme ihe nlereanya nwere ọkwa omenala na ya, na-atụ anya nkwalite dị ala.

Enwere naanị otu obere ihe ị ga-eme: ịkwesịrị ịnweta igwe okwu. Igwe okwu USB mgbe niile ga-eme, mana ọ gaghị adị mma yana RPI. Ma ọbụna ebe a ngwọta n'ezie "dị nso." Iji dekọọ olu, anyị kpebiri iji bọọdụ Voice Bonnet sitere na ngwa ahụ Ngwa Google AIY, nke enwere igwe okwu stereo wired na ya.

Budata Raspbian si Ebe nchekwa ọrụ AIY ma bulite ya na draịva flash, nwalee na igwe okwu na-arụ ọrụ site na iji iwu na-esonụ (ọ ga-edekọ ụda 5 sekọnd ogologo wee chekwaa ya na faịlụ):

arecord -d 5 -r 16000 test.wav

Ekwesịrị m mara ozugbo na igwe okwu na-enwe mmetụta nke ukwuu ma na-eburu mkpọtụ nke ọma. Iji dozie nke a, ka anyị gaa na alsamixer, họrọ ngwaọrụ Capture wee belata ọkwa mgbama ntinye na 50-60%.

OpenVINO hackathon: ịmata olu na mmetụta na Raspberry Pi
Anyị na-agbanwe ahụ ahụ na faịlụ na ihe niile dabara, ị nwere ike ọbụna mechie ya na mkpuchi

Na-agbakwụnye bọtịnụ egosi

Ka anyị na-ewepụ ngwa AIY Voice Kit iche, anyị na-echeta na enwere bọtịnụ RGB, nke nwere ike ịchịkwa ọkụ azụ ya site na ngwanrọ. Anyị na-achọ "Google AIY Led" wee chọta akwụkwọ: https://aiyprojects.readthedocs.io/en/latest/aiy.leds.html
Kedu ihe kpatara na ị gaghị eji bọtịnụ a iji gosipụta mmetụta uche a ghọtara, anyị nwere nanị 7 klas, na bọtịnụ nwere 8 agba, dị nnọọ ezu!

Anyị na-ejikọta bọtịnụ site na GPIO na Voice Bonnet, na-ebunye ọba akwụkwọ ndị dị mkpa (a tinyelarị ha na ngwa nkesa site na AIY oru)

from aiy.leds import Leds, Color
from aiy.leds import RgbLeds

Ka anyị mepụta dict nke mmetụta ọ bụla ga-enwe agba kwekọrọ n'ụdị RGB Tuple na ihe klas aiy.leds.Leds, nke anyị ga-esi na-emelite agba:

led_dict = {'neutral': (255, 255, 255), 'happy': (0, 255, 0), 'sad': (0, 255, 255), 'angry': (255, 0, 0), 'fearful': (0, 0, 0), 'disgusted':  (255, 0, 255), 'surprised':  (255, 255, 0)} 
leds = Leds()

Na n'ikpeazụ, mgbe amụma ọhụrụ ọ bụla nke mmetụta uche, anyị ga-emelite agba nke bọtịnụ dị na ya (site na igodo).

leds.update(Leds.rgb_on(led_dict.get(classes[prediction])))

OpenVINO hackathon: ịmata olu na mmetụta na Raspberry Pi
Bọtịnụ, gbaa!

Na-arụ ọrụ na olu

Anyị ga-eji pyaudio weghara iyi ahụ site na igwe okwu na webrtcvad iji nzacha mkpọtụ wee chọpụta olu. Na mgbakwunye, anyị ga-emepụta kwụ n'ahịrị nke anyị ga-agbakwunye n'otu n'otu ma wepụ akụkụ olu.

Ebe ọ bụ na webrtcvad nwere oke na nha nke iberibe a na-enye - ọ ga-abụrịrị 10/20/30ms, na ọzụzụ nke ihe nlereanya maka ịmata mmetụta uche (dị ka anyị ga-amụta mgbe e mesịrị) na-eme na 48kHz dataset, anyị ga-eme. weghara chunks nke nha 48000×20ms/1000×1( mono)=960 bytes. Webrtcvad ga-eweghachite Ezi/ Ụgha maka nke ọ bụla n'ime akụkụ ndị a, nke kwekọrọ na ọnụnọ ma ọ bụ enweghị votu na chunk.

Ka anyị mejuputa mgbagha ndị a:

  • Anyị ga-agbakwunye na ndepụta ahụ chunks ebe enwere votu; ọ bụrụ na enweghị votu, mgbe ahụ, anyị ga-abawanye counter nke chunks efu.
  • Ọ bụrụ na counter nke efu chunks bụ> = 30 (600 ms), mgbe ahụ, anyị na-ele anya na size nke ndepụta nke chịkọbara chunks; ọ bụrụ na ọ bụ> 250, mgbe ahụ, anyị na-agbakwunye ya kwụ n'ahịrị; ma ọ bụrụ na ọ bụghị, anyị na-atụle na ogologo nke ndekọ ezughị iji zụọ ya na ihe nlereanya iji chọpụta ọkà okwu.
  • Ọ bụrụ na counter nke efu chunks ka <30, na size nke ndepụta nke chịkọbara chunks karịrị 300, mgbe ahụ, anyị ga-agbakwunyere iberibe na kwụ n'ahịrị maka a ọzọ ziri ezi amụma. (n'ihi na mmetụta uche na-agbanwe ka oge na-aga)

 def to_queue(frames):
    d = np.frombuffer(b''.join(frames), dtype=np.int16)
    return d

framesQueue = queue.Queue()
def framesThreadBody():
    CHUNK = 960
    FORMAT = pyaudio.paInt16
    CHANNELS = 1
    RATE = 48000

    p = pyaudio.PyAudio()
    vad = webrtcvad.Vad()
    vad.set_mode(2)
    stream = p.open(format=FORMAT,
                channels=CHANNELS,
                rate=RATE,
                input=True,
                frames_per_buffer=CHUNK)
    false_counter = 0
    audio_frame = []
    while process:
        data = stream.read(CHUNK)
        if not vad.is_speech(data, RATE):
            false_counter += 1
            if false_counter >= 30:
                if len(audio_frame) > 250:              
                    framesQueue.put(to_queue(audio_frame,timestamp_start))
                    audio_frame = []
                    false_counter = 0

        if vad.is_speech(data, RATE):
            false_counter = 0
            audio_frame.append(data)
            if len(audio_frame) > 300:                
                    framesQueue.put(to_queue(audio_frame,timestamp_start))
                    audio_frame = []

Ọ bụ oge ịchọ ụdị a zụrụ azụ na ngalaba ọha, gaa na github, Google, mana cheta na anyị nwere oke na architecture eji. Nke a bụ akụkụ siri ike, n'ihi na ị ga-anwale ụdị ndị ahụ na data ntinye gị, na mgbakwunye, gbanwee ha ka ọ bụrụ usoro ime ime OpenVINO - IR (Nnọchiteanya etiti). Anyị gbalịrị ihe dị ka 5-7 dị iche iche ngwọta site na github, ma ọ bụrụ na ihe nlereanya maka ịmata mmetụta uche na-arụ ọrụ ozugbo, mgbe ahụ na olu olu anyị ga-echere ogologo oge - ha na-eji mgbagwoju architectures.

Anyị na-elekwasị anya na ihe ndị a:

  • Mmetụta sitere na olu - https://github.com/alexmuhr/Voice_Emotion
    Ọ na-arụ ọrụ dị ka ụkpụrụ na-esonụ: audio na-ebipụ n'ime akụkụ nke a ụfọdụ size, n'ihi na nke ọ bụla n'ime ndị a amaokwu anyị na-ahọrọ MFCC wee nyefee ha dị ka ntinye na CNN
  • Nchọpụta olu - https://github.com/linhdvu14/vggvox-speaker-identification
    N'ebe a, kama MVCC, anyị na-arụ ọrụ na spectrogram, mgbe FFT gasịrị, anyị na-eri nri na CNN, ebe na mmepụta anyị na-enweta ihe ngosi vector nke olu.

Ọzọ anyị ga-ekwu maka ịtụgharị ụdị, malite na tiori. OpenVINO gụnyere ọtụtụ modul:

  • Mepee Zoo Model, ụdị nke enwere ike iji wee tinye ya na ngwaahịa gị
  • Model Optimzer, ekele maka nke ị nwere ike ịtụgharị ihe nlereanya site na usoro nhazi dị iche iche (Tensorflow, ONNX wdg) n'ime usoro nnọchiteanya etiti, nke anyị ga-arụ ọrụ n'ihu.
  • Injin Inference na-enye gị ohere ịme ụdị n'ụdị IR na ndị nrụpụta Intel, Myriad chips na Neural Compute Stick accelerators.
  • Ụdị OpenCV kacha arụ ọrụ nke ọma (yana nkwado Inference Engine)
    A na-akọwa ụdị ọ bụla n'ụdị IR site na faịlụ abụọ: .xml na .bin.
    A na-atụgharị ụdịdị ka ọ bụrụ usoro IR site na Model Optimizer dị ka ndị a:

    python /opt/intel/openvino/deployment_tools/model_optimizer/mo_tf.py --input_model speaker.hdf5.pb --data_type=FP16 --input_shape [1,512,1000,1]

    --data_type na-enye gị ohere ịhọrọ usoro data nke ihe nlereanya ahụ ga-arụ ọrụ. FP32, FP16, INT8 na-akwado. Ịhọrọ ụdị data kachasị mma nwere ike inye nkwalite arụmọrụ dị mma.
    --input_shape na-egosi akụkụ nke data ntinye. Ikike ịgbanwe nke ọma ọ dị ka ọ dị na C ++ API, mana anyị gwughị ebe ahụ wee dozie ya maka otu ụdị.
    Na-esote, ka anyị nwaa ịkwanye ụdị agbanweela na usoro IR site na modul DNN n'ime OpenCV wee bufee ya na ya.

    import cv2 as cv
    emotionsNet = cv.dnn.readNet('emotions_model.bin',
                              'emotions_model.xml')
    emotionsNet.setPreferableTarget(cv.dnn.DNN_TARGET_MYRIAD)

    Ahịrị ikpeazụ na nke a na-enye gị ohere ịmegharị mgbakọ na mwepụ na Neural Compute Stick, a na-eme mgbako isi na processor, mana n'ihe banyere Raspberry Pi nke a agaghị arụ ọrụ, ị ga-achọ osisi.

    Ọzọ, mgbagha bụ nke a: anyị na-ekewa ọdịyo anyị n'ime windo nke otu nha (maka anyị ọ bụ 0.4 s), anyị na-atụgharị nke ọ bụla n'ime windo ndị a na MMFC, nke anyị na-eri nri na grid:

    emotionsNet.setInput(MFCC_from_window)
    result = emotionsNet.forward()

    Ọzọ, ka anyị were klaasị kachasị maka windo niile. Ngwọta dị mfe, mana maka hackathon ịkwesighi iwepụta ihe na-adịghị mma, naanị ma ọ bụrụ na ị nwere oge. Anyị ka nwere ọtụtụ ọrụ anyị ga-arụ, yabụ ka anyị gaa n'ihu - anyị ga-ahụ maka njirimara olu. Ọ dị mkpa ịmepụta ụfọdụ ụdị nchekwa data nke a ga-echekwa spectrograms nke olu edekọrịrị. Ebe ọ bụ na e nwere obere oge, anyị ga-edozi okwu a otú ike kwere anyị.

    Ya bụ, anyị na-emepụta edemede maka ịdekọ ụda olu (ọ na-arụ ọrụ n'otu ụzọ ahụ dịka akọwara n'elu, naanị mgbe a kwụsịrị na keyboard ọ ga-echekwa olu na faịlụ).

    Ka anyị nwaa:

    python3 voice_db/record_voice.py test.wav

    Anyị na-edekọ olu ọtụtụ mmadụ (n'ọnọdụ anyị, ndị otu atọ)
    Na-esote, maka olu ọ bụla edekọtara anyị na-eme mgbanwe ọsọ ọsọ nke anọ, nweta spectrogram wee chekwaa ya dị ka ọnụọgụ ọnụọgụ (.npy):

    for file in glob.glob("voice_db/*.wav"):
            spec = get_fft_spectrum(file)
            np.save(file[:-4] + '.npy', spec)

    Nkọwa ndị ọzọ na faịlụ ahụ create_base.py
    N'ihi ya, mgbe anyị na-agba ọsọ isi edemede, anyị ga-enweta ntinye site na spectrograms ndị a na mmalite:

    for file in glob.glob("voice_db/*.npy"):
        spec = np.load(file)
        spec = spec.astype('float32')
        spec_reshaped = spec.reshape(1, 1, spec.shape[0], spec.shape[1])
        srNet.setInput(spec_reshaped)
        pred = srNet.forward()
        emb = np.squeeze(pred)

    Mgbe ị nwetasịrị ntinye site na akụkụ a na-ada ụda, anyị ga-enwe ike ikpebi onye ọ bụ site n'iwere anya cosine site na akụkụ ahụ gaa na olu niile dị na nchekwa data (nke nta, nke kachasị) - maka ngosi ahụ, anyị na-edozi ọnụ ụzọ. ruo 0.3):

            dist_list = cdist(emb, enroll_embs, metric="cosine")
            distances = pd.DataFrame(dist_list, columns = df.speaker)

    N'ikpeazụ, ọ ga-amasị m ịmara na ọsọ inference dị ngwa ngwa ma mee ka o kwe omume ịgbakwunye 1-2 ụdị ọzọ (maka ihe atụ 7 sekọnd ogologo ọ were 2.5 maka ntinye aka). Anyị enwekwaghị oge iji tinye ụdị ọhụrụ wee lekwasị anya n'ịde prototype nke ngwa weebụ.

    Ngwa webụ

    Otu ihe dị mkpa: anyị na-eburu anyị rawụta site n'ụlọ ma guzobe netwọk mpaghara anyị, ọ na-enyere aka jikọọ ngwaọrụ na laptọọpụ na netwọk.

    Azụ azụ bụ ọwa ozi ngwụcha ruo ngwụcha n'etiti ihu na Raspberry Pi, dabere na teknụzụ websocket (http over tcp protocol).

    Nzọụkwụ mbụ bụ ịnata ozi edoziziri site na raspberry, ya bụ, ndị amụma juru na json, bụ ndị echekwara na nchekwa data ọkara n'ime njem ha ka e wee nwee ike iwepụta ọnụ ọgụgụ gbasara ọnọdụ mmetụta uche onye ọrụ maka oge ahụ. A na-ezigakwa ngwugwu a na frontend, nke na-eji ndenye aha ma nata ngwugwu site na njedebe websocket. Emebere usoro azụ azụ niile n'asụsụ golang; ahọpụtara ya n'ihi na ọ dabara nke ọma maka ọrụ asynchronous, nke goroutines na-ejikwa nke ọma.
    Mgbe ị na-abanye na njedebe njedebe, a na-edebanye aha onye ọrụ ma banye n'ime nhazi ahụ, mgbe ahụ, a na-enweta ozi ya. A na-abanye ma onye ọrụ na ozi ahụ n'ime oghere nkịtị, nke ezipụlarị ozi n'ihu (na n'ihu ndị debanyere aha), ma ọ bụrụ na onye ọrụ ahụ mechie njikọ (raspberry ma ọ bụ n'ihu), mgbe ahụ, a kagburu ndenye aha ya ma wepụ ya na ya. hub.

    OpenVINO hackathon: ịmata olu na mmetụta na Raspberry Pi
    Anyị na-eche njikọ site na azụ

    Front-end bụ ngwa webụ edere na Javascript site na iji ọbá akwụkwọ React iji mee ka usoro mmepe dị mfe ma dị mfe. Ebumnuche nke ngwa a bụ iji anya nke uche hụ data enwetara site na iji algọridim na-agba ọsọ n'akụkụ azụ na ozugbo na Raspberry Pi. Ibe ahụ nwere usoro nhazi ngalaba etinyere site na iji react-router, mana isi peeji nke mmasị bụ isi ibe, ebe a na-enweta data na-aga n'ihu ozugbo site na ihe nkesa site na iji teknụzụ WebSocket. Raspberry Pi na-achọpụta olu, na-ekpebi ma ọ bụ nke otu onye sitere na nchekwa data edebanyere aha, wee zigara onye ahịa ndepụta ihe omume. Onye ahịa ahụ na-egosiputa data kacha ọhụrụ dị mkpa, na-egosiputa avatar nke onye o yikarịrị ka ọ na-ekwu okwu n'ime igwe okwu, yana mmetụta ọ na-eji kpọpụta okwu.

    OpenVINO hackathon: ịmata olu na mmetụta na Raspberry Pi
    Ibe ụlọ nwere amụma emelitere

    nkwubi

    Ọ gaghị ekwe omume imezu ihe niile dị ka atụmatụ, anyị enweghị oge, ya mere, olileanya bụ isi bụ na ngosi, na ihe niile ga-arụ ọrụ. Na ngosi ha kwuru banyere otú ihe niile si arụ ọrụ, ụdị ụdị ha weere, nsogbu ndị ha zutere. Ọzọ bụ akụkụ ngosi - ndị ọkachamara na-agagharị n'ime ụlọ ahụ n'usoro na-enweghị usoro wee gakwuru otu ọ bụla ka ha lelee ụdị ọrụ ahụ. Ha jụkwara anyị ajụjụ, onye ọ bụla zara akụkụ nke ya, ha hapụrụ webụ na laptọọpụ, na ihe niile na-arụ ọrụ n'ezie dịka a tụrụ anya ya.

    Ka m mara na ngụkọta ego nke ngwọta anyị bụ $150:

    • Raspberry Pi 3 ~ $35
    • Google AIY Voice Bonnet (ị nwere ike ịnara ego nkwupute) ~ 15$
    • Intel NCS 2 ~ 100$

    Otu esi emeziwanye:

    • Jiri ndebanye aha n'aka onye ahịa - rịọ ka ịgụọ ederede emepụtara na-enweghị usoro
    • Tinye ụdị ole na ole ọzọ: ị nwere ike ikpebi okike na afọ site na olu
    • Kewapụ ụda olu na-ada n'otu oge (diarization)

    Ebe nchekwa: https://github.com/vladimirwest/OpenEMO

    OpenVINO hackathon: ịmata olu na mmetụta na Raspberry Pi
    Ike gwụrụ anyị mana anyị nwere obi ụtọ

    N'ikpeazụ, m ga-achọ ịsị daalụ ndị nhazi na ndị sonyere. N'ime ọrụ nke otu ndị ọzọ, anyị n'onwe anyị nwere mmasị na ngwọta maka nyochaa oghere ndị na-adọba ụgbọala n'efu. Maka anyị, ọ bụ ahụmịhe dị mma nke imikpu na ngwaahịa na mmepe. Enwere m olileanya na a ga-eme ihe omume ndị ọzọ na-adọrọ mmasị na mpaghara, gụnyere na isiokwu AI.

isi: www.habr.com

Tinye a comment