OpenVINO hackathon: gane murya da motsin rai akan Rasberi Pi

Nuwamba 30 - Disamba 1 a Nizhny Novgorod aka gudanar BudeVINO hackathon. An nemi mahalarta su ƙirƙiri samfurin samfurin maganin ta amfani da kayan aikin Intel OpenVINO. Masu shirya taron sun ba da shawarar jerin kusan batutuwa waɗanda za a iya jagoranta ta lokacin zabar wani aiki, amma yanke shawara ta ƙarshe ta kasance tare da ƙungiyoyi. Bugu da ƙari, an ƙarfafa yin amfani da samfuran da ba a haɗa su a cikin samfurin ba.

OpenVINO hackathon: gane murya da motsin rai akan Rasberi Pi

A cikin wannan labarin za mu gaya muku game da yadda muka ƙirƙiri samfurin mu na samfurin, wanda a ƙarshe muka ɗauki matsayi na farko.

Fiye da ƙungiyoyi 10 ne suka halarci hackathon. Yana da kyau cewa wasu daga cikinsu sun fito daga wasu yankuna. Wurin da aka yi hackathon shine hadadden "Kremlinsky on Pochain", inda aka rataye tsoffin hotunan Nizhny Novgorod a ciki, a cikin rakiyar! (Ina tunatar da ku cewa a halin yanzu babban ofishin Intel yana cikin Nizhny Novgorod). An bai wa mahalarta sa'o'i 26 don rubuta lambar, kuma a ƙarshe dole ne su gabatar da mafita. Wata fa'ida ta daban ita ce kasancewar zaman demo don tabbatar da cewa an aiwatar da duk abin da aka tsara a zahiri kuma bai kasance ra'ayoyi ba a cikin gabatarwar. Kasuwanci, kayan ciye-ciye, abinci, komai yana wurin kuma!

Bugu da kari, Intel da zaɓin ya ba da kyamarori, Raspberry PI, Neural Compute Stick 2.

Zaɓin ɗawainiya

Ɗaya daga cikin sassa mafi wahala na shirya don hackathon mai kyauta shine zabar kalubale. Nan da nan muka yanke shawarar fito da wani abu wanda har yanzu bai kasance a cikin samfurin ba, tunda sanarwar ta ce wannan abin maraba ne sosai.

Bayan yayi nazari samfurori, wanda aka haɗa a cikin samfurin a cikin saki na yanzu, mun zo ga ƙarshe cewa yawancin su warware matsalolin hangen nesa na kwamfuta daban-daban. Haka kuma, yana da matukar wahala a zo da wata matsala a fannin hangen kwamfuta da ba za a iya magance ta ta amfani da OpenVINO ba, kuma ko da ana iya ƙirƙira mutum, yana da wuya a sami samfuran da aka riga aka horar a cikin jama'a. Mun yanke shawarar tono a wata hanya - wajen sarrafa magana da nazari. Bari mu yi la'akari da wani aiki mai ban sha'awa na gane motsin zuciyarmu daga magana. Dole ne a ce OpenVINO ya riga yana da samfurin da ke ƙayyade motsin zuciyar mutum bisa fuskar su, amma:

  • A cikin ka'idar, yana yiwuwa a ƙirƙira haɗin algorithm wanda zai yi aiki a kan sauti da hoto, wanda ya kamata ya ba da karuwa a daidaito.
  • Kyamara yawanci suna da kunkuntar kusurwar kallo; ana buƙatar kamara fiye da ɗaya don rufe babban yanki; sauti ba shi da irin wannan iyakancewa.

Bari mu haɓaka ra'ayin: bari mu ɗauki ra'ayin don sashin dillali a matsayin tushe. Kuna iya auna gamsuwar abokin ciniki a wuraren ajiyar kaya. Idan ɗaya daga cikin abokan ciniki bai gamsu da sabis ɗin kuma ya fara haɓaka sautin su ba, zaku iya kiran mai gudanarwa nan da nan don taimako.
A wannan yanayin, muna buƙatar ƙara ƙwarewar muryar mutum, wannan zai ba mu damar bambance ma'aikatan kantin sayar da kayayyaki daga abokan ciniki da kuma samar da nazari ga kowane mutum. To, ban da haka, zai yiwu a bincika halin ma'aikatan kantin sayar da kansu, kimanta yanayin da ke cikin ƙungiyar, sauti mai kyau!

Mun tsara buƙatun don maganin mu:

  • Ƙananan girman na'urar da aka yi niyya
  • Aiki na ainihi
  • Priceananan farashin
  • Sauƙi scalability

Sakamakon haka, mun zaɓi Rasberi Pi 3 c azaman na'urar da aka yi niyya Farashin NCS2.

Anan yana da mahimmanci a lura da fasalin mahimmancin NCS guda ɗaya - yana aiki mafi kyau tare da daidaitattun gine-ginen CNN, amma idan kuna buƙatar gudanar da samfuri tare da yadudduka na al'ada akan sa, to kuyi tsammanin haɓaka ƙananan matakin.

Akwai ƙaramin abu ɗaya da za ku yi: kuna buƙatar samun makirufo. Microphone na USB na yau da kullun zai yi, amma ba zai yi kyau ba tare da RPI. Amma ko da a nan mafita a zahiri "yana nan kusa." Don yin rikodin murya, mun yanke shawarar yin amfani da allon Bonnet na Muryar daga kit Google AIY Kit Kit, wanda akansa akwai makirufo sitiriyo mai waya.

Zazzage Raspbian daga AIY wuraren ajiyar ayyukan kuma loda shi zuwa filasha, gwada cewa makirufo yana aiki ta amfani da umarni mai zuwa (zai yi rikodin sauti na tsawon daƙiƙa 5 kuma adana shi zuwa fayil):

arecord -d 5 -r 16000 test.wav

Nan da nan ya kamata in lura cewa makirufo yana da hankali sosai kuma yana ɗaukar hayaniya da kyau. Don gyara wannan, bari mu je alsamixer, zaɓi na'urorin Ɗauka kuma rage matakin shigar da siginar zuwa 50-60%.

OpenVINO hackathon: gane murya da motsin rai akan Rasberi Pi
Muna canza jiki tare da fayil kuma duk abin da ya dace, za ku iya ma rufe shi da murfi

Ƙara maɓallin nuna alama

Yayin ɗaukar Kit ɗin Muryar AIY, mun tuna cewa akwai maɓallin RGB, hasken baya wanda software za ta iya sarrafa shi. Muna neman "Google AIY Led" kuma mu nemo takardu: https://aiyprojects.readthedocs.io/en/latest/aiy.leds.html
Me yasa ba a yi amfani da wannan maɓallin don nuna motsin zuciyar da aka sani ba, muna da azuzuwan 7 kawai, kuma maɓallin yana da launuka 8, kawai isa!

Muna haɗa maɓallin ta hanyar GPIO zuwa Voice Bonnet, ɗora ɗakunan karatu masu mahimmanci (an riga an shigar da su a cikin kayan rarrabawa daga ayyukan AIY)

from aiy.leds import Leds, Color
from aiy.leds import RgbLeds

Bari mu ƙirƙiri dict ɗin wanda kowane motsin rai zai sami launi mai dacewa ta hanyar RGB Tuple da wani abu na aji aiy.leds.Leds, ta inda za mu sabunta launi:

led_dict = {'neutral': (255, 255, 255), 'happy': (0, 255, 0), 'sad': (0, 255, 255), 'angry': (255, 0, 0), 'fearful': (0, 0, 0), 'disgusted':  (255, 0, 255), 'surprised':  (255, 255, 0)} 
leds = Leds()

Kuma a ƙarshe, bayan kowane sabon hasashen motsin rai, za mu sabunta launi na maɓallin daidai da shi (ta maɓalli).

leds.update(Leds.rgb_on(led_dict.get(classes[prediction])))

OpenVINO hackathon: gane murya da motsin rai akan Rasberi Pi
Button, ƙone!

Aiki tare da murya

Za mu yi amfani da pyaudio don ɗaukar rafi daga makirufo da webrtcvad don tace hayaniya da gano murya. Bugu da kari, za mu ƙirƙiri jerin gwano wanda za mu ƙara da cire ɓangarorin murya ba tare da ɓata lokaci ba.

Tunda webrtcvad yana da iyakance akan girman guntun da aka kawo - dole ne ya zama daidai da 10/20/30ms, kuma horar da ƙirar don gane motsin zuciyarmu (kamar yadda za mu koya daga baya) an aiwatar da shi akan tsarin bayanai na 48kHz, za mu kama guntun girman 48000x20ms/1000x1(mono)=960 bytes. Webrtcvad zai dawo da Gaskiya/Ƙarya ga kowane ɗayan waɗannan gungumen, wanda yayi daidai da kasancewar ko rashi na ƙuri'a a cikin chunk.

Bari mu aiwatar da dabaru masu zuwa:

  • Za mu ƙara a cikin jerin waɗancan ɓangarorin inda aka yi ƙuri'a; idan ba a yi zabe ba, to za mu ƙara yawan kuɗaɗen da ba kowa.
  • Idan counter na fanko chunks shine> = 30 (600 ms), sa'an nan kuma mu dubi girman jerin abubuwan da aka tara; idan ya kasance> 250, sa'an nan mu ƙara shi zuwa jerin gwano; idan ba haka ba, muna la'akari da cewa tsawon na rikodin bai isa ba don ciyar da shi zuwa samfurin don gano mai magana.
  • Idan har yanzu ma'auni na ɓangarorin da ba a sani ba suna <30, kuma girman jerin abubuwan da aka tara sun wuce 300, to, za mu ƙara guntu zuwa jerin gwano don ƙarin tsinkaya. (saboda motsin zuciyarmu yana canzawa akan lokaci)

 def to_queue(frames):
    d = np.frombuffer(b''.join(frames), dtype=np.int16)
    return d

framesQueue = queue.Queue()
def framesThreadBody():
    CHUNK = 960
    FORMAT = pyaudio.paInt16
    CHANNELS = 1
    RATE = 48000

    p = pyaudio.PyAudio()
    vad = webrtcvad.Vad()
    vad.set_mode(2)
    stream = p.open(format=FORMAT,
                channels=CHANNELS,
                rate=RATE,
                input=True,
                frames_per_buffer=CHUNK)
    false_counter = 0
    audio_frame = []
    while process:
        data = stream.read(CHUNK)
        if not vad.is_speech(data, RATE):
            false_counter += 1
            if false_counter >= 30:
                if len(audio_frame) > 250:              
                    framesQueue.put(to_queue(audio_frame,timestamp_start))
                    audio_frame = []
                    false_counter = 0

        if vad.is_speech(data, RATE):
            false_counter = 0
            audio_frame.append(data)
            if len(audio_frame) > 300:                
                    framesQueue.put(to_queue(audio_frame,timestamp_start))
                    audio_frame = []

Lokaci ya yi da za a nemi samfuran da aka riga aka horar a cikin jama'a, je zuwa github, Google, amma ku tuna cewa muna da iyakancewa kan gine-ginen da aka yi amfani da su. Wannan yanki ne mai wahala, saboda dole ne ku gwada samfuran akan bayanan shigar ku, kuma ƙari, canza su zuwa tsarin ciki na OpenVINO - IR (Matsakaici Wakilin). Mun gwada game da 5-7 daban-daban mafita daga github, kuma idan samfurin don gane motsin zuciyarmu ya yi aiki nan da nan, tare da muryar murya dole mu jira tsawon lokaci - suna amfani da gine-gine masu rikitarwa.

Muna mai da hankali kan abubuwa masu zuwa:

Na gaba za mu yi magana game da canza samfura, farawa da ka'idar. OpenVINO ya ƙunshi kayayyaki da yawa:

  • Buɗe Gidan Zoo na Model, samfura waɗanda za a iya amfani da su kuma a haɗa su cikin samfuran ku
  • Model Optimzer, godiya ga wanda zaku iya canza samfuri daga nau'ikan tsari daban-daban (Tensorflow, ONNX da sauransu) zuwa Tsarin Matsakaicin Wakilci, wanda tare da shi zamuyi aiki gaba.
  • Injin Inference yana ba ku damar gudanar da samfura a cikin tsarin IR akan na'urori na Intel, Myriad chips da Neural Compute Stick accelerators.
  • Mafi kyawun sigar OpenCV (tare da tallafin Injin Inference)
    Kowane samfurin a cikin tsarin IR ana kwatanta shi ta fayiloli biyu: .xml da .bin.
    Ana canza samfura zuwa tsarin IR ta hanyar Model Optimizer kamar haka:

    python /opt/intel/openvino/deployment_tools/model_optimizer/mo_tf.py --input_model speaker.hdf5.pb --data_type=FP16 --input_shape [1,512,1000,1]

    --data_type yana ba ka damar zaɓar tsarin bayanai wanda samfurin zai yi aiki da shi. FP32, FP16, INT8 ana tallafawa. Zaɓin mafi kyawun nau'in bayanai na iya ba da haɓaka aiki mai kyau.
    --input_shape yana nuna girman bayanan shigarwa. Ikon canzawa a hankali yana da alama yana cikin C ++ API, amma ba mu yi nisa ba kuma kawai mun gyara shi don ɗayan samfuran.
    Na gaba, bari mu yi ƙoƙarin ɗaukar samfurin da aka riga aka canza a cikin tsarin IR ta hanyar tsarin DNN zuwa OpenCV kuma mu tura shi zuwa gare shi.

    import cv2 as cv
    emotionsNet = cv.dnn.readNet('emotions_model.bin',
                              'emotions_model.xml')
    emotionsNet.setPreferableTarget(cv.dnn.DNN_TARGET_MYRIAD)

    Layin ƙarshe a cikin wannan yanayin yana ba ku damar tura lissafin zuwa Neural Compute Stick, ana yin ƙididdiga na asali akan mai sarrafawa, amma a cikin yanayin Raspberry Pi wannan ba zai yi aiki ba, kuna buƙatar sanda.

    Bayan haka, dabarar ita ce kamar haka: muna raba sautin mu zuwa tagogi masu girman gaske (a gare mu yana da 0.4 s), muna canza kowane ɗayan waɗannan windows zuwa MMFC, sannan mu ciyar da shi zuwa grid:

    emotionsNet.setInput(MFCC_from_window)
    result = emotionsNet.forward()

    Na gaba, bari mu ɗauki ajin gama gari don duk windows. Magani mai sauƙi, amma don hackathon ba buƙatar ku zo da wani abu mai banƙyama ba, kawai idan kuna da lokaci. Har yanzu muna da ayyuka da yawa da za mu yi, don haka mu ci gaba - za mu yi maganin tantance murya. Wajibi ne a yi wani nau'i na bayanai wanda a ciki za a adana spectrogram na muryoyin da aka riga aka yi rikodi. Tun da sauran lokaci kaɗan ne, za mu warware wannan batun yadda za mu iya.

    Wato, muna ƙirƙirar rubutun don yin rikodin sautin murya (yana aiki kamar yadda aka bayyana a sama, kawai idan an katse shi daga maballin madannai zai adana muryar zuwa fayil).

    Mu gwada:

    python3 voice_db/record_voice.py test.wav

    Muna rikodin muryoyin mutane da yawa (a cikin yanayinmu, membobin ƙungiyar uku)
    Na gaba, ga kowace murya da aka yi rikodi muna yin saurin canzawa ta huɗu, sami spectrogram kuma adana ta azaman adadi mai ƙima (.npy):

    for file in glob.glob("voice_db/*.wav"):
            spec = get_fft_spectrum(file)
            np.save(file[:-4] + '.npy', spec)

    Ƙarin cikakkun bayanai a cikin fayil ɗin create_base.py
    Sakamakon haka, lokacin da muka gudanar da babban rubutun, za mu sami abubuwan haɗawa daga waɗannan spectrograms a farkon farkon:

    for file in glob.glob("voice_db/*.npy"):
        spec = np.load(file)
        spec = spec.astype('float32')
        spec_reshaped = spec.reshape(1, 1, spec.shape[0], spec.shape[1])
        srNet.setInput(spec_reshaped)
        pred = srNet.forward()
        emb = np.squeeze(pred)

    Bayan karɓar haɗawa daga sashin da aka yi sauti, za mu iya tantance ko wane ne ta hanyar ɗaukar nisan cosine daga nassi zuwa duk muryoyin da ke cikin bayanan (ƙananan, mafi kusantar) - don demo mun saita kofa. ku 0.3):

            dist_list = cdist(emb, enroll_embs, metric="cosine")
            distances = pd.DataFrame(dist_list, columns = df.speaker)

    A ƙarshe, Ina so in lura cewa saurin ƙaddamarwa yana da sauri kuma ya sa ya yiwu a ƙara ƙarin samfura 1-2 (don samfurin 7 seconds ya ɗauki 2.5 don ƙaddamarwa). Ba mu da lokaci don ƙara sabbin samfura kuma mu mai da hankali kan rubuta samfuri na aikace-aikacen gidan yanar gizo.

    Aikace-aikacen yanar gizo

    Muhimmiyar mahimmanci: muna ɗaukar na'ura mai ba da hanya tsakanin hanyoyin sadarwa tare da mu daga gida kuma kafa cibiyar sadarwar mu ta gida, yana taimakawa wajen haɗa na'urar da kwamfyutocin kan hanyar sadarwa.

    Ƙarshen baya shine tashar saƙon ƙarshe zuwa ƙarshen tsakanin gaba da Rasberi Pi, bisa tushen fasahar websocket (http over tcp protocol).

    Mataki na farko shine karɓar bayanan da aka sarrafa daga rasberi, wato, masu tsinkaya a cikin json, waɗanda aka adana a cikin ma'ajin bayanai a cikin rabin tafiyarsu ta yadda za a iya samar da ƙididdiga game da yanayin tunanin mai amfani na tsawon lokacin. Ana aika wannan fakitin zuwa gaba, wanda ke amfani da biyan kuɗi kuma yana karɓar fakiti daga madaidaicin madaidaicin gidan yanar gizo. An gina gabaɗayan tsarin baya a cikin yaren golang; an zaɓi shi ne saboda ya dace da ayyukan asynchronous, waɗanda goroutines ke ɗauka da kyau.
    Lokacin samun dama ga ƙarshen ƙarshen, an yi rajistar mai amfani kuma an shigar da shi cikin tsarin, sannan ana karɓar saƙon sa. An shigar da mai amfani da saƙon a cikin wata cibiyar sadarwa ta gama gari, wanda aka riga aka aika da ƙarin saƙonni (zuwa gaban da aka yi rajista), kuma idan mai amfani ya rufe haɗin (rasberi ko gaba), to an soke biyan kuɗin sa kuma an cire shi daga. hubba.

    OpenVINO hackathon: gane murya da motsin rai akan Rasberi Pi
    Muna jiran haɗi daga baya

    Ƙarshen gaba shine aikace-aikacen yanar gizo da aka rubuta a cikin JavaScript ta amfani da ɗakin karatu na React don haɓakawa da sauƙaƙe tsarin ci gaba. Manufar wannan aikace-aikacen shine don ganin bayanan da aka samu ta amfani da algorithms masu gudana a gefen ƙarshen baya da kai tsaye akan Rasberi Pi. Shafin yana da tsarin sashe da aka aiwatar ta hanyar amfani da na'ura mai ba da hanya tsakanin hanyoyin sadarwa, amma babban shafin sha'awa shine babban shafi, inda ake samun ci gaba da kwararar bayanai a ainihin lokacin daga uwar garken ta amfani da fasahar WebSocket. Rasberi Pi yana gano murya, yana ƙayyade ko na wani takamaiman mutum ne daga bayanan da aka yi rajista, kuma yana aika jerin yuwuwar ga abokin ciniki. Abokin ciniki yana nuna sabbin bayanan da suka dace, yana nuna avatar mutumin da ya fi dacewa ya yi magana a cikin makirufo, da kuma motsin zuciyar da yake furta kalmomin.

    OpenVINO hackathon: gane murya da motsin rai akan Rasberi Pi
    Shafin gida tare da sabunta tsinkaya

    ƙarshe

    Ba zai yiwu a kammala komai kamar yadda aka tsara ba, kawai ba mu da lokaci, don haka babban bege yana cikin demo, cewa duk abin da zai yi aiki. A cikin gabatarwa sun yi magana game da yadda duk abin ke aiki, abin da samfurori suka ɗauka, irin matsalolin da suka fuskanta. Bangaren demo na gaba - masana sun zagaya ɗakin cikin bazuwar tsari kuma sun tunkari kowace ƙungiya don duba samfurin aiki. Su ma sun yi mana tambayoyi, kowa ya amsa nasa bangaren, suka bar gidan yanar gizo a kan kwamfutar tafi-da-gidanka, kuma komai ya yi aiki kamar yadda aka zata.

    Bari in lura cewa jimlar kuɗin maganin mu shine $150:

    • Rasberi Pi 3 ~ $35
    • Google AIY Voice Bonnet (zaku iya ɗaukar kuɗin magana) ~ 15$
    • Intel NCS 2 ~ 100$

    Yadda ake ingantawa:

    • Yi amfani da rajista daga abokin ciniki - tambaya don karanta rubutun da aka ƙirƙira ba da gangan ba
    • Ƙara wasu ƴan ƙira: zaku iya tantance jinsi da shekaru ta murya
    • Rarrabe masu sauti lokaci guda (diarization)

    Wurin ajiya: https://github.com/vladimirwest/OpenEMO

    OpenVINO hackathon: gane murya da motsin rai akan Rasberi Pi
    Mun gaji amma farin ciki

    A ƙarshe, Ina so in ce godiya ga masu shirya da mahalarta. Daga cikin ayyukan wasu ƙungiyoyi, mu da kanmu muna son mafita don saka idanu wuraren ajiyar motoci kyauta. A gare mu, ya kasance kyakkyawan ƙwarewar nutsewa cikin samfur da haɓakawa. Ina fatan za a gudanar da abubuwa masu ban sha'awa da yawa a cikin yankuna, ciki har da kan batutuwan AI.

source: www.habr.com

Add a comment