Makamera nthawi zambiri amakhala ndi ngodya yopapatiza; makamera opitilira imodzi amafunikira kuphimba malo akulu; mawu alibe malire otero.
Tiyeni tipange lingaliro: tiyeni titenge lingaliro la gawo lazogulitsa ngati maziko. Mutha kuyeza kukhutitsidwa kwamakasitomala pamakasitomala ogulitsa. Ngati m'modzi mwa makasitomala sakukhutira ndi ntchitoyi ndikuyamba kukweza mawu awo, mutha kuyimbira nthawi yomweyo woyang'anira kuti akuthandizeni.
Pankhaniyi, tifunika kuwonjezera kuzindikira kwa mawu aumunthu, izi zidzatilola kusiyanitsa ogwira ntchito m'sitolo ndi makasitomala ndikupereka ma analytics kwa munthu aliyense. Chabwino, kuonjezera apo, kudzakhala kotheka kusanthula khalidwe la ogwira ntchito m'sitolo okha, kuyesa mlengalenga mu timu, zikumveka bwino!
Timapanga zofunikira payankho lathu:
Kukula kochepa kwa chipangizo chandamale
Nthawi yeniyeni ntchito
Mtengo wotsika
Easy scalability
Zotsatira zake, timasankha Raspberry Pi 3 c ngati chida chandamale Intel NCS 2.
Apa ndikofunika kuzindikira chinthu chimodzi chofunikira cha NCS - chimagwira ntchito bwino ndi zomangamanga za CNN, koma ngati mukufunikira kuyendetsa chitsanzo chokhala ndi zigawo zachikhalidwe, ndiye kuti muyembekezere kukhathamiritsa kwapang'ono.
Pali chinthu chimodzi chaching'ono choti muchite: muyenera kupeza maikolofoni. Maikolofoni yokhazikika ya USB idzachita, koma siziwoneka bwino limodzi ndi RPI. Koma ngakhale apa yankho lake kwenikweni "liri pafupi." Kuti tijambule mawu, tasankha kugwiritsa ntchito bolodi la Voice Bonnet kuchokera pakiti Google AIY Voice Kit, pomwe pali maikolofoni ya sitiriyo yawaya.
Koperani Raspbian kuchokera AIY projekiti yosungirako ndikuyiyika pa drive drive, yesani kuti maikolofoni imagwira ntchito pogwiritsa ntchito lamulo ili (idzajambulitsa mawu masekondi 5 ndikusunga ku fayilo):
arecord -d 5 -r 16000 test.wav
Ndiyenera kuzindikira nthawi yomweyo kuti maikolofoniyo ndi omvera kwambiri ndipo amanyamula phokoso bwino. Kuti tikonze izi, tiyeni tipite ku alsamixer, sankhani Zida za Capture ndikuchepetsa siginecha yolowera mpaka 50-60%.
Timasintha thupi ndi fayilo ndipo zonse zimagwirizana, mutha kutseka ndi chivindikiro
Kuwonjezera chizindikiro batani
Tikupatula AIY Voice Kit, timakumbukira kuti pali batani la RGB, lounikira kumbuyo lomwe limatha kuwongoleredwa ndi mapulogalamu. Timasaka "Google AIY Led" ndikupeza zolemba: https://aiyprojects.readthedocs.io/en/latest/aiy.leds.html
Bwanji osagwiritsa ntchito batani ili kuti muwonetse kukhudzidwa kodziwika, tili ndi makalasi 7 okha, ndipo batani ili ndi mitundu 8, yokwanira!
Timalumikiza batani kudzera pa GPIO kupita ku Voice Bonnet, kukweza malaibulale ofunikira (adayikidwa kale m'gulu logawa kuchokera kuzinthu za AIY)
from aiy.leds import Leds, Color
from aiy.leds import RgbLeds
Tiyeni tipange lamulo lomwe kutengeka kulikonse kudzakhala ndi mtundu wofananira mu mawonekedwe a RGB Tuple ndi chinthu cha kalasi aiy.leds.Leds, momwe tidzasinthira mtunduwo:
Popeza webrtcvad ili ndi malire pa kukula kwa chidutswa chomwe chaperekedwa - chiyenera kukhala chofanana ndi 10/20/30ms, ndipo kuphunzitsa kwachitsanzo kuzindikira malingaliro (monga momwe tidzaphunzirira pambuyo pake) kunachitika pa 48kHz dataset, tidzatero. jambulani zigawo za kukula 48000Γ20ms/1000Γ1(mono)=960 mabayiti. Webrtcvad ibweza Zoona/Zabodza pa chilichonse mwa zigawozi, zomwe zimagwirizana ndi kupezeka kapena kusapezeka kwa mavoti mu chunk.
def to_queue(frames):
d = np.frombuffer(b''.join(frames), dtype=np.int16)
return d
framesQueue = queue.Queue()
def framesThreadBody():
CHUNK = 960
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 48000
p = pyaudio.PyAudio()
vad = webrtcvad.Vad()
vad.set_mode(2)
stream = p.open(format=FORMAT,
channels=CHANNELS,
rate=RATE,
input=True,
frames_per_buffer=CHUNK)
false_counter = 0
audio_frame = []
while process:
data = stream.read(CHUNK)
if not vad.is_speech(data, RATE):
false_counter += 1
if false_counter >= 30:
if len(audio_frame) > 250:
framesQueue.put(to_queue(audio_frame,timestamp_start))
audio_frame = []
false_counter = 0
if vad.is_speech(data, RATE):
false_counter = 0
audio_frame.append(data)
if len(audio_frame) > 300:
framesQueue.put(to_queue(audio_frame,timestamp_start))
audio_frame = []
Yakwana nthawi yoti muyang'ane zitsanzo zophunzitsidwa kale pagulu la anthu, pitani ku github, Google, koma kumbukirani kuti tili ndi malire pazomanga zomwe zimagwiritsidwa ntchito. Ili ndi gawo lovuta kwambiri, chifukwa muyenera kuyesa zitsanzo pazomwe mumalowetsa, ndikuwonjezeranso, zisinthe kukhala mawonekedwe amkati a OpenVINO - IR (Intermediate Representation). Tinayesa za 5-7 mayankho osiyanasiyana kuchokera ku github, ndipo ngati chitsanzo chozindikira malingaliro chinagwira ntchito nthawi yomweyo, ndiye ndi kuzindikira mawu tinayenera kuyembekezera nthawi yayitali - amagwiritsa ntchito zomangamanga zovuta kwambiri.
Timayang'ana kwambiri izi:
Zomverera kuchokera m'mawu - https://github.com/alexmuhr/Voice_Emotion
Zimagwira ntchito motsatira mfundo iyi: zomvera zimadulidwa mu ndime za kukula kwake, pa ndime iliyonse yomwe timasankha Mtengo wa MFCC ndikuzipereka ngati zolowera ku CNN
Kuzindikira mawu - https://github.com/linhdvu14/vggvox-speaker-identification
Pano, m'malo mwa MFCC, timagwira ntchito ndi spectrogram, pambuyo pa FFT timadyetsa chizindikiro ku CNN, komwe kumatuluka timapeza chiwonetsero cha vector cha mawu.
Kenako tikambirana za akatembenuka zitsanzo, kuyambira chiphunzitso. OpenVINO imaphatikizapo ma module angapo:
Tsegulani Model Zoo, mitundu yomwe ingagwiritsidwe ntchito ndikuphatikizidwa pazogulitsa zanu
Model Optimzer, chifukwa chake mutha kusintha mtundu kuchokera pamawonekedwe osiyanasiyana (Tensorflow, ONNX etc) kukhala mtundu wa Intermediate Representation, womwe tidzagwira nawo ntchito mopitilira.
Inference Engine imakulolani kuyendetsa mitundu mumtundu wa IR pa ma Intel processors, Myriad chips ndi Neural Compute Stick accelerators.
Mtundu wothandiza kwambiri wa OpenCV (ndi Inference Engine thandizo)
Mtundu uliwonse wamtundu wa IR umafotokozedwa ndi mafayilo awiri: .xml ndi .bin.
Mitundu imasinthidwa kukhala mawonekedwe a IR kudzera pa Model Optimizer motere:
--data_type amakulolani kusankha mtundu wa data womwe chitsanzocho chidzagwire ntchito. FP32, FP16, INT8 amathandizidwa. Kusankha mtundu woyenera wa data kungapereke chiwongola dzanja chabwino. --input_shape imasonyeza kukula kwa deta yolowetsa. Kutha kusintha kwakukulu kumawoneka kuti kulipo mu C ++ API, koma sitinakumba mpaka pano ndikungoyikonzera imodzi mwazojambulazo.
Kenako, tiyeni tiyese kukweza mtundu womwe wasinthidwa kale mu mtundu wa IR kudzera mu gawo la DNN mu OpenCV ndikutumiza kwa iwo.
import cv2 as cv
emotionsNet = cv.dnn.readNet('emotions_model.bin',
'emotions_model.xml')
emotionsNet.setPreferableTarget(cv.dnn.DNN_TARGET_MYRIAD)
Mzere womaliza pankhaniyi umakupatsani mwayi wolozera kuwerengera ku Neural Compute Stick, kuwerengera koyambira kumachitika pa purosesa, koma pankhani ya Raspberry Pi izi sizigwira ntchito, mudzafunika ndodo.
Chotsatira, malingalirowa ndi awa: timagawaniza zomvera zathu m'mawindo a kukula kwake (kwa ife ndi 0.4 s), timatembenuza mawindo awa kukhala MFCC, omwe timawadyetsa ku gridi:
emotionsNet.setInput(MFCC_from_window)
result = emotionsNet.forward()
Kenako, tiyeni titenge kalasi wamba onse mazenera. Yankho losavuta, koma kwa hackathon simusowa kuti mubwere ndi chinthu chosadziwika bwino, pokhapokha ngati muli ndi nthawi. Tidakali ndi ntchito yambiri yoti tichite, kotero tiyeni tipitirize - tithana ndi kuzindikira mawu. Ndikofunikira kupanga mtundu wina wa database momwe ma spectrogram a mawu ojambulidwa kale amasungidwa. Popeza yatsala ndi nthawi yochepa, tidzathetsa nkhaniyi mmene tingathere.
Mwakutero, timapanga zolemba zojambulira mawu (zimagwira ntchito mofananamo monga tafotokozera pamwambapa, pokhapokha zitasokonezedwa pa kiyibodi zimasunga mawuwo ku fayilo).
Tiyeni tiyese:
python3 voice_db/record_voice.py test.wav
Timajambula mawu a anthu angapo (kwa ife, mamembala atatu a gulu)
Kenako, pa liwu lililonse lojambulidwa timapanga masinthidwe othamanga kwambiri, pezani chithunzithunzi ndikuchisunga ngati numpy array (.npy):
for file in glob.glob("voice_db/*.wav"):
spec = get_fft_spectrum(file)
np.save(file[:-4] + '.npy', spec)
Zambiri mufayilo create_base.py
Zotsatira zake, tikamayendetsa script yayikulu, tipeza zoyikapo kuchokera ku ma spectrograms koyambirira:
for file in glob.glob("voice_db/*.npy"):
spec = np.load(file)
spec = spec.astype('float32')
spec_reshaped = spec.reshape(1, 1, spec.shape[0], spec.shape[1])
srNet.setInput(spec_reshaped)
pred = srNet.forward()
emb = np.squeeze(pred)
Pambuyo polandira kulowetsedwa kuchokera kugawo lomveka, tidzatha kudziwa kuti ndi ndani potenga mtunda wa cosine kuchokera pa ndimeyi kupita ku mawu onse omwe ali mu database (ang'onoang'ono, ochulukirapo) - pa chiwonetsero chomwe timayika pakhomo. ku 0.3):