UGoogle wazise ikhowudi yomsindo yeLyra V2, esebenzisa ubuchule bokufunda ngomatshini ukuphumeza umgangatho welizwi ophakamileyo kumajelo onxibelelwano acothayo kakhulu. Uguqulelo olutsha lubonisa utshintsho kwi-architecture entsha ye-neural network, inkxaso yamaqonga ongezelelweyo, ulawulo lwe-bitrate oluphuculweyo, ukuphuculwa komsebenzi kunye nomgangatho ophezulu we-audio. Ukuphunyezwa kwekhowudi yereferensi kubhalwe kwi-C ++ kwaye isasazwe phantsi kwelayisensi ye-Apache 2.0.
Ngokomgangatho wedatha yezwi ehanjiswa ngesantya esiphantsi, i-Lyra iphezulu kakhulu kwii-codecs zendabuko ezisebenzisa iindlela zokucwangcisa umqondiso wedijithali. Ukuze ufezekise ukugqithiswa kwelizwi eliphezulu kwiimeko zolwazi oluncinci olugqithisiweyo, ngaphezu kweendlela eziqhelekileyo zokunyanzeliswa komsindo kunye nokuguqulwa komqondiso, uLyra usebenzisa imodeli yentetho esekelwe kwinkqubo yokufunda ngomatshini evumela ukuba uphinde wenze ulwazi olulahlekileyo. ngokusekelwe kwiimpawu zentetho eziqhelekileyo.
I-codec ibandakanya i-encoder kunye ne-decoder. I-algorithm ye-encoder kukukhupha iiparamitha zedatha yelizwi rhoqo nge-20 milliseconds, zicinezele kwaye zidlulisele kumamkeli ngaphaya kwenethiwekhi ngesantya esincinci ukusuka kwi-3.2kbps ukuya kwi-9.2kbps. Kwicala lomamkeli, idikhowuda isebenzisa imodeli yokuvelisa ukwenza kwakhona umqondiso wentetho yantlandlolo esekwe kwiparameters zomsindo ezithunyelwayo, eziquka i-logarithmic chaki spectrograms ezithathela ingqalelo iimpawu zamandla entetho kuluhlu lwamaza ohlukeneyo kwaye zilungiselelwe kuthathelwa ingqalelo imbono yokuva umntu. imodeli.
I-Lyra V2 isebenzisa imodeli entsha yokuvelisa esekelwe kwi-SoundStream convolutional network neural, ebonakaliswe ngeemfuno eziphantsi kwi-computing resources, evumela i-decoding yexesha langempela nakwiinkqubo zamandla aphantsi. Imodeli esetyenzisiweyo ukwenza isandi iqeqeshwe kusetyenziswa amawaka aliqela eeyure zokurekhodwa kwelizwi kwiilwimi ezingaphezu kwama-90. I-TensorFlow Lite isetyenziselwa ukuphumeza imodeli. Ukusebenza kokuphunyezwa okucetywayo kwanele kwi-encoding kunye ne-decoding intetho kwii-smartphones zoluhlu lwamaxabiso aphantsi.
Ukongeza ekusebenziseni imodeli eyahlukileyo yokuvelisa, inguqu entsha ikwaphawuleka ngokubandakanywa kwamakhonkco kunye ne-RVQ (i-Residual Vector Quantizer) i-quantizer kwi-architecture ye-codec, eyenziwa kwicala lomthumeli ngaphambi kokuhanjiswa kwedatha, kwaye kwicala lommkeli. emva kokufumana idatha. I-quantizer iguqula iiparameters ezinikezwe yi-codec kwiiseti zeepakethi, ikhowudi yolwazi ngokunxulumene nebitrate ekhethiweyo. Ukuqinisekisa amanqanaba ahlukeneyo omgangatho, i-quantizers ibonelelwa ngemilinganiselo ye-bit emithathu (3.2 kps, 6 kbps kunye ne-9.2 kbps), i-bit rate ephezulu, ingcono umgangatho, kodwa iphezulu iimfuno ze-bandwidth.

Uyilo olutsha luye lwanciphisa ukulibaziseka kokuhanjiswa kwesignali ukusuka kwi-100 ukuya kwi-20 milliseconds. Ukuthelekisa, i-Opus codec yeWebRTC ibonise ukulibaziseka kwe-26.5ms, 46.5ms kunye ne-66.5ms kwi-bitrate evavanyiweyo. Ukusebenza kwe-encoder kunye ne-decoder kuye kwanda kakhulu - xa kuthelekiswa nenguqulo yangaphambili, kukho ukukhawuleza ukuya kumaxesha angama-5. Ngokomzekelo, kwi-smartphone ye-Pixel 6 Pro, i-codec entsha i-codec kunye ne-decodes isampuli ye-20-ms kwi-0.57 ms, ephindwe ngama-35 ngokukhawuleza kunokuba kuyimfuneko yokuhanjiswa kwexesha langempela.
Ukongeza kwintsebenzo, siye sakwazi ukuphucula umgangatho wokubuyisela isandi - ngokomlinganiselo we-MUSHRA, umgangatho wentetho kwimilinganiselo ye-bit ye-3.2 kbps, 6 kbps kunye ne-9.2 kbps xa usebenzisa i-codec ye-Lyra V2 ihambelana nemilinganiselo ye-bit ye-10. kbps, 13 kbps kunye ne-14 kbps xa usebenzisa i-Opus codec.
umthombo: opennet.ru
