Facebook yana buga codec audio na EnCodec ta amfani da koyo na inji

Meta/Facebook (an dakatar da shi a cikin Tarayyar Rasha) ya gabatar da sabon codec mai jiwuwa, EnCodec, wanda ke amfani da hanyoyin koyo na inji don haɓaka rabon matsawa ba tare da rasa inganci ba. Ana iya amfani da codec ɗin duka don yawo da sauti a ainihin lokacin da kuma don ɓoyewa don adanawa a cikin fayiloli daga baya. An rubuta aiwatar da ma'anar EnCodec a cikin Python ta amfani da tsarin PyTorch kuma yana da lasisi ƙarƙashin lasisin CC BY-NC 4.0 (Creative Commons Attribution-NonCommercial) don amfanin kasuwanci kawai.

Ana ba da samfuran shirye-shiryen biyu don zazzagewa:

  • Samfurin dalili wanda ke amfani da ƙimar samfurin 24 kHz, yana goyan bayan sauti guda ɗaya kawai, kuma an horar da shi akan bayanan mai jiwuwa daban-daban (wanda ya dace da lambar magana). Ana iya amfani da samfurin don kunshin bayanan odiyo don watsawa a ƙimar bit na 1.5, 3, 6, 12 da 24 kbps.
  • Samfurin da ba dalili ba ta amfani da ƙimar samfurin 48 kHz, yana tallafawa sautin sitiriyo kuma an horar da shi akan kiɗa kawai. Samfurin yana goyan bayan bitrates na 3, 6, 12 da 24 kbps.

Ga kowane samfurin, an shirya ƙarin samfurin harshe, wanda ke ba ka damar samun gagarumin karuwa a cikin matsa lamba (har zuwa 40%) ba tare da asarar inganci ba. Ba kamar ayyukan da aka ƙera a baya ba ta amfani da hanyoyin koyo na na'ura don matsawa mai jiwuwa, ana iya amfani da EnCodec ba kawai don marufi na magana ba, har ma don matsawa kiɗa tare da ƙimar ƙima na 48 kHz, daidai da matakin CD ɗin mai jiwuwa. A cewar masu haɓaka sabon codec, lokacin da ake watsawa tare da bitrate na 64 kbps idan aka kwatanta da tsarin MP3, sun sami damar haɓaka matakin matsawa na sauti da kusan sau goma yayin da suke riƙe daidaitaccen matakin inganci (misali, lokacin amfani. MP3, ana buƙatar bandwidth na 64 kbps, don watsawa tare da wannan ingancin a cikin EnCodec ya isa 6 kbps).

An gina tsarin gine-gine na codec akan hanyar sadarwa na jijiyoyi tare da tsarin gine-gine na "canji" kuma yana dogara ne akan mahaɗa hudu: encoder, quantizer, decoder da kuma wariya. Mai rikodin rikodi yana fitar da sigogin bayanan muryar kuma yana canza madaidaicin rafi zuwa ƙaramin firam. Mai ƙididdigewa (RVQ, Residual Vector Quantizer) yana jujjuya fitowar rafi ta mai rikodin zuwa saitin fakiti, matsawa bayanai dangane da zaɓaɓɓen bitrate. Fitar da ma'aunin ƙididdiga shine matsi wakilci na bayanai, dacewa don watsawa akan hanyar sadarwa ko adanawa zuwa faifai.

Mai ƙididdigewa yana ƙaddamar da matsi na wakilcin bayanai kuma ya sake gina ainihin kalaman sauti. Mai nuna bambanci yana inganta ingancin samfurori da aka samar, la'akari da samfurin fahimtar sauraron sauraron ɗan adam. Ba tare da la'akari da ingancin inganci da bitrate ba, samfuran da aka yi amfani da su don ɓoyewa da yankewa ana bambanta su ta hanyar ƙayyadaddun buƙatun albarkatu (ƙididdigar da ake buƙata don aiki na ainihi ana yin su akan ainihin CPU guda ɗaya).

Facebook yana buga codec audio na EnCodec ta amfani da koyo na inji


source: budenet.ru

Add a comment