I-Facebook ishicilela i-codec yomsindo ye-EnCodec isebenzisa ukufunda ngomshini

I-Meta/Facebook (evinjelwe e-Russian Federation) yethule i-codec entsha yomsindo, i-EnCodec, esebenzisa izindlela zokufunda ngomshini ukwandisa isilinganiso sokucindezela ngaphandle kokulahlekelwa ikhwalithi. I-codec ingasetshenziswa kokubili ukusakaza umsindo ngesikhathi sangempela kanye nombhalo wekhodi ukuze ulondolozwe kamuva kumafayela. Ukuqaliswa kwesithenjwa se-EnCodec kubhalwe nge-Python kusetshenziswa uhlaka lwe-PyTorch futhi kunikezwe ilayisense ngaphansi kwelayisensi ye-CC BY-NC 4.0 (Creative Commons Attribution-NonCommerce) ukuze isetshenziswe okungezona ezohwebo kuphela.

Amamodeli amabili enziwe ngomumo anikezwa ukuze alandwe:

  • Imodeli eyimbangela esebenzisa isilinganiso sesampula esingu-24 kHz, esekela umsindo we-monophonic kuphela, futhi eqeqeshwe ngedatha yomsindo ehlukahlukene (efanele ukubhala ikhodi yenkulumo). Imodeli ingasetshenziselwa ukupakisha idatha yomsindo ukuze idluliselwe ngamanani amancane angu-1.5, 3, 6, 12 kanye no-24 kbps.
  • Imodeli engeyona eyimbangela esebenzisa isilinganiso samasampula esingu-48 kHz, esekela umsindo we-stereo futhi eqeqeshelwe umculo kuphela. Imodeli isekela ama-bitrate angu-3, ​​6, 12 no-24 kbps.

Kumodeli ngayinye, imodeli yolimi eyengeziwe ilungiselelwe, okuvumela ukuthi uzuze ukwanda okukhulu kwesilinganiso sokucindezela (kufika ku-40%) ngaphandle kokulahlekelwa kwekhwalithi. Ngokungafani namaphrojekthi athuthukiswe ngaphambilini asebenzisa izindlela zokufunda ngomshini zokucindezelwa komsindo, i-EnCodec ayikwazi ukusetshenziselwa ukupakisha inkulumo kuphela, kodwa futhi nokucindezelwa komculo ngesilinganiso sesampula esingu-48 kHz, esihambisana nezinga lama-CD alalelwayo. Ngokusho konjiniyela be-codec entsha, lapho idluliswa nge-bitrate engu-64 kbps uma iqhathaniswa nefomethi ye-MP3, bakwazile ukukhulisa izinga lokucindezelwa komsindo cishe izikhathi eziyishumi ngenkathi begcina izinga elifanayo lekhwalithi (isibonelo, uma usebenzisa. I-MP3, i-bandwidth engu-64 kbps iyadingeka, ukuze udlulise ngekhwalithi efanayo ku-EnCodec yanele u-6 kbps).

Isakhiwo se-codec yakhelwe kunethiwekhi ye-neural ene-architecture "ye-transformer" futhi isekelwe kuzixhumanisi ezine: i-encoder, i-quantizer, i-decoder kanye nokubandlulula. Isifaki khodi sikhipha amapharamitha edatha yezwi futhi siguqule ukusakaza okupakishiwe kube izinga eliphansi lozimele. I-quantizer (RVQ, Residual Vector Quantizer) iguqula ukusakaza okukhiphayo ngesishumeki sibe amasethi amaphakethe, sicindezela ulwazi olusekelwe ku-bitrate ekhethiwe. Okukhiphayo kwe-quantizer kuwumfanekiso ocindezelwe wedatha, olungele ukudluliselwa ngenethiwekhi noma ukugcinwa kudiski.

Idikhoda iqopha ukumelwa okucindezelwe kwedatha futhi yakhe kabusha igagasi lomsindo langempela. Umbandlululi uthuthukisa ikhwalithi yamasampuli akhiqiziwe, ecabangela imodeli yokubona kokuzwa komuntu. Kungakhathalekile izinga lekhwalithi ne-bitrate, amamodeli asetshenziselwa ukubhala ngekhodi nokukhipha amakhodi ahlukaniswa ngezidingo zensiza ezinesizotha (izibalo ezidingekayo ekusebenzeni kwesikhathi sangempela zenziwa kumongo owodwa we-CPU).

I-Facebook ishicilela i-codec yomsindo ye-EnCodec isebenzisa ukufunda ngomshini


Source: opennet.ru

Engeza amazwana