Google releases Lyra V2 open source audio codec

Google has introduced the Lyra V2 audio codec, which uses machine learning techniques to achieve maximum voice quality over very slow communication channels. The new version features a transition to a new neural network architecture, support for additional platforms, enhanced bitrate control, performance improvements and higher audio quality. The reference code implementation is written in C++ and distributed under the Apache 2.0 license.

In terms of the quality of voice data transmitted at low speeds, Lyra is significantly superior to traditional codecs that use digital signal processing methods. In order to achieve high quality voice transmission in conditions of a limited amount of transmitted information, in addition to the usual methods of audio compression and signal conversion, Lyra uses a speech model based on a machine learning system that allows you to recreate the missing information based on typical speech characteristics.

The codec includes an encoder and a decoder. The algorithm of the encoder is to extract the voice data parameters every 20 milliseconds, compress them and transfer them to the recipient over the network with a bit rate from 3.2kbps to 9.2kbps. On the receiver side, the decoder uses a generative model to recreate the original speech signal based on the transmitted audio parameters, which include logarithmic chalk spectrograms that take into account speech energy characteristics in different frequency ranges and are prepared taking into account the human auditory perception model.

Lyra V2 uses a new generative model based on the SoundStream convolutional neural network, which is characterized by low requirements in computing resources, which allows real-time decoding even on low-power systems. The model used to generate the sound has been trained using several thousand hours of voice recordings in over 90 languages. TensorFlow Lite is used to execute the model. The performance of the proposed implementation is sufficient for encoding and decoding speech on smartphones of the lower price range.

In addition to using a different generative model, the new version is also notable for the inclusion of links with the RVQ (Residual Vector Quantizer) quantizer in the codec architecture, which is performed on the sender side before data transmission, and on the recipient side after data reception. The quantizer converts the parameters given by the codec into sets of packets, encoding the information in relation to the selected bitrate. To ensure different levels of quality, quantizers are provided for three bit rates (3.2 kps, 6 kbps and 9.2 kbps), the higher the bit rate, the better the quality, but the higher the bandwidth requirements.

Google releases Lyra V2 open source audio codec

The new architecture has reduced signal transmission delays from 100 to 20 milliseconds. For comparison, the Opus codec for WebRTC showed delays of 26.5ms, 46.5ms and 66.5ms at the tested bitrates. The performance of the encoder and decoder has also increased significantly - compared to the previous version, there is an acceleration of up to 5 times. For example, on the Pixel 6 Pro smartphone, the new codec encodes and decodes a 20-ms sample in 0.57 ms, which is 35 times faster than necessary for real-time transmission.

In addition to performance, we also managed to improve the quality of sound restoration - according to the MUSHRA scale, the quality of speech at bit rates of 3.2 kbps, 6 kbps and 9.2 kbps when using the Lyra V2 codec corresponds to bit rates of 10 kbps, 13 kbps and 14 kbps when using the Opus codec.

Source: opennet.ru

Add a comment