French mathematician Fabrice Bellard, who founded the QEMU, FFmpeg, BPG, QuickJS, TinyGL and TinyCC projects, published the TSAC audio encoding format and associated tools for compressing and decompressing audio files. The format is focused on transmitting data at very low bitrates, for example, 5.5 kb/s for mono and 7.5 kb/s for stereo, while maintaining acceptable quality of music and speech. Using TSAC allows you to pack a musical composition with a duration of 3.5 minutes and a sampling frequency of 44.1 kHz (stereo) into a 192 KB file, which will be almost indistinguishable from the original to the ear of an inexperienced layman. The project code is distributed under the MIT license.
The Descript audio codec was used as the basis for creating TSAC, which was expanded to support stereo sound and transferred to the use of another machine learning model based on a neural network with a “transformer” architecture, which made it possible to increase the compression ratio by reconstructing lost details taking into account the model of human auditory perception. The model occupies about 200 MB in compressed form and is formatted in a deterministic representation, which guarantees the same result regardless of the CPU/GPU used and the number of threads involved in the calculations.
The encoder can run using only the CPU for calculations (AVX2 instructions are supported for acceleration), but to achieve high performance it is recommended to use the GPU. In its current form, the CUDA API can be used for acceleration using NVIDIA GPUs based on Ampere, ADA and Hopper microarchitectures (RTX 3090, RTX 4090, RTX A6000, A100 and H100) with at least 4 GB of video memory. FFmpeg is used to convert audio files before encoding.
Additionally, we can note the update of the ts_zip utility developed by Bellar, designed for efficient compression of text data using a token prediction mechanism based on a machine learning system and a large language model RWKV 169M v4. When compressing a Wikipedia archive, the ts_zip utility made it possible to compress data by 7.3 times, and when compressing Linux 1.2 kernel code - by 7.8 times. For comparison, the compression levels when using the xz utility were 4.7 and 5.5 times, respectively. The price of high compression efficiency is low compression speed and high resource requirements (minimum 4 GB of RAM). On a system with an RTX 4090 GPU, compression performance is approximately 1 MB/s.
Source: opennet.ru
