Methods for Compressing/Storing Media Data in WAVE and JPEG Format, Part 1

Hello! My first series of articles will focus on image/sound compression and storage methods such as JPEG (image) and WAVE (sound) and will include examples of programs using these formats (.jpg, .wav) in practice. In this part, we will consider exactly WAVE.

History

WAVE (Waveform Audio File Format) is a container file format for storing an audio stream recording. This container is typically used to store uncompressed PCM audio. (Taken from Wikipedia)

It was invented and published in 1991 together with RIFF by Microsoft and IBM (Leading IT companies of the time).

File structure

The file has a header part, the data itself, but no footer. The header weighs a total of 44 bytes.
The header contains settings for the number of bits in the sample, the sampling frequency, the sound depth, etc. information needed for the sound card. (All table numeric values ​​must be written in Little-Endian order)

Block name
Block size (B)
Description/Purpose
Value (for some it is fixed

chunkId
4
Defining a file as a media container
0x52494646 in Big Endian ("RIFF")

chunkSize
4
Whole file size without chunkId and chunkSize
FILE_SIZE - 8

format
4
Type definition from RIFF
0x57415645 in Big Endian ("WAVE")

subchunk1Id
4
To make the file take up more space by continuing format
0x666d7420 in Big-Endian("fmt")

subchunk1Size
4
Remaining header (in bytes)
16 by default (for the case without audio stream compression)

audioFormat
2
Audio format (depends on compression method and audio data structure)
1 (for PCM, which we are considering)

numChannels
2
Number of channels
1/2, we will take 1 channel (3/4/5/6/7… - a specific audio track, for example 4 for quad sound, etc.)

sampleRate
4
Audio sampling frequency (in Hertz)
The larger, the better the sound quality, but the more memory is required to create an audio track of the same length, the recommended value is 48000 (the most acceptable sound quality)

byterate
4
Number of bytes in 1 second
sampleRate numChannels bitsPerSample (next)

blockAlign
2
Number of bytes for 1 sample
numChannels * bitsPerSample: 8

bitsPerSample
2
Number of bits per 1 sample (depth)
Any number that is a multiple of 8. The more, the better and heavier the audio will be, there is no difference from 32 bits for a person

subchunk2Id
4
The starting point of the data (because there may be other header elements depending on the audioFormat)
0x64617461 in Big-Endian("data")

subchunk2Size
4
Data area size
size of data in int

date
byteRate * audio duration
Audio data
?

Example with WAVE

The previous table can easily be translated into a C structure, but our language for today is Python. The easiest thing to do with a "wave" is a noise generator. For this task, we do not need a high byteRate and compression.
First, let's import the necessary modules:

# WAV.py

from struct import pack  # перевод py-объектов в базовые типы из C
from os import urandom  # функция для чтения /dev/urandom, для windows:
# from random import randint
# urandom = lambda sz: bytes([randint(0, 255) for _ in range(sz)])  # лямбда под windows, т.к. urandom'а в винде нет
from sys import argv, exit  # аргументы к проге и выход

if len(argv) != 3:  # +1 имя скрипта (-1, если будете замораживать)
    print('Usage: python3 WAV.py [num of samples] [output]')
    exit(1)

Next, we need to create all the necessary variables from the table according to their size. Non-constant values ​​in it depend here only on numSamples (number of samples). The more of them there are, the longer our noise will go.

numSamples = int(argv[1])
output_path = argv[2]

chunkId = b'RIFF'
Format = b'WAVE'
subchunk1ID = b'fmt '
subchunk1Size = b'x10x00x00x00'  # 0d16
audioFormat = b'x01x00'
numChannels = b'x02x00'  # 2-х каналов будет достаточно (стерео)
sampleRate = pack('<L', 1000)  # 1000 хватит, но если поставить больше, то шум будет слышен лучше. С 1000-ю он звучит, как ветер
bitsPerSample = b'x20x00'  # 0d32
byteRate = pack('<L', 1000 * 2 * 4)  # sampleRate * numChannels * bitsPerSample / 8  (32 bit sound)
blockAlign = b'x08x00'  # numChannels * BPS / 8
subchunk2ID = b'data'
subchunk2Size = pack('<L', numSamples * 2 * 4)  # * numChannels * BPS / 8
chunkSize = pack('<L', 36 + numSamples * 2 * 4)  # 36 + subchunk2Size

data = urandom(1000 * 2 * 4 * numSamples)  # сам шум

It remains only to write them down in the required sequence (as in the table):

with open(output_path, 'wb') as fh:
    fh.write(chunkId + chunkSize + Format + subchunk1ID +
            subchunk1Size + audioFormat + numChannels + 
            sampleRate + byteRate + blockAlign + bitsPerSample +
            subchunk2ID + subchunk2Size + data)  # записываем

And so, it's ready. To use the script, we need to add the necessary command line arguments:
python3 WAV.py [num of samples] [output]
num of samples - count. samples
output - path to the output file

Here is a link to a test audio file with noise, but to save memory, I lowered the BPS to 1b / s and lowered the number of channels to 1 (with a 32-bit uncompressed stereo audio stream in 64kbs, it turned out 80M pure .wav file, and only 10): https://instaud.io/3Dcy

Whole code (WAV.py) (The code has a lot of duplication of variable values, this is just a sketch):

from struct import pack  # перевод py-объектов в базовые типы из C
from os import urandom  # функция для чтения /dev/urandom, для windows:
# from random import randint
# urandom = lambda sz: bytes([randint(0, 255) for _ in range(sz)])  # лямбда под windows, т.к. urandom'а в винде нет
from sys import argv, exit  # аргументы к проге и выход

if len(argv) != 3:  # +1 имя скрипта (-1, если будете замораживать)
    print('Usage: python3 WAV.py [num of samples] [output]')
    exit(1)

numSamples = int(argv[1])
output_path = argv[2]

chunkId = b'RIFF'
Format = b'WAVE'
subchunk1ID = b'fmt '
subchunk1Size = b'x10x00x00x00'  # 0d16
audioFormat = b'x01x00'
numChannels = b'x02x00'  # 2-х каналов будет достаточно (стерео) 
sampleRate = pack('<L', 1000)  # 1000 хватит, но можно и больше.
bitsPerSample = b'x20x00'  # 0d32
byteRate = pack('<L', 1000 * 2 * 4)  # sampleRate * numChannels * bitsPerSample / 8  (32 bit sound)
blockAlign = b'x08x00'  # numChannels * BPS / 8
subchunk2ID = b'data'
subchunk2Size = pack('<L', numSamples * 2 * 4)  # * numChannels * BPS / 8
chunkSize = pack('<L', 36 + numSamples * 2 * 4)  # 36 + subchunk2Size

data = urandom(1000 * 2 * 4 * numSamples)  # сам шум

with open(output_path, 'wb') as fh:
    fh.write(chunkId + chunkSize + Format + subchunk1ID +
            subchunk1Size + audioFormat + numChannels + 
            sampleRate + byteRate + blockAlign + bitsPerSample +
            subchunk2ID + subchunk2Size + data)  # записываем в файл результат

Сonclusion

So you learned a little more about digital sound and how it is stored. In this post, we did not use compression (audioFormat), but to consider each of the popular ones, it will take 10 articles. I hope you learned something new for yourself and this will help you in future developments.
Thank you!

Sources of

WAV file structure
WAV - Wikipedia.

Source: habr.com

Add a comment