Txoj kev compressing/storing media data in WAVE and JPEG formats, part 1

Nyob zoo! Kuv thawj cov kab lus yuav tsom mus rau kev kawm cov duab / suab compression thiab kev khaws cia xws li JPEG (duab) thiab WAVE (suab), thiab tseem yuav suav nrog cov piv txwv ntawm cov kev pabcuam uas siv cov qauv no (.jpg, .wav) hauv kev xyaum. Hauv seem no peb yuav saib WAVE.

История

WAVE (Waveform Audio File Format) yog lub thawv ntim cov ntaub ntawv rau khaws cov ntaub ntawv kaw tseg ntawm cov kwj suab. Lub thawv no feem ntau yog siv los khaws cov mem tes code modulated suab uas tsis muaj kev sib txuas. (Tau los ntawm Wikipedia)

Nws tau tsim thiab luam tawm xyoo 1991 ua ke nrog RIFF los ntawm Microsoft thiab IBM (Cov tuam txhab IT ntawm lub sijhawm ntawd).

Cov ntaub ntawv qauv

Cov ntaub ntawv muaj ib feem header, cov ntaub ntawv nws tus kheej, tab sis tsis muaj footer. Lub header hnyav tag nrho ntawm 44 bytes.
Lub header muaj cov chaw rau tus naj npawb ntawm cov khoom hauv cov qauv, tus nqi piv txwv, suab qhov tob, thiab lwm yam. cov ntaub ntawv xav tau rau daim npav suab. (Txhua tus lej ntawm cov lej yuav tsum tau sau hauv Little-Endian kev txiav txim)

Thaiv lub npe
Block loj (B)
Nqe lus piav qhia/Lub hom phiaj
Tus nqi (rau qee qhov nws yog tsau

chunkId
4
Txhais cov ntaub ntawv raws li lub thawv ntim khoom
0x52494646 hauv Big-Endian ("RIFF")

chunkSize
4
Qhov loj ntawm tag nrho cov ntaub ntawv tsis muaj chunkId thiab chunkSize
FILE_SIZE - 8

hom
4
Hom txhais los ntawm RIFF
0x57415645 hauv Big-Endian ("WAVE")

sib 1 id
4
Yog li ntawd cov ntaub ntawv yuav siv sij hawm ntau qhov chaw los ntawm kev txuas ntxiv cov hom ntawv
0x666d7420 hauv Big-Endian ("fmt")

subchunk1Size
4
Tseem tshuav header (hauv bytes)
16 los ntawm lub neej ntawd (rau rooj plaub tsis muaj suab kwj compression)

audioFormat
2
Suab hom (nyob ntawm txoj kev compression thiab suab cov ntaub ntawv qauv)
1 (rau PCM, uas yog qhov peb tab tom txiav txim siab)

numChannels
2
Tus naj npawb ntawm cov channel
1/2, peb yuav siv 1 channel (3/4/5/6/7... - ib lub suab khiav, piv txwv li 4 rau plaub lub suab, thiab lwm yam)

sampleRate
4
Audio sampling rate (hauv Hertz)
Qhov siab dua, lub suab yuav zoo dua, tab sis ntau lub cim xeeb yuav tsum tsim kom muaj lub suab khiav ntawm tib qhov ntev, tus nqi pom zoo yog 48000 (qhov zoo tshaj plaws lub suab zoo)

byteRate
4
Tus naj npawb ntawm bytes ib ob
sampleRate numChannels bitsPerSample (ntxiv)

blockAlign
2
Tus lej ntawm bytes rau 1 tus qauv
numChannels * bitsPerSample: 8

bitsPerSample
2
Tus naj npawb ntawm cov khoom rau 1 tus qauv (qhov tob)
Txhua tus lej uas yog ntau yam ntawm 8. Tus lej ntau dua, lub suab zoo dua thiab hnyav dua yuav yog; los ntawm 32 cov khoom tsis muaj qhov sib txawv rau tib neeg

sib 2 id
4
Cov ntaub ntawv siv cim (vim tej zaum yuav muaj lwm yam header ntsiab nyob ntawm lub audioFormat)
0x64617461 hauv Big-Endian("data")

subchunk2Size
4
Cov ntaub ntawv loj
data size hauv int

cov ntaub ntawv
byteRate * suab ntev
Cov ntaub ntawv suab
?

WAVE piv txwv

Cov lus yav dhau los tuaj yeem txhais tau yooj yim rau hauv cov qauv hauv C, tab sis peb cov lus niaj hnub no yog Python. Qhov yooj yim tshaj plaws uas koj tuaj yeem ua yog siv "yoj" - lub tshuab hluav taws xob nrov. Rau txoj hauj lwm no peb tsis xav tau siab byteRate thiab compression.
Ua ntej, cia peb import cov tsim nyog modules:

# WAV.py

from struct import pack  # перевод py-объектов в базовые типы из C
from os import urandom  # функция для чтения /dev/urandom, для windows:
# from random import randint
# urandom = lambda sz: bytes([randint(0, 255) for _ in range(sz)])  # лямбда под windows, т.к. urandom'а в винде нет
from sys import argv, exit  # аргументы к проге и выход

if len(argv) != 3:  # +1 имя скрипта (-1, если будете замораживать)
    print('Usage: python3 WAV.py [num of samples] [output]')
    exit(1)

Tom ntej no, peb yuav tsum tsim txhua qhov tsim nyog hloov pauv ntawm lub rooj raws li lawv qhov ntau thiab tsawg. Cov nqi sib txawv hauv nws tsuas yog nyob ntawm numSamples (tus naj npawb ntawm cov qauv). Qhov ntau ntawm lawv muaj, ntev peb lub suab yuav mus.

numSamples = int(argv[1])
output_path = argv[2]

chunkId = b'RIFF'
Format = b'WAVE'
subchunk1ID = b'fmt '
subchunk1Size = b'x10x00x00x00'  # 0d16
audioFormat = b'x01x00'
numChannels = b'x02x00'  # 2-х каналов будет достаточно (стерео)
sampleRate = pack('<L', 1000)  # 1000 хватит, но если поставить больше, то шум будет слышен лучше. С 1000-ю он звучит, как ветер
bitsPerSample = b'x20x00'  # 0d32
byteRate = pack('<L', 1000 * 2 * 4)  # sampleRate * numChannels * bitsPerSample / 8  (32 bit sound)
blockAlign = b'x08x00'  # numChannels * BPS / 8
subchunk2ID = b'data'
subchunk2Size = pack('<L', numSamples * 2 * 4)  # * numChannels * BPS / 8
chunkSize = pack('<L', 36 + numSamples * 2 * 4)  # 36 + subchunk2Size

data = urandom(1000 * 2 * 4 * numSamples)  # сам шум

Txhua yam uas tseem tshuav yog sau lawv hauv qhov yuav tsum tau ua ntu zus (raws li hauv lub rooj):

with open(output_path, 'wb') as fh:
    fh.write(chunkId + chunkSize + Format + subchunk1ID +
            subchunk1Size + audioFormat + numChannels + 
            sampleRate + byteRate + blockAlign + bitsPerSample +
            subchunk2ID + subchunk2Size + data)  # записываем

Thiab yog li ntawd, npaj. Txhawm rau siv tsab ntawv, peb yuav tsum tau ntxiv cov lus txib kab lus uas tsim nyog:
python3 WAV.py [num of samples] [output]
tus naj npawb ntawm cov qauv - suav. cov qauv
output — txoj kev mus rau cov ntaub ntawv tso zis

Ntawm no yog qhov txuas mus rau cov ntaub ntawv audio nrog lub suab nrov, tab sis kom txuag lub cim xeeb Kuv tau txo qis BPS mus rau 1b / s thiab txo cov naj npawb ntawm cov channel mus rau 1 (nrog rau 32-ntsis tsis muaj suab nrov tso suab nrov ntawm 64kbs, nws tig tawm los ua. 80M ntawm cov ntaub ntawv ntshiab .wav, thiab tsuas yog 10): https://instaud.io/3Dcy

Tag nrho cov cai (WAV.py) (Cov lej muaj ntau qhov sib txawv ntawm qhov sib txawv, qhov no tsuas yog kos duab):

from struct import pack  # перевод py-объектов в базовые типы из C
from os import urandom  # функция для чтения /dev/urandom, для windows:
# from random import randint
# urandom = lambda sz: bytes([randint(0, 255) for _ in range(sz)])  # лямбда под windows, т.к. urandom'а в винде нет
from sys import argv, exit  # аргументы к проге и выход

if len(argv) != 3:  # +1 имя скрипта (-1, если будете замораживать)
    print('Usage: python3 WAV.py [num of samples] [output]')
    exit(1)

numSamples = int(argv[1])
output_path = argv[2]

chunkId = b'RIFF'
Format = b'WAVE'
subchunk1ID = b'fmt '
subchunk1Size = b'x10x00x00x00'  # 0d16
audioFormat = b'x01x00'
numChannels = b'x02x00'  # 2-х каналов будет достаточно (стерео) 
sampleRate = pack('<L', 1000)  # 1000 хватит, но можно и больше.
bitsPerSample = b'x20x00'  # 0d32
byteRate = pack('<L', 1000 * 2 * 4)  # sampleRate * numChannels * bitsPerSample / 8  (32 bit sound)
blockAlign = b'x08x00'  # numChannels * BPS / 8
subchunk2ID = b'data'
subchunk2Size = pack('<L', numSamples * 2 * 4)  # * numChannels * BPS / 8
chunkSize = pack('<L', 36 + numSamples * 2 * 4)  # 36 + subchunk2Size

data = urandom(1000 * 2 * 4 * numSamples)  # сам шум

with open(output_path, 'wb') as fh:
    fh.write(chunkId + chunkSize + Format + subchunk1ID +
            subchunk1Size + audioFormat + numChannels + 
            sampleRate + byteRate + blockAlign + bitsPerSample +
            subchunk2ID + subchunk2Size + data)  # записываем в файл результат

Qhov no

Yog li koj tau kawm me ntsis ntxiv txog cov suab digital thiab nws khaws cia li cas. Hauv cov ntawv tshaj tawm no peb tsis tau siv compression (audioFormat), tab sis xav txog txhua qhov nrov tshaj plaws, 10 kab lus yuav tsum tau.
Ua tsaug!

Cov chaw

WAV file structure
WAV - Wikipedia

Tau qhov twg los: www.hab.com

Ntxiv ib saib