Audio format (1) PCM and WAV

Zero, written in front

        If you want to understand audio, you must first understand its structure, how it changes from sound to file, and how to change from file to sound. The file format has different versions according to the needs and technological progress, and different file formats have different file structures. Let's start with the two most original audio files and talk about common audio file formats. First is PCM and WAV

1. PCM file

        PCM ( Pulse - code modulation ) is the original audio data, which is the digital signal converted from the electrical signal of the sampler (microphone). In terms of audio recording, it can be used in all scenarios where other electrical signals are converted to digital signals. An audio file composed of such a piece of raw data is called a PCM file, ending with .PCM. The size of a PCM file depends on several elements:

1.1. Format and parameters    

        Sampling rate : refers to the frequency at which electrical signals collect data per second. Common audio sampling rates are 8000HZ, 16000HZ, 44100HZ, 48000HZ, 96000HZ, etc.

        Sampling bit depth : Indicates how many bits are used to store each electrical signal. For example, the 8-bit sampling bit depth can be divided into 256 levels, and the recognizable sound frequency of the human ear is 20-20000hz, so the error of each bit is It has reached 80HZ, which greatly reduces the degree of audio reproduction, but its size is correspondingly reduced, which is more conducive to audio transmission. Early telephony used a relatively low sampling rate to achieve a more stable call quality. If the 16-bit bit depth can accurately sample every HZ. The sampling bit depth is actually not as simple as 8-bit/16-bit/32-bit. For computers, 16-bit can be represented by short or 16-bit int, and 32-bit can be represented by 32-bit int or It can be represented by 32-bit float. In addition, there are signed and unsigned points. We also need to pay attention to this when encoding and decoding.

        Sampling channel : The common ones are single channel or dual channel. Dual channel can help us distinguish the sound of left and right ears. If single channel, both ears have the same sound. Usually we use dual channels in pursuit of stereo, so the sound collected by dual channels is also called stereo. In addition, there are more demanding channel types such as 2.1, 5.1, 6.1, 7.1, etc., which have certain requirements for the microphone you record audio.

        Data storage method : Indicates whether the data is stored in a crossed way or in a channel-by-channel way. The cross-arrangement is only for multi-channel audio files, and there is no cross-arrangement for single-channel audio files. The sampling channel and data storage method determine how the data is stored.

        Then it can be seen from the above that the size of the disk required to generate a 1S PCM file:

        Sampling rate * number of sampling bits * number of channels in bit

        Then the calculated result is actually our bit/second = bit rate

        Then we generate a PCM file with a duration of 10S, a sampling rate of 44100hz, a sampling number of 16 bits, and a channel number of two bits. The disk size (unit bit) required is: 44100*16*2*10.

        Although PCM is the source data of audio, most players cannot play it, because it only has data, and the player has no way of knowing how to parse it. Even a PCM player must specify its sampling rate, number of channels, and The number of sampling bits and byte order can be played correctly (converting digital signals into electrical signals).

        So there are several common audio formats WAV/MP3/AAC. Of course, the travel of these audio formats is not only to solve the problem of storing basic audio information. The purpose of MP3 and AAC is more to compress data. On the premise of ensuring the quality as much as possible, the file occupies less space and achieves better transmission. and storage effects. Before that, let me talk about the audio file WAV, which is very close to PCM.

1.2, byte order

        When we record PCM, we also need to pay attention to the irrelevant file quality, which is very simple but very important: endianness. Of course, byte order is not only in audio, but also a point that needs attention for some computer-related data interaction/storage/communication protocols. The byte order is divided into two types: big endian and little endian. The more readable order, little endian is the order in which the computer is more readable. For example, if a piece of data is 0x01234567, its big-endian sequence is 0x01234567 and its little-endian sequence is 0x67452301. In fact, the byte position is reversed.

2. WAV

2.1, format and parameters        

        It can be said that WAV only solves the problem that PCM data does not have a storage and recording format, and the solution is quite simple and rude, directly adding recording format information to the head of PCM, which occupies a fixed 44 bytes. Therefore, the space occupied by WAV is not smaller than that of PCM data, but it is 44 bytes larger. Let us see what is put in these 44 bytes. Let's take a PCM file with a sampling rate of 44100, a sampling number of 16, dual channels, and a duration of 10S to WAV as an example. First calculate the PCM file size: 44100*16*2*10=14112000bit=176000byte, of course, we can also read the file length to obtain it in practical applications.

owning block

position

name

size (byte)

byte order

describe

content

RIFF block

0-3

ID

4

big endian

Fixed content 'RIFF' ASCALL code

0x52494646

4-7

Size

4

little endian

The length of the entire file (including the header 44) minus the ID and Szie size is actually -8

The data is 176000+44-8=176036

Big endian: 0x0002AFA4

Little endian: 0xA4AF0200

8-11

type

4

big endian

Fixed content 'WAVE'

0x57415645

Format block

12-15

ID

4

big endian

Fixed content 'fmt', note that the last byte is not blank (full ASCALL)

0x666d7420

16-19

SIZE

4

little endian

The block (not the total file) length - ID and Size length

Block length 35-11=24 data length 24-8=16

Big endian: 0x0010

Little endian: 0x1000

20-21

AudioFormat

2

little endian

Audio format, generally 1 means PCM

Big endian: 0x0001

Little endian: 0x0100

22-23

NumChannels

2

little endian

Number of channels 1 mono 2 stereo

Big endian: 0x0002

Little endian: 0x0200

24-27

SampleRate

4

little endian

Sampling Rate

Data 44100

Big endian: 0x0000AC44

Little endian: 0x44AC0000

28-31

ByteRate

4

little endian

Byte rate is actually bit rate/8

Data 1411200/8=17600

Big endian: 0x000044C0

Little endian: 0xC0440000

32-33

BlockAlign

2

little endian

How many bytes are needed per sample =

Number of channels * number of sampling bits / 8

Data: 2*16/8=4

Big endian: 0x0004

Little endian: 0x0400

34-35

BitPerSample

2

little endian

Sampling bits 8/16/24/32bit

Data: 16

Big endian: 0x0010

Little endian: 0x1000

Data block

36-39

DataId

4

big endian

fixed content 'data'

0x64617461

40-43

DataSize

4

little endian

PCM actual size

Data: 176000

Big endian: 0x0002AF80

Little endian: 0x80AF0200

44-(44+

DataSize)

Data

DataSize

little endian

PCM data (in fact, the little endian here is also the recording byte order we often specify when recording)

PCM data splicing directly upwards

        It should be noted that the official data shows that the data of these header files is divided into big and small endian, but in actual application, we all use big endian and it will not affect our playback. Anyway, don't need little-endian data, you can change it to small-endian and big-endian for a while, that's definitely not possible. As for how to package PCM to WAV, you can actually see the above table clearly.

 2.2. Summary

        Advantages: In terms of data, it is lossless, retains the original data/format conversion is simple, and a format conversion can be completed with dozens of lines of code/decoding without transcoding.

        Disadvantages: The volume is too large/the file size is limited, and the space for storing file-sized data is only 4 bytes, which is 2^32 bytes at most, which is nearly 4GB.

        Application scenario: In fact, in addition to audio, it can also be used as a file storage for broadcasting/other electrical signals to digital signals.

3. Write at the end

        这篇记录的是最简单的PCM和WAV音频格式,后续还有MP3和AAC格式的相关记录,有兴趣的可以点个关注等待后续更新,也希望大家能看看我其它博客给出建议哦。欢迎大家交流讨论,批评指正。

Guess you like

Origin blog.csdn.net/qq_37841321/article/details/124533629