table of Contents
3. PCM data format supported by FFmpeg
4. The difference between Packed and Planar PCM data in FFmpeg
1. What is PCM?
PCM (Pulse Code Modulation) audio data is a raw stream of uncompressed audio sample data, which is a standard digital audio data converted from an analog signal through sampling, quantization, and encoding.
6 parameters describing PCM data:
- Sample Rate: Sampling frequency. 8kHz (phone), 44.1kHz (CD), 48kHz (DVD).
- Sample Size: The number of quantization bits. Usually the value is 16-bit.
- Number of Channels: The number of channels. Common audio has two types: stereo and mono. Stereo includes left and right channels. There are also other less commonly used types such as surround sound.
- Sign: Indicates whether the sample data is a signed bit, for example, the sample data represented by one byte. If it is signed, the range is -128 ~ 127, and if it is unsigned, it is 0 ~ 255.
- Byte Ordering: Byte order. Whether the endianness is little-endian or big-endian. Usually little-endian. See Section 4 for the byte order description.
- Integer Or Floating Point: Integer or floating point. PCM sample data in most formats is represented by integers, and in some applications that require high precision, the floating-point type is used to represent PCM sample data.
Recommended PCM data playback tools:
- ffplay, the usage example is as follows:
//播放格式为f32le,单声道,采样频率48000Hz的PCM数据
ffplay -f f32le -ac 1 -ar 48000 pcm_audio
- Audacity : A free and open source cross-platform audio processing software.
- Adobe Auditon. Import the original data, you need to select the sampling rate, format and endianness when opening it.
2. PCM data format
If it is a mono audio file, the sampled data is stored in chronological order (sometimes, it will also be stored in LRLRLR mode, but the data of the other channel is 0). If it is a two-channel audio file, it will be stored according to LRLRLR. The way of storage, the storage is related to the endianness. The big-endian mode is shown in the figure below:
PCM.png
3. PCM data format supported by FFmpeg
Use the ffmpeg -formats command to obtain the audio and video formats supported by ffmpeg, among which we can find the supported PCM formats.
DE alaw PCM A-law
DE f32be PCM 32-bit floating-point big-endian
DE f32le PCM 32-bit floating-point little-endian
DE f64be PCM 64-bit floating-point big-endian
DE f64le PCM 64-bit floating-point little-endian
DE mulaw PCM mu-law
DE s16be PCM signed 16-bit big-endian
DE s16le PCM signed 16-bit little-endian
DE s24be PCM signed 24-bit big-endian
DE s24le PCM signed 24-bit little-endian
DE s32be PCM signed 32-bit big-endian
DE s32le PCM signed 32-bit little-endian
DE s8 PCM signed 8-bit
DE u16be PCM unsigned 16-bit big-endian
DE u16le PCM unsigned 16-bit little-endian
DE u24be PCM unsigned 24-bit big-endian
DE u24le PCM unsigned 24-bit little-endian
DE u32be PCM unsigned 32-bit big-endian
DE u32le PCM unsigned 32-bit little-endian
DE u8 PCM unsigned 8-bit
s is signed, u is unsigned, and f is a floating point number.
be is big endian, le is little endian.
4. The difference between Packed and Planar PCM data in FFmpeg
There are basically two storage methods for audio and video data in FFmpeg, Packed and Planar. For two-channel audio, the Packed method is the interleaved storage of the data of the two channels; the Planar method is the storage of the two channels separately. Assuming that a L/R is a sampling point, the data storage method is as follows:
- Packed: L R L R L R L R
- Planar: LLLLRRRR
The data after FFmpeg audio decoding is stored in the AVFrame structure.
- Packed format, frame.data[0] or frame.extended_data[0] contains all audio data.
- Planar format, frame.data[i] or frame.extended_data[i] represents the data of the i-th channel (assuming channel 0 is the first one), the size of the AVFrame.data array is fixed at 8, if the number of channels exceeds 8 , Need to get the channel data from frame.extended_data.
The following is the sampling format used by FFmpeg to store audio internally. All Planar formats have the letter P after them.
enum AVSampleFormat {
AV_SAMPLE_FMT_NONE = -1,
AV_SAMPLE_FMT_U8, ///< unsigned 8 bits
AV_SAMPLE_FMT_S16, ///< signed 16 bits
AV_SAMPLE_FMT_S32, ///< signed 32 bits
AV_SAMPLE_FMT_FLT, ///< float
AV_SAMPLE_FMT_DBL, ///< double
AV_SAMPLE_FMT_U8P, ///< unsigned 8 bits, planar
AV_SAMPLE_FMT_S16P, ///< signed 16 bits, planar
AV_SAMPLE_FMT_S32P, ///< signed 32 bits, planar
AV_SAMPLE_FMT_FLTP, ///< float, planar
AV_SAMPLE_FMT_DBLP, ///< double, planar
AV_SAMPLE_FMT_S64, ///< signed 64 bits
AV_SAMPLE_FMT_S64P, ///< signed 64 bits, planar
AV_SAMPLE_FMT_NB ///< Number of sample formats. DO NOT USE if linking dynamically
};
Description:
- Planar mode is the internal storage mode of ffmpeg, and the audio files we actually use are all in Packed mode.
- The audio sampling format of FFmpeg decoding audio output of different formats is not the same. The test found that the data output by AAC decoding is in floating-point AV_SAMPLE_FMT_FLTP format, and the data output by MP3 decoding is in AV_SAMPLE_FMT_S16P format (the mp3 file used is 16-bit deep). The specific sampling format can be viewed in the format member in the decoded AVFrame or the sample_fmt member in the AVCodecContext of the decoder.
- Planar or Packed mode directly affects the operation of writing files when saving files. When operating data, you must first detect the audio sampling format.
5. Endianness
When it comes to endianness, two major CPU factions are bound to be involved. That is Motorola's PowerPC series CPU and Intel's x86 series CPU. The PowerPC series uses the big endian method to store data, while the x86 series uses the little endian method to store data. So what is big endian and what is little endian?
Big endian means that the most significant byte (MSB, Most Significant Bit) is stored at the lower address, and little endian is the least significant byte (LSB, Least Significant Bit) is stored at the lower address.
The following uses images to illustrate. For example, the storage order of the number 0x12345678 in two different endian CPUs is as follows:
Big Endian
Low address high address
----------------------------------------------------------------------------->
| 12 | 34 | 56 | 78 |
Little Endian
Low address high address
----------------------------------------------------------------------------->
| 78 | 56 | 34 | 12 |
All network protocols use the big endian way to transmit data. Therefore, the big endian method is also called network byte order. When two hosts with different byte order communicate, they must be converted into network byte order before sending data before transmitting.
6. PCM audio data processing
6.1 Separate the left and right channel data of two-channel PCM audio data
According to the PCM audio data of the two-channel LRLRLR, the data of the left and right channels can be separated by interleaving them.
int pcm_s16le_split(const char* file, const char* out_lfile, const char* out_rfile) {
FILE *fp = fopen(file, "rb+");
if (fp == NULL) {
printf("open %s failed\n", file);
return -1;
}
FILE *fp1 = fopen(out_lfile, "wb+");
if (fp1 == NULL) {
printf("open %s failed\n", out_lfile);
return -1;
}
FILE *fp2 = fopen(out_rfile, "wb+");
if (fp2 == NULL) {
printf("open %s failed\n", out_rfile);
return -1;
}
char * sample = (char *)malloc(4);
while(!feof(fp)) {
fread(sample, 1, 4, fp);
//L
fwrite(sample, 1, 2, fp1);
//R
fwrite(sample + 2, 1, 2, fp2);
}
free(sample);
fclose(fp);
fclose(fp1);
fclose(fp2);
return 0;
}
7. Reference
- PCM wiki in multimedia
- PCM volume control
- PCM audio sample data processing
- stackoverflow/What is the difference between AV_SAMPLE_FMT_S16P and AV_SAMPLE_FMT_S16?
Author: smallest_one
link: https: //www.jianshu.com/p/fd43c1c82945