PCM audio data

table of Contents

1. What is PCM?

2. PCM data format

3. PCM data format supported by FFmpeg

4. The difference between Packed and Planar PCM data in FFmpeg

5. Endianness

6. PCM audio data processing

7. Reference


1. What is PCM?

PCM (Pulse Code Modulation) audio data is a raw stream of uncompressed audio sample data, which is a standard digital audio data converted from an analog signal through sampling, quantization, and encoding.

6 parameters describing PCM data:

  1. Sample Rate: Sampling frequency. 8kHz (phone), 44.1kHz (CD), 48kHz (DVD).
  2. Sample Size: The number of quantization bits. Usually the value is 16-bit.
  3. Number of Channels: The number of channels. Common audio has two types: stereo and mono. Stereo includes left and right channels. There are also other less commonly used types such as surround sound.
  4. Sign: Indicates whether the sample data is a signed bit, for example, the sample data represented by one byte. If it is signed, the range is -128 ~ 127, and if it is unsigned, it is 0 ~ 255.
  5. Byte Ordering: Byte order. Whether the endianness is little-endian or big-endian. Usually little-endian. See Section 4 for the byte order description.
  6. Integer Or Floating Point: Integer or floating point. PCM sample data in most formats is represented by integers, and in some applications that require high precision, the floating-point type is used to represent PCM sample data.

Recommended PCM data playback tools:

  • ffplay, the usage example is as follows:

 

//播放格式为f32le,单声道,采样频率48000Hz的PCM数据
ffplay -f f32le -ac 1 -ar 48000 pcm_audio
  • Audacity : A free and open source cross-platform audio processing software.
  • Adobe Auditon. Import the original data, you need to select the sampling rate, format and endianness when opening it.

2. PCM data format

If it is a mono audio file, the sampled data is stored in chronological order (sometimes, it will also be stored in LRLRLR mode, but the data of the other channel is 0). If it is a two-channel audio file, it will be stored according to LRLRLR. The way of storage, the storage is related to the endianness. The big-endian mode is shown in the figure below:

 

PCM.png

3. PCM data format supported by FFmpeg

Use the ffmpeg -formats command to obtain the audio and video formats supported by ffmpeg, among which we can find the supported PCM formats.

 

 DE alaw            PCM A-law
 DE f32be           PCM 32-bit floating-point big-endian
 DE f32le           PCM 32-bit floating-point little-endian
 DE f64be           PCM 64-bit floating-point big-endian
 DE f64le           PCM 64-bit floating-point little-endian
 DE mulaw           PCM mu-law
 DE s16be           PCM signed 16-bit big-endian
 DE s16le           PCM signed 16-bit little-endian
 DE s24be           PCM signed 24-bit big-endian
 DE s24le           PCM signed 24-bit little-endian
 DE s32be           PCM signed 32-bit big-endian
 DE s32le           PCM signed 32-bit little-endian
 DE s8              PCM signed 8-bit
 DE u16be           PCM unsigned 16-bit big-endian
 DE u16le           PCM unsigned 16-bit little-endian
 DE u24be           PCM unsigned 24-bit big-endian
 DE u24le           PCM unsigned 24-bit little-endian
 DE u32be           PCM unsigned 32-bit big-endian
 DE u32le           PCM unsigned 32-bit little-endian
 DE u8              PCM unsigned 8-bit

s is signed, u is unsigned, and f is a floating point number.
be is big endian, le is little endian.

4. The difference between Packed and Planar PCM data in FFmpeg

There are basically two storage methods for audio and video data in FFmpeg, Packed and Planar. For two-channel audio, the Packed method is the interleaved storage of the data of the two channels; the Planar method is the storage of the two channels separately. Assuming that a L/R is a sampling point, the data storage method is as follows:

  • Packed: L R L R L R L R
  • Planar: LLLLRRRR

The data after FFmpeg audio decoding is stored in the AVFrame structure.

  • Packed format, frame.data[0] or frame.extended_data[0] contains all audio data.
  • Planar format, frame.data[i] or frame.extended_data[i] represents the data of the i-th channel (assuming channel 0 is the first one), the size of the AVFrame.data array is fixed at 8, if the number of channels exceeds 8 , Need to get the channel data from frame.extended_data.

The following is the sampling format used by FFmpeg to store audio internally. All Planar formats have the letter P after them.

 

enum AVSampleFormat {
    AV_SAMPLE_FMT_NONE = -1,
    AV_SAMPLE_FMT_U8,          ///< unsigned 8 bits
    AV_SAMPLE_FMT_S16,         ///< signed 16 bits
    AV_SAMPLE_FMT_S32,         ///< signed 32 bits
    AV_SAMPLE_FMT_FLT,         ///< float
    AV_SAMPLE_FMT_DBL,         ///< double

    AV_SAMPLE_FMT_U8P,         ///< unsigned 8 bits, planar
    AV_SAMPLE_FMT_S16P,        ///< signed 16 bits, planar
    AV_SAMPLE_FMT_S32P,        ///< signed 32 bits, planar
    AV_SAMPLE_FMT_FLTP,        ///< float, planar
    AV_SAMPLE_FMT_DBLP,        ///< double, planar
    AV_SAMPLE_FMT_S64,         ///< signed 64 bits
    AV_SAMPLE_FMT_S64P,        ///< signed 64 bits, planar

    AV_SAMPLE_FMT_NB           ///< Number of sample formats. DO NOT USE if linking dynamically
};

Description:

  • Planar mode is the internal storage mode of ffmpeg, and the audio files we actually use are all in Packed mode.
  • The audio sampling format of FFmpeg decoding audio output of different formats is not the same. The test found that the data output by AAC decoding is in floating-point AV_SAMPLE_FMT_FLTP format, and the data output by MP3 decoding is in AV_SAMPLE_FMT_S16P format (the mp3 file used is 16-bit deep). The specific sampling format can be viewed in the format member in the decoded AVFrame or the sample_fmt member in the AVCodecContext of the decoder.
  • Planar or Packed mode directly affects the operation of writing files when saving files. When operating data, you must first detect the audio sampling format.

5. Endianness

When it comes to endianness, two major CPU factions are bound to be involved. That is Motorola's PowerPC series CPU and Intel's x86 series CPU. The PowerPC series uses the big endian method to store data, while the x86 series uses the little endian method to store data. So what is big endian and what is little endian?

Big endian means that the most significant byte (MSB, Most Significant Bit) is stored at the lower address, and little endian is the least significant byte (LSB, Least Significant Bit) is stored at the lower address.

The following uses images to illustrate. For example, the storage order of the number 0x12345678 in two different endian CPUs is as follows:

Big Endian

Low address high address

----------------------------------------------------------------------------->

| 12 | 34 | 56 | 78 |

Little Endian

Low address high address

----------------------------------------------------------------------------->

| 78 | 56 | 34 | 12 |

All network protocols use the big endian way to transmit data. Therefore, the big endian method is also called network byte order. When two hosts with different byte order communicate, they must be converted into network byte order before sending data before transmitting.

6. PCM audio data processing

6.1 Separate the left and right channel data of two-channel PCM audio data

According to the PCM audio data of the two-channel LRLRLR, the data of the left and right channels can be separated by interleaving them.

 

int pcm_s16le_split(const char* file, const char* out_lfile, const char* out_rfile) {
     FILE *fp = fopen(file, "rb+");
     if (fp == NULL) {
         printf("open %s failed\n", file);
         return -1;
     }
     FILE *fp1 = fopen(out_lfile, "wb+");
     if (fp1 == NULL) {
         printf("open %s failed\n", out_lfile);
         return -1;
     }
     FILE *fp2 = fopen(out_rfile, "wb+");
     if (fp2 == NULL) {
         printf("open %s failed\n", out_rfile);
         return -1;
     }
     char * sample = (char *)malloc(4);
     while(!feof(fp)) {
         fread(sample, 1, 4, fp);
         //L
         fwrite(sample, 1, 2, fp1);
         //R
         fwrite(sample + 2, 1, 2, fp2);
     }
     free(sample);
     fclose(fp);
     fclose(fp1);
     fclose(fp2);
     return 0;
 }

7. Reference

  1. PCM wiki in multimedia
  2. PCM volume control
  3. PCM audio sample data processing
  4. stackoverflow/What is the difference between AV_SAMPLE_FMT_S16P and AV_SAMPLE_FMT_S16?


Author: smallest_one
link: https: //www.jianshu.com/p/fd43c1c82945
 

Guess you like

Origin blog.csdn.net/boonya/article/details/108658295