PCM format audio

PCM (Pulse Code Modulation) encoding, that is, the technology or format of digital audio data generated by pulse code modulation method, is a lossless encoding format and a method of digitizing audio analog signals. It needs to go through the process of sampling, quantization and encoding to Digitize audio analog signals.

  • PCM can be described from 6 aspects:
    1. Sampling rate;
    2. Sign: indicates whether the sample data is a signed bit, for example, the sample data represented by one byte, if signed, the range is -128 ... 127, unsigned is 0 … 255;
    3. Byte order: byte order is divided into big endian and little endian;
    4. Sample size: determines how many bits each sample consists of, that is, the quantization depth mentioned above, generally 16 bits is the most common 5.
    Number of channels: the common ones are monophonic and dual-channel.
    6. Integer or floating-point type: PCM sample data in most formats is represented by integer, but in some applications that require high precision, use floating-point type to represent PCM sample data.

  • Pitch, Volume, Timbre
    Pitch: The high or low frequency of a sound. Indicates the degree to which the human sense of hearing can distinguish the pitch of a sound. The pitch is mainly determined by the frequency of the sound, and it is also related to the intensity of the sound
    . The smaller the distance from the source, the louder it will be. (Unit: decibel dB)
    Timbre: Also known as the quality of the sound, the waveform determines the timbre of the sound. The sound has different characteristics due to the characteristics of different objects and materials. The timbre itself is an abstract thing, but the waveform is the intuitive expression of this abstraction. Different timbres have different waveforms. Typical timbre waveforms are square wave, sawtooth wave, sine wave, pulse wave, etc. Different timbres can be distinguished through waveforms.

  • Signed and unsigned conversions (the values ​​stored in memory remain the same, only the way of interpreting these locations is changed)

    Signed byte to unsigned int:
    byte b= -120;
    int a= b & 0xff

    Convert unsigned byte to signed byte:
    byte a=(byte)unsigned number#Direct strong conversion

  • Sample Precision Conversion

    Convert unsigned 8-bit PCM to signed 16-bit:
    Method 1: ((byte)(val+128))<<8
    Method 2: (val-128)<<8

    Signed 16-bit PCM to unsigned 8-bit:
    Method 1: parseInt(255 / (65535 / (val + 32768))
    Method 2: (val>>8)+128
    Method 3: ((val + 32768)>>8 ) & 0xFF

  • Calculation of decibels
    dB = 20 * log(A1 / A2) #Using unsigned 16-bit values ​​to calculate approximate decibels, 96.32dB=20*lg(65535); A1 = A2 * pow(10 ,
    db/20)
    It is calculated that: for every increase of 2 decibels, the original amplitude will be expanded by about 1.2589 times;

  • Mixing algorithm
    1. Average after linear superposition
    Advantages: no overflow, less noise;
    Disadvantages: excessive attenuation, affecting call quality;
    2. Normalized mixing (adaptive weighted mixing algorithm)
    idea: use more More bits (32 bits) are used to represent a sample of audio data. After the sound is mixed, find a way to reduce its amplitude so that it is still distributed within the range that can be represented by 16 bits. This method is called the normalization method;
    Method: To avoid overflow, the speech is attenuated with a variable attenuation factor. This attenuation factor also represents the weight of the voice, and the attenuation factor changes with the change of audio data, so it is called adaptive weighted mixing. When there is overflow, the attenuation factor is small, so that the overflowed data can be within the critical value after attenuation, and when there is no overflow, the attenuation factor is slowly increased, so that the data changes more gently.
    3. A mixing implementation of PCM pulse-coded audio signals found on newlc:
    if( data1 < 0 && data2 < 0)
    date_mix = data1+data2 - (data1 * data2 / -(pow(2,16-1 )-1));
    else
    date_mix = data1+data2 - (data1 * data2 / (pow(2,16-1)-1));
    4. Cut the time slice, the resampling algorithm
    can stack the sound of each channel together , so that the sampling rate of the sound is doubled. If the playback frequency of the sound is increased, the sound can be played normally, and the sound is superimposed; if you do not want to modify the playback output frequency of the sound, you can output the output you want by resampling the sound Frequency;
    5. Adaptive sound mixing algorithm
    The characteristics of the multi-channel audio signals participating in the sound mixing are used as their own proportions as weights to determine their proportions in the synthesized output.
    The specific principle can refer to this paper: Research on Fast Real-time Adaptive Mixing Scheme.
    This method should be better than the averaging method after linear superposition for a large number of audio tracks, but it may introduce noise.

  • The difference between big-endian mode and little-endian mode
    In big-endian mode, the high byte of word data is stored in the low address, while the low byte of word data is stored in the high address
    ; In the mode, the low address stores the low byte of the word data, and the high address stores the high byte of the word data.
    When an array is defined, the addresses of its elements are packed closely from low to high.

Guess you like

Origin blog.csdn.net/shuangmu9768/article/details/125145341