MP3 compression algorithm

mp3 also has sync word

MP3 has an 11-bit synchronization word
AAC ADTS that has a header, and a synchronization word to mark the beginning of a frame
AMR (cbr, vbr) also has a header

http://blog.csdn.net/sunnylgz/article/details/7615410



MP3 coding is mainly composed of 3 major functional modules, including hybrid filter bank (subband filter and MDCT), psychoacoustic model, quantization coding (bit and bit factor allocation and Huffman coding).
1. Hybrid filter bank. This part includes two parts of subband filter bank and MDCT. The sub-band filter bank coding completes the mapping of the sample signal from the time domain to the frequency domain, and decomposes the specified audio signal into 32 sub-bands through the band-pass filter bank for output. The 32 subbands output by the subband filter bank are of equal bandwidth, while the critical bandwidth derived from the psychoacoustic model is not. Therefore, in order to match each scale factor band for encoding with the critical band, it is necessary to The subband signals are MDCT transformed. After the output of the subband filter bank is sent to the MDCT filter bank, each bank will be subdivided into 18 frequency lines, resulting in a total of 576 frequency lines. The number of bits allocated to the 576 spectral lines is then determined using the signal-to-mask ratio of the subband signals calculated in the psychoacoustic model.




2. Psychoacoustic models. The psychoacoustic model takes advantage of the masking effect of the human auditory system to remove a large number of irrelevant signals, so as to achieve the effect of compressing audio data. In order to accurately calculate the masking threshold, the signal is required to have better frequency domain resolution, so the signal is Fourier transformed before using the psychoacoustic model. MPEG-I provides two psychoacoustic models. The first model is simple to calculate and provides appropriate accuracy when encoding at high bit rates. The second model is more complex and is generally used when encoding at lower bit rates. The psychoacoustic model II is generally used in MP3 encoding. The purpose of the psychoacoustic model is to find the masking threshold value of each subband, and use this to control the quantization process. The implementation process of the psychoacoustic model is generally to first use FFT to obtain the spectral characteristics of the signal, and to find the tonal components (some called musical components) and non-tonal components (or noise components) at each frequency point according to the spectral characteristics; The curve determines the masking domain value of each tonal component and non-tone component at other frequency points; finally, the overall masking domain of each frequency point is obtained and converted into the coding subband. For the noise generated after the quantization of the spectral value output by the subband filter bank, if the noise can be controlled below the masking threshold value, the decoded result of the final compressed data can be indistinguishable from the original signal. The masking ability of a given signal depends on its frequency and loudness, so the final output of a psychoacoustic model is the signal-to-mask radio (signal-to-maskradio) ratio, which is the ratio of the signal strength to the masking threshold.




3. Quantization coding. Quantization coding uses a three-layer iterative loop model for bit allocation and quantization. These three layers include: frame loop, outer loop and inner loop. The frame loop resets all iterative variables, calculates the maximum number of bits that can be provided to each section of data, and then calls the outer iterative model; the outer iterative model first uses the inner iterative model, which quantifies the input vector by incrementing The quantization step size enables the quantized output to be encoded within a certain bit limit. Huffman coding has a limit on the maximum value of quantization, so it is necessary to judge whether all quantization values ​​exceed the limit. If it exceeds the limit, the inner iteration loop needs to increase the quantization step size and re-quantize. Then determine the number of bits of Huffman coding, so that the number of bits occupied is less than the maximum number of bits that can be provided by each section of coding calculated by the frame cycle, otherwise, the quantization step size should be increased for re-quantization. When the quantization meets the requirements, store the final scale factor value, jump out of the outer loop, and calculate the number of bits used to store each section of data in the frame loop. 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325904459&siteId=291194637