Audio and video codec - audio codec format AAC (Advanced Audio Coding)

AAC (Advanced Audio Coding) is a widely used audio codec format that uses advanced compression algorithms to provide higher audio quality and lower bit rates.

1. Principle:
AAC is based on the acoustic model and the principle of perceptual coding, and uses the perceptual characteristics of sound by the human ear to compress audio signals. It mainly uses the following technologies:

1. Frequency domain analysis: Convert the audio signal into a frequency domain representation, usually using fast Fourier transform (FFT) for spectrum analysis.

2. Time-frequency masking effect: Utilize the masking characteristics of the human ear to mask weaker signals when stronger signals exist, reducing the amount of coding of weaker signals.

3. Frequency linearization: Frequency linearization is performed on audio signals represented in the frequency domain, aiming to improve the encoding effect of audio signals at low bit rates to better adapt to the perceptual characteristics of the human ear.

Here are several common frequency linearization methods:

①. Frequency grouping:
Frequency grouping is a method of dividing the frequency range into multiple sub-bands. The frequency spectrum of an audio signal is divided into multiple non-overlapping sub-bands, and the frequency range within each sub-band is relatively narrow. The purpose of this is to better accommodate differences in the human ear's perceptual sensitivity to different frequency ranges. During the encoding process, the spectral coefficients within each subband can be processed differently to better control the encoding quality.

②. Perceptual weighting:
Perceptual weighting is to better simulate the difference in sensitivity of the human ear to different frequencies by applying different weighting coefficients to the spectral coefficients. According to the perceptual characteristics of the human ear, for higher frequency signals, a lower weighting coefficient can be applied, and for lower frequency signals, a higher weighting coefficient can be applied. This can effectively allocate the bit rate so that both high-frequency signals and low-frequency signals can be encoded with appropriate accuracy.

③. Nonlinear transformation:
Nonlinear transformation is a method that changes the distribution of spectral coefficients by applying a nonlinear function to them. Common nonlinear transformation functions include logarithmic functions, power functions, etc. Through nonlinear transformation, the dynamic range of the spectrum coefficients can be changed, the coefficients with smaller amplitudes can be amplified, and the coefficients with larger amplitudes can be compressed to better adapt to the human ear's response to signals of different amplitudes. perceptual characteristics.

④. Dynamic bit rate control:
Dynamic bit rate control adjusts the bit rate of the encoder in real time based on the characteristics of the audio signal and encoding requirements. In frequency linearization, dynamic code rate control can allocate different bit rates to different frequency ranges based on the importance and perceptual sensitivity of the frequency range. For frequency ranges to which the human ear is more sensitive, more bitrate can be allocated to preserve more detail and sound quality.

4. Entropy coding: Apply entropy coding technology to efficiently compress frequency domain information. Entropy coding is a commonly used data compression technology that aims to reduce the number of bits required to represent data by utilizing the statistical characteristics of data. It is based on the concept of entropy in information theory. Symbols with high frequency are represented by fewer bits, while symbols with low frequency are represented by more bits.

Principle of entropy coding:
The principle of entropy coding is based on the concept of Shannon entropy in information theory. Shannon entropy is a measure of the average information content of a discrete random variable. In entropy coding, according to the statistical characteristics of the data, symbols that appear more frequently are represented by shorter codes, while symbols that appear less frequently are represented by longer codes. This effectively reduces the number of bits required to represent the overall data.

The general steps of entropy coding are as follows:

①. Statistical symbol frequency: Perform statistical analysis on the data to be encoded and calculate the frequency of occurrence of each symbol (such as characters, pixels, etc.).

②. Construct a coding table: Construct a coding table based on symbol frequency, mapping high-frequency symbols to shorter codes, and mapping low-frequency symbols to longer codes. Common coding tables include Huffman trees, arithmetic coding tables, etc.

③. Encoding: According to the constructed encoding table, replace each symbol in the original data with the corresponding encoding.

④Storage encoding: Store the encoded data as a bit stream, usually in bits.

⑤. Decoding: Use the same encoding table to decode the stored encoded data into original data. The decoding process is the reverse process of the encoding process, and the encoding is restored to the original symbol by looking up the encoding table in reverse.

Advantages of entropy coding:

can perform adaptive coding according to the statistical characteristics of the data, and higher frequency symbols can be represented by fewer bits, thereby obtaining a higher compression rate. Common entropy coding algorithms include Huffman coding, arithmetic coding, adaptive coding, etc. In practical applications, according to the characteristics of the data and compression requirements, a suitable entropy coding algorithm is selected to obtain better compression effects.

2. Encoding steps:

1. Audio framing: Divide the audio signal into fixed-length frames. Each frame usually has tens of milliseconds of audio data.

2. Window function application: Apply the window function to each audio frame to reduce spectrum leakage.

3. Spectrum analysis: Spectrum analysis is performed on the audio frame after the window function is applied. FFT is usually used to convert the time domain signal into a frequency domain representation.

4. Perception model: Based on the perceptual characteristics of the human ear, the masking effect is processed on the spectrum to reduce the amount of coding in the masked area.

5. Quantization and coding: Quantize the spectrum so that coefficients with smaller amplitudes are represented by fewer bits, while coefficients with larger amplitudes are represented by more bits.

6. Entropy coding: Apply entropy coding technology (such as Huffman coding, etc.) to further compress the quantized data.

7. Packaging and encapsulation: Pack the compressed audio data into an AAC format data stream, and add audio metadata and synchronization information.

3. The decoding process of AAC is the reverse process of the encoding process, including the following steps:

1. Data decapsulation: extract audio data and metadata from the AAC data stream.

Data decapsulation is a common process in data communication and storage, used to extract data in the encapsulated format for subsequent processing or playback.

The principle of data decapsulation:

According to the specifications of the packaging format, various components in the packaging format are identified and extracted, including audio, video, subtitles, metadata, etc. The encapsulation format usually contains information describing the media data structure, media stream relationships, timeline information, and metadata. Therefore, during the decapsulation process, it is necessary to parse this information and separate the media data and related information.

The general steps for data decapsulation are as follows:

①. Identify the packaging format: First, you need to identify the packaging format used, such as common MP4, AVI, MKV, etc. The encapsulation format usually has a specific file header or identifier, and the encapsulation format can be determined by reading the file header or identifier.

②. Parse the encapsulation format: According to the specification of the encapsulation format, parse the structure and metadata in the encapsulation format. This includes reading the index table, timestamp information, media stream description information, etc. in the encapsulation format. During the parsing process, the data of each component needs to be extracted from the file according to the syntax rules of the encapsulation format.

③. Extract media data: Based on the information obtained through analysis, media data such as audio, video, and subtitles are extracted. This can be done by reading the media stream data chunks in the encapsulated format and sorting and organizing them according to the timestamp information to get the original media stream.

④. Decode media data: Decode the extracted media data to restore it to the original audio, video, etc. This includes decoding audio and video using appropriate decoders to obtain raw media data that can be played or processed.

⑤ Processing additional information: The decapsulation process can also include processing additional metadata information, such as media descriptions, subtitles, chapters, etc. This additional information can be used for media display, search, indexing and other functions.

Metadata is data that describes data. It provides information about the data, such as its structure, characteristics, attributes, and relationships. Metadata can help users understand and manage data, and support data organization, retrieval, analysis and processing.

Metadata usually includes the following aspects of information:

  1. Descriptive Metadata: Descriptive metadata provides information about the content of the data, such as the title, abstract, keywords, subject, author, creation date, etc. of the data. It can help users quickly understand the basic characteristics and content of data.

  2. Structural Metadata: Structural metadata describes the organizational structure and format of data. It can tell users how the data is organized, how to segment and link it, etc. For example, for multimedia data, structural metadata can describe the relationship and timing of audio, video, subtitles, etc.

  3. Administrative Metadata: Administrative metadata includes information related to data management and maintenance, such as data access rights, storage location, ownership, version control, data quality, etc. It helps data administrators effectively manage and maintain data resources.

  4. Technical Metadata: Technical metadata provides information related to data processing and exchange, such as data file format, encoding method, resolution, sampling rate, data source, data creation tools and parameters, etc. This information is very important for data processing, parsing and transformation.

Metadata plays a key role in data management and data applications. It can help users better understand and use data resources. Through metadata, users can quickly search and locate the required data, understand the credibility and applicability of the data, perform data integration and integration, and support data analysis, mining and decision-making. Metadata also plays an important role in data sharing and data exchange, ensuring the correct interpretation and correct use of data.

2. Entropy decoding: Entropy decoding is performed on the compressed data to restore the quantized spectral coefficients.

3. Inverse quantization: Inverse quantize the decoded spectral coefficients and restore them to their pre-quantization representation.

4. Spectrum synthesis: Spectrum synthesis is performed on the inverse quantized spectral coefficients and restored to the audio frame represented in the frequency domain.

5. Time domain synthesis: Perform inverse FFT on the audio frame represented in the frequency domain and convert it into a time domain signal.

6. Window function removal: Apply the inverse window function to the time domain signal to remove the influence of the window function.

7. Frame overlap and synthesis: Perform appropriate frame overlap and synthesis on the decoded audio frames to achieve smooth audio output.

Through the above encoding and decoding steps, AAC can provide high-quality audio compression and decompression at a lower bit rate, and is widely used in digital audio transmission, storage, broadcast and other fields.​ 

Guess you like

Origin blog.csdn.net/qq_42233059/article/details/135003931