Audio and video codec technology: AAC audio coding technology

Contents of this article:

  • 1. Overview of AAC encoding
  • 2. Brief description of AAC encoding specification
  • Three, the characteristics of AAC encoding
  • 4. AAC audio file format

1. Overview of AAC encoding

AAC is the abbreviation of Advanced Audio Coding (Advanced Audio Coding), which appeared in 1997. It was originally an audio coding technology based on MPEG-2, with the purpose of replacing the MP3 format. In 2000, the MPEG-4 standard was introduced, and AAC re-integrated other technologies including SBR or PS features. At present, AAC can be defined as a lossy audio compression format defined by the MPEG-4 standard

The benefits of this article, free C++ audio and video learning materials package, technical video/code, including (audio and video development, interview questions, FFmpeg, webRTC, rtmp, hls, rtsp, ffplay, codec, push-pull stream, srs)↓↓↓ ↓↓↓See below↓↓Click at the bottom of the article to get it for free↓↓

2. Brief description of AAC encoding specification

There are 9 specifications of AAC to meet the needs of different occasions:

MPEG-2 AAC LC Low Complexity Specification (Low Complexity) Note: It is relatively simple, without gain control, but improves the coding efficiency, and can find a balance between the coding efficiency of the medium bit rate and the sound quality

MPEG-2 AAC Main Specification

MPEG-2 AAC SSR Variable Sampling Rate Specification (Scaleable Sample Rate)

MPEG-4 AAC LC Low Complexity Specification (Low Complexity) --- the audio part of the MP4 file, which is more common in mobile phones now, includes the audio file of this specification

MPEG-4 AAC Main Specification Note: Contains all functions except gain control, which has the best sound quality

MPEG-4 AAC SSR Variable Sampling Rate Specification (Scaleable Sample Rate)

MPEG-4 AAC LTP Long Term Prediction Specifications (Long Term Prediction)

MPEG-4 AAC LD Low Delay Specification (Low Delay)

MPEG-4 AAC HE high efficiency specification (High Efficiency) --- this specification is suitable for low bit rate encoding, supported by Nero ACC encoder

The popular Nero AAC encoding program only supports the three specifications of LC, HE, and HEv2, and the encoded AAC audio shows that the specifications are all LC. HE is actually AAC (LC) + SBR technology, and HEv2 is AAC (LC) + SBR + PS technology;

Here is another explanation of the relevant content of HE and HEv2:

HE: HE-AAC v1 (also known as AACPlusV1, SBR), implements the AAC (LC) + SBR technology using the container method. SBR actually stands for Spectral Band Replication (frequency band replication). To briefly describe, the main frequency spectrum of music is concentrated in the low frequency band, and the high frequency band has a small but very important range, which determines the sound quality. If the entire frequency band is encoded, if it is to protect the high frequency, the low frequency band will be encoded too finely and the file will be huge; if the main component of the low frequency is preserved and the high frequency component is lost, the sound quality will be lost. SBR splits the spectrum, encodes the low frequency separately to save the main components, and amplifies the high frequency separately to save the sound quality, "overall considerations", and preserves the sound quality while reducing the file size, which perfectly resolves this contradiction.

HEv2: The container method includes HE-AAC v1 and PS technology. PS refers to "parametric stereo" (parametric stereo). The original stereo file is twice the file size of a single channel. But there is a certain similarity between the sounds of the two channels. According to Shannon's information entropy coding theorem, the correlation should be removed to reduce the file size. So PS technology stores all the information of a channel, and then spends a few bytes to describe the difference between another channel and it with parameters.

Three, the characteristics of AAC encoding

(1). AAC is an audio compression algorithm with a high compression ratio, but its compression ratio is much higher than that of older audio compression algorithms, such as AC-3, MP3, etc. And its quality is comparable to that of an uncompressed CD.

(2). Like other similar audio coding algorithms, AAC also uses a transform coding algorithm, but AAC uses a filter bank with a higher resolution, so it can achieve a higher compression ratio.

(3). AAC uses the latest technologies such as temporary noise reshaping, backward adaptive linear prediction, joint stereo technology and quantized Huffman coding. The use of these new technologies has further improved the compression ratio.

(4). AAC supports more sampling rates and bit rates, supports 1 to 48 audio tracks, supports up to 15 low-frequency audio tracks, has multi-language compatibility, and has up to 15 embedded data flow.

(5). AAC supports a wider sound frequency range, the highest can reach 96kHz, the lowest can reach 8KHz, which is much wider than the 16KHz-48kHz range of MP3.

(6). Unlike MP3 and WMA, AAC hardly loses the very high and very low frequency components in the sound frequency, and is closer to the original audio frequency spectrum structure than WMA, so the fidelity of the sound is better.

(7). AAC uses an optimized algorithm to achieve higher decoding efficiency, and requires less processing power when decoding.

4. AAC audio file format

1. ACC audio file format type

AAC audio file formats are ADIF & ADTS:

ADIF : Audio Data Interchange Format audio data exchange format. The characteristic of this format is that the beginning of the audio data can be found for sure, without decoding starting in the middle of the audio data stream, that is, its decoding must be performed at a well-defined beginning. This format is commonly used in disk files.

ADTS : Audio Data Transport Stream audio data transport stream. The characteristic of this format is that it is a bit stream with a sync word, and decoding can start anywhere in this stream. Its characteristics are similar to the mp3 data stream format.

Simply put, ADTS can be decoded in any frame, which means that it has header information for each frame. ADIF has only one unified header, so all the data must be decoded after getting it. The formats of these two headers are also different. Generally, the encoded and extracted audio streams are in ADTS format.

The ADIF file format of AAC is as follows:

header()

raw_data_stream()

The format of a frame in an AAC ADTS file is as follows:

...

syncword

header()

error_check()

raw_data_block()

...

The blank rectangles on both sides in the ADTS format represent the data before and after the current frame.

2. Header structure of ADIF

The header information of ADIF is as follows:

The ADIF header information is located at the beginning of the AAC file, followed by continuous Raw Data Blocks.

3. Header structure of ADTS

The length of an AAC original data block is variable, and an ADTS frame is formed by adding ADTS header encapsulation to the original frame. Relatively important information in the ADTS header includes: sampling rate, channel number, and frame length. Each AAC stream with ADTS header information will clearly tell the decoder the information it needs, and the decoder can parse and read it. Generally, the header information of ADTS is 7 bytes, divided into 2 parts:

  - adts_fixed_header(); —— fixed header information, each frame in the header information is the same.

  - adts_variable_header(); —— variable header information, header information is variable from frame to frame.

Fixed header information for ADTS:

Syncword: always 0xFFF, representing the start of an ADTS frame, used for synchronization, the decoder can determine the start position of each ADTS through 0xFFF. Because of its existence, decoding can start anywhere in this stream, that is, it can be in any frame decoding**.

ID:MPEG Version: 0 for MPEG-4, 1 for MPEG-2

Layer:always: '00'

Protection_absent:Warning, set to 1 if there is no CRC and 0 if there is CRC

Profile: indicates which level of AAC to use, such as the value of profile is equal to the value of Audio Object Type minus 1, that is, profile = MPEG-4 Audio Object Type - 1

sampling_frequency_index : subscript of the sampling rate

channel_configuration : The number of channels. For example, 2 means stereo and two channels.

aac_frame_length: The length of an ADTS frame including ADTS header and AAC original stream.

adts_buffer_fullness: 0x7FF indicates that it is a code stream with variable code rate.

number_of_raw_data_blocks_in_frame: Indicates that there are number_of_raw_data_blocks_in_frame + 1 AAC raw frames in the ADTS frame.

When actually developing AAC codec, especially when encapsulating ADTS frames, how to set the value of the relevant Header, you can refer to the following wiki content:

  • https://wiki.multimedia.cx/index.php?title=MPEG-4_Audio
  • https://wiki.multimedia.cx/index.php/ADTS

Note: ACC LC and HE have different sampling rate settings. The LC format is a normal index, and the HE format index is the corresponding sampling index after dividing by 2. This is because: HE uses SBR technology, that is, Spectral Band Replication (frequency band replication) ), so the same audio content is stored, and the HE file is smaller. When used, the sampling rate is half that of LC.

Variable header information for ADTS:

(1) The purpose of Syncword is to find out the position of the frame header in the bit stream. The frame header synchronization word in ADTS format is 12-bit "1111 1111 1111".

(2) The header information of ADTS is composed of two parts, one is fixed header information, followed by variable header information. The data in the fixed header is the same every frame, while the variable header is variable from frame to frame.

4. AAC file processing flow

(1). Determine the file format and determine whether it is ADIF or ADTS

(2). If it is ADIF, unpack the ADIF header information and skip to step 6.

(3). If it is ADTS, look for the sync header.

(4). Decode ADTS frame header information.

(5). If there is an error detection, perform error detection.

(6). Deblocking information.

(7). Solve element information.

Note: Sometimes when processing the AAC audio stream (for example: extract the ES stream of AAC audio from the FLV encapsulation format and send it to the hardware decoder), the encoded AAC file cannot be played on the PC or mobile phone, resulting in playback errors , the most likely reason is that each frame of the AAC file lacks the packaging and splicing of the ADTS header information file. In this case, the header file ADTS needs to be added.

The benefits of this article, free C++ audio and video learning materials package, technical video/code, including (audio and video development, interview questions, FFmpeg, webRTC, rtmp, hls, rtsp, ffplay, codec, push-pull stream, srs)↓↓↓ ↓↓↓See below↓↓Click at the bottom of the article to get it for free↓↓

Guess you like

Origin blog.csdn.net/m0_60259116/article/details/131294521