Hass Multimedia (MPP) Development (7) - Audio Module (AUDIO)

(I. Introduction:

AUDIO module includes an audio input, an audio output, audio encoding, audio decoding four submodules. Audio input and output modules for audio input and output functions of the control chip Hi35xx audio interface.

Audio encoding and decoding modules provide audio codec function G711, G726, ADPCM formats, and supports the original audio file recording and playback LPCM format.

(B) audio interface:

nvp6134:

  • nvp6134 support 4-way voice and 1 microphone input, in fact, is the analog voice signal into a digital signal, officially known as the recording mode.
  • 1 also supports channel playback mode, the so-called playback mode, the digital audio data is input nvp6134, through the 6134 digital signal into an analog audio signal output.
  • nvp6134支持SSP/DSP/I2S Interface (Master/Slave mode)

Hi3521A:

  • Three-way I2S / PCM Interface
  • - 2 inputs, supports 16 composite inputs
  • - 1 output, dual channel (two composite) output
  • 16bit support voice input and output

As can be seen from the above, NVP6134 + Hi3521 this manner only with the data communication is performed using I2S. When debugging needs to be nvp6134 and Hass AI (Audio) on the corresponding module. Is a master mode, slave mode to another is required.


(C) enhance the sound quality (Voice Quality Enhancement)

Hi35xx audio input and output modules to support enhanced sound quality audio data (Voice Quality Enhancement) process. Functions include echo cancellation, noise reduction voice, automatic gain, high pass filtering, recording, noise reduction, six equalizer processing module

AEC

AEC is echo cancellation (Acoustic Echo Cancellation) module, major work is required in the echo removal scenario: The IPC speaking, the distal end of the voice data in the playback device AO, at this time the voice data acquired locally by the MIC, which supports elimination AO device plays voice data in the recorded sound (echo).

ANR

ANR noise reduction for voice (Audio Noise Reduction) module, the main work is required to remove ambient noise, voice input is reserved at the scene.

RNR algorithm and compared, ANR is more particular about cleanliness, noise processing. ANR will filter out ambient sounds, the main voice data retention, and will bring some loss of detail. So ANR algorithm is more suitable for IPC and NVR scene. In both scenarios, we hope to be able to retain the focus on the human voice, to filter out other noise.

RNR

RNR is recording noise cancellation (Record Noise Reduction) module, the main work is required to remove ambient noise, but retain small signal input at the scene.

ANR algorithm and compared, RNR more attention to detail input (small signal) of retention, RNR will retain a small input signals at the same time noise reduction, noise reduction so the intensity will be lower, but it can retain more live sound, real reduction scenarios for sports DV scene.

HPF

High-pass filter HPF (high-pass filte) module that is responsible for removing low frequency noise.

Low frequency noise sources often hardware noise or frequency noise, the performance of Honghong class uncomfortable sound. We can use spectral analysis veneer stream recorded in a quiet environment to determine whether to join the module. If the low-frequency noise is not very obvious, and customers need to keep the low-frequency portion of the sound source, adding that the module is not recommended.

The recommended configuration parameters to AUDIO_HPF_FREQ_120 ​​or AUDIO_HPF_FREQ_150.

AGC

AGC to AGC (Auto Gain Control) module, is mainly responsible for the gain control output level, the input sound volume when there are changes in the size, output volume can be controlled within a more consistent range, mainly in the need to ensure sound and will not work too big or too small at the scene.

EQ

EQ equalizer processor module (Equalizer) module, the main audio data equalization processing audio data to adjust the gain of each frequency sound

GAIN

GAIN module is the volume control module is mainly used to adjust the volume after AGC on.

RES

RES resampling module (Resampler) module. When AI uplink or downlink paths open AO VQE functional modules, each before and after treatment in the presence of a resampled first role is to convert the audio data in the input sampling rate supported by the module can successfully work sampling rate (8kHz / 16kHz / 48kHz), then the data at the second sampling rate of work into an output sample rate.

(D) audio format

Hass audio encoding and decoding module provides audio codecs, audio format G711, G726, ADPCM format described below:

LPCM

LPCM, linear pulse code modulation, i.e. linear pulse code modulation, is a non-compressed digital audio technology, an uncompressed sound reproduction, has been in general the highest quality audio CD, DVD, and other requirements of various occasions wide range of applications. Various applications of LPCM (PCM) principle is the same, except that different sampling frequencies and quantization accuracy.

It has been able to digitize sound, because the human ear can hear the sound frequency is not infinitely wide, primarily in below 20kHz. According to the sampling theorem, only the sampling frequency is greater than 40kHz, in order to recreate the original sound without distortion. Such as CD uses a sampling frequency of 44.1kHz, 48kHz or others mainly 96kHz.

PCM (Pulse Code Modulation) converting an analog voice signal into a digital signal coding method. Primarily through three processes: sampling, quantization and coding. Sampling process becomes continuous time analog discrete-time signal, sampling the amplitude of the continuous signal, the sampled signal becomes the quantization process discrete-time, discrete-amplitude digital signal, the encoding process the encoded quantized signal into a binary code output.

Quantizing the quantized divided into linear and nonlinear quantization. Quantizing the quantized linear over the entire range of the quantization intervals are equal. Ranging from non-linear quantization using the quantization interval. Quantization interval determined by the number of encoded binary digits. For example, CD using 16bit linear quantization, the quantization interval of L = 65536. More bits (n-), higher accuracy, signal to noise ratio SNR = 6.02n + 1.76 (dB) higher. But encoded binary digit is not unlimited, determined in accordance with the required data rate. For example: CD attainable data rate 2 × 44.1 × 16 = 1411.2Kbit / s.
Simple to understand, the LPCM the original analog sound waveform is sampled and the digital signal obtained after linear quantization, the data signals are not compressed.

ADPCM

ADPCM :Adaptive Differential Pulse Code Modulation

Speaking ADPCM, you have to start with the next DPCM. Differential (difference) or Delta PCM (DPCM) record is a difference value as the previous value. DPCM quantizing the difference signal, the number of quantization bits can be further reduced. Compared with equal PCM, this coding requires only 25% of the number of bits. This is similar to some compression of the concept of the video, with the difference from the preceding frame of the frame of the frame recording is performed for the purpose of compression.

ADPCM (ADPCM Adaptive Differential Pulse Code Modulation), is a directed to a 16bit (or more) of the sound waveform data lossy compression algorithm, 16bit audio stream data each time it is sampled at 4bit storage, so that the compression ratio of 1 : 4 and compression / decompression algorithm is very simple, it is a low space consumption, a good way to get a high-quality sound.

The algorithm uses the correlation between the speech signal samples and for the characteristics of non-stationary speech signals, using adaptive prediction and adaptive quantization, i.e. quantizer and the predictor parameters can be adaptive to the statistical characteristics of the input signal with the or a state close to the optimal parameter in the rate of 32kbps ◎ 8khz network level can be given voice quality.

Characteristics: ADPCM integrated adaptive differential characteristic APCM and DPCM system characteristics, good performance is a waveform coding. Its core idea is:
① the use of adaptive thinking of changing the quantization step size, even with a small quantization step (step-size) to encode small difference, using a large order to quantify the large difference coding;
② use the estimated sample values of a past input samples a predicted value, so that always the minimum difference between the actual sample value and the predicted value.

  • Advantages: low complexity of the algorithm, the compression ratio is small, the time from the codec (relative to other technologies)
  • Disadvantages: sound quality in general

Simple to understand, ADPCM is to LPCM data lossy compression, the compression process in the case of small quantization parameter is small, the case of the large big difference according to their own resize; in addition it can then predict the statistics before subsequent data poor value, as far as possible that the difference is relatively small.

G711

International Telecommunication Union ITU-T G711 is a customized set out voice compression standard, which represents the logarithmic PCM (logarithmic pulse-code modulation) standard samples, mainly for the telephone. It is mainly used for PCM audio samples, the sampling rate is 8k per second. It uses a 64Kbps channel to transmit voice signals uncompressed. Since compression ratio of 1: 2, i.e., the 16-bit data is compressed into 8 bits. G.711 is the mainstream of the waveform audio codec.

There are two main compression algorithm G.711 standard. One is ulaw algorithm (also known as often ulaw, ulaw, mu-law), used primarily in North America and Japan; the other is the A-law algorithm, mainly used in Europe and other parts of the world. Where the latter is specifically designed to facilitate computer processing

G711 content is 14bit (uLaw) PCM 13bit data encoding or (aLaw) 8bit samples into a data stream, the broadcast time is reduced to in this 8bit or 13bit 14bit data play, this is different from the overall MPEG or consider then the data section of the codec practice, the G711 is a waveform codec algorithm, is a sample corresponding to a coding, so that the compression ratio is fixed:

  • 8/14 = 57% (uLaw)
  • 8/13 = 62% (aLaw)

Simple to understand, G.711 voice analog signal is a nonlinear quantization, bitrate is 64kbps.

G726

G.726 audio coding algorithm defined by ITU-T. In 1990 CCITT (ITU predecessor) made on the basis of G.721 and G.723 standards. G.726 PCM signal may be converted 64kbps to 40kbps, 32kbps, 24kbps, 16kbps an ADPCM signal.

The most common way is 32 kbit / s, but since only half the rate of G.711, the available space will be a doubling of the network. G.726 specifies a 64 kbpsA-law or μ-law PCM signal is converted to how 40, 32, 24, or 16 kbps ADPCM the channel. In these channels, the channels 24 and 16 kbps being used for voice transmission Digital Circuit Multiplication Equipment (DCME) is, the 40 kbps channel signal were used for data demodulation in DCME (especially 4800 kbps or higher modem).
Indeed, G.726 encoder input generally G.711 encoder output: 64kbit / s A-law or the μ-law; G.726 algorithm is essentially a ADPCM, adaptive quantization algorithm, the 64kbit / s to compress 32kbit / s.

AAC

AAC, stands for Advanced Audio Coding, Chinese name: Advanced Audio Coding, is a designed specifically for audio data file compression format. MP3 and different, it uses a new algorithm to encode, more efficient, with a higher "cost." Using the AAC format, it can make people feel at significantly lower sound quality is not the premise, more compact. Apple ipod, Nokia mobile phone supports AAC audio file formats.

  • Advantages: with respect to the quality mp3, AAC format better, smaller file size.
  • Inadequate: AAC belongs to the lossy compression format, there is a gap "in essence" of sound quality compared with the popular APE, FLAC and other lossless formats. Additionally, the transmission speed is faster than USB3.0 and high-capacity 16G is accelerating popularity of MP3, AAC also makes the head of "small" aura ceased to exist.

AAC audio is the next generation lossy compression techniques, which through additional coding techniques (such as PS, SBR, etc.), derived from the LC-AAC, HE-AAC, three major coding HE-AACv2, LC-AAC is HE-AACv2 more traditional AAC, relatively speaking, mainly used for high bit rate (> = 80Kbps), HE-AAC (equivalent AAC + SBR) code is mainly used for low (<= 80Kbps), and the newly introduced (corresponding to the AAC + SBR + PS) is mainly used for low rate (<= 48Kbps), in fact the majority of the encoder is set to <= 48Kbps PS automatically enabled technology, and> 48Kbps PS does not increase, equivalent to normal HE-AAC.

 

Note that audio channel binding: AI and AO module can be bound to AENC two modules, that binds to both video channels are independent, and are not affected. That's bound to vpss AI does not have any relationship.

 

This chapter from the frequency measurement engineering "catalog preface" to get the address provided

 

The first article in this column "catalog preface," lists the complete directory column, read by directory order to help your understanding.

 

 

 

Published 175 original articles · won praise 262 · views 700 000 +

Guess you like

Origin blog.csdn.net/li_wen01/article/details/105064076