Detailed explanation of the details of the voice noise reduction module ANS in webRTC (1)

ANS (adaptive noise suppression) is one of the audio-related core modules in webRTC and is used by many companies. Since 2015, I have used webRTC's 3A (AEC/ANS/AGC) module in several products. In the past, it was only used, and the algorithm principle was only a preliminary understanding. In the past six months, I have used my spare time to read the two books "Speech Enhancement: Theory and Practice" and "Real-time Speech Processing Practice Guide". Debugging, basically mastered the algorithm implementation. I want to write out my understanding of ANS. Since there are many details in the content, let's make a series. The ANS in webRTC is based on Wiener filtering to reduce noise. This article will first talk about the basic principles of Wiener filtering.

As shown in Figure 1, the input signal y(n) generates an output signal x(n) after passing through a filter, and it is hoped that x(n) is as close as possible to the desired signal d(n). This can be achieved by calculating the estimation error e(n) and minimizing it. The optimal filter that can minimize this estimation error is called a Wiener filter.

Usually the Wiener filter is linear and is an FIR filter because the FIR filter is stable and it is linear for easy calculation. Thus the filter output x(n) can be written as Equation 1:

                                   (1)

Among them, h(k) is the filter coefficient, and M is the number of filters, that is, an M-order filter. x(n) can be rewritten as formula 2:

                                                                             (2)

Where h is a filter coefficient vector with M rows and 1 column, and y is an input vector with M rows and 1 column including the past M samples. h and y are represented as follows:

So is a real value.

The estimated error e(n) can be expressed as Equation 3:

                                                 (3)

In order to find the optimal filter coefficient, the statistical mean square value of the estimation error is obtained, that is, Equation 4:

                                                                              (4)

Where E[•] represents expectation. because

so

make

Thus, formula 5 is obtained:

                                       (5)

After expansion, the following series of formulas are obtained:

because

so

The definition represents the autocorrelation between two input values, and n represents the serial difference. so:

Redefinition means the cross-correlation between the input value and the expected output value, and n means the sequence difference. so:

So the above formula 5 can be rewritten as formula 6

      (6)

After expansion, formula 7 is obtained:

                                    (7)

Then rewrite it into the following form to get formula 8:

                                                   (8)

Equation 8 above is a finite impulse response filter. Consider a bilateral infinite pulse filter again, in the form of Equation 9:

                                                           (9)

Then Equation 8 can be written as Equation 10:

                                                 (10)

Written in the form of convolution, get formula 11:

                                                                                             (11)

Perform Fourier transform on both sides, and the convolution in the time domain becomes the product in the frequency domain, so the formula 12 is obtained:

                                                           (12)

where is the autopower spectrum of the input, which is equal to the Fourier transform of the autocorrelation. is the cross-power spectrum of the input and output, and the cross-power spectrum is equal to the Fourier transform of the cross-correlation. So get formula 13:

                                                                                                      (13)

The above formula is the general form of the frequency domain Wiener filter.

If the Wiener filter is to be used for speech noise reduction, y(n) in Figure 1 is the noisy speech signal, and x(n) is the pure speech signal. Assuming that n(n) represents a noise signal, if only additive noise is considered, the relationship between noisy speech signal, pure speech signal and noise signal is as follows: y(n) = x(n) + n(n), do Fourier The expression after leaf transformation is as follows:

 Assuming that the noise is uncorrelated with speech and has zero mean, then

where is the autopower spectrum of pure speech, and is the autopower spectrum of noise. Put and into Equation 13 to get Equation 14:

                                                                                 (14)

If it is defined as the prior signal-to-noise ratio (prior SNR, representing the power ratio of pure speech and noise, and the posterior signal-to-noise ratio (post SNR) represents the power ratio of noisy speech and noise), then Equation 14 can be Expressed as Equation 15:

                                                                                              (15)

Equation 15 is the general expression form of the Wiener filter, which is expressed by the priori signal-to-noise ratio. The ANS in webRTC is based on this expression for voice noise reduction. The next article will talk about the processing flow of ANS and some details of the conversion of voice signals in the time domain and frequency domain.

Guess you like

Origin blog.csdn.net/david_tym/article/details/120576289