WebRTC audio algorithm

WebRTC audio algorithm with complete C code

WebRTC provides a set of audio processing engines,

Contains the following algorithms:

AGC Automatic Gain Control (Automatic Gain Control)

ANS noise suppression (Automatic Noise Suppression)

AEC is Acoustic Echo Cancellation (Acoustic Echo Canceller for  Mobile )

VAD is Voice Activity Detection

This is a very classic set of audio algorithm resources worth reading and learning.

The blog post shared earlier also mentioned audio-related knowledge points.

Some knowledge points of algorithm optimization, due to historical reasons,

The implementation of WebRTC is no longer the best idea at the moment.

But also very classic.

For example:

The implementation of WebRtcSpl_Sqrt fast square root in AGE algorithm.

It can be replaced by the following assembly function:

copy code

static float fast_sqrt(float x) {
    float s;
#if defined(__x86_64__)
    __asm__ __volatile__ ("sqrtss %1, %0" : "=x"(s) : "x"(x));
#elif defined(__i386__)
    s = x;
    __asm__ __volatile__ ("fsqrt" : "+t"(s));
#elif defined(__arm__) && defined(__VFP_FP__)
    __asm__ __volatile__ ("vsqrt.f32 %0, %1" : "=w"(s) : "w"(x));
#else
    s = sqrtf(x);
#endif
    return s;
}

copy code

Many modern cpu assembly instructions already support the fast implementation of the square root,

After testing and comparison, it will indeed be much faster than WebRtcSpl_Sqrt.

For the fast implementation of the square root, see the following for details:

Best Square Root Method - Algorithm - Function (Precision VS Speed) - CodeProject

Students who are doing algorithm optimization, just let go of the square root.

Each algorithm has two basic metrics,

performance, effect.

WebRTC focuses on audio communication, so its performance requirements are extremely high.

The optimization of the performance of the algorithm, in most cases, is a special case.

I also shared it when I held a technology sharing meeting in the company before.

In other words, the closer to the CPU, the faster the performance.

That is, unless it is necessary, please do not write to the hard disk, and then read it.

Because the hard disk is too far away from the CPU.

So the idea of ​​optimization is very obvious.

The media from fast to slow are

CPU register -> CPU cache -> memory space -> hard disk space (disk)

So try to use upper-level resources as much as possible, and use registers if you can.

To be able to rely on CPU resources, it is necessary to make the algorithm data structure and resources more compact.

Related resources about CPU:

CPU-Z | Softwares | CPUID

You can check it out under CPU-Z.

To get rid of the cocoon, we must understand the structure and performance information of the CPU.

Then prescribe the right medicine to meet the taste of the CPU as much as possible.

The idea of ​​algorithm optimization under popular science:

1. Use as many local variables as possible to write the shortest and most effective closure function.

In order to compile and process, registers can be finally used to cache.

2. Call the function as little as possible, and the parameters are preferably passed by pointer or reference, which can reduce copying,

Of course, the parameters should be as few as possible if possible.

3. The processed data is as compact and as small as possible, and the data alignment is largely

It is to meet the preferences of the CPU and use its cache.

4. Read and write sequentially as much as possible to use cache resources

5. Calculation degradation. Generally, multiplication takes more time than addition, and division takes more time than multiplication.

Floating point is more time consuming than integer.

Therefore, multiplication is reduced to addition, division is reduced to multiplication, and floating point is reduced to integer (fixed-point).

If most friends don’t know why, you can move to resources:

GitHub - ARM-software/CMSIS_5: CMSIS Version 5 Development Repository

Read some of these implementations and you'll find out why.

I won't expand here.

6. If you can use memory, don't use disk. I don't think there is any need to explain this.

7. Of course, it is also possible to optimize the data interface with specific algorithm ideas, such as table lookup and the like.

It seems a bit off topic, back to the topic.

Take the time to organize the above mentioned algorithms into 

The way of single-file implementation, and attached sample code.

It is convenient for learning or engineering purposes.

Related project address:

GitHub - cpuimage/WebRTC_AECM: Acoustic Echo Canceller for Mobile Module Port From WebRTC

GitHub - cpuimage/WebRTC_NS: Noise Suppression Module Port From WebRTC

https://github.com/cpuimage/WebRTC_VAD

GitHub - cpuimage/WebRTC_AGC: Automatic Gain Control Module Port From WebRTC

The road is long and long, all the way to the dark.

The sample code can be compiled with cmake, see CMakeLists.txt for details.

Guess you like

Origin blog.csdn.net/qq_39436605/article/details/132077837