WebRTC audio algorithm with complete C code

WebRTC provides a set of audio processing engines,

Contains the following algorithms:

AGC Automatic Gain Control

ANS Noise Suppression (Automatic Noise Suppression)

AEC is Acoustic Echo Canceller for Mobile

VAD is Voice Activity Detection

This is a very classic set of audio algorithm resources worthy of careful reading and learning.

The blog post shared earlier also mentioned audio-related knowledge points.

Some knowledge points of algorithm optimization, due to historical reasons,

The implementation of WebRTC is no longer the best idea at the moment.

But also very classic.

E.g:

WebRtcSpl_Sqrt implementation of fast square root in AGE algorithm.

It can be replaced by the following assembly function:

static float fast_sqrt(float x) {
    float s;
#if defined(__x86_64__)
    __asm__ __volatile__ ("sqrtss %1, %0" : "=x"(s) : "x"(x));
#elif defined(__i386__)
    s = x;
    __asm__ __volatile__ ("fsqrt" : "+t"(s));
#elif defined(__arm__) && defined(__VFP_FP__)
    __asm__ __volatile__ ("vsqrt.f32 %0, %1" : "=w"(s) : "w"(x));
#else
    s = sqrtf(x);
#endif
    return s;
}

Many modern cpu assembly instructions already support the fast implementation of square root,

After testing, the comparison will indeed be much faster than WebRtcSpl_Sqrt.

For the fast implementation of square rooting, you can see the details below:

https://www.codeproject.com/Articles/69941/Best-Square-Root-Method-Algorithm-Function-Precisi

Students who do algorithm optimization, just let go of the square root.

Each algorithm has two basic metrics,

performance, effect.

WebRTC focuses on audio communication, so its performance requirements are extremely high.

The optimization of the performance of the algorithm, the idea of ​​​​in most cases, is a special case.

In the past when the company held a technology sharing meeting, I also shared it.

In other words, the closer to the CPU, the faster the performance.

That is, unless absolutely necessary, please do not write to the hard disk, and then read it.

Because the hard disk is too far from the CPU.

So the idea of ​​optimization is very obvious.

The media from fast to slow are

CPU's registers -> CPU's cache -> memory space -> hard disk space (disk)

Therefore, use the upper-level resources as much as possible, and use registers if you can use registers.

To rely on the resources of the CPU, it is necessary to make the algorithm data structure and resources more compact.

Related resources on CPU:

https://www.cpuid.com/softwares/cpu-z.html

You can check it out with the next CPU-Z.

In order to get rid of the cocoon, you must understand the structural performance information of the CPU.

Then prescribe the right medicine to suit the CPU's taste as much as possible.

 

The idea of ​​algorithm optimization under popular science:

1. Use as many local variables as possible and write the shortest, most efficient closed function.

In order to compile and process, you can finally use the register to cache.

2. Call the function as little as possible, and the parameters are preferably passed by pointer or reference, which can reduce copying,

Of course, the parameters should be as few as possible if possible.

3. The processed data is as compact and small as possible, and the data alignment is largely

It is to meet the preferences of the CPU and use its cache.

4. Read and write sequentially as much as possible, but also to use cache resources

5. Calculation degradation, in general, multiplication is more time-consuming than addition, and division is more time-consuming than multiplication.

Floating point is more time consuming than integer.

So multiplication is reduced to addition, division is reduced to multiplication, and floating point is reduced to integer (fixed point).

If most of my friends don't know why, they can move to the resources:

https://github.com/ARM-software/CMSIS_5

Read about some of these implementations and you'll find out why.

It will not expand here.

6. If you can use memory, don't use disk. I don't think it is necessary to explain more.

7. Of course, it is also possible to optimize the data interface with specific algorithm ideas, such as table lookup.

 

Seems a bit off topic, back to the topic.

Take the time to organize the algorithms mentioned above into 

A single-file implementation with sample code attached.

It is convenient for learning or engineering purposes.

 

Related project address:

https://github.com/cpuimage/WebRTC_AECM

https://github.com/cpuimage/WebRTC_NS

https://github.com/cpuimage/WebRTC_VAD

https://github.com/cpuimage/WebRTC_AGC

The road is long and long, and one road goes to the dark.

The sample code can be compiled with cmake, see CMakeLists.txt for details.

 

If you have any other questions or needs, please contact me by email.

The email address is: 
[email protected]

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325082013&siteId=291194637