Android audio subsystem (fourteen) ------ headphone noise problem analysis

Hello! Here is Kite's blog,

Welcome to communicate with me.


Background introduction:
[Precondition] OPPO’s analog wired headset
[Operation steps] When you open the national K song and play any music K song [
Actual result] There is a buzzing noise in the headset
[Expected result] There is no buzzing sound in the headset Noise
[Number of occurrences/Number of recurrences] Occasional user feedback

[Preconditions]: None
[Operation Steps]: Enter QQ/WeChat – insert a digital headset – enter a single-person chat interface – make a voice call – listen to the ringback tone [Actual results (Actual results)
] :QQ/WeChat digital earphone ringback tone has a serious pop sound
[Desired result (Desired result)]: No such phenomenon
[Occurrences/Recurrences]: 5/5

Recently, there have been a lot of digital earphone/analog earphone noise problems, the project is about to be launched, the problem is big, and the pressure is huge...

The noise problem occurs in the downlink of the simulated earphone. For general noise problems, the dump needs to be analyzed first to confirm in which environment the noise is introduced.
Open af_mixer_write.wav to check, no noise was found.
Open streamout.pcm.xx.AudioALSAPlaybackHandlerNormal.wav to check, the simulated headset is normal, just check the dump, and no noise was found.
Because the analog headset is connected from the PMIC, the codec inside the MTK is used, and the streamout is the data of the HAL. If there is no problem with the HAL, then the problem probably occurs in the kernel, that is, the codec.
Based on past experience, suspicion should be related to performance. But it is probabilistic at present, and it is user feedback. There is no good debugging method this time.

The problem can only be focused on the noise problem of digital headphones.
Because it is a problem that must occur, the local test verification was conducted and it was found that it can be reproduced, and it must occur with a specific headset. It seems to be a digital headset problem, but this digital headset will not appear in other mobile phones.

But the problem is not big, as long as I can reproduce it, there are many ways to debug.

This digital headset problem is very similar to a problem encountered before. This situation is basically caused by the timeout of writing data to the USB digital headset. Check the log:

AudioALSAPlaybackHandlerUsb: latency_in_s,0.000142,0.000000,0.010944,0.000001,0.000000, totalTime 0.011088 > logTimeout 0.007000 TIMEOUT!! queue 512

You can see how much you hate this kind of timeout printing.

Generally, this situation can be solved by increasing the buffer size. In prepareUsb, change the
period size 256, period_count 2
to
period size 768, period_count 4.
After the modification, the self-test can solve the ringback tone of the QQ/WeChat digital headset voice call. Serious pop sound problem, but I am not willing to modify the size of the buffer, because such modification will have an impact: the delay of digital headphones will increase !
In mobile phones, headphone delay is a very important experience, and I am not willing to modify this casually.

But recently, the earphone problem is really too big, and the project pressure is high, so we can only send the version to the test for testing. As a result, the QQ/WeChat problem can be solved, but other scenarios: Douyin digital earphone call ringback You can initiate a call later), today’s Toutiao video playback, and the voice downlink of the King’s Glory team, there are still noise problems.

First of all, these problems will not be introduced by my modification of the buffer size. The modification has side effects (delay effects), but it will never cause these problems, indicating that noise problems are common, and modifications such as increasing the period size cannot completely solve the problem.

Well, let's analyze it together.

After looking at these logs, I found that these scenes are all fast channels. I don’t understand why a broken Douyin still needs to use fast, but for low-latency channels like fast, the period size really needs to be set. It is much smaller than the scene of deepbuffer, and it is easy to have xrun.

So in essence, this question is still an xrun problem caused by writing data overtime. I would like to see which thread caused our audio to write data not in time.
Because the problem is easy to reproduce, I captured the systrace directly for the scene:
systrace
You can see that the write thread and the mix thread are running normally, no blocking is found, and the CPU load is not abnormal.

This is very strange. I can't see the cause of the problem from systrace, but I have a high probability that this problem is a performance problem.

In addition, the analog earphone also has the problem of voice call noise in the glory of the king. Check the log, because the call will run the call algorithm of Goodix, and the downlink data will be taken from the kernel as a reference signal (AudioALSACaptureHandlerDsp) to analyze the call of Goodix dump:
nxp
It can be seen that there is a problem with the data obtained by the algorithm from the kernel, and there is zero padding here.

No way, we still have to start from the audio aspect. In order to optimize these noise/pop problems, modify the stop_threshold. Under normal circumstances, stop_threshold is the size of the entire buffer size, so that the DMA transmission will stop when the audio data is empty. I try to modify the stop_threshold to a larger value. , so that the DMA will not stop the transfer, but will transfer the old data to get better audio effect.

After the modification, the test found that the problem has indeed been improved, but there is still a slight pop that cannot be avoided.

At this time, the problem has fallen into a bottleneck, and the analysis cannot continue. . .

At this time, there is a test machine, but there is no such problem. I checked, it is not a machine problem, but a version problem. The version in the normal machine is a version of one and a half months ago, which means that the previous version does not have this problem. This problem came later!

However, the interval between versions is too long, and it is impossible to determine which day it was introduced. I pulled out the submissions of the two versions for comparison and analysis, trying to find out the problem submissions, but there are hundreds of warehouses on Android, and more than 2,000 submissions. It's like finding a needle in a haystack~

So we only need to pay attention to a few audio-related warehouses: the sound directory in the kernel and the submission of drivers/usb, the submission of Audio Hal, the submission of AudioFlinger, the submission of audio sound effects, the submission of audio parameters, and check all the submissions in these After reading it again, my eyes are blurred...

but! ! !
Still did not find the existence of the problem, and almost broke the defense.

No, this thing still needs to be dealt with. Since it was not introduced by the submission of audio, it seems to be a performance problem. I wonder if the CPU frequency has been modified, and the frequency has been reduced when entering some scenes. Check the log. I said in the article before: Android Audio Subsystem (12)------Analysis of Douyin Live Power Consumption Problems

The frequency of the audio will be modified in these few scenes. Check the log, the frequency has not been modified, and it runs at the preset frequency, indicating that the CPU has not been frequency-reduced.

After thinking about it, there is only one last direction, the direction of kernel performance!

This is the most difficult part. As an audio player, we can’t handle things related to kernel performance. The main thing is that we don’t understand. I can do it myself.

I took a look at all the modifications made to the CPU warehouse in the past month and a half. Thanks to the brothers for their good programming habits. Although the content is incomprehensible, the commit message is still written clearly. The main purpose is to check whether there are any performance-related modifications and the main direction It's about scheduling optimization and stability. I really found a few stable submissions for me. It's suspicious, very suspicious!

In the end, I screened out these patches, reverted them one by one, and started to build and test them, and finally found the reason.

A patch was added to the kernel group before, which is related to CPU scheduling. This patch will cause the hot thread collection to be interrupted and take a long time to process. After rolling back this patch, the digital headset finally has no problem, and the analog headset is tested synchronously, and there is no abnormality up! ! !

It's numb, and I'm being blamed again. This problem makes me want to die. I get it very late every day, so tired. . .
Take the blame

The problem is exactly what I thought at the beginning, it is a performance problem, but if you think about it carefully, if you don’t know that the problem is introduced by other patches, and then further investigate the problem patch, it is indeed a headache to solve it head-on of. Because the reason for this question is the time-consuming problem of the underlying interrupt, systrace is mainly the thread situation on the AP side, and it is not easy to analyze the situation of the kernel, so if you want to solve it head-on, you may need to open a trace in the kernel to analyze: Use trace to view function call relationship | analyze Linux performance

Before this, I originally thought that if it can’t be solved, I plan to increase the frequency. Increasing the frequency can obtain better performance, but it will introduce the problem of higher power consumption. That’s another story, but fortunately this time There is a patch to solve it, and I can finally have a good rest~

Guess you like

Origin blog.csdn.net/Guet_Kite/article/details/130456684
Recommended