Practical combat on fluency issues

Overall flow chart

1. Confirm the configuration information of the test machine and comparison machine

You need to know in advance the parameter difference between the test machine and the comparison machine, such as CPU frequency/GPU frequency/memory size/resolution and other hardware configurations and benchmark scores. Many times it is because of a certain hardware difference that the comparison machine is worse than the comparison machine. Need to understand clearly

  • Whether the CPU architecture is the same and whether the CPU frequency difference is less than 5%. If the difference is obviously large, it is not suitable for comparison.
    • In performance mode, see if the antutu-CPU and geekbench scores are close? If the running score exceeds 10%, it is not suitable for comparison.
  • GPU performance can look at the GPU architecture, GPU shader cores, and GPU frequency. GPU For apks with greater GPU requirements, you need to see GPU information.
    • In performance mode, see if GFXbench Manhattan3.0 fps is close? If the difference exceeds 10%, it is not suitable for comparison.
  • MEM type and frequency will also have an impact on UX performance and power consumption. Try to ensure that the MEM configuration is consistent.
  • The resolution and refresh rate of the display have a direct impact on Ux performance and power consumption. Try to ensure that the display configuration is consistent.
  • Touch IC reporting rate will also affect power and Touch Latency. Try to choose a comparison machine whose Touch IC reporting rate is not much different from the test machine for comparison.
  • Check method: adb shell "getevent -ltr", slide TP to see how close the rate is.

2. Confirm test preparation conditions (very important)

2.1 Application related

1. Are the app versions the same?

2. The starting point and content of the two devices should be the same.

3. Try to use the same account for DUT and REF to eliminate problems caused by inconsistent display content.

4. Is the compilation mode consistent?

5. For sliding problems, you need to manually scroll for at least 1 minute to load the content to be tested after connecting to the Internet [scroll up and down at least 5 times]

2.2 Equipment related

1. Is the refresh rate consistent?

2. Is the resolution consistent?

3. Whether the equipment temperature is basically the same

4. Are there differences in software versions?

5、Driver Only

2.3 Others

1. Turn off Bluetooth

2. Turn off automatic brightness

3. Restart and wait for 5 minutes

4. The actual screen timeout is 30 minutes

5. Turn on power to stay awake in developer options

6. Close other application backgrounds and notifications

7. Turn off animations, including: window animation scaling, transition animation scaling, animation duration scaling

3. Analysis process

The basic time consumption can be analyzed directly by looking at the mind map below, and then going to the CPU state below for analysis:

4. CPU state analysis

4.1 Running state analysis:

The long running time is mainly related to the CPU frequency and Task schedule strategy.

4.1.1 cpu frequency difference

The test machine runs at low frequency, and the comparison machine runs at high frequency?

①Is the platform CPU architecture itself different --> Evaluate the difference

②Is Boost CPU effective? --> power_app_list.xml appropriately adjusts DDR and DDR parameter configurations

③ Confirm that the frequency is full and confirm whether there is frequency limitation --> thermal, fpsgo, powerhal

4.1.2 Difference in proportion of large and small cores

The comparison machine has a higher proportion of large cores, while the test machine has a smaller proportion of large cores?

①CPU Loading situation? Compare the CPU Loading TOP 5 processes (such as JIT/heaptaskdaemon/kswapd, etc.)? --> TOP thread optimization

② Is the occupation of large core processes reasonable? --> Let UI Thread give priority to running on large cores and optimize the thread occupied by large cores.

③Other time-consuming processes interfere? --> Optimize the interference of irrelevant threads

4.1.3 Check the DDR and GPU frequency differences when there is no obvious difference between the CPU frequency and the pendulum core.

Are Boost DDR and GPU effective? Valid --> power_app_list.xml appropriately adjusts DDR and GPU parameter configurations

4.1.4 Android version differences or customized software differences

Is it caused by Android version differences or customization differences? --> Use traceview, simpleperf and other tools to help clarify software differences

4.2 Runnable state analysis:

The long Runnable time is mainly related to the CPU Loading and Task schedule strategies.

4.2.1 How to check whether there is core binding behavior

①Through the taskset command

If you know the pid of the thread, you can use the following command to check the CPUs that this thread is allowed to run on:

taskset -p $pid //get the affinity of given PID

taskset -ap $pid //get the affinity of all threads of the PID

For example: # taskset -p 23378
pid 23378's current affinity mask: f
corresponds to one bit of each cpu, and cpu 0 corresponds to the lowest bit. So mask f means that it is allowed to run on cpu0-cpu3.

# taskset -ap 23378

pid 23378's current affinity mask: f

pid 23386's current affinity mask: f

pid 23387's current affinity mask: f

pid 23388's current affinity mask: f

pid 23389's current affinity mask: f

pid 23390's current affinity mask: f

② Check Cpus_allowed through cat status

For example: # cat proc/1/status | grep Cpus_allowed

Cpus_allowed: ff
Cpus_allowed_list: 0-7

③systrace view

If you capture systrace with sched_select_task_rq, you can see the following information in systrace:

<idle>-0 (-----) [006] d.s3 455.535618: sched_select_task_rq: pid=1272 policy=0x00080001 pre-cpu=1 target=1 util=3 boost=12 mask=0xff prefer=0 cpu_prefer = 0 flags=0
where mask indicates the CPU that the task with pid=1272 is allowed to run.

4.2.2 How to run a task on a specified core through a command

taskset -p cpumask $pid //set the affinity of given PID

taskset -ap $pid //set the affinity of all threads of the PID

For example, if you need to fix the renderthread to the large core, you can use the command:
Experiment to bind the small core: # taskset -p 0f 1

pid 1's current affinity mask: ff
pid 1's new affinity mask: f

# cat proc/1/status | grep Cpus_allowed
Cpus_allowed: 0f
Cpus_allowed_list: 0-3

4.2.3 prefer idle

When there is an idle CPU, the idle CPU is given priority for scheduling.

a. Query the stune group to which the task belongs
#cat /proc/[task PID]/cgroup
b. Set the idle prefer attribute for the corresponding group
#echo 1 > /dev/stune/[cgroup]/schedtune.prefer_idle
c. View idle prefer Whether the setting is successful
#cat /dev/stune/[cgroup]/schedtune.prefer_idle

4.2.4 There is no idle CPU. Maybe the current CPU loading is relatively heavy. Is there any room for optimization?

Does Boost CPU work? --> Properly boost CPU

Are there other abnormal background processes on the test machine (such as JIT/heaptaskdaemon/kswapd, etc.)? --> Clear the impact of background processes

Kswapd0’s CPU usage is in TOP 3? Check Kswapd0 ranking --> Low Memory processing

Is the CPU usage of Mmcqd/exe_cq in the TOP 3? Check mmcqd/exe_cq ranking -->, IO Issue processing

There is no Low Memory or IO Issue --> TOP thread optimization

Task priorities and scheduling strategies come from inheritance or special settings and can be used as a basis for judgment. It is generally not recommended to make too many changes.

4.2.5 View thread priority and scheduling information

Ordinary process - renice command

The nice value range of an ordinary process is -20~19, and the corresponding priority value is 100~139. The larger the value, the lower the priority. The default nice value is 0. You can use the ps -el command to view the NICE value.

renice -n N –p $pid //Add N to the thread and nice value corresponding to the pid. If N is a positive number, the priority will become lower; if N is a negative number, the priority will become higher.

If you need to modify the code, it is recommended to use the setpriority() function to adjust the priority of ordinary processes.

Real-time process - chrt command

The priority range of real-time processes is 1~99. The smaller the value, the higher the priority (Kernel space). You can view it with the following command

chrt -p $pid //View the scheduling priority and scheduling policy of the corresponding pid thread
chrt -p -f 10 $pid //Set the scheduling policy of the corresponding pid thread to SCHED_FIFO and set the priority to 10

If you need to modify the code, it is recommended to use sched_setscheduler to adjust the priority of the real-time process.

4.3 Sleeping state analysis

4.3.1 Is the wake-up source waiting for a lock?

Through trace analysis, the UI Thread sleeps for a long time, mainly waiting for the binder call to return, then looking at the specific operations of the binder, and analyzing it based on the actual situation.

4.3.2 Is the wake-up source reasonable?

Check whether the previous wake-up source is reasonable, and then analyze the wake-up source based on the Cpu state.

4.4 Uniterruptible Sleep state analysis

4.4.1 Does BlockIO exist?

Check IO data volume

The amount of IO data is normal --> IO Performance optimization

The amount of IO data is too large --> refer to Low memory status to troubleshoot

No Low Memory --> Clarify the reason for large IO usage, such as other apks performing IO operations

4.4.2 Low Memory?

Adjust memory-related parameters

5. Assessment process

1. Clarify the specific time point of the lag and be able to correspond to the lag mentioned in the test.

2. The premise test machine is no worse than the comparison machine

3. A clear explanation of the cause of the lag and the reason why it cannot be improved, and whether the comparison machine has the same lag. For example, the test machine has lag reasons 1, 2, and 3, but the comparison machine only has 1 and 2. It is necessary to explain in detail why 3 is only available on the test machine but not on the comparison machine, and 1 and 2 must also be reflected, and all data must be sufficient.

4. It is necessary to reflect the iq of sliding lag, the BufferTX of related applications, the overall and detailed pictures of the trace, the size and frequency of the CPU, and the frame loss points from the trace.

5. Explain the reasons why optimization cannot be done

6. Case analysis

6.1 There are lags and frame drops on Shopee’s sliding product details homepage.

【Analysis conclusion】

1. The test machine has the same moderate jitter as the comparison machine and performs equally well.

2. RenderThread did not run during the jitter period, resulting in surfaceflinger having no buffer to synthesize. This is a problem of the apk itself.

[Trace overall picture]

Test machine:

Comparator:

【Analysis process】

1. When testing the APK version of the synchronized test machine and connecting to the same network, the comparison machine also has moderate jitter and the performance is equivalent.

2. Both the test machine and the comparison machine have jitters at the end of the slide. RenderThread did not run during the jitter period, resulting in no buffer for surfaceflinger to synthesize. This is a problem of the apk itself. The test machine behaves the same as the comparison machine. Apply for evaluation. Puzzled.

Test machine:

Comparator:

Guess you like

Origin blog.csdn.net/weixin_47465999/article/details/131864575