Android UI Caton monitoring

I. Background

Use fluency applications, is an important measure of the user experience. Because Android model configurations and different systems, project complexity App scenes rich, people involved in the code iteration longer history, there may be a lot of code UI thread time-consuming operation, the actual test time will occasionally find that some business scenarios occur Card Dayton phenomenon, user feedback and complaints are often encountered in using App Caton. Therefore, we are increasingly concerned about the problem and improve the fluency user experience.

Second, the program

Based on this pain point, we want to use an effective detection mechanism, able to cover Caton scenarios that may arise, once Caton occurred, can help us to more easily locate local time consuming Caton occurred, the specific record and stack information, to the extent the code directly from the developers to locate Caton problem. Android Caton monitoring system we envision the need to reach several basic functions:

  • How to effectively monitor the App occur Caton, while in the event of the card immediately correct app state records, such as stack information, CPU usage, memory usage, IO usage, and so on;

  • Caton statistical information to be reported to the monitoring platform, need to deal with classified report content analysis, simple and intuitive platform to show through the Web, for the development of follow-up treatment.

Third, how to monitor Caton from the App level?

Our idea is generally the main UI thread too much draw a large number of IO operations, or a large number of computing operations occupied CPU, resulting in App interface Caton. As long as we can in the event of Caton, to capture resources and information systems stack of the main thread usage information, you can accurately analyze how what happened Caton function, resource consumption. The question then is how to effectively detect Android main thread of Caton occurred, two popular and effective app industry monitoring as follows:

  • Using the UI thread log Looper print matching;

  • 使用Choreographer.FrameCallback。

3.1, matching the log using the UI thread Looper print to determine whether Caton

Android updates main thread UI. If the interface is less than 1 second refresh 60 times, that is, less than 60 FPS, you will have the feeling of Caton. In simple terms, Android messaging mechanism using the UI updates, UI thread there Looper, will continue to remove the message in its loop method, bindings Handler calls its implementation of the UI thread. If there is time-consuming operation dispatchMesaage handler's method, occurs Caton.

3.1.1, Looper.loop () source code

public static void loop() {
    final Looper me = myLooper();
    if (me == null) { 
        throw new RuntimeException("No Looper; Looper.prepare() wasn't called on this thread.");
    }
    final MessageQueue queue = me.mQueue;
    // Make sure the identity of this thread is that of the local process,
    // and keep track of what that identity token actually is.
    Binder.clearCallingIdentity();
    final long ident = Binder.clearCallingIdentity();    
    for (;;) {
            Message msg = queue.next(); 
            // might block
            if (msg == null) {
            // No message indicates that the message queue is quitting.
                return;
            }
            // This must be in a local variable, in case a UI event sets the logger
            Printer logging = me.mLogging;        
            if (logging != null) {
                logging.println(">>>>> Dispatching to " + msg.target + " " +
                msg.callback + ": " + msg.what);
            }

            msg.target.dispatchMessage(msg);        
            if (logging != null) {
                logging.println("<<<<< Finished to " + msg.target + " " + msg.callback);
            }        
            // Make sure that during the course of dispatching the
            // identity of the thread wasn't corrupted.
            final long newIdent = Binder.clearCallingIdentity();        
            if (ident != newIdent) {
                Log.wtf(TAG, "Thread identity changed from 0x"
                    + Long.toHexString(ident) + " to 0x"
                    + Long.toHexString(newIdent) + " while dispatching to "
                    + msg.target.getClass().getName() + " "
                    + msg.callback + " what=" + msg.what);
            }

            msg.recycleUnchecked();
     }
}

As long as the detection line 25 msg.target.dispatchMessage (msg) execution time, be able to detect whether the UI thread portion has a time-consuming operation. Notes that this line of code before and after the execution, there are two logging.println function, if you set up logging, will print out ">>>>> Dispatching to" and "<<<<< Finished to" such a log, respectively, so we can pass the time difference between the two log to calculate the execution time dispatchMessage, thereby setting a threshold to determine whether there has been Caton.
Here Insert Picture Description

3.1.2 How to set logging it?

We look at line 19, the source code is me.mLogging

public final class Looper {  
    private Printer mLogging;  
    public void setMessageLogging(@Nullable Printer printer) {  
        mLogging = printer;  
    }  
}  

public interface Printer {  
    void println(String x);  
}

Looper's mLogging is private, and provides setMessageLogging (@Nullable Printer printer) method, so we can achieve own a Printer, through setMessageLogging () method can be passed, as follows:

public class BlockDetectByPrinter {    
	public static void start() {
        Looper.getMainLooper().setMessageLogging(new Printer() { 
           private static final String START = ">>>>> Dispatching"; 
           private static final String END = "<<<<< Finished";
            @Override
            public void println(String x) {
              if (x.startsWith(START)) {
                    LogMonitor.getInstance().startMonitor();
                }  
              if (x.startsWith(END)) {
                    LogMonitor.getInstance().removeMonitor();
                }
            }
        });
    }
}

After setting the logging, loop method callback logging.println will print out every time the log messages performed: ">>>>> Dispatching to" and "<<<<< Finished to". BlockDetectByPrinter use in onCreate method is invoked BlockDetectByPrinter.start Application of () can be.

We can implement a simple stack LogMonitor to record information Caton when the main thread. When a match to >>>>> Dispatching, execution startMonitor, will perform the task (Caton set threshold) after 1000ms, this task is responsible for printing the UI thread stack information in the child thread (non-UI thread). If the message below the strike completed within 1000ms, you can match to <<<<< Finished log, execute removeMonitor canceled the task before the mission starts printing stack is that there is no Caton happen; if the message was finished more than 1000ms At this time the thought occurred Caton, print and stack information UI thread.

3.1.3, LogMonitor how to achieve?

public class LogMonitor {    
	private static LogMonitor sInstance = new LogMonitor();    
	private HandlerThread mLogThread = new HandlerThread("log");    
	private Handler mIoHandler;    
	private static final long TIME_BLOCK = 1000L;    
	private LogMonitor() {
        mLogThread.start();
        mIoHandler = new Handler(mLogThread.getLooper());
    } 
   
	private static Runnable mLogRunnable = new Runnable() {        
        @Override
        public void run() {
            StringBuilder sb = new StringBuilder();
            StackTraceElement[] stackTrace = Looper.getMainLooper().getThread().getStackTrace();            
            for (StackTraceElement s : stackTrace) {
                sb.append(s.toString() + "\n");
            }
            Log.e("TAG", sb.toString());
        }
    };
    
    public static LogMonitor getInstance() {        
        return sInstance;
    }    
    public boolean isMonitor() {        
        return mIoHandler.hasCallbacks(mLogRunnable);
    }    
    public void startMonitor() {
        mIoHandler.postDelayed(mLogRunnable, TIME_BLOCK);
    }    
    public void removeMonitor() {
        mIoHandler.removeCallbacks(mLogRunnable);
    }
}

When we use here to construct a HandlerThread Handler, HandlerThread inherited from Thread, actually a Thread, just one more than the average Looper Thread, provide external yourself this getLooper method Looper object and then create Handler will HandlerThread the looper Object passed. Such is the object of our mIoHandler HandlerThread the non-UI thread binding, and it is time-consuming processing operations will not block UI. If the UI thread is blocked more than 1000ms, will be executed in the sub-thread mLogRunnable, print out the current UI thread stack information, if the message is not processed more than 1000ms, it will remove out of this mLogRunnable real-time task.

Occur suddenly print out the card stack information roughly as follows, can be time-consuming to develop through the log positioning place.


优点: The user to use the app or the testing process can be monitored from the app level Caton case, once Caton appear capable of recording app and status information, as long as dispatchMesaage too time-consuming metropolis recorded, there is no longer faced with the first two ways adb problems and shortcomings.

缺点: Subject to open sub-thread stack get information, consume a small amount of system resources.

In actual implementation, different Android phones of different systems and even different ROM version, Loop function may not be able to print out ">>>>> Dispatching to" and "<<<<< Finished to" such a log, causing the way impossible.

Optimization strategy: We know the start and end Loop function will be executed println print the log, so the optimized version of the judgment Caton changed, Loop when the output of the first sentence log as startMonitor, a log output at a time when as the end to solve this problem.

3.2, use Choreographer.FrameCallback monitoring Caton

Choreographer.FrameCallback official document links ( https://developer.android.com/reference/android/view/Choreographer.FrameCallback.html )

We know, Android system every 16ms releasing the VSYNC signal to notify redraw interface, rendering, every synchronized cycle of 16.6ms, on behalf of the refresh rate of one frame. SDK contains a related classes, as well as related callbacks. In theory, the two callback period of time should be 16ms, 16ms if more than we are to believe that an Caton, use the time period between two callback to determine if there Caton (This program is more than just support Android 4.1 API 16 ).

The main principle of this program is to set its FrameCallback function by Choreographer class, when each frame is rendered trigger callback FrameCallback, FrameCallback callback void doFrame (long frameTimeNanos) function. Interface will render a callback doFrame method, if the interval between two doFrame greater than 16.6ms indicating the occurrence of Caton.
Here Insert Picture Description

public class BlockDetectByChoreographer {
    public static void start() {
        Choreographer.getInstance().postFrameCallback(new Choreographer.FrameCallback() { 
                   long lastFrameTimeNanos = 0; 
                   long currentFrameTimeNanos = 0;

                @Override
                public void doFrame(long frameTimeNanos) { 
                    if(lastFrameTimeNanos == 0){
                        lastFrameTimeNanos == frameTimeNanos;
                    }
                    currentFrameTimeNanos = frameTimeNanos;
                    long diffMs = TimeUnit.MILLISECONDS.convert(currentFrameTimeNanos-lastFrameTimeNanos, TimeUnit.NANOSECONDS);
                    if (diffMs > 16.6f) {            
                       long droppedCount = (int)diffMs / 16.6;
                    }
                        if (LogMonitor.getInstance().isMonitor()) {
                        LogMonitor.getInstance().removeMonitor();                    
                    } 
                    LogMonitor.getInstance().startMonitor();
                    Choreographer.getInstance().postFrameCallback(this);
                }
        });
    }
}

When each frame is rendered, the rendering of a note off time is used to calculate the number of frames, data can be smoothly drawn curve; at the same time, obtaining isMonitor () function to determine the one LogMonitor has started printing stack task, if start, remove LogMonitor, at this time if rendered on a time and now has exceeded the threshold, the task has been executed to print out the stack; if not exceed the threshold then promptly removed task. If isMonitor returns false, no LogMonitor callback task, the beginning of a new monitoring task one frame.

优点: Not only from the app can be used to monitor the level of Caton, at the same time can be calculated in real-time frame rates and swap frames, frame rate real-time monitoring data App page, if it is found the frame rate is too low, can be saved automatically scene stack information.

缺点: Subject to open sub-thread stack get information, consume a small amount of system resources.

3.3, summarizes the comparison of the two schemes:

Looper.loop Choreographer.FrameCallback
Monitor whether Caton
Support for static pages Caton detection
Support frame rate calculation X
Support for App operational information

The actual use of the project, we started two kinds of monitoring methods have to spend, reported two ways to collect the information we dealt with separately Caton, Caton discovered the monitoring effect is roughly equal. Caton occurs when the same two monitoring methods can be recorded. Because Choreographer.FrameCallback monitoring approach not only to monitor Caton, also easy to calculate the real-time frame rates, so we now use Choreographer.FrameCallback only to monitor app Caton situation.

Fourth, how to ensure the accuracy of capture Caton stack?

Observant students may find that we can judge by these two programs (Looper.loop and Choreographer.FrameCallback) is currently the main thread whether there has been Caton, and then in the calculation of time after the discovery Caton dump down the main thread stack information . In fact, through a sub-thread to monitor the activities of the main thread, the main thread dump calculated that the stack exceeds the threshold, the generated stack file just to capture a snapshot of a moment in the scene. To make an inappropriate analogy, the equivalent of CCTV captured only after the horrors of the murder, but did not record the course of this case occurred, then you only see the police as the outcome is still difficult to judge the merits and murderer. In actual use, we found the stack case acquired this way, view the code and functions, often Caton code is not already occurred.
Here Insert Picture Description
As shown, the main thread T1 ~ T2 time period occurs Caton, acquired in the above embodiment the stack is already Caton time T2. The actual Caton might be this time consuming too large a function of Caton, while not necessarily a problem T2 time, Caton such information can not be accurately captured reaction Caton site.

We take a look at the stack capture micro-channel iOS main thread Caton surveillance system is how to achieve this before. IOS micro-channel is detected, the program is checked once every 1 second thread, if the main thread is detected Caton, all threads will function call stack dump memory. In essence, the starting time in the micro-channel iOS scheme is fixed, the number of checks is fixed. If the task 1 execution took longer to cause Caton, but because the thread is monitored once every 1 second sweep, it may be found to the task N and dump down the stack, and can not catch mission-critical stack 1. Such a situation does exist, but is now on leave large sea surveillance tactics, Caton caught by the probability distribution point, but still not the best acquisition scheme.

Therefore, we have before us is how to get more accurate Caton stack. In order to stack the accuracy Caton, we want to be able to get the stack over time, rather than a point of the stack, as shown below:
Here Insert Picture Description
We use frequency acquisition program to acquire a plurality of stacks within a period of time Caton, without then there is only one point of the stack. The advantage of such programs is to ensure the completeness of monitoring, the entire stack Caton process have been sampling, collection and landing.

It consists in the process of monitoring the child thread in every round log or output for each frame started monitor, we have opened a high-frequency sampling work to collect the main thread's stack. The next moment a log or end of the frame monitor, we determine whether Caton occurred (time-consuming calculations exceeds a threshold value), to determine whether the memory of this set of floor to stack file storage. In other words, every Caton happens, we recorded a number of high-frequency sampling stack Caton whole process. Thereby accurately record details throughout the murders analysis process (will be described later how Caton from a plurality of stack information extracted key stack) for the reporting.

Fifth, after the massive stack Caton how to deal with?

After Caton stack reported to the platform needs to be reported for document analysis, extraction and clustering process, the final show to Caton platform. We mentioned earlier, every time Caton occurs, the high-frequency sampling to multiple stacks with this description Caton. Be a minimum estimate, collected daily reported 2000 Caton user files, each file dump Caton encountered by users under 10 Caton, Caton high frequency each collected 30 stack, which has produced 20,001,030 = 60W a stack. According to the development of this magnitude, a month can produce tens of millions of stack information, each stack or row of dozens function call relationship. Such a large number of information storage, analysis, etc. page shows bring considerable pressure. Soon explode storage tiers, platforms can not show such large amounts of data, development is no way to deal with these multi-stack problem. Thus, the mass Caton Stack become another problem we face.

In a process Caton, Caton generally occur on the call of a function, in which more than one stack list, we ranked each stack are doing were re-analyzed after a hash process, there is a great chance to be a stack dump to the same hash, as shown below:
Here Insert Picture Description

Published 100 original articles · won praise 45 · views 640 000 +

Guess you like

Origin blog.csdn.net/wangzhongshun/article/details/100735952