The collapse of high-quality development of Android Optimization

Do the interview topic for a long time, do not know you need to quit the interview and have little idea of ​​partnership has not helped gather today an article about the collapse aspects of optimization, the collection interviewing, follow-up will continue to update if I can find it useful point followers

Foreword

APP developers encounter crashes (flash back) What to do? Many people would say that according to Log, find the code flash back, catch the exception, "digest" lost all Java collapse. As to whether the program will have other abnormalities, that is what God wants tube. Yes, this method for emergency situations regarded as a solution, but flash back what the truth is? Whether solve the root causes of it?

First, the collapse of

Crash rate is a basic indicator of the level of quality of the application, then, how to objectively measure the collapse of this index, and how should we look at stability and crash-related.

Android two crashes:

  • Java crash
  • Native crash

In simple terms, the collapse of Java is Java code, there has been an uncaught exception causes the program to quit unexpectedly. Native crash because it is generally illegal address access in Native code, it may be aligned address a problem, or program initiative took place Abort, which will have a corresponding Signal signal causes the program to quit unexpectedly.

1.1 collect crash

"Collapse" that appear abnormal procedures, and the collapse of a product, with how we capture, process these exceptions have relatively large relationship. For many small and medium companies, you can choose some third-party services. Currently a variety of platforms is also flourishing, including Ali Friends of the Union, Tencent Bugly, Netease cloud catch, Google's Firebase and so on. To understand leveraging!

1.2 ANR

Crash rate is not able to fully equivalent to the stability of the application of it? The answer is certainly not. Processing the crash, we will often encounter ANR (Application Not Responding, does not respond) to this problem.

ANR occur when the system dialog box will pop up to interrupt the user's operation, which is very user intolerable.

ANR approach:
using FileObserver monitor the changes /data/anr/traces.txt. Unfortunately, many high version of the ROM, has not read the file permissions. This time you might only be thinking about other path, you can use Google Play services overseas, and domestic micro-channel use Hardcoder framework (HC framework is a set of system-independent communication framework Andrews implemented, it allows real-time App ROM vendors and "talk" to the , the goal is to fully resource scheduling system to improve speed and quality App, and effectively improve everyone's mobile phone experience) gained greater authority to the vendor. You can also phone ROOT off, and then get traces.txt file.

1.3 Application Exit

In addition to frequent crashes, there are some cases cause the application quit unexpectedly, for example:

  • Initiative suicide. Process.killProcess (), exit (), etc.
  • collapse. Or the emergence of Java Native crash
  • The system is restarted. Abnormal system, power failure, the user automatically restart and so on, we can compare the application startup time is running than the previous record value of less
  • Killed by the system. Killed low memory killer, crossed from the task manager system, etc.
  • ANR

We can set a flag when the application starts, after the collapse of the initiative suicide or update flag, it will be able to confirm whether through a leave occurs during operation Check this so the next start. Corresponding to the above five exit scenario, we have ruled out suicide and take the initiative to crash (crash statistics alone) both scenarios, hoping to monitor abnormal exit to the three remaining, in theory, this exception trapping mechanism is up to 100% coverage.

By detecting this abnormal exit, can reflect as ANR, low memory killer, system strong to kill, crashes, power outages and other problems can not be captured to normal. Of course there will be some abnormal rate of false positives, such as users crossed application from the task manager system. For large data line, it can still help us discover some hidden problems in the code.

According to Taiwan before and after the application of the state, we can quit unexpectedly quit unexpectedly divided into front and back exits abnormally. "Killed by the system" is the main cause abnormal exit backstage, of course, we will be more concerned about the abnormal exit the front desk, which will have a greater association with abnormal situation ANR, OOM and so on.

Second, the collapse of the deal

Our daily work will encounter a variety of difficult problems, "collapse" is one of the more common kind of problem. The more experience with solving problems need to solve the case, we analyze the more skilled, the quicker positioning of the quasi. Of course, there are also a lot of routines, such as for the "crime scene" we should pay attention to what information? How to find more "witnesses" and "clues"? What is "investigating the case," the general process? Different types of "cases" were what the survey method should be used?

To believe that "the truth is always only a" crash is not terrible.

2.1 crash site

Crash site is our "first scene", it retains many valuable clues. Now you can tap into the more information, the more clear direction for further analysis, rather than to rely on guesswork.

Crash information

Basic information about the crash from, we can have a preliminary judgment of the crash. Process name, thread name. The collapse of the process is the foreground process or background process, a crash is not the place in the UI thread.

The collapse of the stack and type. Collapse is part of the Java collapse, Native crash or ANR, it is not the same for different types of crashes point of concern. See particularly desirable stack collapse the stack, see the specific code of the system crashes, or APP code inside.

Keywords: FATAL
 FATAL EXCEPTION: main
 Process: com.cchip.csmart, PID: 27456
 java.lang.NullPointerException: Attempt to invoke virtual method 'void android.widget.TextView.setText(int)' on a null object reference
    at com.cchip.alicsmart.activity.SplashActivity$1.handleMessage(SplashActivity.java:67)
    at android.os.Handler.dispatchMessage(Handler.java:102)
    at android.os.Looper.loop(Looper.java:179)
    at android.app.ActivityThread.main(ActivityThread.java:5672)
    at java.lang.reflect.Method.invoke(Native Method)
    at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:784)
    at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:674)
system message

Information systems sometimes with some key clues, there is a very big help to us to solve the problem.

Logcat. This includes running log application, system. Because the system permission issues, acquired Logcat may only contain information related to the current APP. Wherein the system will record the event logcat some basic APP run, recorded in the file / system / etc / event-log-tags in.

//system logcat:
10-25 17:13:47.788 21430 21430 D dalvikvm: Trying to load lib ... 

//event logcat:
10-25 17:13:47.788 21430 21430 I am_on_resume_called: 生命周期
10-25 17:13:47.788 21430 21430 I am_low_memory: 系统内存不足
10-25 17:13:47.788 21430 21430 I am_destroy_activity: 销毁 Activty
10-25 17:13:47.888 21430 21430 I am_anr: ANR 以及原因
10-25 17:13:47.888 21430 21430 I am_kill: APP 被杀以及原因

Models, systems, manufacturers, CPU, ABI, Linux version. By collecting as many as dozens of dimensions, it would be helpful to look for common problems.

Memory Information

OOM, ANR, such as virtual memory is exhausted, a lot of crashes related to memory are directly related. If the user's phone memory is divided into "2GB or less" and "2GB more than" two areas, you will find "2GB or less" user rate of collapse is "2GB or more" user several times.

The remaining system memory. About the status of system memory, you can read the file / proc / meminfo directly. When the system's available memory is very small (less than 10% MemTotal of), OOM, a large number of GC, pull systems suicide and other issues are very prone to frequent.

Applications use memory. Including Java memory, RSS (Resident Set Size), PSS (Proportional Set Size), we can conclude that occupy the size and distribution of the application itself memory. PSS and by RSS / proc / self / smap calculated, for example, can be further more detailed classification statistics apk, dex, so and so.

Virtual Memory. Virtual memory can / proc / self / status available, you can get specific distribution through / proc / self / maps file. Sometimes we generally pay little attention to virtual memory, but many similar OOM, tgkill and other problems are caused by virtual memory.

Name:     com.xmamiga.name   // 进程名
FDSize:   800               // 当前进程申请的文件句柄个数
VmPeak:   3004628 kB        // 当前进程的虚拟内存峰值大小
VmSize:   2997032 kB        // 当前进程的虚拟内存大小
Threads:  600               // 当前进程包含的线程个数

In general, for a 32-bit process, if it is a 32-bit CPU, virtual memory to 3GB of memory might cause problems of application failure. If a 64-bit CPU, the virtual memory is generally between 3 ~ 4GB. Of course, if we support the 64-bit processes, virtual memory will not be a problem. Google Play requires August 2019 must support 64-bit, 64-bit devices in the country, although support has been above 90%, but the store does not support the type of CPU architecture distinguish release, spread up takes longer.

Resource Information

Sometimes you will find application heap memory and device memory are very adequate, or memory allocation failures will occur, which may leak resources with a relatively large relationship.

File handle fd. Limit for file handles by / proc / self / limits available, the maximum number of file handles are generally allowed to open a single process for 1024. However, if the file handle more than 800 is more dangerous, we need to output all fd and file names corresponding to the log, further investigation if there is a documented or thread leaks.

opened files count 812:
0 -> /dev/null
1 -> /dev/log/main4 
2 -> /dev/binder
3 -> /data/data/com.xmamiga.sample/files/test.config
...

Threads. The current number of threads size can be obtained by the above status file, a thread may account 2MB virtual memory, virtual memory will have too many threads and file handles pressure. In my experience, if the number of threads over 400 is more dangerous. All you need to output and the corresponding thread id thread name to the log, further investigation if there are problems associated with the thread.

 threads count 412:
 1820 com.xmamiga.crashsdk
 1844 ReferenceQueueD
 1869 FinalizerDaemon
 ...

JNI. When using JNI, if not pay attention to it is prone to failure quote, quote some crash explosion lists.

Application Information

In addition to the system, in fact, our application know better yourself, you can leave a lot of relevant information. Crash scenes. Crash which occurred in the Activity or Fragment, which occurred in the business; key operational path, different from the development process detailed log of RBI, we can record critical user operation path, which reproduce the crash will be relatively big help to us. Other custom information. Focus on different applications of interest may not be the same.

2.2 Crash Analysis

Once you have so much information on the site, you can start the real "detection" tour of. Most of the "case" as long as willing to spend effort, can last the truth. Do not be afraid issue, through patience and careful analysis, always keen to find some unusual or key point, and even dare to doubt and verification.

The first step: setting priorities

Identify and analyze the focus lies in the final over the log to find important information, the problems have a general judgment. In general, I recommend setting priorities in this step can focus on the following points.

  • Confirm severity. Resolving crashing it also depends on cost-effective, our priorities Top crash or have a significant impact on the business, such as the collapse of the main functions. Do not spend a few days to solve the collapse of a corner, it is possible to put the next version features deleted.
  • The collapse of basic information. And an abnormality determination of crash type described, have substantially the collapse of the judgment.
    In general, most simply collapse after this step can already be concluded.

Java collapse. Java type of crash more obvious, such as NullPointerException is a null pointer, OutOfMemoryError is a lack of resources, this time need to go further to see "Memory Information" and "Information resources" log.

Native crash. We need to observe signal, code, fault addr and other content, as well as the time of the crash Java stack. About the meaning of each signal, you can view the crash signal introduction. More common is SIGSEGV and SIGABRT, the former general is due to a null pointer, resulting in an illegal pointer, the latter mainly because of ANR and call abort () quit as a result.

ANR. Take a look at the main thread's stack, whether the cause is because the lock wait. Then take a look at ANR log iowait, CPU, GC, system server and other information, to further determine the I / O problem, or CPU competition issues, or due to a large number of GC lead to stuck.

Step Two: Find common

If the above method is still not effectively locate the problem, we can try to find this kind of collapse have nothing in common. Find commonalities, differences will be found further, it further away from the problem.

Models, systems, ROM, vendors, ABI, which collected information system can be used as polymerization dimensions, such as common problems are not only in the x86 phone, this model is not only Samsung, is not only in Android 8.0 on the system. Application information can also be used as polymerization dimensions, such as being open link, the video is playing, country, region and so on.

Found a common, can have a more clear guidelines on your next recurring problem.

The third step: Try to reproduce

If we already know about the reason for the crash, in order to further confirm more information, we need to try to reproduce the crash. If we have no clue to crash, but also want to try to reproduce the user operation path, and then go to analyze cause of the crash.

"As long as the local reproduction, I can solution", I believe this is a lot of development with the test words. Have such confidence mainly because of stable reproduction path above, we can use to increase the log or use the Debugger, GDB and other kinds of instruments or tools for further analysis.

We may encounter a variety of wonderful problem. For example, a manufacturer changed the underlying implementation, the new Android system implementation is subject to change, we need to go to Google, turn the source code, sometimes need to go to pull the manufacturer's manual brush ROM or ROM. Many difficult issues that we need to endure loneliness, repeated speculation, gray hair repeatedly, repeated verification. - but this problem is to look at a serious procedural problem, not penny wise and pound-foolish.

2.3 system crashes

System crashes often makes us feel very helpless, it could be an Android version of the Bug, it may be a modified ROM manufacturers lead. Stack may collapse completely our own code in this case, it is difficult to directly locate the problem. Can do are:

  • For the possible causes. Through the above common classification, we take a look at a version of a system problem, or a problem in some vendor-specific ROM. Although the crash log may not have our own code, but by operating path and the log, you can find the point of some doubt.

  • Try to avoid. View suspicious code calls, whether the use of inappropriate API, if you can replace other implementations avoid.

  • Hook resolved. Here it is divided into Java Hook and Native Hook. It may only appear in the system Android 7.0, Android 8.0 with reference to the practice of direct catch live this exception.
    If you do both of these cases, most of the above should be able to solve or avoid collapse, most of the system to crash as well. Of course, there are always some difficult issues that need to rely on the user's real environment, these need to have the ability to trace and debug a similar dynamic.

Third, the summary

The collapse of offense and defense is a long process, we prevent crashes early as possible, it will be nipped in the bud stage. As technicians, we should not blindly pursue the collapse of a number of these, should the user experience first, if forced to cover up some of the problems tend to be more counter-productive. We should not feel free to use try catch to hide the real problem, the source from the start, understand the nature of the cause of the collapse, to ensure that the process is running behind. In the process of solving the collapse, but also do point to the surface, not only for this crash to solve, but should consider how to solve this type of crash and prevention.

Guess you like

Origin www.cnblogs.com/Androidmm/p/11357900.html