Three small cases of Android cold start optimization

Author: Zhuo Xiuwu K

background

In order to improve the cold start time of the App, in addition to the time-consuming code optimization on the regular business side, in order to further shorten the start-up time, some optimization explorations need to be done in pure technical testing. In this issue, we will start from class preloading, Retrofit, ARouter has been further optimized. Judging from the test data, the benefits of these optimization methods are limited, and may not add up to more than 50ms of benefits on mid-range machines. However, in order to optimize the cold start scene and bring users a better experience, any profitable optimization All methods are worth trying.

class preloading

The complete loading process of a class includes at least loading, linking, and initialization, and the loading of a class will only be triggered once in a process. Therefore, for cold start scenarios, we can asynchronously load the class loading process that would have been triggered on the main thread during the startup phase. Class, so that when the original process accesses this class in the main thread, the class loading process will not be triggered.

Hook ClassLoader implementation

In the Android system, class loading is implemented through PathClassLoader. Based on the parent class delegation mechanism of class loading, we can modify its default parent through Hook PathClassLoader.

First, we create a MonitorClassLoader inherited from PathClassLoader, and record the time-consuming class loading inside it

class MonitorClassLoader(
    dexPath: String,
    parent: ClassLoader, private val onlyMainThread: Boolean = false,
) : PathClassLoader(dexPath, parent) {

    val TAG = "MonitorClassLoader"

    override fun loadClass(name: String?, resolve: Boolean): Class<*> {
    val begin = SystemClock.elapsedRealtimeNanos()
    if (onlyMainThread && Looper.getMainLooper().thread!=Thread.currentThread()){
        return super.loadClass(name, resolve)
    }
    val clazz = super.loadClass(name, resolve)
    val end = SystemClock.elapsedRealtimeNanos()
    val cost = end - begin
    if (cost > 1000_000){
        Log.e(TAG, "加载 ${clazz} 耗时 ${(end - begin) / 1000} 微秒 ,线程ID ${Thread.currentThread().id}")
    } else {
        Log.d(TAG, "加载 ${clazz} 耗时 ${(end - begin) / 1000} 微秒 ,线程ID ${Thread.currentThread().id}")
    }
    return  clazz;

}
}

After that, we can reflect and replace the parent pointer corresponding to the classLoader of the application instance in the Application attach phase.

The core code is as follows:

    companion object {
        @JvmStatic
        fun hook(application: Application, onlyMainThread: Boolean = false) {
            val pathClassLoader = application.classLoader
            try {
                val monitorClassLoader = MonitorClassLoader("", pathClassLoader.parent, onlyMainThread)
                val pathListField = BaseDexClassLoader::class.java.getDeclaredField("pathList")
                pathListField.isAccessible = true
                val pathList = pathListField.get(pathClassLoader)
                pathListField.set(monitorClassLoader, pathList)

                val parentField = ClassLoader::class.java.getDeclaredField("parent")
                parentField.isAccessible = true
                parentField.set(pathClassLoader, monitorClassLoader)
            } catch (throwable: Throwable) {
                Log.e("hook", throwable.stackTraceToString())
            }
        }
    }

The main logic is

  • Reflection gets the pathList of the original pathClassLoader
  • Create a MonitorClassLoader and reflect to set the correct pathList
  • Reflection replaces the parent of the original pathClassLoader pointing to the MonitorClassLoader instance

In this way, we get the loading class in the startup phase

Implemented based on JVMTI

In addition to implementing the Hook ClassLoader solution, we can also implement class loading monitoring through JVMTI.

By registering ClassPrepare Callback, callbacks can be triggered during each class Prepare phase.

Of course, this solution is much more cumbersome than Hook ClassLoader, but based on JVMTI, many other more powerful things can be done.

Class Preloading Implementation

At present, applications are usually multi-module, so we can design an abstract interface, which can be inherited by different business modules, and define the classes that need to be preloaded by different business modules.

/**
 * 资源预加载接口
 */
public interface PreloadDemander {
    /**
     * 配置所有需要预加载的类
     * @return
     */
    Class[] getPreloadClasses();
}

Then collect all Demander instances during the startup phase and trigger preloading

/**
 * 类预加载执行器
 */
object ClassPreloadExecutor {


    private val demanders = mutableListOf<PreloadDemander>()

    fun addDemander(classPreloadDemander: PreloadDemander) {
        demanders.add(classPreloadDemander)
    }

    /**
     * this method shouldn't run on main thread
     */
    @WorkerThread fun doPreload() {
        for (demander in localDemanders) {
            val classes = demander.preloadClasses
            classes.forEach {
                val classLoader = ClassPreloadExecutor::class.java.classLoader
                Class.forName(it.name, true, classLoader)
    			}
			}
    }
    
}

income

The first version is configured with about 90 classes. The test data on the terminal model shows that the loading of these classes takes about 30ms of cpu time. The difference in the loading time of different classes mainly comes from the complexity of the class, such as the inheritance system and field attributes. Quantity, etc., and the time-consuming of the class initialization phase, such as the immediate initialization of static member variables, the execution of static code blocks, etc.

Thinking about program optimization

The specific class list of our current scheme configuration comes from manual configuration. The disadvantage of this scheme is that the class list needs to be developed and maintained, and the maintenance cost is relatively high in the case of rapid iterative changes in the version, and for some large apps, there are very There are many AB experimental conditions, which may also cause differences in class loading for different users.

In the previous section, we introduced that using a custom ClassLoader can manually collect the class list of the main thread during the startup phase, so can we automatically collect the loaded class every time the terminal starts, if it is found that this class is not in the existing The list is added to the list and preloaded on the next startup. Of course, specific strategies need to be designed in detail, such as controlling the size of the preload list, the minimum time-consuming threshold of classes added to the preload list, elimination strategies, and so on.

Retrofit ServiceMethod pre-parsing injection

background

Retrofit is currently the most commonly used network library framework. Its annotation-based network request method and Adapter design mode greatly simplify the calling method of network requests. However, it does not use an APT-like method to generate request codes at compile time, but uses runtime analysis.

When we call the Retrofit.create(final Class service) function, a dynamic proxy instance of the abstract interface will be generated.

All function calls of the interface will be forwarded to the invoke function of the dynamic proxy object, and finally call loadServiceMethod(method).invoke.

In the loadServiceMethod function, it is necessary to analyze various meta-information on the original function, including function annotations, parameter annotations, parameter types, return value types, and other information, and finally generate a ServiceMethod instance. The call to the original interface function actually triggers this The call of the generated ServiceMethod invoke function.

It can be seen from the source code implementation that the instances of ServiceMethod are cached, and each Method corresponds to a ServiceMethod.

time consuming test

Here I simulate a simple Service Method, and call archiveStat to observe the time-consuming of the first call and its subsequent calls. Note that the call here has not yet triggered a network request, and it returns a Call object.

From the test results, it takes 1.7ms for the first call to be triggered, while subsequent calls only need to consume about 50 microseconds.

Optimization

Since the first call to an interface function needs to trigger the generation of a ServiceMethod instance, this process is time-consuming, so the optimization idea is relatively simple. Collect the functions that will be called during the startup phase, generate ServiceMethod instances in advance and write them into the cache.

The type of serviceMethodCache itself is ConcurrentHashMap, so it is concurrently safe.

However, when the ServiceMethod cache is judged in the source code, the serviceMethodCache is still used as the Lock Object for locking, which leads to the problem of lock waiting when multiple threads are triggered at the same time when different Method calls are triggered for the first time.

First of all, we need to understand why we need to lock here. The purpose is also because parseAnnotations is a good operation. Here it is to achieve a completely atomic operation like putIfAbsent. But in fact, the lock here can use the corresponding Method type as the lock object, because the ServiceMethod instances corresponding to different Methods are different. We can modify the implementation of its source code to avoid lock competition in this scenario.

Of course, for our optimization scenario, it is actually achievable without modifying the source code, because ServiceMethod.parseAnnotations is lock-free, after all, it is a pure function. Therefore, we can call parseAnnotations in an asynchronous thread to generate a ServiceMethod instance, and then write it into the serviceMethodCache of the Retrofit instance through reflection. The problem with this is that different threads may trigger a method parsing injection at the same time, but since serviceMethodCache itself is thread-safe, it just does one more parsing and has no effect on the final result.

ServiceMethod.parseAnnotations is package-level private, we can create the same package in the current project, so that we can call this function directly. The core implementation code is as follows

package retrofit2

import android.os.Build
import timber.log.Timber
import java.lang.reflect.Field
import java.lang.reflect.Method
import java.lang.reflect.Modifier

object RetrofitPreloadUtil {
    private var loadServiceMethod: Method? = null
    var initSuccess: Boolean = false
    //    private var serviceMethodCacheField:Map<Method,ServiceMethod<Any>>?=null
    private var serviceMethodCacheField: Field? = null

    init {
        try {
            serviceMethodCacheField = Retrofit::class.java.getDeclaredField("serviceMethodCache")
            serviceMethodCacheField?.isAccessible = true
            if (serviceMethodCacheField == null) {
                for (declaredField in Retrofit::class.java.declaredFields) {
                    if (Map::class.java.isAssignableFrom(declaredField.type)) {
                        declaredField.isAccessible =true
                        serviceMethodCacheField = declaredField
                        break
                    }
                }
            }
            loadServiceMethod = Retrofit::class.java.getDeclaredMethod("loadServiceMethod", Method::class.java)
            loadServiceMethod?.isAccessible = true
        } catch (e: Exception) {
            initSuccess = false
        }
    }

    /**
     * 预加载 目标service 的 相关函数,并注入到对应retrofit实例中
     */
    fun preloadClassMethods(retrofit: Retrofit, service: Class<*>, methodNames: Array<String>) {
        val field = serviceMethodCacheField ?: return
        val map = field.get(retrofit) as MutableMap<Method,ServiceMethod<Any>>

        for (declaredMethod in service.declaredMethods) {
            if (!isDefaultMethod(declaredMethod) && !Modifier.isStatic(declaredMethod.modifiers)
                && methodNames.contains(declaredMethod.name)) {
                try {
                    val parsedMethod = ServiceMethod.parseAnnotations<Any>(retrofit, declaredMethod) as ServiceMethod<Any>
                    map[declaredMethod] =parsedMethod
                } catch (e: Exception) {
                    Timber.e(e, "load method $declaredMethod for class $service failed")
                }
            }
        }

    }

    private fun isDefaultMethod(method: Method): Boolean {
        return Build.VERSION.SDK_INT >= 24 && method.isDefault;
    }

}

Preload list collection

After the optimization plan is in place, it is also necessary to collect the list of Retrofit ServiceMethod calls that will be made on the main thread during the startup phase. Here, the method of bytecode instrumentation is adopted, and the LancetX framework is used for modification.

Currently, the configuration of the list is collected in advance, configured in the configuration center, and preloaded according to the configuration written in the configuration at runtime. Other configuration schemes can also be provided here, such as providing an annotation to mark that the Retrofit function needs to be pre-parsed,

After that, collect all the services and functions that need to be preloaded during compilation, and generate the corresponding list. However, this solution requires a certain development cost and needs to modify the code of the business module. The current stage is still in the stage of verifying benefits, so it has not yet been implemented. .

income

The app collects about 20 methods in the startup phase for preloading, which is expected to increase by 10~20ms.

ARouter

background

The ARouter framework provides routing registration jump and SPI capabilities. In order to optimize the cold start speed, some service instances can be preloaded during the startup phase to generate corresponding instance objects.

The registration information of ARouter is generated in the precompilation stage (based on APT), and in the compilation stage, the injection code corresponding to the mapping relationship is generated through ASM.

At runtime, take obtaining a Service instance as an example. When the navigation function is called to obtain an instance, the completion function will eventually be called.

When called for the first time, the corresponding RouteMeta instance has not been generated, and the addRouteGroupDynamic function will be called to register.

addRouteGroupDynamic will create the corresponding service registration class generated in the precompilation stage and call the loadInto function to register. However, how some business modules serve more registration information, the loadInto here will be more time-consuming.

On the whole, for the process of obtaining Service instances, the entire process of completion involves loadInto information registration, Service instance reflection generation, and init function invocation. The completion function is synchronized, so multi-thread registration cannot be used to shorten the startup time.

Optimization

The optimization here is actually similar to the registration mechanism of Retroift Service. When different Services are registered, their corresponding meta-information classes (IRouteGroup) are actually different, so you only need to lock the corresponding IRouteGroup.

In the second half of the completion process, the process produced for the Provider instance also needs to be locked separately to avoid calling the init function multiple times.

income

According to the data collected offline, 20+ preloaded Service Methods are configured, and the expected return is 10~20ms (mid-range machine).

In order to help everyone better grasp performance optimization comprehensively and clearly, we have prepared relevant learning routes and core notes (returning to the underlying logic):https://qr18.cn/FVlo89

Performance optimization core notes:https://qr18.cn/FVlo89

Startup optimization

Memory optimization

UI

optimization Network optimization

Bitmap optimization and image compression optimization : Multi-thread concurrency optimization and data transmission efficiency optimization Volume package optimizationhttps://qr18.cn/FVlo89




"Android Performance Monitoring Framework":https://qr18.cn/FVlo89

"Android Framework Learning Manual":https://qr18.cn/AQpN4J

  1. Boot Init process
  2. Start the Zygote process at boot
  3. Start the SystemServer process at boot
  4. Binder driver
  5. AMS startup process
  6. The startup process of the PMS
  7. Launcher's startup process
  8. The four major components of Android
  9. Android system service - distribution process of Input event
  10. Android underlying rendering-screen refresh mechanism source code analysis
  11. Android source code analysis in practice

Guess you like

Origin blog.csdn.net/weixin_61845324/article/details/131435954