Memory leak monitoring on Flutter

I. Introduction

The Dart language used by Flutter has a garbage collection mechanism. With garbage collection, memory leaks cannot be avoided. There is a memory leak detection tool LeakCanary [1] on the Android platform , which can easily detect whether the current page is leaked in the debug environment. This article will take you to implement a LeakCanary that Flutter can use, and describe how to use the tool to detect two leaks on the 1.9.1 Framework.

2. Weak references in Dart

In languages ​​with garbage collection, weak references are a good way to detect leaks. We just need to weak reference object of observation, waiting for the next Full GC , GC if after the object null, indicating that was recovered, if not nullit could be leaked.

There are also weak references in the Dart language. It is called Expando<T>. Take a look at its API:

class Expando<T> {
  external T operator [](Object object "");
  external void operator []=(Object object, T value);
}

You may be wondering where the weak references of the above code are reflected? In fact, in expando[key]=valuethis assignment. ExpandoWill be held in the form of weak references key, and here is where the weak references are made.

So the question is, the Expandoweak reference is holding key, but does not itself provide getKey()such an API, we will not start to know keywhether the object is recovered.

To solve this problem, we look at the Expandoconcrete realization of the specific code in the expando_path.dart [2] :

@path
class Expando<T> {
  // ...
  T operator [](Objet object "") {
    var mask = _size - 1;
    var idx = object._identityHashCode & mask;
    // sdk 是把 key 放到了一个 _data 数组内,这个 wp 是个 _WeakProperty
    var wp = _data[idx];

    // ... 省略部分代码
    return wp.value;
    // ... 省略部分代码
  }
}

Note : This patch code is not applicable to the web platform

We can see that keythe object is placed _datawithin the array, with one _WeakPropertyto wrap, then this _WeakPropertyis the key class, and look to achieve it, the code weak_property.dart [3] :

@pragma("vm:entry-point")
class _WeakProperty {

  get key => _getKey();
  // ... 省略部分代码
  _getKey() native "WeakProperty_getKey";
  // ... 省略部分代码
}

This class has what we want keyand can be used to determine whether the object is still there.

How to get such private properties and variables? Dart in Flutter does not support reflection (in order to optimize the packaging volume, reflection is turned off), is there any other way to get this private property?

The answer is definitely " Yes " . In order to solve the above problems, I will introduce a service that comes with Dart - Dart VM Service .

Three, Dart vm_service

Dart VM Service [4] (abbreviated hereafter vm_service) is a set of web services provided inside the Dart virtual machine, and the data transmission protocol is JSON-RPC 2.0. But we don't need to implement data request parsing by ourselves. The official Dart SDK has been written for us: `vm_service` [5] .

ObjRef, ObjAnd idthe role of

Introduce vm_servicethe core ObjRefcontent: Obj, ,id

vm_serviceThe returned data is mainly divided into two categories, ObjRef(reference type) and Obj(object instance type). Wherein Objcomplete contains ObjRefdata, and to increase its basis additional information ( ObjRefonly contains some basic information, such as: id, nameetc.).

Substantially all of the APIdata is returned ObjRef, when ObjRefinside information can not meet you, then call getObject(,,,)to get Obj.

About id: Obj and ObjRefcontain id, this idis the object instance vm_servicean identifier inside, vm_servicenearly all of the API will need idto operate, such getInstance(isolateId, classId, ...)as: getIsolate(isolateId), getObject(isolateId, objectId, ...), .

How to use the vm_serviceservice

vm_serviceWill open a locally when it starts WebSocketservice, service URI can be obtained in the corresponding platform:

  • Android in the FlutterJNI.getObservatoryUri()middle;

  • iOS in the FlutterEngine.observatoryUrlmiddle.

Once you have URI we can use vm_servicethe service, and to help us have an official written SDK: vm_service [6] , directly inside the vmServiceConnectUrican get a usable VmServiceobject.

vmServiceConnectUriParameters need to be a ws://URI of the protocol, obtain the default httpprotocol by the need convertToWebSocketUrlmethods of transforming the

Fourth, the realization of leak detection

With vm_serviceAfter that, we can use it to make up for Expandothe lack of. According to previous analysis, we have to obtain Expandoprivate field _data, where you can use getObject (isolateId, objectId) [7] API, its return value is Instance [8] , the internal fieldsfield holds all the current properties of the object. In this way, we can traverse the attributes to obtain the _datareflection effect.

The question now is API parameters isoateIdand objectIdwhat is it? I described earlier according to idrelevant content, it is the object vm_seriveidentifier. That is, we only pass vm_servicecan get to these two parameters only.

IsolateId Get

Isolate(Quarantine) is a very important concept Dart inside, basically an isolateequivalent to a thread, and the thread but we usually contact is different: different isolatememory between not shared.

Because of the above features, we also need to bring them when looking for objects isolateId. Through vm_servicethe getVM()API you can get to the target virtual machine data, and through isolatesyou can get to all the fields of the current virtual machine isolate.

So how do we want to filter out of isolateit? For the sake of simplicity, only the main filter is selected here. For isolatethis part of the filter, you can view the source code of dev_tools [9] : service_manager.dart#\_initSelectedIsolate [10] function.

ObjectId Get

We want to get objectIdthat expandoin vm_servicethe id, where you can extend the problem:

How to obtain the specified object in vm_servicethe id?

The problem is too much trouble, vm_servicethere is no instance of an object and idconverting the API, there is getInstance(isolateId, classId, limit)the API, you can get an classIdall subclass instance, they will not speak how to get to the desired classIdperformance of the API and limitare worrying.

Is there no good way? In fact, we can use the Library of top-level functions (written directly in the current file, not in the class, for example, the main function) to achieve this function.

Simple instructions under Library is something, subcontract management Dart is based on Library to come, with a Library class name can not be repeated, in general, a .dartdocument is a Library, of course, there are exceptions, such as: part of and export.

vm_serviceThere is the Invoke (isolateId, targetId, Selector, argumentIds) [11] API, can be used to perform a routine function ( getter, , setterconstructors, private function unconventional function), which if targetIdis the Library id, then the invokeexecution is the Library The top-level function.

With invokeLibrary top path function, it can be implemented using object- ida, as follows:

int _key = 0;
/// 顶级函数,必须常规方法,生成 key 用
String generateNewKey() {
  return "${++_key}";
}

Map<String, dynamic> _objCache = Map();
/// 顶级函数,根据 key 返回指定对象
dynamic keyToObj(String key) {
  return _objCache[key];
}

/// 对象转 id
String obj2Id(VMService service, dynamic obj) async {

  // 找到 isolateId。这里的方法就是前面讲的 isolateId 获取方法
  String isolateId = findMainIsolateId();
  // 找到当前 Library。这里可以遍历 isolate 的 libraries 字段
  // 根据 uri 筛选出当前 Library 即可,具体不展开了
  String libraryId = findLibraryId();

  // 用 vm service 执行 generateNewKey 函数
  InstanceRef keyRef = await service.invoke(
    isolateId,
    libraryId,
    "generateNewKey",
    // 无参数,所以是空数组
    []
  );
  // 获取 keyRef 的 String 值
  // 这是唯一一个能把 ObjRef 类型转为数值的 api
  String key = keyRef.valueAsString;

  _objCache[key] = obj;
  try {
    // 调用 keyToObj 顶级函数,传入 key,获取 obj
    InstanceRef valueRef = await service.invoke(
      isolateId,
      libraryId,
      "keyToObj",
      // 这里注意,vm_service 需要的是 id,不是值
      [keyRef.id]
    )
    // 这里的 id 就是 obj 对应的 id
    return valueRef.id;
  } finally {
    _objCache.remove(key);
  }
  return null;
}

Object leakage judgment

Now we have can get to expandoinstances vm_servicein idthe next simple.

By first vm_serviceacquired Instance, traversing the inside of the fieldsattributes found _datafield (note _datais ObjRefthe type), in the same way to _datathe field turn into Instancetype ( _datais an array, Objwhich has an array of child information).

Traversing the _datafield, if we are null, we show that the observed keyobject has been released. If itemnot null, again iteminto Instancean object, take it propertyKey(because the item is _WeakPropertytype, Instancewhich is specially _WeakPropertyopened this field).

Force GC

As mentioned at the beginning of the article, if you want to determine whether the object is leaked, you need to determine whether the weak reference is still there after Full GC. Is there a way to manually trigger the GC?

The answer is yes. vm_serviceAlthough there is no API to force GC, there is a GC button in the upper right corner of the memory icon of Dev Tools. We just follow it to operate it! Dev Tools is called vm_servicethe getAllocationProfile (isolateId, gc: to true) [12] API to implement the GC manual.

As for whether this API triggers FULL GC or not, it is not stated. My test triggers are FULL GC. If you want to be sure to detect leaks after FULL GC, you can monitor the gc event stream and vm_serviceprovide this function.

Thus far, we have been able to achieve control leakage, but also get to leak goals vm_serivein idthe following analysis began to get a leak path.

Five, get the leak path

Regarding the acquisition of the leak path, vm_servicean API called getRetainingPath(isolateId, objectId, limit) [13] is provided . By directly using this API, you can obtain the reference chain information of the leaked object to the GC Roots. Does it feel simple? But this is not enough, because it has the following pitfalls:

Expando holding issues

If the execution getRetainingPathtime of the leak object is expandoheld by the following two issues that would arise

  • Because there is only one reference chain returned by the API, the returned reference chain will pass through expando, which makes it impossible to obtain the real leaked node information;

  • Native crash will appear on ARM devices, and the specific error will appear in utf8 character decoding.

This problem solved, then note the finish in front of leak detection, freed expandoon the line.

id Expiration problem

InstanceType idand Class, Library, Isolatethis idis not the same, it will expire. vm_serviceIn respect of such a temporary iddefault cache capacity size 8192is a circular queue.

Because of this problem, when we detect a leak, we can't just save the leaked object id, we need to save the original object, and we can't strongly reference the holding object. So here we still need expandoto save the leak we detected the object, until the need to analyze leakage path, then the object designed for id.

Six, memory leaks on the 1.9.1 Framework

After completing the leak detection and path acquisition, a simple leakcanary tool was obtained. When I tested this tool under Framework version 1.9.1, I found that it leaked a page when I observed it! ! !

Judging from the objects dumped by dev_tools, it is indeed leaked!

That is, there is a leak in the 1.9.1 Framework, and this leak will leak the entire page.

Next, I started to investigate the cause of the leak, and here was a problem: the leak path was too long: getRetainingPaththe length of the returned link was 300+, and the root of the problem was not found after an afternoon of investigation.

Conclusion: The direct vm_servicedata returned is difficult to analyze the source of the problem, the need for information leakage path under secondary treatment.

How to shorten the reference chain

First look at why such a long leak path, after the return of link discovery through observation, most of the nodes are nodes Flutter UI components (such as: widget, , element, ).staterenderObject

In other words, the reference chain has passed Flutter's widget tree. Developers familiar with Flutter should know that the level of Flutter's widget tree is very deep. Since the reason for the length of the reference chain is that it contains the widget tree, and the widget tree basically appears in blocks, then we only need to classify and aggregate the nodes in the reference chain according to their types to greatly shorten the leak path.

classification

According to Flutter's component types, nodes are divided into the following types:

  • element: Corresponding to Elementthe node;

  • widget: Corresponding to Widgetthe node;

  • renderObject: Corresponding to RenderObjectthe node;

  • state: Corresponding to State<T extends StatefulWdget>the node;

  • collection: Node corresponding set type, for Listexample: Map, Set, ;

  • other: corresponds to other nodes.

polymerization

After the node classification is done, the nodes of the same type can be aggregated. Here is my aggregation method:

The collectionsame node type look into the connection node of the node, the adjacent merged into one set, if the collection center by two of the same type collectionconnected to the node, to continue a set of these two merged into one set, recursively.

By classification - the polymerization process, the original link length 300+, 100+ can be shortened.

1.9.1 Framework continue troubleshooting the problem of leakage, although the path is shortened, the problem can be found in roughly appear FocusManageron the node! However, specific problems are still difficult to locate, mainly due to the following two points:

  • Missing codes CCN reference position : since RetainingObjectthe data only parentField, parentIndexand parentKeythree fields represent the current object to the next object reference information, through the information to find location code is low efficiency;

  • Flutter can not know the current component node information : for example, Texttext information, elementthe widget is valid and where the life cycle status of the state, the current component belongs to which page, and so on.

Between the above two pain points, it is also necessary to expand the information of the leaked node:

  • Code positions : the position of the reference node code parse fact, only parentFieldthe line, through vm_seriveparsing class, taking inside field, find the corresponding scriptinformation. This method can get the source code;

  • Components node information : Flutter UI components are inherited from Diagnosticable, that is, as long as the Diagnosticabletype of node can get to very detailed information (dev_tools debugging time, assembly tree information is through the Diagnosticable.debugFillPropertiesmethod of acquisition). In addition to this, you also need to expand the information of the route where the current component is located. This is very important to determine the page where the component is located.

Troubleshoot the root cause of the 1.9.1 Framework leak

After above all the optimization, I got the following tools in the two _InkResponseStatefound a problem node:

Leakage path, there are two _InkResponseStatedifferent route information node belongs, indicating that the two nodes in two different pages. The top of _InkResponseStatethe description of the display lifecycle not mounted, indicating that components have been destroyed, but still be FocusManagerreferenced with! The problem is here, look at this part of the code

The code can clearly see the addListenertime of StatefulWidgetthe life cycle of misunderstanding. didChangeDependenciesIt is called many times, and disposewill only be called once, so there will be listenerremoval of unclean conditions.

After fixing the above-mentioned leak, another leak was found. After investigation found the leak source TransitionRoutein:

When you open a new page when the page Route(that is, the code nextRoute) is a front page of animationthe holding, if the page is jump TransitionRoute, then all Routewill leak!

The good news is that the above leaks were all fixed after version 1.12.

After completion of the repair of the two leak tested again, Routeand Widgetcan be recovered, so far 1.9.1 Framework investigation is completed.


Author: Qi Gengxin

Currently working in the Flutter team of the Kuaishou application development platform group, responsible for the development and research of the APM direction. I have been exposed to Flutter since 2018 and have a lot of experience in Flutter hybrid stack, engineering landing, UI components, etc.

Contact: [email protected]


"Flutter Chinese Community Tutorial" is contributed by community developers, and the content is simultaneously published to the flutter.cn website and various social platforms of the "Flutter Community". During the internal test of this project, submissions will be open after the preparation is completed. Please click "Read the original text" to view the links contained in the corner of the text.

Guess you like

Origin blog.csdn.net/weixin_43459071/article/details/106821683