Crash caused by BinderProxy leak

Problem Description

After the National Day, I officially resigned. Before the handover was completed, I just touched fish or helped my colleagues analyze some serious bugs on Jira. The vehicle-mounted project that my colleagues were in charge of had already undergone small batch trial production, and the intensity of the Monkey test also began to increase. If there is an accident, there will be an accident. A service with a core function of a vehicle will almost certainly fail in a high-intensity monkey test.

The problematic service is responsible for the communication business of the "data buried point" of the vehicle. Dozens of applications and services in the car need to report the buried point data to the "T-box" through this service. The IPC communication method is implemented using the classic AIDL.

The complete log is as follows:

01-01 13:36:45.936     0     0 I binder  : release 1684:1718 transaction 57988729 in, still active
01-01 13:36:45.936     0     0 I binder  : send failed reply for transaction 57988729 to 3135:3254
09-30 21:53:02.514  1684  1718 E AndroidRuntime: FATAL EXCEPTION: Binder:1684_2
09-30 21:53:02.514  1684  1718 E AndroidRuntime: Process: com.xxx.xxx.xxxservice, PID: 1684
09-30 21:53:02.514  1684  1718 E AndroidRuntime: java.lang.AssertionError: Binder ProxyMap has too many entries: 20691 (total), 20691 (uncleared), 20691 (uncleared after GC). BinderProxy leak?
09-30 21:53:02.514  1684  1718 E AndroidRuntime:         at android.os.BinderProxy$ProxyMap.set(BinderProxy.java:230)
09-30 21:53:02.514  1684  1718 E AndroidRuntime:         at android.os.BinderProxy.getInstance(BinderProxy.java:432)
09-30 21:53:02.514  1684  1718 E AndroidRuntime:         at android.os.Parcel.nativeReadStrongBinder(Native Method)
09-30 21:53:02.514  1684  1718 E AndroidRuntime:         at android.os.Parcel.readStrongBinder(Parcel.java:2483)
09-30 21:53:02.514  1684  1718 E AndroidRuntime:         at com.xxx.xxx.xxxservice.diagnostics.IDiagnosticsEvent$Stub.onTransact(IDiagnosticsEvent.java:121)
09-30 21:53:02.514  1684  1718 E AndroidRuntime:         at android.os.Binder.execTransactInternal(Binder.java:1154)
09-30 21:53:02.514  1684  1718 E AndroidRuntime:         at android.os.Binder.execTransact(Binder.java:1123)
09-30 21:53:02.221   754   754 I chatty  : uid=1000(system) Binder:754_3 identical 1 line
09-30 21:53:02.222   754   754 D NotificationService: 0|com.xxx.xxx.persontime|124|null|1000: updating permissions
09-30 21:53:02.521   754  4583 I DropBoxManagerService: add tag=system_app_crash isTagEnabled=true flags=0x2
09-30 21:53:02.523   754   962 I PackageWatchdog: Updated health check state for package com.xxx.xxx.xxxservice: INACTIVE -> INACTIVE
09-30 21:53:02.524   754  3061 W ActivityManager: Force-killing crashed app com.xxx.xxx.xxxservice at watcher's request
09-30 21:53:02.526  1684  1718 I Process : Sending signal. PID: 1684 SIG: 9

Cause Analysis

The problem is obvious from the log. Binder, that is, the cross-process communication module, has a memory leak

java.lang.AssertionError: Binder ProxyMap has too many entries: 20691 (total), 20691 (uncleared), 20691 (uncleared after GC ). BinderProxy leak?

Then the linux kernel sends the signal=9 signal, and the system actively kills the process

Linux signal is also a very useful log analysis entry. Sometimes if the process is killed by the kernel without any crash, you can consider analyzing whether it has received a Linux signal, and then push it up step by step, often there will be unexpected gains

09-30 21:53:02.526  1684  1718 I Process : Sending signal. PID: 1684 SIG: 9

Continue to look at the log, you can find that the code that actually triggers the binder leak is on IDiagnosticsEvent.classline 121

 AndroidRuntime:         at com.xxx.xxx.xxxservice.diagnostics.IDiagnosticsEvent$Stub.onTransact(IDiagnosticsEvent.java:121)

IDiagnosticsEvent.javaIt is a temporary class generated after the AIDL interface is compiled, and its location is:

工程MODULE/build/generated/aidl_source_output_dir/debug/out/com/xxx/xxx/xxxservice/xxx/

case  TRANSACTION_setSendListener:
{
  data.enforceInterface(descriptor);
  com.xxx.xxx.xxxservice.diagnostics.ISendEventListener _arg0;
_arg0 = com.xxx.xxx.xxxservice.diagnostics.ISendEventListener.Stub. asInterface (data.readStrongBinder());
  com.xxx.xxx.xxxservice.diagnostics.IDiagnosticsEvent _result = this.setSendListener(_arg0);
  reply.writeNoException();
  reply.writeStrongBinder((((_result!=null))?(_result.asBinder()):(null)));
  return  true;
}

Line 121 is _arg0 = com.xxx.xxx.xxxservice.diagnostics.ISendEventListener.Stub.asInterface (data.readStrongBinder()); That is, this class has not been released, so you need to go to the code to see what this method doesISendEventListenersetSendListener()

setSendListener()It is DiagnosticsEvent.classa member method in the code , DiagnosticsEvent.classand it is also a logical implementation class of Binder.Stub.

// DiagnosticsEvent
@Override
public IDiagnosticsEvent setSendListener(ISendEventListener listener) throws RemoteException {
    mListener = listener;
    mEvent.SetCallback(new DiagnosticsListener());
    return  this;
}

Then we look at DiagnosticsListener.classthis class.

As a result, the problem lies in this class, DiagnosticListenerwhich is also an implementation class of Binder.Stub, but as an inner class, it holds a reference to the outer class ISendEventListener.

At this point, the problem is already very clear. It is a typical memory leak caused by an inner class holding a reference to an outer class .

Since setSendListener()an object that cannot be released is generated every time the method is called DiagnosticsListener, BinderProxy takes up more and more memory over time, and eventually the process is killed by the kernel.

private  class DiagnosticsListener extends ISendEventCallback.Stub {

    @Override
    public  void SendStatus(hal e) throws RemoteException {
        if (mListener != null) {
            EventInfos infos = new EventInfos();
            ...
            mListener.OnSendStatus(infos);
        }
    }
}

solution

Now that you know the reason, the modification is simple:

1) The inner class is changed to a static inner class, and the reference to the outer class is changed to a soft reference

@Override
public IDiagnosticsEvent setSendListener(ISendEventListener listener) throws RemoteException {
    mEvent.SetCallback(new DiagnosticsListener(listener));
    return  this;
}

private  static  class DiagnosticsListener extends ISendEventCallback.Stub {

    private  final SoftReference<ISendEventListener> mSoftReference;

    public DiagnosticsListener(ISendEventListener listener) {
        mSoftReference = new SoftReference<>(listener);
    }

    @Override
    public  void SendStatus(t_hal e) throws RemoteException {
        ISendEventListener mListener = mSoftReference.get();
        if (mListener != null) {
            LogUtils.logD(TAG, "SendStatus" );
            EventInfos infos = new EventInfos();
            ...
            mListener.OnSendStatus(infos);
        }
    }
}

2) Change the inner class to an anonymous inner class

Anonymous inner classes cannot absolutely prevent memory leaks, and improper use will also cause memory leaks

@Override
public IDiagnosticsEvent setSendListener(ISendEventListener listener) throws RemoteException {
    mEvent.SetCallback(new ISendEventCallback.Stub() {
        @Override
        public  void SendStatus(t_hal e) throws RemoteException {
            EventInfos infos = new EventInfos();
            ...
            listener.OnSendStatus(infos);
        }
    });
    return  this;
}

Of course, there are other methods, such as initializing ISendEventCallback.Stub into a member object in advance, instead of going to new every time it is used, etc.

After the problem is checked and the code is modified, the monkey test is executed again and the problem does not appear, which should be solved.

Guess you like

Origin blog.csdn.net/linkwj/article/details/127261848