[Ultimate Secret] Fps, Memory, Cpu of Android Performance Collection | Performance Monitoring Series

background

It’s been a long time since you guys have seen each other, and they have developed for a while (mainly I am fighting the Dragon Warrior), and basically completed a simple performance collection Demo. Let’s share the relevant experience with you.

APM (Application perfmance monitor) is application performance monitoring. Today, when the mobile Internet has more and more impact on people’s lives, the functions of apps are becoming more and more comprehensive, and with it, the requirements for app performance are getting higher and higher. You can’t passively wait for the abnormal occurrence of users and then follow the online logs. Go to fix the bug and send the patch version again. Active monitoring of App performance has become more and more important. Analyzing App's power consumption, UI freezes, and network performance have become a top priority.

The Apm development of the current project refers to Tencent's Matrix , 360's ArgusAPM , Didi's Dokit , and some small projects. Customize according to the project, and then complete your own Apm collection system.

Subsequent articles will be updated slowly according to my current development progress. You can follow me as a little kid. The collection of these three performance indicators is currently completed, and thread FD information may be added in the future, so this article will focus on analysis. These three points.

Raise the question first

  1. For a performance data collection system, you can't become a burden on an app, and you can't delay the rendering of the main thread during collection. After connecting to Apm, it will make the app more lagging.

  2. Because Fps, memory, CPU, etc. all need to be sampled frequently, such as Fps, refreshing 60 frames per second, if the full amount of data is reported, then the back-end boss may kill me.

  3. Complete the collection of key page data with minimal business intervention, and bind the page data and performance data.

Fps acquisition

First of all, we still have to introduce what Fps is.

Fluency is the experience of the page in the process of sliding, rendering, etc. The Android system requires that each frame be drawn within 16ms. The smooth completion of a frame means that any special frame needs to execute all rendering codes (including the commands sent by the framework to the GPU and the CPU to draw to the buffer) within 16ms. Finished within, maintaining a smooth experience. If the rendering is not completed within seconds, frame drops will occur. Frame drop is a very core issue in user experience. The current frame is discarded, and the previous frame rate cannot be continued later. This discontinuous interval will easily cause the user's attention, which is what we often call stuttering and choppy.

So is it smooth as long as 60 frames are drawn in 1s? Not necessarily, if jitter occurs, then there will definitely be a few frames that are problematic. Among them, there must be a maximum drawing frame and a minimum drawing frame, so the average value, the maximum value and the minimum value are all we need to know.

Before discussing the collection, we have to briefly talk about the two things Choreographerand LooperPrinter.

Choreographer (choreographer)

Choreographer translates to "dance director" in Chinese, which literally means to gracefully direct the above three UI operations to dance a dance together. This word can summarize this kind of work. If the android system is a ballet, it is the choreography of this wonderful dance show displayed by the Android UI. The actors on the podium cooperate with each other and perform wonderful performances. Google engineers seem to like dancing!

Among them, you can refer to this article for the introduction of Choreographer. This question is often asked during interviews, such as ViewRootImp and Vsync, etc., but we are still more focused on data collection and reporting, so the focus of attention is still a bit different.

Generally, the conventional Fps collection can be Choreographerdrawn by the choreographer drawn by the UI thread. Choreographer is a singleton of ThreadLocal, which receives the vsync signal to render the interface. As long as we add a CallBack to it, we can cleverly calculate the frame The drawing time.

Matrix Choreographeris the right CallbackQueuehook for the core , addCallbackLockedadding custom hooks to the heads of different types of callback queues through hooks FrameCallback. So the system CALLBACK_INPUT = custom CALLBACK_ANIMATION start time-custom CALLBACK_INPUT completion time system CALLBACK_ANIMATION = custom CALLBACK_TRAVERSAL start time-custom CALLBACK_ANIMATION start time system CALLBACK_TRAVERSAL = msg dipatch end time-custom CALLBACK_TRAVERSAL start time

LooperPrinter

First of all, we must first have a concept, all View related and life cycle related are executed on the main thread. So is there a way to monitor the time consumption of the main thread?

We how a disk is performed Handler, firstly Looperfrom the MessageQueueacquired into Message, after the determination of the internal Message Handleror Runnabledecides to perform a subsequent operation.

From the ActivityThreadanalysis, all the main thread operations are executed on the main thread Looper, so as long as we execute Looperthe loopmethod Messagebefore and after the acquisition and write the code in the main thread , can we monitor the Messageexecution time?

public static void loop() {
      final Looper me = myLooper();
      if (me == null) {
          throw new RuntimeException("No Looper; Looper.prepare() wasn't called on this thread.");
      }
      final MessageQueue queue = me.mQueue;

      // Make sure the identity of this thread is that of the local process,
      // and keep track of what that identity token actually is.
      Binder.clearCallingIdentity();
      final long ident = Binder.clearCallingIdentity();

      // Allow overriding a threshold with a system prop. e.g.
      // adb shell 'setprop log.looper.1000.main.slow 1 && stop && start'
      final int thresholdOverride =
              SystemProperties.getInt("log.looper."
                      + Process.myUid() + "."
                      + Thread.currentThread().getName()
                      + ".slow", 0);

      boolean slowDeliveryDetected = false;

      for (;;) {
          Message msg = queue.next(); // might block
          if (msg == null) {
              // No message indicates that the message queue is quitting.
              return;
          }

          // This must be in a local variable, in case a UI event sets the logger
          final Printer logging = me.mLogging;
          if (logging != null) {
              logging.println(">>>>> Dispatching to " + msg.target + " " +
                      msg.callback + ": " + msg.what);
          }
          ...

          if (logging != null) {
              logging.println("<<<<< Finished to " + msg.target + " " + msg.callback);
          }
          ...
      }
  }

From the source code, we can see that Loopera Printerclass is reserved at the beginning, and the Printermethod will be executed after the execution of the Message starts and after the execution of the Message ends . We can looper.setMessageLogging(new LooperPrinter());method to set the monitoring.

IdleHandler

When Looperthe MessageQueuecase is empty, triggers IdleHandler, so the main thread Caton, usually together with the reset time consuming, so we can guarantee when the main thread vacant, time-consuming method of calculation will not go wrong.

UIThreadMonitor (main thread monitoring)

After a brief introduction to a few things at the top, this part of the actual sample code for our Fps collection I made reference at Matrixthe UIThreadMonitor, and UIThreadMonitoris by some combination of the above done.

   private void dispatchEnd() {
        long traceBegin = 0;
        if (config.isDevEnv()) {
            traceBegin = System.nanoTime();
        }
        long startNs = token;
        long intendedFrameTimeNs = startNs;
        if (isVsyncFrame) {
            doFrameEnd(token);
            intendedFrameTimeNs = getIntendedFrameTimeNs(startNs);
        }

        long endNs = System.nanoTime();

        synchronized (observers) {
            for (LooperObserver observer : observers) {
                if (observer.isDispatchBegin()) {
                    observer.doFrame(AppMethodBeat.getVisibleScene(), startNs, endNs, isVsyncFrame, intendedFrameTimeNs, queueCost[CALLBACK_INPUT], queueCost[CALLBACK_ANIMATION], queueCost[CALLBACK_TRAVERSAL]);
                }
            }
        }

        dispatchTimeMs[3] = SystemClock.currentThreadTimeMillis();
        dispatchTimeMs[1] = System.nanoTime();

        AppMethodBeat.o(AppMethodBeat.METHOD_ID_DISPATCH);

        synchronized (observers) {
            for (LooperObserver observer : observers) {
                if (observer.isDispatchBegin()) {
                    observer.dispatchEnd(dispatchTimeMs[0], dispatchTimeMs[2], dispatchTimeMs[1], dispatchTimeMs[3], token, isVsyncFrame);
                }
            }
        }
        this.isVsyncFrame = false;

        if (config.isDevEnv()) {
            MatrixLog.d(TAG, "[dispatchEnd#run] inner cost:%sns", System.nanoTime() - traceBegin);
        }
    }

UIThreadMonitorIt's not the same, and the dispatchEndmethods have some of LooperMonitorthe accepted ones.

And LooperMonitorhe set one by Looperthe setMessageLoggingmethod of the main thread LooperPrinter. dispatchEndAfter the execution of the method of the main thread ends, the Choreographercurrent drawing Vsync and rendering time are obtained through reflection . Finally, when the IdleHandler is triggered, the time is reset LooperPrinter, so as to avoid the time-consuming calculation of the method when the main thread is idle.

Matrix LoopMonitor, the portal of this part of the source generation

Why go around a big circle to monitor Fps? What are the benefits of writing this way? I specifically went to look up the next Matrix official Wiki , Martix reference BlockCanarycode, through a combination of lower Choreographerand BlockCanary, when Caton frame to get the current Caton main thread stack, and then by LooperPrinterthe current stack Caton Method output, which can better assist the development to locate the stuck problem, instead of directly telling the business side that your page is stuck.

Sampling analysis

The article started to raise a question. If every collected data is reported, it will firstly cause a huge amount of invalid data pressure on the server, and secondly, there will be a lot of invalid data reported, then what should be done?

In this piece, we refer to the code of Matrix. First, Fps data cannot be reported in real time. Secondly, it is best to filter out the problematic data from the data within a period of time. There are a few small details about the Fps collection of Matrix. It's good.

  1. Delay 200 milliseconds. First collect 200 frames of data, then analyze the data content, filter and traverse the largest frame, the smallest frame, and the average frame, and then save the data in the memory.
  2. The child thread processes the data, and the filtering and traversal operations are moved to the child thread, so as to avoid APM from causing App stalling problems.
  3. The data of 200 milliseconds is only used as one of the data fragments. Matrix's reporting node is reported for a longer period of time. When the time exceeds about 1 minute, it will be reported as an Issue fragment.
  4. The front-end and back-end switching state does not require data collection.
   private void notifyListener(final String focusedActivity, final long startNs, final long endNs, final boolean isVsyncFrame,
                                final long intendedFrameTimeNs, final long inputCostNs, final long animationCostNs, final long traversalCostNs) {
        long traceBegin = System.currentTimeMillis();
        try {
            final long jiter = endNs - intendedFrameTimeNs;
            final int dropFrame = (int) (jiter / frameIntervalNs);
            droppedSum += dropFrame;
            durationSum += Math.max(jiter, frameIntervalNs);

            synchronized (listeners) {
                for (final IDoFrameListener listener : listeners) {
                    if (config.isDevEnv()) {
                        listener.time = SystemClock.uptimeMillis();
                    }
                    if (null != listener.getExecutor()) {
                        if (listener.getIntervalFrameReplay() > 0) {
                            listener.collect(focusedActivity, startNs, endNs, dropFrame, isVsyncFrame,
                                    intendedFrameTimeNs, inputCostNs, animationCostNs, traversalCostNs);
                        } else {
                            listener.getExecutor().execute(new Runnable() {
                                @Override
                                public void run() {
                                    listener.doFrameAsync(focusedActivity, startNs, endNs, dropFrame, isVsyncFrame,
                                            intendedFrameTimeNs, inputCostNs, animationCostNs, traversalCostNs);
                                }
                            });
                        }
                    } else {
                        listener.doFrameSync(focusedActivity, startNs, endNs, dropFrame, isVsyncFrame,
                                intendedFrameTimeNs, inputCostNs, animationCostNs, traversalCostNs);
                    }

                   ...
                }
            }
        }
    }

The above is the source code of Matirx, where we can see that listener.getIntervalFrameReplay() > 0when this condition is triggered, the listener will perform a collection operation first, and after a certain amount of data is triggered, the subsequent logic will be triggered. Secondly, we can see the judgment null != listener.getExecutor(), so this part of the collected operation is executed in the thread pool.

 private class FPSCollector extends IDoFrameListener {

        private Handler frameHandler = new Handler(MatrixHandlerThread.getDefaultHandlerThread().getLooper());

        Executor executor = new Executor() {
            @Override
            public void execute(Runnable command) {
                frameHandler.post(command);
            }
        };

        private HashMap<String, FrameCollectItem> map = new HashMap<>();

        @Override
        public Executor getExecutor() {
            return executor;
        }

        @Override
        public int getIntervalFrameReplay() {
            return 200;
        }

        @Override
        public void doReplay(List<FrameReplay> list) {
            super.doReplay(list);
            for (FrameReplay replay : list) {
                doReplayInner(replay.focusedActivity, replay.startNs, replay.endNs, replay.dropFrame, replay.isVsyncFrame,
                        replay.intendedFrameTimeNs, replay.inputCostNs, replay.animationCostNs, replay.traversalCostNs);
            }
        }

        public void doReplayInner(String visibleScene, long startNs, long endNs, int droppedFrames,
                                  boolean isVsyncFrame, long intendedFrameTimeNs, long inputCostNs,
                                  long animationCostNs, long traversalCostNs) {

            if (Utils.isEmpty(visibleScene)) return;
            if (!isVsyncFrame) return;

            FrameCollectItem item = map.get(visibleScene);
            if (null == item) {
                item = new FrameCollectItem(visibleScene);
                map.put(visibleScene, item);
            }

            item.collect(droppedFrames);

            if (item.sumFrameCost >= timeSliceMs) { // report
                map.remove(visibleScene);
                item.report();
            }
        }
    }

    private class FrameCollectItem {
        String visibleScene;
        long sumFrameCost;
        int sumFrame = 0;
        int sumDroppedFrames;
        // record the level of frames dropped each time
        int[] dropLevel = new int[DropStatus.values().length];
        int[] dropSum = new int[DropStatus.values().length];

        FrameCollectItem(String visibleScene) {
            this.visibleScene = visibleScene;
        }

        void collect(int droppedFrames) {
            float frameIntervalCost = 1f * UIThreadMonitor.getMonitor().getFrameIntervalNanos() / Constants.TIME_MILLIS_TO_NANO;
            sumFrameCost += (droppedFrames + 1) * frameIntervalCost;
            sumDroppedFrames += droppedFrames;
            sumFrame++;
            if (droppedFrames >= frozenThreshold) {
                dropLevel[DropStatus.DROPPED_FROZEN.index]++;
                dropSum[DropStatus.DROPPED_FROZEN.index] += droppedFrames;
            } else if (droppedFrames >= highThreshold) {
                dropLevel[DropStatus.DROPPED_HIGH.index]++;
                dropSum[DropStatus.DROPPED_HIGH.index] += droppedFrames;
            } else if (droppedFrames >= middleThreshold) {
                dropLevel[DropStatus.DROPPED_MIDDLE.index]++;
                dropSum[DropStatus.DROPPED_MIDDLE.index] += droppedFrames;
            } else if (droppedFrames >= normalThreshold) {
                dropLevel[DropStatus.DROPPED_NORMAL.index]++;
                dropSum[DropStatus.DROPPED_NORMAL.index] += droppedFrames;
            } else {
                dropLevel[DropStatus.DROPPED_BEST.index]++;
                dropSum[DropStatus.DROPPED_BEST.index] += Math.max(droppedFrames, 0);
            }
        }
    }

This part of the code is the logic that Matrix performs data processing on a frame segment. It can be seen that the collectmethod has filtered out the data of multiple latitudes such as the largest and smallest ones, which enriches a data segment. The more data in this place, the more it can help a development positioning problem.

The acquisition logic also refers to this part of the Matrix code, but a small bug was found in the actual test phase, because the report is a relatively large time segment. After the user switches the page, the fps data of the previous page will also be regarded as the next page. Report data for each page.

So we have added one ActivityLifeCycle, when the page changes, a data report operation will be performed. Secondly, we adjusted the logic of switching between the front and back of the Matrix and replaced it with a more reliable one ProcessLifecycleOwner.

Cpu and Memory

The use of memory and CPU can better help us detect the real situation of online users, instead of waiting for the user to crash, we can reverse the problem, and we can filter out different page data according to the page dimension to facilitate the development and analysis of the corresponding problem.

After gaining Fps experience, we added Cpu and Memory data collection on this basis. Relatively speaking, we can learn from a lot of acquisition logic, and then only need to adjust the key data.

  1. The data is collected in the sub-thread to avoid the main thread from getting stuck in the data collection.
  2. At the same time, data is collected once per second, data content is analyzed locally, and the average value of peak and valley values ​​is calculated
  3. The data reporting node is split, and the page is switched within a certain period of time, and a data is generated.
  4. Combine CPU and memory data and report them as the same data structure to optimize data flow problems.

Memory data collection

For the memory data, we refer to the code of Dokit. There are also differences between the high and low versions. The high version can directly Debug.MemoryInfo()obtain the data from the memory, and the low version needs amsto ActivityManagerobtain the data from it.

The following is the performance collection tool class that collects cpu data at the same time, you can use it directly.

object PerformanceUtils {
    private var CPU_CMD_INDEX = -1
    @JvmStatic
  fun getMemory(): Float {
      val mActivityManager: ActivityManager? = Hasaki.getApplication().getSystemService(Context.ACTIVITY_SERVICE)
              as ActivityManager?
      var mem = 0.0f
      try {
          var memInfo: Debug.MemoryInfo? = null
          if (Build.VERSION.SDK_INT > 28) {
              // 统计进程的内存信息 totalPss
              memInfo = Debug.MemoryInfo()
              Debug.getMemoryInfo(memInfo)
          } else {
              //As of Android Q, for regular apps this method will only return information about the memory info for the processes running as the caller's uid;
              // no other process memory info is available and will be zero. Also of Android Q the sample rate allowed by this API is significantly limited, if called faster the limit you will receive the same data as the previous call.
              val memInfos = mActivityManager?.getProcessMemoryInfo(intArrayOf(Process.myPid()))
              memInfos?.firstOrNull()?.apply {
                  memInfo = this
              }
          }
          memInfo?.apply {
              val totalPss = totalPss
              if (totalPss >= 0) {
                  mem = totalPss / 1024.0f
              }
          }
      } catch (e: Exception) {
          e.printStackTrace()
      }
      return mem
  }

  /**
   * 8.0以下获取cpu的方式
   *
   * @return
   */
  private fun getCPUData(): String {
      val commandResult = ShellUtils.execCmd("top -n 1 | grep ${Process.myPid()}", false)
      val msg = commandResult.successMsg
      return try {
          msg.split("\\s+".toRegex())[CPU_CMD_INDEX]
      } catch (e: Exception) {
          "0.5%"
      }
  }

  @WorkerThread
  fun getCpu(): String {
      if (CPU_CMD_INDEX == -1) {
          getCpuIndex()
      }
      if (CPU_CMD_INDEX == -1) {
          return ""
      }
      return if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.O) {
          getCpuDataForO()
      } else {
          getCPUData()
      }
  }

    /**
     * 8.0以上获取cpu的方式
     *
     * @return
     */
    private fun getCpuDataForO(): String {
        return try {
            val commandResult = ShellUtils.execCmd("top -n 1 | grep ${Process.myPid()}", false)
            var cpu = 0F
            commandResult.successMsg.split("\n").forEach {
                val cpuTemp = it.split("\\s+".toRegex())
                val cpuRate = cpuTemp[CPU_CMD_INDEX].toFloatOrNull()?.div(Runtime.getRuntime()
                        .availableProcessors())?.div(100) ?: 0F
                cpu += cpuRate
            }
            NumberFormat.getPercentInstance().format(cpu)
        } catch (e: Exception) {
            ""
        }
    }
  private fun getCpuIndex() {
      try {
          val process = Runtime.getRuntime().exec("top -n 1")
          val reader = BufferedReader(InputStreamReader(process.inputStream))
          var line: String? = null
          while (reader.readLine().also { line = it } != null) {
              line?.let {
                  line = it.trim { it <= ' ' }
                  line?.apply {
                      val tempIndex = getCPUIndex(this)
                      if (tempIndex != -1) {
                          CPU_CMD_INDEX = tempIndex
                      }
                  }
              }

          }
      } catch (e: Exception) {
          e.printStackTrace()
      }
  }

  private fun getCPUIndex(line: String): Int {
      if (line.contains("CPU")) {
          val titles = line.split("\\s+".toRegex()).toTypedArray()
          for (i in titles.indices) {
              if (titles[i].contains("CPU")) {
                  return i
              }
          }
      }
      return -1
  }
}

Cpu collection

The Cpu data collection code is also on it. This part of the code is relatively simple. The complexity lies in the command line and version adaptation. It is also necessary to distinguish the system version when obtaining it. Both the high and low versions are obtained through the cmd command, which fixes the problem of failure to obtain the CpuId of the low version. Then optimized the code logic of DoKit. ShellUtilsYou can refer to Blankthe collection of tools written.

to sum up

The data of Fps, cpu, and memory can only be regarded as the simplest link in Apm. The actual purpose of Apm is to better assist developers in locating online problems and avoid online problems through early warning mechanisms. Monitoring of performance. However, more user behavior data needs to be collected in the follow-up to assist development to more accurately locate online problems.

Because standing on the shoulders of giants, in fact, the difficulty of this part of the development is relatively less, but there is still some room for optimization. For example, currently we only monitor the changes in Activity. Is there a way to follow the Fragment Is there any way to extract more information about the data content?

The next article will introduce you to the content related to IO reading and writing monitoring in Apm. The amount of magic modification in this part of the code here is a bit larger. Basically, I have already done it here, but the content may be I still have to reorganize it.

Original address: https://juejin.cn/post/6890754507639095303

In addition, this article has been included in the open source project: https://github.com/xieyuliang/Note-Android , which contains self-learning programming routes in different directions, a collection of interview questions/face sutras, and a series of technical articles, etc. The resources are continuously updated in…

Share it here this time, see you in the next article .

Guess you like

Origin blog.csdn.net/weixin_49559515/article/details/112235009