ART perspective | How to automatically trigger GC when native memory grows too much? How to trigger native memory recycling when Java objects are recycled?

本文分析基于Android R(11)

Preface

It is a well-known fact that GC is used for the recovery of Java heap memory. However, some Java classes are now designed as marionettes. Java objects only store some "threads", and their real memory consumption is all placed in native memory. For example, Bitmap. For them, how to automatically reclaim manipulated native memory has become an urgent problem.

To automatically recycle, you must rely on the GC mechanism. But just relying on the existing GC mechanism is not enough. We also need to consider the following two points:

  1. How much of native memory growth when the automatic trigger GC
  2. How to recover when the GC Java objects synchronized recovery of native resources

Android introduced the NativeAllocationRegistry class from N. The early version can ensure that the native resources are synchronously recovered when the GC reclaims Java objects (the 2nd point above), and what it uses internally is the Cleaner mechanism introduced in the previous blog .

Using the earlier version of NativeAllocationRegistry, although native resources can be recycled, there are still some defects. For example, a Java class designed as a marionette occupies a small space, but its indirect reference to native resources takes up a lot. Therefore, the growth of the Java heap is very slow, and the growth of the native heap is very fast. In some scenarios, the growth of the Java heap has not reached the level of the next GC trigger, and the garbage in the native heap has accumulated into a mountain. The proactive call by the program System.gc()can certainly alleviate this problem, but how do developers control this frequency? If it is frequent, it will reduce the running performance. If it is sparse, it will cause the native garbage to not be released in time. Therefore, the new version of NativeAllocationRegistry has been adjusted together with the GC, so that the process can automatically trigger the GC when the native memory grows too much, which is the first point above. It is equivalent to the previous GC trigger which only considers the usage size of the Java heap, which is now considered together with the native heap.

The problem of native rubbish accumulation can cause some serious problems, such as the native memory OOM problem encountered on many 32-bit APKs in China recently. ByteDance specially posted a blog to introduce their solutions. In the linked blog, the Bytedance team provides an application layer solution, and the application layer actively releases native resources. But the fundamental solution to this problem still depends on the modification of the underlying design. After reading the Bytedance blog, I specifically contacted the Android team and suggested that they use NativeAllocationRegistry in the CameraMetadataNative class. They quickly accepted the proposal and provided a new implementation. I believe this problem encountered by byte beating will not exist on S.

table of Contents

1. How to automatically trigger GC when native memory grows too much

When a Java class is designed as a marionette, there are usually two ways to allocate its native memory. One is malloc (new is usually called malloc) to allocate heap memory, and the other is mmap to allocate anonymous pages. The biggest difference between the two is that malloc is usually used for small memory allocation, while mmap is usually used for large memory allocation.

When we use NativeAllocationRegistry to automatically release native memory for this Java object, we first need to call it registerNativeAllocation. On the one hand, we inform the GC of the resource size allocated by the native this time, and on the other hand, we need to detect whether the GC trigger condition is reached. According to the different memory allocation methods, the processing methods are also different.

libcore/luni/src/main/java/libcore/util/NativeAllocationRegistry.java

     // Inform the garbage collector of the allocation. We do this differently for
     // malloc-based allocations.
     private static void registerNativeAllocation(long size) {
         VMRuntime runtime = VMRuntime.getRuntime();
         if ((size & IS_MALLOCED) != 0) {     <==================如果native内存是通过malloc方式分配的,则走这个if分支
             final long notifyImmediateThreshold = 300000;
             if (size >= notifyImmediateThreshold) {   <=========如果native内存大于等于300000bytes(~300KB),则走这个分支
                 runtime.notifyNativeAllocationsInternal();
             } else {                         <==================如果native内存小于300000bytes,则走这个分支
                 runtime.notifyNativeAllocation();
             }
         } else {
             runtime.registerNativeAllocation(size);
         }
     }

1.1 Malloc memory

There are two judgment conditions for the memory allocated by Malloc.

  1. Is the allocation greater than or equal to 300,000 bytes. If it is greater than, the CheckGCForNativefunction will be executed directly through the VIP channel . This function will count the total amount of native memory allocation and determine whether it reaches the threshold of GC trigger. If it reaches, a GC is triggered.
  2. Is this allocation an integral multiple of 300 allocations? This judgment condition is used to limit CheckGCForNativethe number of executions, and only one test is performed every 300 mallocs.

Next, look at CheckGCForNativethe logic inside the function.

First calculate the total size of the current native memory, and then calculate the ratio between the current memory size and the threshold. If the ratio is greater than or equal to 1, request a new GC.

art/runtime/gc/heap.cc

 inline void Heap::CheckGCForNative(Thread* self) {
   bool is_gc_concurrent = IsGcConcurrent();
   size_t current_native_bytes = GetNativeBytes();    <================获取native内存的总大小
   float gc_urgency = NativeMemoryOverTarget(current_native_bytes, is_gc_concurrent); <============计算当前内存大小和阈值之间的比值,大于等于1则表明需要一次新的GC
   if (UNLIKELY(gc_urgency >= 1.0)) {
     if (is_gc_concurrent) {
       RequestConcurrentGC(self, kGcCauseForNativeAlloc, /*force_full=*/true);   <=================请求一次新的GC
       if (gc_urgency > kStopForNativeFactor
           && current_native_bytes > stop_for_native_allocs_) {
         // We're in danger of running out of memory due to rampant native allocation.
         if (VLOG_IS_ON(heap) || VLOG_IS_ON(startup)) {
           LOG(INFO) << "Stopping for native allocation, urgency: " << gc_urgency;
         }
         WaitForGcToComplete(kGcCauseForNativeAlloc, self);
       }
     } else {
       CollectGarbageInternal(NonStickyGcType(), kGcCauseForNativeAlloc, false);
     }
   }
 }

To get the total size of the current native memory, you need to call a GetNativeBytesfunction. Its internal statistics are also divided into two parts, one part is mallinfothe total size of the current malloc obtained through . Since the system has a special API to obtain this information, there NativeAllocationRegistry.registerNativeAllocationis no need to store the size of a single malloc. The other part is the size of all registered mmaps recorded in the native_bytes_registered_ field. The addition of the two basically reflects the overall consumption of native memory in the current process.

art/runtime/gc/heap.cc

 size_t Heap::GetNativeBytes() {
   size_t malloc_bytes;
 #if defined(__BIONIC__) || defined(__GLIBC__)
   IF_GLIBC(size_t mmapped_bytes;)
   struct mallinfo mi = mallinfo();
   // In spite of the documentation, the jemalloc version of this call seems to do what we want,
   // and it is thread-safe.
   if (sizeof(size_t) > sizeof(mi.uordblks) && sizeof(size_t) > sizeof(mi.hblkhd)) {
     // Shouldn't happen, but glibc declares uordblks as int.
     // Avoiding sign extension gets us correct behavior for another 2 GB.
     malloc_bytes = (unsigned int)mi.uordblks;
     IF_GLIBC(mmapped_bytes = (unsigned int)mi.hblkhd;)
   } else {
     malloc_bytes = mi.uordblks;
     IF_GLIBC(mmapped_bytes = mi.hblkhd;)
   }
   // From the spec, it appeared mmapped_bytes <= malloc_bytes. Reality was sometimes
   // dramatically different. (b/119580449 was an early bug.) If so, we try to fudge it.
   // However, malloc implementations seem to interpret hblkhd differently, namely as
   // mapped blocks backing the entire heap (e.g. jemalloc) vs. large objects directly
   // allocated via mmap (e.g. glibc). Thus we now only do this for glibc, where it
   // previously helped, and which appears to use a reading of the spec compatible
   // with our adjustment.
 #if defined(__GLIBC__)
   if (mmapped_bytes > malloc_bytes) {
     malloc_bytes = mmapped_bytes;
   }
 #endif  // GLIBC
 #else  // Neither Bionic nor Glibc
   // We should hit this case only in contexts in which GC triggering is not critical. Effectively
   // disable GC triggering based on malloc().
   malloc_bytes = 1000;
 #endif
   return malloc_bytes + native_bytes_registered_.load(std::memory_order_relaxed);
   // An alternative would be to get RSS from /proc/self/statm. Empirically, that's no
   // more expensive, and it would allow us to count memory allocated by means other than malloc.
   // However it would change as pages are unmapped and remapped due to memory pressure, among
   // other things. It seems risky to trigger GCs as a result of such changes.
 }

After getting the total size of the native memory of the current process, you need to decide whether you need a new GC.

The decision-making process is as follows, the source code is explained in detail below.

art/runtime/gc/heap.cc

 // Return the ratio of the weighted native + java allocated bytes to its target value.
 // A return value > 1.0 means we should collect. Significantly larger values mean we're falling
 // behind.
 inline float Heap::NativeMemoryOverTarget(size_t current_native_bytes, bool is_gc_concurrent) {
   // Collection check for native allocation. Does not enforce Java heap bounds.
   // With adj_start_bytes defined below, effectively checks
   // <java bytes allocd> + c1*<old native allocd> + c2*<new native allocd) >= adj_start_bytes,
   // where c3 > 1, and currently c1 and c2 are 1 divided by the values defined above.
   size_t old_native_bytes = old_native_bytes_allocated_.load(std::memory_order_relaxed);
   if (old_native_bytes > current_native_bytes) {
     // Net decrease; skip the check, but update old value.
     // It's OK to lose an update if two stores race.
     old_native_bytes_allocated_.store(current_native_bytes, std::memory_order_relaxed);
     return 0.0;
   } else {
     size_t new_native_bytes = UnsignedDifference(current_native_bytes, old_native_bytes);   <=======(1)
     size_t weighted_native_bytes = new_native_bytes / kNewNativeDiscountFactor              <=======(2)
         + old_native_bytes / kOldNativeDiscountFactor;
     size_t add_bytes_allowed = static_cast<size_t>(                                         <=======(3)
         NativeAllocationGcWatermark() * HeapGrowthMultiplier());
     size_t java_gc_start_bytes = is_gc_concurrent                                           <=======(4)
         ? concurrent_start_bytes_
         : target_footprint_.load(std::memory_order_relaxed);
     size_t adj_start_bytes = UnsignedSum(java_gc_start_bytes,                               <=======(5)
                                          add_bytes_allowed / kNewNativeDiscountFactor);
     return static_cast<float>(GetBytesAllocated() + weighted_native_bytes)                  <=======(6)
          / static_cast<float>(adj_start_bytes);
   }
 }

First, compare the total size of native memory this time with the total size of native memory after the last GC. If it is less than the total size of the last time, it indicates that the usage level of native memory has decreased, so there is no need to perform a new GC.

But if the native memory usage increases this time, you need to further calculate the proportional relationship between the current value and the threshold. If it is greater than or equal to 1, GC is required. The following details (1)~(6) in the source code.

(1) Calculate the difference between this native memory and the last one. This difference reflects the size of the newly increased portion of native memory.

(2) Give different weights to different parts of native memory, divide the new growth part by 2, and divide the old part by 65536. The reason why the weight of the old part is so low is because the native heap itself has no upper limit. The original intention of this mechanism is not to limit the size of the native heap, but to prevent excessive accumulation of native memory garbage between two GCs.

(3) The so-called threshold is not set up for native memory alone, but for (Java heap size + native memory size) as a whole. add_bytes_allowed represents the native memory size that can be allowed based on the original Java heap threshold. NativeAllocationGcWatermarkThe allowable native memory size is calculated according to the Java heap threshold. The larger the Java heap threshold, the larger the allowable value. HeapGrowthMultipilerFor the foreground application, it is 2, which indicates that the memory control of the foreground application is looser and the GC trigger frequency is lower.

(4) Under the same conditions, the trigger level of synchronous GC is lower than that of non-synchronous GC. The reason is that synchronous GC will also have new object allocations during garbage collection. Therefore, it is better not to exceed the threshold for these newly allocated objects.

(5) Add the Java heap threshold and the allowed native memory as the new threshold.

(6) Add the allocated size of the Java heap and the native memory size after adjusting the weight, and divide the added result by the threshold to obtain a ratio to determine whether GC is needed.

The following code shows that when the ratio is ≥ 1, a new GC will be requested.

art/runtime/gc/heap.cc

   if (UNLIKELY(gc_urgency >= 1.0)) {
     if (is_gc_concurrent) {
       RequestConcurrentGC(self, kGcCauseForNativeAlloc, /*force_full=*/true);   <=================请求一次新的GC

1.2 MMap memory

The processing method of mmap is basically the same as that of malloc, which is executed for 300,000 bytes or mmap 300 times CheckGCForNative. The only difference is that mmap needs to count the size of each time into native_bytes_registered, because this information is not recorded in mallinfo (for the bionic library).

art/runtime/gc/heap.cc

 void Heap::RegisterNativeAllocation(JNIEnv* env, size_t bytes) {
   // Cautiously check for a wrapped negative bytes argument.
   DCHECK(sizeof(size_t) < 8 || bytes < (std::numeric_limits<size_t>::max() / 2));
   native_bytes_registered_.fetch_add(bytes, std::memory_order_relaxed);
   uint32_t objects_notified =
       native_objects_notified_.fetch_add(1, std::memory_order_relaxed);
   if (objects_notified % kNotifyNativeInterval == kNotifyNativeInterval - 1
       || bytes > kCheckImmediatelyThreshold) {
     CheckGCForNative(ThreadForEnv(env));
   }
 }

2. How to trigger native memory recycling when Java objects are recycled

NativeAllocationRegistry mainly relies on the Cleaner mechanism to complete this process. For details about Cleaner, please refer to my previous blog .

3. Actual case

The Bitmap class implements the automatic release of native resources through NativeAllocationRegistry. The following is part of the Bitmap construction method.

frameworks/base/graphics/java/android/graphics/Bitmap.java

         mNativePtr = nativeBitmap;         <=========================== 通过指针值间接持有native资源
 
         final int allocationByteCount = getAllocationByteCount(); <==== 获取native资源的大小,如果是mmap方式,这个大小最终会计入native_bytes_registered中
         NativeAllocationRegistry registry;
         if (fromMalloc) {
             registry = NativeAllocationRegistry.createMalloced(   <==== 根据native资源分配方式的不同,构造不同的NativeAllocationRegistry对象,nativeGetNativeFinalizer()返回的是native资源释放函数的函数指针
                     Bitmap.class.getClassLoader(), nativeGetNativeFinalizer(), allocationByteCount);
         } else {
             registry = NativeAllocationRegistry.createNonmalloced(
                     Bitmap.class.getClassLoader(), nativeGetNativeFinalizer(), allocationByteCount);
         }
         registry.registerNativeAllocation(this, nativeBitmap);   <===== 检测是否需要GC

It can be seen from the above case that when we use NativeAllocationRegistry to automatically release native memory resources for Java classes, we first need to create a NativeAllocationRegistry object, and then call the registerNativeAllocationmethod. Only these two steps can realize the automatic release of native resources.

Since it takes two steps, why registerNativeAllocationnot put it in the construction method of NativeAllocationRegistry, wouldn't it be better to do this one step? The reason is registerNativeAllocationthat if you are independent, you can inform the GC after the native resource is actually applied for, which is more flexible. In addition, there is a registerNativeFreecorresponding method in NativeAllocationRegistry , which allows the application layer to notify the GC after releasing native resources in advance.

Author: Lu Mid
link: https: //juejin.im/post/6894153239907237902

End of sentence

Thank you everyone for following me, sharing Android dry goods, and exchanging Android technology.
If you have any insights on the article, or any technical questions, you can leave a message in the comment area to discuss, and I will answer you religiously.
Everyone is also welcome to come to my B station to play with me. There are video explanations of the advanced technical difficulties of various Android architects to help you get promoted and raise your salary.
Through train at station B: https://space.bilibili.com/544650554

Guess you like

Origin blog.csdn.net/Androiddddd/article/details/112493970