Take a look, the ART virtual machine in Android 10 (9)-end

origin

Continue to systematically study "Advanced Design and Implementation of Virtual Machines" from the perspective of reading notes. This book is written in English, and it is very challenging to read it all, carefully and thoroughly. I only read 60% before, and the more impatient I get later. After thinking about it, if the JVM must reach a certain level at the system level, you may still have to read one or two such books. Taking reading notes is a better way for me to learn knowledge and skills. It has been with me for at least 20 years. For the time being, name this reading note "Theory of VM".

This article introduces the last two parts:

This article is the end of our reading notes on ADIVM. Today’s article will be a bit rough. After all, this belongs to the advanced part-optimization... Some places I can only summarize, let you know what the author is talking about (if you don’t organize it, you basically don’t know So, the kind of completely unclear situation).

GC basic knowledge review

Let me review the basics of GC first, which is also an excerpt from Chapter 14 of the book "In-depth Understanding of Android JVM ART".

In principle, there are four basic GC methods.

  • Mark Sweep

  • Copying Collection

  • Mark Compact

  • Reference Counting

The following is a schematic diagram of three GC methods except Reference Counting.

The principle of Mark Sweep is as above. Roughly find the surviving objects, and then clean up the garbage objects on the spot. This method is relatively simple to understand, and there are no additional operations. But in actual use, it will cause memory fragmentation.

Copying collection is to copy live objects to a free area. When copying, you can rearrange the position so that everyone is crowded together. In this way, the problem of memory fragmentation is solved. But this copy definitely has overhead, and there is also the problem of objects being referenced before/after the copy. In addition, the memory space has to be reserved in advance as a free area, which is a bit wasteful.

Mark Compact is actually similar to Copying. It just does not leave a separate free area. It's moving... The so-called Compact (compression is actually moving)

The above is an illustration of the three basic GCs. Next we look at how to optimize GC in the book.

GC optimization to improve throughput (Throughput)

The author first talks about how to improve the throughput of GC. This chapter comes up with a bunch of formulas. I think the main difficulty lies in how to define and calculate throughput. Let's first look at what the author cares about.

When discussing GC throughput, the author's first optimization design consideration is to discuss the timing of Partial Heap and Full Heap recovery. That is, how to match minor/major GC. Note that this is based on a major premise, which is to divide the Heap area. The lower left corner of the above figure is the division of Heap. On the right is the classification.

Based on such a purpose, several mathematical formulas were formed. The following is a highly refined content of the original text. I think it’s okay if I don’t see it...BTW, I read the evaluation of this book on Meiya, one of which said that there are a lot of quotes at the back of the book. But in the main text, there is no mention of which document a certain content comes from. To be honest, I feel vague about the throughput calculation below, but I don’t know if the author himself came up with it or what literature he referred to...

The author first defines the throughput calculation formula, that is, the total reclaimed memory size/total reclaim time during the APP running period. But this is too difficult to calculate, so another period is defined, that is, from the end of one major gc to the end of the next major gc. This cycle includes one major gc and several minor gc. Assuming that the time-consuming of each minor gc is similar (all Tminor), the time-consuming of a major gc is similar (all Tmajor). Barabara....

Here comes the dizziness...

In fact, many assumptions are made in the above formula. For example, after minor gc, the application will probably allocate dS-sized non-garbage memory... But I feel that there are more than these assumptions, so this chapter is actually quite difficult (now understand the helplessness embodied in Meiya’s comment) .

Finally, the author got a throughput calculation formula. With Fmax/dS/Tmin/Tmax fixed, a differential equation is solved. Then, let's assume Fmin=16MB, and do a set of throughput tests. This logic does jump too much (I really don’t understand how this picture is drawn)...

The optimization prerequisites and final conclusions in the lower left corner of the above figure (start Major GC earlier).

Next, the second design consideration to improve throughput is to use generational GC or non-generational GC

The above is some key knowledge extraction. The final conclusion is as follows:

According to the author's test results, the throughput of generational GC was lower than that of non-generational GC at first, but it was higher later. Therefore, the optimization method provided by the author is to start non-generational GC with high throughput. At a certain point, change to generational GC... The idea of ​​optimization is that simple and rude. Of course, how to do it is quite a test. But the goal is indeed so direct...

The last design consideration for improving throughput is as follows, this is mainly from the perspective of improving operating speed.

GC optimization to improve Scalability

Another optimization of GC is how to take advantage of multi-core and improve throughput. This chapter is more boring, it is recommended to understand what to do first..

The figure above explains the difference between concurrent gc and parallel gc. Concurrent gc emphasizes the relationship between mutator and collector, working in parallel. Parallel gc emphasizes the parallel work of multiple collectors.

For the content discussed in this chapter, see the introduction of 1, 2, and 3. Again, a very big feature of this book is that many content at the same level are not in a parallel relationship. For example, 1 and 2 mentioned in the figure above. Object Traversal includes Object Marking. In addition, please pay attention to the data structure of Mark-Stack. This is also a common word in the GC part of the ART book. Also, the book says that because the root object enumeration needs to suspend the operation of the mutator, Scalability has no meaning in this scenario...

Traversal is actually for mark. But let’s talk about mark alone, but we can do some other optimization designs for the mark...

For multi-core, one of the optimization design considerations is Parallel object traversal. Since the scanned objects must be added to MarkStack, this becomes a typical multi-write/multi-read producer/consumer model. See the figure below for explanation:

Of course, if you want to add load balancing and other processing, the situation will be more complicated. In short, the three optimization methods mentioned by the author in this chapter are better to solve the problem of multiple write/read more. The specific details are not discussed. Some people think that we may not say anything in this article. In fact, if you have no chance to read this article, I have a 90% chance that you may not understand the original text (or don't know what the original text is discussing) .

Parallel Object Marking discussed next.

As I said earlier, the contents of the above two pages of pictures are side-by-side titles in the original text. But obviously Object Marking discusses only a small point of Traversal...

Finally, this chapter also discusses an optimized design point-Parallel Compaction.

Keep it for now. Come back when you see this part of the original text...

GC optimization to improve Responsiveness

Improve response speed, this is probably where many keyboard guys can come out and say a few words. This is also a mature technology. I feel that the main method is concurrent...

One of optimization design considerations-Concurrent Tracing. The key content is as follows:

Three basic guarantees are important. When the author discusses related GC methods, it ultimately comes down to satisfying these three conditions.

  • Can not miss a live object

  • Some garbage objects can be kept, but they must be collected eventually

  • Tracing must be able to end, not to fall into a dead end and not come out.

The overall situation of Concurrent Tracing is shown in the figure above. First of all, it is necessary to make it clear that after the Root root object is enumerated, there is concurrent. In response to the last problem mentioned in Figure 3, the following three methods are designed.

This article only ends the Snapshot-At-The-Beginning method. With Write Barrier, when the member variables of the object are modified, we remember. Pay attention to the diagram above.

  • A.f1, f2, f3 point to three objects, B, C, and D respectively.

  • At this point, A.f1 is assigned to a, and then A.f1 points to X. Then, the B object pointed to by A.f1 is not remembered. Nor can it be found. Because A has been marked, it will not be marked again.

Therefore, with the help of Write-Barrier, we actively remember B. The equivalent of B is also the object to be remembered. The reason B needs to be remembered is that it may or may not be a garbage object. You can't treat it as a garbage object now (otherwise it will cause problems once it is recycled. The point to note is that mutator and collector are working at the same time). SATB (Snapshot-At-The-Beginning) method is just that. There are also some variants.

The next optimization design consideration point is concurrent root-set enumeration. This is a bit difficult. The true meaning of this optimization point requires careful reading of the original text. That is, it discusses how to concurrently perform root-set enumeration between multiple mutators.

In ART, it seems that there is no "so advanced" (or perhaps not mature implementation) method. The Visit of the root object in the thread stack in ART belongs to NonCurrentRoot, which can only be accessed after all mutators are suspended...

Concurrent Moving Colleciton

The arrangement of this chapter also addresses the issue of side-by-side content I mentioned above. The first three chapters talk about how to optimize GC from three aspects: Throughput, Scalability, and Responsiveness. This chapter is suddenly a specific GC method...

There are also places where the concurrent moving gc method is used in ART. Pay attention to the key steps below.

CMC (abbreviation of Concurrent Moving Collection) optimization design consideration is a starting point. It is whether the mutator sees two Obj A (one is the original Obj A, in From-Space, and the other is the copied Obj A', in To-Space). If you only see Obj A in To-Space, this design is called To-Space Invariant. This article mainly introduces this design

Note the processing of 2.a. After Rootset enumeration, you need to move the root object in Root Set to To-Space. Then resume the operation of the mutator.

Regarding the processing of 2.b, Read Barrier is used. See below.

Constants of kUseBakerReadBarrier often appear in ART code. This is the Read-Barrier shown in the figure above, but strictly speaking it is Load Barrier. Because not only Read, but also do some work when Write. This method was first proposed by Baker...

The above is the optimization part related to GC. The difficulty is quite large. Moreover, the original text does not have a clearer association with the reference, so it will be more painful to read. I suggest that you still understand the purpose and how to do it, but you can do a detailed study in conjunction with the source code of ART.

Lock implementation and hardware-based memory transactions

The last two chapters of this book discuss the optimization of Lock and the possible help that hardware-based memory transaction technology can bring to the JVM.

First look at the introduction to Lock

To be honest, I think it's better to read section 12.3 of ART. ART's Lock combines several technologies mentioned in the figure above. The above discussion is actually more like a review (of course, if you don’t understand the ART approach, you won’t have this feeling). These techniques are not complicated. I suggest you just read my book (combined code). Stronger than discussed here (I think the author is reviewing these optimization techniques, rather than proposing new things)

The last chapter of this book introduces the possible use of hardware-based transactional memory technology in JVM implementation. I don't want to talk too much, just look at the brief introduction in the lower left corner of the figure below.

JVM reading notes summary

The book is generally quite rich in content. The JVM itself is a project. In terms of engineering, there will be relatively more accumulation of experience and optimization in certain areas, and there are not too many theoretical theories (GC is actually more of an optimization). When reading the ART source code, you are destined to encounter the content mentioned in this book. Therefore, this book should be an integrated and summarized book (the various reference materials are also a treasure trove for subsequent in-depth research).

last of the last

  • The result I expect is not what my friends have learned from my books, articles, and blogs and what they have done, but rather, Shennong, I stepped on your shoulders.

  • I have finished discussing the issue of learning. The following public account will learn and share some basic technologies and new technologies. Your contributions are also welcome. However, as I said in the "Contact Information" of the public account-Zheng Yuanjie has a sentence in the fairy tale King "Wisdom Teeth" that impressed me, to the effect that "I have the right to remain silent, but every word you say is May become my source of inspiration". Therefore, the impact is not one-way, it is possible that I learned more from you.

Collection of essays by Shennong and his friends

Long press to identify the QR code and follow us

Guess you like

Origin blog.csdn.net/Innost/article/details/107602892
Recommended