JVM-Memory Structure and JMM-Memory Model

JAVA's memory model: heap, stack, method area in JVM (method area is the conceptual definition of JVM specification, in HotSpot virtual machine, the realization of method area in HotSpot is a permanent generation, and the realization of method area in version 1.8 is meta Space, meta-space is implemented using native memory, that is, its memory is not in the virtual machine, and theoretically limited by the memory of the physical machine), the program counter, etc. are the memory structure of the Java virtual machine. After the Java program is started, These memory data will be initialized. As shown below

The memory model is another thing. What is the memory model?

In computers, cpu and memory interact most frequently. Compared with memory, disk reads and writes are too slow, and memory is equivalent to a high-speed buffer, but the read and write speed of memory is far behind the cpu, so cpu manufacturers add a high-speed buffer to each cpu , Used to alleviate this situation, the interaction between cpu and memory is roughly cpu<=> cache (usually L1, L2, L3) <=> memory, cache solves the contradiction between processor and memory (one fast and one slow ), but it also brings the problem of cache consistency. In a multi-core cpu, each core has its own high-speed cache, and there is only one main memory (when the CPU wants to read a piece of data, it first searches from L1-L2-L3- main memory in turn, and each cpu has and only one set (Own cache) How to ensure that when multiple processor operations involve the same memory area, there will be cache consistency problems in multi-threaded scenarios, so data consistency is guaranteed at runtime? Guaranteed by the CPU-each processor must follow the consistency protocol guarantee (such as MSI, MESI).

Memory Barrier:

The high-speed cache in the CPU improves the data access performance and avoids requesting from the memory every time, but it cannot exchange information with the memory in real time. Different threads executed by multiple CPUs have different cache values ​​for the same variable. Rely on the memory barrier to ensure that the hardware layer memory barrier is divided into two types: Load Barrier (read barrier), Store Barrier (write barrier). The memory barrier is at the hardware level. Because different hardware implements memory barriers in different ways, java shields these differences and generates memory barrier instructions through jvm. For read memory barriers: insert a read barrier before the instruction to invalidate the data in the cache and force access from the main memory .

The function of the memory barrier: 1. Prevent the rearrangement of instructions on both sides of the barrier; 2. Force the write buffer and dirty data in the cache to be written back to the main memory to invalidate the corresponding data in the cache. (The volatile keyword uses a memory barrier. When a volatile-modified variable is written, a special assembly instruction will be generated. This instruction will trigger the mesi protocol. There will be a bus sniffing mechanism. Simply put, this The cpu will constantly detect the change of the variable in the bus. If the variable changes once, due to this sniffing mechanism, other cpus will immediately clear the cpu cache data of the variable and get the data from the main memory again)

JAVA memory area

        The above are all based on the HotSpot virtual machine. The setting of the java memory model conforms to the above-mentioned computer specifications. The memory allocation of the Java program is completed under the JVM virtual machine memory allocation mechanism. The Java Memory Model (JMM) is a kind of memory model that meets the specifications of the memory model, shields the access differences of various hardware and operating systems, and ensures that Java programs can access memory under various platforms to ensure consistent results. The mechanism and specification of In brief, JMM is a specification of JVM, which defines the memory model of JVM. It shields the access differences of various hardware and operating systems. It is not directly accessible to hardware memory like c. It is relatively safer. Its main purpose is to solve the inconsistency and compilation of local memory data due to multi-threaded communication through shared memory. The processor will reorder the code instructions and the processor will execute the code out of order. It can ensure atomicity, visibility and order in concurrent programming scenarios.

        The JAVA data area is divided into five data areas-heap, local method stack, virtual machine stack, program counter, method area

Program counter:

        The program counter is a small memory space, which is mainly used to record the address of the bytecode executed by each thread. For example, branches, loops, jumps, exceptions, thread recovery, etc. all rely on the counter. Since Java is a multi-threaded language, when the number of threads executed exceeds the number of CPU cores, the threads will poll for CPU resources based on time slices. If the time slice of a thread runs out, or other reasons cause the CPU resources of this thread to be preempted, then the exiting thread needs a separate program counter to record the next running instruction. If you encounter a native method (native method), this method is not specifically executed by the JVM, so the program counter does not need to be recorded. This is because there is also a program counter at the operating system level, which will record the address of the execution of the local code, so When the native method is executed, the value of the program counter in the JVM is undefined. In addition, the program counter is also the only memory area in the JVM that does not OOM (OutOfMemory).

JAVA virtual machine stack:

Data structure: FILO data structure

Role: Store the data, instructions, and return addresses required by the current thread running method during the JVM running process 

Thread-based: run in a threaded manner. In the life cycle of a thread, the data involved in the calculation will be frequently pushed into and out of the stack. The life cycle of the stack is the same as that of a thread. The default size of the virtual machine stack is 1M, which can be adjusted by the parameter -Xss, for example -Xss256k.

        The thread is private, and the stack describes the memory model of java method execution. When each method is executed, a stack frame is created to store the local variable table, operation stack, dynamic link, method exit and other information. The process in which each method is called corresponds to the process of a stack frame from pushing to popping in the virtual machine stack.

  1.     Stack frame: A data structure used to store data and partial process results.
  2.     The location of the stack frame: memory -> runtime data area -> a thread corresponds to the virtual machine stack -> stack frame
  3.     The size and determination time of the stack frame: determined during compilation and not affected by running data

The stack usually refers to the part of the local variable table. The local variable table is a continuous memory space that stores method parameters, local variables defined in the method, data types known during compilation (eight basic types and reference types), and return addresses.

It should be noted that the memory space required by the local variable table is allocated at compile time. When entering a method, how much local variable space this method needs to allocate in the stack is completely determined, and the local variable will not be changed during the running of the method. Variable table size. Two types of exceptions that may appear in the Java virtual machine stack:

  1. StackOverflowError: This exception will be thrown if the stack depth requested by the thread is greater than the stack depth allowed by the virtual machine
  2. OutOfMemoryError: The virtual machine stack space can be dynamically expanded. This exception will be thrown when the dynamic expansion cannot apply for enough space

 

Local method stack:

        The function of the local method stack is similar to that of the Java virtual machine stack. The Java virtual machine stack is used to manage the calls of Java functions, and the local method stack is used to manage the calls of local methods. But the native method is not implemented in Java, but implemented in C language (such as Object.hashcode method). The native method stack is an area very similar to the virtual machine stack, and the objects it serves are native methods. You can even think of the virtual machine stack and the local method stack as the same area. There is no mandatory requirement in the virtual machine specification. Each version of the virtual machine can be implemented freely. HotSpot directly combines the local method stack and the virtual machine stack into one.

heap:

        The heap is the largest memory area on the JVM. Almost all the objects we apply for are stored here. In the garbage collection we often say, the object of operation is the heap. The heap space is generally applied for when the program is started, but not all of it will be used. The heap is generally set to be scalable. With the frequent creation of objects, more and more heap space is occupied, and objects that are no longer used need to be recycled from time to time. This is called GC (Garbage Collection) in Java. When that object is created, is it allocated on the heap or on the stack? This is related to two aspects: the type of the object and the location in the Java class. Java objects can be divided into basic data types and ordinary objects. For ordinary objects, the JVM will first create the object on the heap, and then use its reference elsewhere. For example, save this reference in the local variable table of the virtual machine stack. For basic data types (byte, short, int, long, float, double, char), there are two cases. When you declare an object of basic data type in the method body, it will be allocated directly on the stack. In other cases, it is allocated on the heap.

Heap size parameters:
-Xms: the minimum value of the heap;
-Xmx: the maximum value of the heap;
-Xmn: the size of the new generation;
-XX:NewSize; the minimum value of the new generation;
-XX:MaxNewSize: the maximum value of the new generation;
for example- Xmx256m

Method area:

        The method area, like the heap, is a memory area shared by all threads. In order to distinguish the heap, it is also called non-heap. Used to store class information, constants, and static variables that have been loaded by the virtual machine. For example, static modified variables are loaded into the method area when the class is loaded.

The runtime constant pool is part of the method area. In addition to the description information of the class's fields, interfaces, methods, etc., the class file also has a constant pool for storing various literals and symbol references generated during compilation. In the old version of jdk, the method area is also called the permanent generation, because there is no mandatory requirement that the method area must be garbage collected. The HotSpot virtual machine implements the method area with permanent generation. The garbage collector of the JVM can manage this area like the heap area. , So there is no need to specifically design a garbage collection mechanism for this part. Since JDK7, the Hotspot virtual machine has removed the runtime constant pool from the permanent generation. jdk8 really began to abandon the permanent generation, and use metaspace. The java virtual machine is relatively loose on the method area. In addition to the heap, there is no continuous memory space, defined space and expandable space, and you can also choose not to implement garbage collection. .

Why does Java8 use metaspace instead of permanent generation, and what are the benefits of doing so?
        The official explanation is: the removal of the permanent generation is an effort to integrate HotSpot JVM and JRockit VM. Because JRockit does not have a permanent generation, there is no need to configure a permanent generation. Permanent generation memory is often insufficient or memory overflow occurs, throwing the exception java.lang.OutOfMemoryError: PermGen (permanent generation). This is because in the JDK1.7 version, the size of the specified PermGen area is 8M. Since the metadata information of the classes in PermGen may be collected at each FullGC, the recovery rate is low, and the results are difficult to satisfy; It is also difficult to determine how much space to allocate for PermGen. The size of PermSize depends on many factors, such as the total number of classes loaded by the JVM, the size of the constant pool, and the size of the method.

JDK1.8 constant pool analysis:

  1. Class constant pool: birth time: compile time; location: heap (class constant pool is stored in the Class file, a Class file corresponds to a constant pool); storage content: symbolic references and literals
  2. String constant pool: birth time: compile time; location: heap; storage content: references and string constants of string objects in the heap
  3. Runtime constant pool: birth time: when the class is loaded into memory; area: local memory (the data of the constant pool after each class is loaded is summarized into the runtime constant pool, and the runtime constant pool is stored in the metaspace) ; Storage content: class file meta-information description, compiled code data, reference type data (after the class is parsed, symbol references will be replaced with direct references. The parsing process will query the string constant pool to ensure the runtime constant pool The quoted string is consistent with the quoted string in the global string pool).

Direct memory (out-of-heap memory):

Direct memory has a more scientific name, off-heap memory. When the JVM is running, it will apply for a large block of heap memory from the operating system to store data; there are also virtual machine stacks, local method stacks, and program counters, which are called stack areas. The remaining memory of the operating system is the off-heap memory. It is not part of the virtual machine runtime data area, nor is it the memory area defined in the Java virtual machine specification; if NIO is used, this area will be frequently used, and it can be directly referenced and manipulated by directByteBuffer objects in the java heap; this Block memory is not limited by the Java heap size, but is limited by the total memory of the machine, which can be set by -XX:MaxDirectMemorySize (the default is the same as the maximum heap memory), so OOM exceptions may also occur.

 

The memory layout of the object:

In HotSpot, the memory storage layout of objects is divided into: 1. Object header; 2. Sample data; 3. Alignment and padding

//EVERYTHING

 

Introduction to gc

GC (Garbage Collection): Garbage collection is mainly used to reclaim and release the space occupied by garbage. JAVA GC generally refers to Java garbage collection.

Which areas of garbage need to be recycled? When will it be recycled? How to recycle?

What memory needs to be reclaimed? The five major areas in the java memory model have been understood. The program counter, virtual machine stack, and local method stack are born from threads and die with threads. The stack frames in the stack are pushed into the stack as the method enters the order. As with popping operations, how much memory a stack frame needs to allocate depends on the specific virtual machine implementation and is determined during compilation [Ignore the optimization done by the JIT compiler, which is basically known during compilation], when the method or thread is executed , The memory will be reclaimed, so there is no need to care. The Java heap and method area are different. The method area stores the class loading information, but the memory required by multiple implementation classes in an interface may be different, and the memory required by multiple branches in a method may also be different [only during the runtime can you know which methods this method creates The object does not need much memory], the allocation and recovery of this part of memory are dynamic, and gc is also concerned with this part of memory.

The recovery area of ​​the heap?  

  1. Young Generation (Young Generation) NewSize and MaxNewSize can control the initial size and maximum size of the young generation respectively
  2. Old Generation
  3. Permanent Generation [Using meta space after 1.8, it will not be in the heap]

Algorithm to determine whether the object is alive?

1. In the
early days of the reference counting algorithm , this algorithm is used to judge whether the object is alive or not. This algorithm is very simple. In simple terms, it is to add a reference counter to the object. When the object is referenced, it will increase by 1, and the reference will be invalid Minus 1. When it is 0, it is judged that the object will not be referenced again.
Advantages: Simple implementation and high efficiency, widely used in game scripting languages ​​such as python.
Disadvantages: It is difficult to solve the problem of circular references, that is, if two objects refer to each other and will no longer be referenced by other other objects, it will not be able to be recycled if it is not always 0.

2. Reachability analysis algorithm The
current mainstream commercial languages ​​[such as java, c#] use the reachability analysis algorithm to determine whether the object is alive. This algorithm effectively solves the drawbacks of recycling. Its basic idea is to use an object called "GC Roots" as the starting point, and the search path is called a reference chain. When an object is connected to GC Roots without any reference to it, the object is proved to be unusable. There are four types of objects as GC Roots

  1. The referenced object in the virtual machine stack (local variable table in the stack frame).
  2. The object referenced by the static properties of the class in the method area generally refers to the object referenced by the static modification, and is loaded into the memory when the class is loaded.
  3. The object referenced by the constant in the method area,
  4. Objects referenced by JNI (native method) in the native method stack

Garbage collection algorithm

  • Mark/clear algorithm [basic]
  • Copy algorithm
  • Marking/Organizing Algorithm

JVM uses `generational collection algorithm` to use different recovery algorithms for different areas.

The Cenozoic adopts a replication algorithm : In the Cenozoic, objects are "live and die", with a death rate of 98%, which is suitable for the replication algorithm [the replication algorithm is more suitable for memory areas with low survival rates]. It optimizes the efficiency of the mark/sweep algorithm and the problem of memory fragmentation, and the JVM does not allocate memory at 5:5 [Due to the low survival rate, there is no need to copy and reserve such a large area, which causes a waste of space, so there is no need to press 1:1 [Original area: reserved space] divide the memory area, but divide the memory into a piece of Eden space and From Survivor, To Survivor [reserved space], the default ratio of the three is 8:1:1, the Eden area is used first, if Eden When the area is full, copy the object to the second memory area. However, there is no guarantee that there will be no more than 10% of the object inventory for each collection, so if the Survivor area is not enough, it will rely on the old age inventory for allocation]. The recycling process of the new generation is: [To Survivor] is kept empty before GC, and objects are stored in Eden and [From Survivor]. When GC is running, the surviving objects in Eden are copied to [To Survivor]. For surviving objects in [From Survivor], the age of the object will be considered. If the age does not reach the threshold (tenuring threshold, default 15), the object will be copied to [To Survivor]. If the threshold is reached, the object is copied to the old generation. After the copy phase is completed, only dead objects are saved in Eden and [From Survivor], which can be regarded as empty. If [To Survivor] is filled during the copy process, the remaining objects will be copied to the old generation. Finally, [From Survivor] and [To Survivor] will swap the names. In the next GC, [To Survivor] will become [From Survivor].

The old age adopts [Mark Clear] and [Mark Sort]. Since the old age has a high survival rate and there is no extra space to guarantee him, these two algorithms must be used.

Garbage collector:

If the garbage collection algorithm is the methodology of memory recovery, then the garbage collector is the specific implementation. JVM will use different collectors in combination with different scenarios and user configurations.

  • Young generation collectors: Serial, ParNew, Parallel Scavenge
  • Old age collector: Serial Old, Parallel Old, CMS
  • Special collector: G1 collector (not in the category of young and old)

Cenozoic collector:

Serial: The most basic and longest-developed collector. Before jdk3, it was the only choice for gc collectors. Serial is a single-threaded collector. Only one thread can be used for collection. Other threads must be stopped during collection. Wait for the collection work to complete before other threads can continue to work.

ParNew: Serial upgraded version, supports multi-threading (GC thread), when working, it is the same as Serial Stop the word, HotSpot's first true concurrency collector, the number of threads enabled by default is the same as the number of CPUs (new generation in Server mode) The collector is the first choice, because only it and Serial can be used with CMS in the new generation of collectors)

Parallel Scavenge: Adopts a replication algorithm, supports multi-threading, and focuses on throughput (throughput = code running time / (code running time + garbage collection time)). For example, if the code runs for 99 minutes and the garbage collection is 1 minute, the throughput = 99%. Suitable for scenes with short pause time

Old generation collector:

Serial Old: Single-threaded, the old version of Serial, but it uses the mark-organize algorithm and also requires STW

Parallel Old: Supports multi-threading, the old version of Parallel Scavenge, jdk1.6 appeared, mark-sorting algorithm, the old age collectors mostly use this algorithm

CMS: (Concurrent Mark Sweep) is a collector that aims to obtain the shortest recovery pause time (emphasis on response, called a concurrent low-pause collector by sun), mark-sweep algorithm, and supports concurrency. The recycling process is as follows

  1. Initial mark: mark the objects that GC Roots can directly associate with, single-thread mark, fast speed, STW
  2. Concurrent marking: GC Roots Tarcing process, reachability analysis
  3. Re-marking: In order to correct the marking record of the part of the object that is changed due to the operation of the user program during the concurrent marking, there will be a little pause. Generally, the initial marking in time <Re-marking <Concurrent marking, STW
  4. Concurrent cleanup

CMS large volume problem:

  • The memory fragmentation is serious. Once a lot of fragmentation occurs in the old generation and objects from the young generation cannot find space, they will use Serial Old to mark and organize
  • Floating garbage cannot be processed. Floating garbage is new garbage generated when CMS collects garbage. At this time, CMS cannot process these garbage and needs to be processed next time.

G1 collector: (garbage first-collect as much garbage as possible to avoid Full GC), one of the most cutting-edge collectors, after 1.7, focus on low latency, replace cms function, and solve a series of problems such as space fragmentation caused by cms . (From Oracle: A low-pause, server-style generational garbage collector for Java HotSpot VM. G1 GC uses concurrent and parallel stages to achieve its target pause time and maintain good throughput. When G1 GC determines that it is necessary During garbage collection, it will first collect the area with the least survival data (garbage first). The special feature of g1 is that it strengthens the partition and weakens the concept of generation. It is a regionalized and incremental collector, which is not a new birth. Generation does not belong to the old generation collector. The algorithm used is mark-clean, copy algorithm)

JDK1.7-1.8 is disabled by default, and the enable option is -XX:+UseG1GC

g1 is regionalized, it divides the Java heap memory into several regions of the same size [region], jvm can set the size of each region (1-32m, the size depends on the heap memory size, must be a power of 2), It will allocate a reasonable region size based on the current heap memory. g1 searches for surviving objects in the old generation through the concurrent (parallel) marking phase, and compresses surviving objects through parallel replication [this saves continuous space for large objects]. g1 copies the surviving objects in one or more groups of areas to different areas in an incremental and parallel manner for compression, thereby reducing heap fragmentation. The goal is to reclaim as much heap space as possible [garbage first], and as far as possible not to exceed the suspension target. To achieve the purpose of low latency. g1 provides three garbage collection modes: young gc, mixed gc and full gc. Unlike other collectors, it can collect objects in the new generation and the old generation based on region rather than generation.

 

Minor GC: Garbage collection in the young generation (including Eden area and Suriver area) is called Minor GC, and only the young generation is cleaned up

Major GC: Clean up the old generation (old GC), but it can also usually be referred to as the equivalent of Full GC, because collecting the old generation is often accompanied by upgrading the young generation and collecting the entire Java heap.

Full GC: It is a unified collection of the new generation, old generation, permanent generation [jdk1.7], and meta space [jdk1.8].

mixed GC: specific to G1, mixed GC, collecting the entire young gen and part of the old gen GC. Only G1 has this mode

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Guess you like

Origin blog.csdn.net/csdnbeyoung/article/details/113144174
Recommended