[JAVA Knowledge Daily Question]: What is the difference between JDK and JRE?

foreword

JVM is the abbreviation of Java Virtual Machine (Java Virtual Machine). JVM is a specification for computing devices. It is a fictional computer that is realized by simulating various computer functions on an actual computer. The Java virtual machine includes a bytecode instruction set, a set of registers, a stack, a garbage collection heap, and a storage method field.
The JVM shields the information related to the specific operating system platform, so that the Java program only needs to generate the object code (byte code) running on the Java virtual machine, and it can run on various platforms without modification. When the JVM executes the bytecode, it actually interprets the bytecode as machine instructions on a specific platform for execution.

What is the relationship between JRE/JDK/JVM

JRE (JavaRuntimeEnvironment, Java Runtime Environment) , which is the Java platform. All Java programs must be run under the JRE. Ordinary users only need to run the developed java program and install JRE.

**JDK (Java Development Kit)** is a development kit used by program developers to compile and debug java programs. The tools of the JDK are also Java programs, which also require the JRE to run. In order to maintain the independence and integrity of the JDK, the JRE is also a part of the installation during the installation of the JDK. Therefore, there is a directory named jre under the JDK installation directory, which is used to store JRE files.

**JVM (JavaVirtualMachine, Java Virtual Machine)** is part of JRE. It is a fictitious computer, which is realized by simulating various computer functions on an actual computer. JVM has its own complete hardware architecture, such as processor, stack, registers, etc., and also has a corresponding instruction system. The most important feature of the Java language is that it runs across platforms. The JVM is used to support operating system-independent, cross-platform.

Java study notes shared address: JVM tuning and actual combat more than 400 pages of study notes

JVM principle

The JVM is the core and foundation of java, a virtual processor between the java compiler and the os platform. It is an abstract computer realized by software method based on the underlying operating system and hardware platform, and can execute java bytecode programs on it.

As long as the java compiler is oriented to the JVM, it generates code or bytecode files that the JVM can understand. Java source files are compiled into bytecode programs, and each instruction is translated into machine code for different platforms by the JVM, and runs on a specific platform.


JVM architecture

Class Loader (ClassLoader) (used to load .class files)

Execution engine (execute bytecode, or execute native methods)

Runtime data area (method area, heap, java stack, PC register, native method stack)


JVM runtime data area

Block 1: PC Register

The PC register is used to store the JVM instructions that each thread will execute next. If the method is native, no information is stored in the PC register.

Second block: JVM stack

The JVM stack is private to the thread. When each thread is created, a JVM stack is created. The JVM stack stores the local basic type variables in the current thread (eight basic types defined in java: boolean, char, byte, short, int, long, float, double), partial return results and Stack Frame, objects of non-basic types only store an address pointing to the heap on the JVM stack.

The third block: Heap

It is the area used by the JVM to store object instances and array values. It can be considered that the memory of all objects created by new in Java is allocated here, and the memory of the objects in the Heap needs to wait for the GC to be reclaimed.

(1) The heap is shared by all threads in the JVM, so allocating object memory on it needs to be locked, which also leads to a relatively large overhead for new objects

(2) In order to improve the efficiency of object memory allocation, Sun Hotspot JVM allocates an independent space TLAB (Thread Local Allocation Buffer) for the created thread, the size of which is calculated by JVM according to the running situation, and allocates objects on TLAB There is no need to lock, so JVM will try to allocate memory on TLAB when allocating memory to thread objects. In this case, the performance of allocating object memory in JVM is basically as efficient as C, but if the object is too large is still directly allocated using heap space

(3) TLAB only acts on the Eden Space of the new generation, so when writing Java programs, it is usually more efficient to allocate multiple small objects than large objects.

(4) All newly created Objects will be stored in the new generation Yong Generation. If Young Generation data survives one or more GCs, it will be transferred to OldGeneration. New Objects are always created in Eden Space.

Block 4: Method Area

(1) In Sun JDK, this area corresponds to PermanentGeneration, also known as persistent generation.

(2) The method area stores the loaded class information (names, modifiers, etc.), static variables in the class, constants defined as final types in the class, Field information in the class, and method information in the class. When personnel obtain information through the methods such as getName and isInterface in the Class object in the program, these data all come from the method area, and the method area is also globally shared. Under certain conditions, it will also be GC. When the method area needs An OutOfMemory error message is thrown when the memory used exceeds its allowable size.

Block 5: Runtime Constant Pool

The fixed constant information, method and Field reference information in the class are stored, and its space is allocated from the method area.

Block 6: Native Method Stacks

The JVM uses the native method stack to support the execution of native methods. This area is used to store the state of each native method invocation.


The algorithm for determining the "dead" of an object

Since the program counter, Java virtual machine stack, and native method stack are all exclusive to the thread, the memory occupied by them is also born with the thread and reclaimed when the thread ends. The Java heap and method area are different, thread sharing is the part of GC concerned.

Almost all objects exist in the heap. Before GC, you need to consider which objects are still alive and cannot be recycled, and which objects are dead and can be recycled.

There are two algorithms to determine if an object is alive:

1.) Reference counting algorithm: Add a reference counter to the object. Whenever the object is applied in one place, the counter is incremented by 1; when the reference is invalid, the counter is decremented by 1; when the counter is 0, it means that the object is dead and recyclable. But it's hard to solve the case of mutual circular reference between two objects.

2.) Reachability analysis algorithm: Through a series of objects called "GC Roots" as the starting point, the search starts from these nodes, and the path traversed by the search is called the reference chain. When an object does not have any GC Roots If the reference chain is connected (that is, the object is unreachable to the GC Roots), it proves that the object is dead and recyclable. Objects that can be used as GC Roots in Java include: objects referenced in the virtual machine stack, objects referenced by Native methods in the local method stack, objects referenced by static attributes in the method area, and objects referenced by constants in the method area.

In the mainstream implementation of mainstream commercial programming languages ​​(such as our Java), the reachability analysis algorithm is used to determine whether the object is alive.


JVM garbage collection

The basic principle of GC (Garbage Collection): to recycle objects that are no longer used in memory. The method used for recycling in GC is called collector. Since GC needs to consume some resources and time, Java is in the life cycle characteristics of objects. After the analysis, objects are collected according to the new generation and the old generation to shorten the pause caused by GC to the application as much as possible.

(1) The collection of objects in the new generation is called minor GC;

(2) The collection of objects in the old generation is called Full GC;

(3) The GC enforced by actively calling System.gc() in the program is Full GC.

Different object reference types are collected by GC in different ways. JVM object references are divided into four types:

(1) Strong reference: By default, the object uses a strong reference (the instance of this object has no other object references, and it will only be recycled during GC)

(2) Soft reference: Soft reference is an application provided in Java that is more suitable for caching scenarios (it will be GC only when the memory is not enough)

(3) Weak reference: it will be recycled by GC during GC

(4) Virtual reference: Since virtual reference is only used to know whether the object is GC


Garbage Collection Algorithms

1. Mark-Sweep Algorithm

The most basic algorithm is divided into two stages: marking and clearing: first mark the objects that need to be recovered, and after the marking is completed, all marked objects are uniformly recovered.

It has two shortcomings: one is an efficiency problem, the marking and clearing processes are inefficient; one is a space problem, after the marking and clearing, a large number of discontinuous memory fragments (similar to the disk fragmentation of our computer) will be generated, and too much space fragmentation will lead to When a large object needs to be allocated, it cannot find enough contiguous memory and has to trigger another garbage collection early.


replication algorithm

In order to solve the problem of efficiency, the "copy" algorithm appeared, which divides the available memory into two equal-sized blocks according to their capacity, and only needs to use one of them at a time. When a block of memory is used up, copy the surviving objects to another block, and then clean up the memory space just used up. This solves the memory fragmentation problem, but the cost is that the available content is reduced by half.


mark-collate algorithm

The replication algorithm will perform frequent replication operations when the object survival rate is high, and the efficiency will be reduced. Therefore, there is a mark-cleaning algorithm. The marking process is the same as the mark-cleaning algorithm, but in the subsequent steps, instead of directly cleaning up the objects, all surviving objects are moved to one side, and then the memory outside the end boundary is directly cleaned up. .


Generational Collection Algorithm

The current GC of commercial virtual machines adopts the generational collection algorithm. This algorithm does not have any new ideas. Instead, the heap is divided into: the new generation and the old generation according to the different object life cycles. The method area is called the permanent generation ( In the new version, the permanent generation has been abandoned, and the concept of metaspace has been introduced. The permanent generation uses JVM memory and the metaspace directly uses physical memory).

In this way, different collection algorithms can be used according to the characteristics of each age.

Objects in the new generation "live and die". Every time a GC occurs, a large number of objects will die, and a small number will survive. The replication algorithm is used. The new generation is divided into Eden area and Survivor area (Survivor from, Survivor to), and the size ratio is 8:1:1 by default.

Objects in the old generation use mark-sweep or mark-sort algorithms because they have high object survival rates and no extra space for allocation guarantees.

The newly generated objects enter the Eden area first. When the Eden area is full, the Survivor from is used. When the Survivor from is also full, the Minor GC (new generation GC) is performed, and the surviving objects in the Eden and Survivor from are copied into the Survivor to , and then empty Eden and Survivor from. At this time, the original Survivor from becomes the new Survivor to, and the original Survivor to becomes the new Survivor from. When copying, if the Survivor to cannot accommodate all the surviving objects, the object will be copied into the old age according to the distribution guarantee of the old age (similar to the loan guarantee of the bank), and if the old age cannot be accommodated, the Full GC (old age) will be performed. GC).


Large objects enter the old age directly: there is a parameter configuration in the JVM

-XX:PretenureSizeThreshold, so that objects larger than this setting value directly enter the old age, the purpose is to avoid a large number of memory copies between the Eden and Survivor areas.

Long-lived objects enter the old age: JVM defines an object age counter for each object. If the object is still alive after Eden is born and after the first Minor GC, and can be accommodated by the Survivor, it will be moved into the Survivor and the age is set to 1. If he has not survived a Minor GC, the age will be increased by 1. When his age reaches a certain level (the default is 15 years old, which can be set by XX:MaxTenuringThreshold), it will move to the old age. However, the JVM does not always require that the age must reach the maximum age to be promoted to the old generation. If the sum of the size of all objects of the same age in the Survivor space (such as age x) is greater than half of the Survivor, all objects with an age greater than or equal to x enter the old generation directly. , without waiting for the maximum age requirement.


garbage collector

The garbage collection algorithm is the methodology, and the garbage collector is the concrete implementation. The JVM specification does not have any regulations on how the garbage collector should be implemented, so the garbage collectors provided by different manufacturers and different versions of virtual machines are quite different. Here we only look at the HotSpot virtual machine.

After JDK7/8, all collectors and combinations (connections) of the HotSpot virtual machine are as follows:


Serial collector

The Serial collector is the most basic and oldest collector and was once the only choice for the new generation of mobile phones. It is single-threaded and only uses one CPU or one collection thread to complete the garbage collection work, and when it is collecting, it must suspend all other worker threads until it ends, that is, "Stop the World". Stopping all user threads is unacceptable for many applications. For example, when you are doing something and you are forced to stop by others, can you count the "alpacas" that are racing through your heart?

Nonetheless, it is still the default young generation collector for VMs running in client mode: simple and efficient (compared to a single thread of other collectors, since there is no thread switching overhead, etc.).

Working diagram:


ParNew collector

The ParNew collector is a multi-threaded version of the Serial collector. Except for the use of multiple threads, other behaviors (collection algorithm, stop the world, object allocation rules, recycling strategies, etc.) are the same as the Serial collector.

It is the preferred new generation collector for many JVMs running in Server mode. One of the important reasons is that, except for Serial, only it can work with the old generation CMS collector.

Working diagram:


Parallel Scavenge Collector

Young generation collector, parallel multi-threaded collector. Its goal is to achieve a controllable throughput (that is, the ratio of the time the CPU runs user code to the total CPU consumption time, that is, throughput = time for lines of user code/time for lines of user code + garbage collection time), which can Efficient use of CPU time to complete the program's computing tasks as soon as possible, suitable for tasks that do not require too much interaction in the background.

Serial Old collector

Older version of the Serial collector, single-threaded, "marked and sorted" algorithm, mainly for use by virtual machines in Client mode.

In addition, in Server mode:

Use with Parallel Scavenge collector in versions prior to JDK 1.5

It can be used as a back-end solution for CMS, and it is used when Concurrent Mode Failure occurs in CMS.

Working diagram:


Parallel Old collector

An older version of Parallel Scavenge, a multi-threaded, "mark-sorting" algorithm that only appeared in JDK 1.6. Before that, Parallel Scavenge could only be used with Serial Old. Due to the poor performance of Serial Old, the advantages of Parallel Scavenge could not be exerted, which is embarrassing~~

With the advent of the Parallel Old collector, the "throughput first" collector finally has a veritable combination. The Parallel Scavenge/Parallel Old combination can be used in both throughput and CPU-sensitive situations. The working diagram of the combination is as follows:


CMS collector

The CMS (Concurrent Mark Sweep) collector is a collector that aims to obtain the shortest recovery pause time. The pause time is short and the user experience is good.

Based on the "mark-sweep" algorithm, concurrent collection, low pause, complex operation process, divided into 4 steps:

_1) Initial marking: _Only mark objects that GC Roots can directly associate with, which is fast, but requires "Stop The World"

_2) Concurrency marking: _ is the process of tracking the reference chain, which can be executed concurrently with the user thread.

_3) Re-marking: _Fix the marking record of the part of the object whose marking changes due to the user thread continuing to run in the concurrent marking phase, which is longer than the initial marking time but much shorter than the concurrent marking time, and requires "Stop The World"

_4) ​​Concurrent clearing: _ clearing objects marked as recyclable and can be executed concurrently with user threads

Since the concurrent marking and concurrent clearing, which take the longest time in the whole process, can work together with the user thread, in general, the memory recovery process of the CMS collector and the user thread are executed concurrently.

Working diagram:

The CSM collector has 3 disadvantages:

1) Very sensitive to CPU resources

Although concurrent collection does not suspend user threads, it will still cause the application to slow down and reduce the overall throughput because it occupies part of the CPU resources.

The default number of collection threads of CMS is = (number of CPUs + 3)/4; when the number of CPUs is more than 4, the collection thread occupies more than 25% of the CPU resources, which may have a greater impact on user programs; when there are less than 4, the impact larger, may not be acceptable.

2) The floating garbage cannot be processed (the garbage newly generated by the user thread is called floating garbage during concurrent clearing), and a "Concurrent Mode Failure" failure may occur.

A certain amount of memory space needs to be reserved for concurrent clearing, and it cannot be collected when it is almost filled up in the old age like other collectors; if the memory space reserved by the CMS cannot meet the needs of the program, a "Concurrent Mode Failure" failure will occur; at this time JVM enable backup plan: temporarily enable Serail Old collector, which leads to another Full GC generation;

**3) A lot of memory fragments are generated: **CMS is based on the "mark-clean" algorithm, and a large number of discontinuous memory fragments are generated without the compression operation after clearing, which will lead to the inability to find enough continuous memory when allocating large memory objects. Therefore, another Full GC action needs to be triggered in advance.


G1 collector

G1 (Garbage-First) is a commercial collector that was officially launched in JDK7-u4. G1 is a garbage collector for server-side applications. Its mission is to replace the CMS collector in the future.

G1 Collector Features:

Parallelism and Concurrency: It can make full use of the hardware advantages of multi-CPU and multi-core environments, and shorten the pause time; it can execute concurrently with user threads.

**Generational collection: **G1 can independently manage the entire heap without the cooperation of other GC collectors, and uses different methods to deal with new objects and objects that have survived for a period of time.

**Space integration: **The mark sorting algorithm is used as a whole, and the copy algorithm (between two regions) is used locally. There will be no memory fragmentation, and GC will not be triggered in advance because large objects cannot find enough continuous space. , which is better than the CMS collector.

**Predictable pauses:** In addition to the pursuit of low pauses, a predictable pause time model can also be established, allowing users to explicitly specify that in a time segment of length M milliseconds, the time spent on garbage collection does not exceed N milliseconds, which is better than the CMS collector.

Why a predictable pause?

This is because it can be planned to avoid a full-area garbage collection on the entire Java heap.

The G1 collector divides memory into independent regions (Regions) of equal size, and the concepts of the new generation and the old generation are retained, but they are no longer physically isolated.

G1 tracks each Region to obtain its collection value, and maintains a priority list in the background;

Each time according to the allowed collection time, the Region with the largest value will be recycled first (the origin of the name Garbage-First);

This ensures that the highest possible collection efficiency can be achieved within a limited time.

What if the object is referenced by an object of another Region?

When judging object survival, do we need to scan the entire Java heap to ensure accuracy? In other generational collectors, there is also such a problem (and G1 is more prominent): Do you have to scan the old generation when the young generation is collected? Regardless of G1 or other generational collectors, the JVM uses the Remembered Set to avoid global scanning: each Region has a corresponding Remembered Set; each time a Reference type data is written, a Write Barrier will be generated to temporarily interrupt the operation; then Check whether the object pointed to by the reference to be written is in a different Region from the reference type data (other collectors: check whether the old generation object refers to the new generation object); if different, record the relevant reference information to the reference pointed object through CardTable In the Remembered Set corresponding to the Region where it is located;
when garbage collection is performed, adding the Remembered Set to the enumeration range of the GC root node can ensure that no global scan is performed, and there will be no omissions.


Not counting the operation of maintaining the Remembered Set, the recycling process can be divided into 4 steps (similar to CMS):

1) Initial mark: Only mark the objects that GC Roots can directly associate with, and modify the value of TAMS (Next Top at Mark Start), so that when the user program runs concurrently in the next stage, new objects can be created in the correct and available Region. "Stop The World"

2) Concurrent marking: Start reachability analysis from GC Roots to find out surviving objects, which takes a long time and can be executed concurrently with user threads

3) Final mark: Correct the mark record of the part of the object whose mark is changed because the user thread continues to run in the concurrent mark phase. During concurrent marking, the virtual machine records the object changes in the thread Remember Set Logs, and the final marking stage integrates the Remember Set Logs into the Remember Set, which is longer than the initial marking time but much shorter than the concurrent marking time, requiring "Stop The World"

4) Screening and recycling: First, sort the recycling value and cost of each Region, then customize the recycling plan according to the GC pause time expected by the user, and finally recycle some garbage objects in regions with high value according to the plan. A replication algorithm is used during recycling, copying live objects from one or more regions to another empty region on the heap, and compressing and releasing memory in the process; it can be performed concurrently, reducing pause time and increasing throughput.

Working diagram:


basic structure

From the logical structure of the Java platform, we can understand the JVM from the following figure:


From the above figure, you can clearly see the various logical modules contained in the Java platform, and you can also understand the difference between JDK and JRE.

finally

I have compiled a document related to the java (jvm) virtual machine here: Spring series family buckets, Java systematic information: (including Java core knowledge points, interview topics and the latest 21 years of Internet real questions, e-books, etc. ) Friends in need can follow the official account [Program Yuan Xiaowan] to get it.

Guess you like

Origin blog.csdn.net/m0_48795607/article/details/120086754