Java technology finishing (1) - JVM articles

1. What is JVM?

The JVM is a virtual computer that can run Java code, including a set of bytecode instructions, a set of registers, a stack, a garbage collector, a heap, and a storage stack. The JVM runs on top of the operating system and does not directly interact with the operating system.


2. Operation process:

We all know that Java source files, through compilers, can generate corresponding .class files, that is, bytecode files, and bytecode files are compiled into machine code on a specific machine through the interpreter in the Java virtual machine.

As follows: Java source file - compiler - bytecode file - JVM - machine code

The interpreter of each platform is different, but the virtual machine implemented is the same, which is the Java cross-platform mechanism.

  • When multiple programs start, there will be multiple virtual machine instances, which will be created and destroyed with the start and completion of the program
  • Data cannot be shared between multiple virtual machine instances

2.1 Threads

The thread here refers to a thread entity in the process of program execution, and the Java thread in the Hotspot JVM has a direct mapping relationship with the operating system thread.

① When the thread local storage, buffer pool allocation, synchronization object, stack, program counter, etc. are ready, create an operating system thread, which changes with the life cycle of the Java thread, and the OS is responsible for scheduling all threads and allocating CPU.
② The OS thread completes initialization and calls run()the method of the Java thread
. ③ When the Java thread ends, release the bound OS thread and all resources of the thread.

The main system thread running in the background of the Hotspot JVM

  • Virtual Machine Thread: Waiting for the JVM to reach the safe point operation occurs, these operations must be performed in a separate thread, because the JVM needs to reach the safe point when the heap is modified. The operations mainly include stop-the-world garbage collection , thread stack dump , thread suspension , and thread bias lock release .
  • Periodic task thread: responsible for timer tasks (that is, interrupts), used to schedule the execution of periodic operations.
  • GC thread: Supports various GC activities of the JVM.
  • Compiler thread: Supports dynamic compilation of bytecode into native platform-dependent machine code at runtime.
  • Signal distribution thread: supports receiving signals sent to the JVM and calling appropriate JVM methods for processing.

2.2 JMM memory model

  • Private memory area: The life cycle is the same as that of threads, depending on the creation or end of user threads (in Hotspot JVM, each thread is directly mapped to OS threads)
    • Program Counter: The only area where OOM does not occur
      • A small memory space is an indicator of the bytecode line number executed by the current thread, and each thread has an independent program counter.
      • If the Java method is being executed, the counter records the address of the bytecode instruction (the address of the current instruction), and it is set to empty if it is a native method (Native Method).
    • Virtual machine stack: Describes the memory model of Java method execution. When each method is executed, a stack frame (Stack Frame) is created to store local variable table, operand stack, dynamic link, method exit and other information. The stack frame corresponds to the process from the beginning to the end of the method.
      • Stack frame: It is a data structure that stores data and partial calculation results. It is also used to process information such as dynamic linking, method return values, and exception dispatch. No matter how the method is completed (normal or abnormal), it will be destroyed with the end of the method.
    • Native method stack: a stack structure serving Native methods.
  • Shared memory area:
    • Method area (permanent generation): stores data such as loaded class information, constants, static variables, and just-in-time compiled code.
      • Runtime constant pool: A part of the method area, mainly used to store literal and symbol references generated during compilation after class loading.
    • Class instance area (heap memory): the memory area that stores created objects and arrays, and is the most important memory area for GC.
  • Direct memory area: not managed by JVM GC

2.3 JVM runtime memory

From the perspective of GC, the JVM heap can be subdivided into the new generation and the old generation .

2.3.1 The new generation

The new generation occupies 1/3 of the JVM heap memory, and the old generation occupies 2/3 of the space.

The new generation area can be divided into three parts: Eden area (8/10), SurvivorFrom area (1/10), SurvivorTo area (1/10)
- Eden area: storage area for new Java objects (if the new object is too large, will be directly put into the old generation area), when the memory in this area is insufficient, it will trigger Minor GC to reclaim the new generation area.
- SurvivorFrom area: Store objects that were not recycled in the last GC
- SurvivorTo area: Keep objects that were not recycled by Minor GC once.

Minor GC (minor garbage collection)

  • Adoption Algorithm: Copy Algorithm
  • The process of MinorGC: copy-clear-swap

The process of garbage collection in the new generation:

  • 1. Copy eden and survivorFrom to survivorTo, and age +1 for all objects
    • If the age of the object meets the standard of the old age, copy it to the old age
    • If there is insufficient memory in the SurvivorTo area, it will be copied directly to the old generation
  • 2. Clear the edon and survivorFrom areas
  • 3. The SurvivorFrom and SurvivorTo pointers are swapped
    • In the next GC, the current SurvivorFrom is the SurvivorTo that needs to be scanned

2.3.2 Old generation

The old generation mainly stores long-lived memory objects in the application. Because the objects are stable, MajorGC does not occur frequently.

  • When a new generation object enters the old generation, MajorGC is triggered first, and then MajorGC is triggered.
  • MajorGC is also triggered to make room when it cannot find enough memory space to allocate to newly created larger objects.
  • When the memory in the old generation is insufficient, an OOM exception is thrown.

Major GC (Major Garbage Collection)

  • Algorithm used: mark-and-sweep algorithm.
  • The usage process after the old generation is triggered:
    • 1. Scan all old generation objects and mark surviving objects.
    • 2. Recycle unmarked objects.
  • Disadvantages: Every recovery will generate memory fragments, which usually need to be merged or marked to facilitate the next direct allocation.

2.3.3 Permanent generation

The permanent memory area mainly stores Class and Meta (metadata) information.

  • Class is put into the permanent generation area when it is loaded, and the GC will not recycle the permanent area when the main program is running, so it will cause OOM exception.

Metadata space: Java 8 introduces the memory space to replace the permanent generation. The biggest difference is that the metadata space is mapped on the actual memory, and its size is also limited to the actual available memory; while the permanent generation is mapped on the memory of the JVM , limited by the JVM.


2.4 Garbage collection algorithm

2.4.1 How to judge whether it is garbage?

  • Reference counting method: The object is referenced once, and the reference counter is +1. When the reference number is 0 or the reference number is the lowest among all objects, it will be recycled by GC.
  • Reachability analysis: by searching a series of "GC roots" objects as the starting point, when there is no reachable path between a "GC roots" and an object, the object is unreachable and marked, if the second The second search mark is still an unreachable object, and it will be reclaimed by GC.

2.4.2 What GC algorithms are there?

  • Mark-Sweep algorithm (Mark-Sweep): one of the most basic GC algorithms, divided into two phases: mark and clear. Its disadvantage is memory fragmentation .
    • Labeling: Label recyclable objects with a single scan.
    • Clear: Recycle marked objects.
  • Copying algorithm (copying): Divide the memory into two parts of equal size (main area and backup area), first use the main area, and when the memory in the main area is full or not enough to allocate enough space, the main area will be marked and cleared for recycling , and then copy the surviving objects to the backup area, and clean up the main area. The disadvantage is that the available memory is halved, and there are too many surviving objects, and the efficiency of the copy algorithm decreases.
  • Mark-Compact algorithm (Mark-Compact): divided into three stages: labeling, moving and clearing.
    • Labeling: Mark recyclable objects.
    • Move: Move surviving objects to one end of memory.
    • Clear: Clear all areas outside the surviving object storage area.
  • Generational collection algorithm: mainly for the new generation, old generation and permanent generation collection algorithm.
    • The new generation: using the replication algorithm
      • Since the cleanup operation of the new generation is more than the copy operation, it is generally divided into a larger Edon area and two smaller Survivor spaces. When performing GC, copy the surviving objects in the Eden area and one of the Survivor spaces to another Survivor in space.
    • Old generation: using mark replication algorithm
      • When the object's age reaches 15, it is cleared from the old generation.
    • Permanent Generation: Using Mark Copy Algorithm
      • Mainly collect obsolete constants and useless classes

2.5 The four reference types of Java

  • Strong reference: most commonly used in object assignment variables, this reference is a strong reference, it will not be recycled by the JVM, which is the main cause of OOM.
  • Soft reference: implemented through the SoftReference class. For soft reference objects, they will only be recycled when the JVM runs out of memory. It is commonly used in memory-sensitive programs.
  • Weak reference: implemented by the WeakRefrence class, as long as the GC mechanism starts, the class reference will be recycled, and it is often used for class initialization (such as class code blocks).
  • Phantom reference: implemented through the PhantomRefrence class, must be used in conjunction with the reference queue, and is mainly used to track the state of the object being recycled.

2.6 The difference between generational collection algorithm and partition collection algorithm

2.6.1 Generational Collection Algorithm

  • New generation-replication algorithm: When GC occurs frequently and there are only a few surviving objects, the replication algorithm is selected, and the cost is less.
  • Old generation-marking and sorting algorithm: the survival rate of objects is high, and the marking and sorting algorithm is selected, which is more efficient

2.6.2 Partition Collection Algorithm

  • The partition algorithm is to divide a piece of memory space into different continuous memory spaces. Each small area can be used independently and recovered independently. It can control multiple spaces to be recovered at one time. According to the difference of the target pause time, several small areas can be reasonably recovered each time. , thereby reducing the pause generated by a GC.

2.7 GC garbage collector

Java provides different garbage collectors for the new generation and the old generation.

  • new generation
    • Serial (single-thread copy algorithm): one of the most basic garbage collectors, which is the only new-generation GC before JDK 1.3.1. It only uses one thread or one CPU during execution and blocks all threads until garbage collection is completed. Because of its high-efficiency features, there is no thread interaction overhead in a limited single CPU environment, so Serial is still the default new generation collector of the JVM in Client mode.
    • ParNew (multi-threaded Serial): By default, the number of threads is the same as the number of CPUs, and -XX:ParallelGCThreadsthe number of threads of the garbage collector can be limited by parameters.
    • Parallel Scavenge (Multi-thread Adaptive Serial): Efficient use of CPU time for garbage collection in a high-throughput environment. It is mainly suitable for background computing and tasks that do not require too much interaction. The adaptive adjustment strategy is its biggest feature.
  • old generation
    • CMS (Concurrent mark sweep): By obtaining the shortest garbage collection pause time, the multi-threaded mark-sweep algorithm is used for garbage collection. The working mechanism has four stages:
      • Initial marking: Marking objects directly associated with GC roots is extremely fast and requires blocking worker threads.
      • Concurrent marking: Execute trace GC roots operations concurrently with user threads
      • Relabeling: Correcting labeling that changes due to user program running requires blocking worker threads
      • Concurrent clearing: When both the first mark and the second mark are recyclable, the CMS clears the current recyclable object
    • Serial Old (single-thread mark finishing algorithm): The old-age version of the Serial collector, which mainly runs on the client's default JVM old-age garbage collector.
    • Parallel Old (multi-thread adaptive Serial Old): The old generation version of the Parallel Scavenge collector, in a high-throughput environment, it is used with Parallel Scavenge to perform new/old generation garbage collection adjustment strategies.
  • G1 collector: Improved CMS, based on mark sorting algorithm and precise control of pause time, while ensuring high throughput and achieving low-pause GC, it has the highest garbage collection efficiency within an effective time.
    • 1. Divide the heap memory into several fixed-size memory areas, and track the GC progress of these areas
    • 2. Maintain a priority list, and give priority to the areas where garbage is most frequently generated according to the allowed collection time.

2.8 IO and NIO

2.8.1 Blocking IO Model

One of the most traditional IO models, that is, blocking occurs during reading and writing

  • Concept: The user thread sends an IO request, and the kernel checks the data readiness status. If the data is not ready, the user thread is in a blocked state, and the user thread surrenders the CPU. When the data is ready, the kernel copies the data to the user thread and returns the result to the user thread. The user thread unblocks the state.
  • Example: data = socket.read , if the data is not ready, it will always be blocked in the read method.

2.8.2 Non-blocking IO model

  • Concept: After the user thread issues a read operation, the kernel checks the data readiness status. If the data is ready, the kernel copies the data. If the data is not ready, it directly returns an error result. The user thread keeps asking the kernel about the data readiness status, so the NIO model will always occupy CPU.
  • example: while(true){ data = socket.read(); if( data != error ) break; }
  • Cons: High CPU usage

2.8.3 Multiplexing IO Model

Currently using more IO models, Java NIO is a multiplexed IO.

  • Concept: In the multiplexed IO model, there will be a kernel thread that continuously polls the status of each socket, and the IO operation is called only when the socket sends out an IO request. In Java NIO, by selector.select()polling the socket, if there is no request, it will be blocked in the polling state.
  • Usage scenario: It is suitable for scenarios with a large number of connections.
  • Disadvantage: If the response body is too large, it will block the processing of other IO requests and affect the sockect polling progress.

2.8.4 Signal-driven IO model

  • Concept: When the user thread initiates an IO request, register a signal function for the corresponding socket, and then the user thread continues to execute. When the kernel data is ready, it will send a signal to the user thread. When the user thread receives the signal, it uses the IO of the signal function Operation interface to implement read and write operations.

2.8.5 Asynchronous IO model

An ideal IO model.

  • Concept: After the user thread initiates a read request, it does not need to wait for the data to return, and concurrently does other tasks. From the perspective of the kernel, asynchronous readafter the kernel receives it, it will return immediately, indicating that the read request is successfully initiated, so no blocking will occur. After the kernel waits for the data to be ready, it will copy the data to the user thread and send a signal to the user thread.
  • Use: In Java 7, the AIO (Asynchronous IO) class is provided.
  • Advantages: The user thread does not need to block the thread during the IO request phase and the IO usage phase, but can directly use the data after receiving the signal sent by the kernel.

2.9 Java IO包

Mainly the commonly used byte stream and character stream branches.


2.10 Java NIO package

NIO has three main core parts: Channel (channel), Buffer (buffer), Selector (selector) .

NIO performs IO operations based on Channel and Buffer. Data is always read from the channel to the buffer or written to the channel from the buffer; while the Selector listens to multiple Channel events (such as connection, data ready), so a single thread can listen to multiple aisle.

The biggest difference between NIO and IO is [IO stream-oriented, NIO buffer-oriented]

  • **NIO buffer**

Traditional IO reads single or multiple bytes from the stream until all bytes are read, so the data in the stream cannot be preprocessed.

The establishment of the NIO buffer is to flexibly pre-process the read data in the buffer. When the data is read from the channel to the buffer, the read data can be processed in the buffer and then the data acquisition operation can be performed, but the processing There are two problems to be dealt with when processing data: 1. Judging whether the data needs to be processed; 2. Reading in new data does not overwrite unprocessed data .

  • NIO's non-blocking

NIO's non-blocking means that when a thread calls a read or write operation, it will immediately obtain the currently available data or write part of the data in the channel. If no data is available, it will return empty data without waiting for valuable data, so NIO can be used Separate threads manage multiple IO channels.

  • Java NIO classes

2.10.1 Channel

Channel and Stream in IO are at the same level, except that Channel is bidirectional and Stream is unidirectional.

The main implementations of Channel in NIO are:

  • 1. FileChannel (file IO)
  • 2. DatagramChannel (UDP channel)
  • 3. SocketChannel (TCP client channel)
  • 4. ServerSocketChannel (TCP server channel)

2.10.2 Buffer

Buffer is actually a continuous array. Any data read or written needs to be transmitted through Buffer, but there is no need for Buffer to buffer between channels.

The main implementations of Buffer in NIO are:

  • 1、ByteBuffer
  • 2、ShortBuffer
  • 3、IntBuffer
  • 4、LongBuffer
  • 5、DoubleBuffer
  • 6、FloatBuffer
  • 7、CharBuffer

2.10.3 Selector

Selector can detect whether an IO request is issued on multiple registered channels. When there is an IO request, it will get the request event and call the response method corresponding to the request time.


2.11 JVM class loading mechanism

The JVM class loading mechanism is divided into five parts:

  • Loading: At this stage, a java.lang.Class object representing this class will be generated in memory as the method area entry for reading various data of this class. (The Class object here can be read from the compressed package, or it can be a dynamic proxy class, or it can be generated from other files)
  • Verification: This stage is to ensure that the information contained in the byte stream of the Class file meets the requirements of the current JVM.
  • Preparation (Preparation): This stage is to allocate memory and initialize class variables. (The initialization here is the default value of the type that has been allocated by the JVM, for example, the initialization value of the int type is 0)
  • Resolution: This stage is the process of replacing symbolic references in the constant pool with direct references. (That is, replace the positions where variables are used with constants)
    • Symbol reference: symbol reference has nothing to do with the layout of the virtual machine implementation, but the literal form of the symbol reference must conform to the Class file format of the JVM specification.
    • **Direct reference:** The referenced target must exist in the virtual machine. The directly referenced target can be a pointer to the target, a relative offset or a handle to indirectly locate the target.
  • Initialization (Initialization): This stage is to actually execute the Java program code defined in the class, mainly the process of executing the class constructor <client>method.
    • <client>The method is combined by the compiler to automatically collect the assignment operation of the class variable in the class and the statement in the static statement block. The JVM will ensure that the execution of the parent class constructor is completed before the execution of the subclass constructor. If there is no static variable in a class Assignment also has no static code block, then the compiler will not generate <client>methods for it.
    • Cases where class initialization is not performed
      • 1. When a subclass refers to the static field of the parent class, it will only trigger the initialization of the parent class
      • 2. An array of objects is defined
      • 3. Classes that do not directly reference constants
      • 4. Get the Class object by class name
      • 5. When calling Class.forName() to load the specified class, specify the parametersinitialize=false
      • 6. The default loadClass method of ClassLoader is called

The unloading mechanism of JVM classes is divided into two parts:

  • Using
  • Unloading

2.11.1 Class loaders

The JVM provides three class loaders: startup class loader, extension class loader, application class loader, and custom class loader.

  • Bootstrap ClassLoader: Responsible for loading JAVA_HOME\libclasses in the directory, or in -Xbootclasspaththe specified path, and recognized by the JVM (identified by file, such as rt.jar).
  • Extension ClassLoader: Responsible for loading the class library JAVA_HOME\libin the directory or java.ext.dirsin the path specified by the system variable.
  • Application ClassLoader (Application ClassLoader): Responsible for loading class libraries on the user path (classpath)
  • Custom class loader: The JVM loads classes through the parent delegation model, and can java.lang.ClassLoaderimplement a custom class loader through inheritance.

2.11.2 Parental delegation model

  • Concept: When a class receives a class loading request, it first delegates the request to the parent class loader, and each class loader will delegate to the parent class loader, so eventually all class loading requests will reach the startup class loader, When the parent class loader responds that the request cannot be processed (the class to be loaded cannot be found under its own loading path), the child class loader tries to load itself.
  • Advantages: It is guaranteed that different class loaders will eventually obtain the same Object object.

2.11.3 OSGI (Dynamic Modeling System)

OSGI provides the function of dynamically changing the structure of various network devices without restarting, and can realize the module-level hot-swapping function. When the program is updated, only some modules of the program can be disabled, reinstalled, and started. While modularizing functionality, additional high complexity is introduced because the parental delegation model is not followed.

Guess you like

Origin blog.csdn.net/Zain_horse/article/details/132137942