Ali architects will take you to understand jvm in simple terms

This article talks about the internal structure of the JVM, and analyzes it from the aspects of multi-threading in components, JVM system threads, local variable arrays, etc.

JVM

JVM = classloader + execution engine + runtime data area

The following diagram shows the key internal components of a typical JVM (compliant with the JVM Specification Java SE 7 Edition).

Ali architects will take you to understand jvm in simple terms

Multithreading in Components

"Multithreading" or "free threading" refers to the ability of a program to perform multiple threads of operation simultaneously. As an example of a multithreaded application, a program receives user input on one thread and executes it on another thread Multiple complex calculations and updating the database on a third thread. In a single-threaded application, the user may spend time waiting for calculations or database updates to complete. In a multi-threaded application, these processes can be performed in the background, So no user time is wasted. Multithreading can be a very powerful tool in component programming. By writing multithreaded components, you can create components that perform complex computations in the background, and they allow the user interface (UI) to perform computations in the process Responds freely to user input in the Internet. Although multithreading is a powerful tool, it can be difficult to apply correctly. Multithreaded code that is not implemented correctly can degrade application performance, or even cause the application to freeze. The following topics Will introduce you to some considerations and best practices for multithreaded programming. The .NET Framework provides several options for multithreading in components. The functionality in the System.Threading namespace is one option. The event-based asynchronous pattern is Another option. The BackgroundWorker component is an implementation of the asynchronous pattern; it provides advanced functionality encapsulated in a component for ease of use.

JVM system thread

If you look with jconsole or any other debug tool, you may see that there are many threads running in the background. These running background threads do not contain the main thread, which is created based on the need to execute publicstatic void main(String[]). And these background threads are created by the main thread. The main background system threads in HotspotJVM, see the following table:

VM threads This thread is used to wait to perform a series of operations that can bring the JVM to a "safe-point".

The reason these operations have to happen on a separate thread is that they all require the JVM to be at a safepoint where it cannot modify the heap.

All such operations performed by this thread are "stop-the-world" garbage collection, thread stack collection, thread shelving, and biased lock revocation.

periodic task thread This thread is used to respond to timer events (for example, interrupts), which are used to schedule periodic operations
GC thread These threads support different types of garbage collection in the JVM
compiler thread They are used to compile bytecode to native machine code at runtime
signal dispatch thread The thread receives the signal sent to the JVM and handles it by calling the appropriate method of the JVM

single thread

An execution of each thread consists of the following components

Program Counter (PC)

Unless the current instruction or opcode is native, the address of the current instruction or opcode needs to depend on the PC for addressing. If the current method is native, the PC is undefined. All CPUs have a PC, usually the PC is incremented after each instruction execution to point to the address of the next instruction to be executed. The JVM uses the PC to keep track of the location of the instruction being executed. In fact, PC is used to point to a memory address of the methodarea.

native stack

Not all JVMs support native methods, but those that do typically create a native method stack per thread. If the implementation of the C link model is adopted for the JNI (JavaNative Invocation) of the JVM, then the native stack will also be a C implementation stack. In this example, the order of parameters in the native stack and the return value will be the same as in a normal C program. A native method usually generates a callback to the JVM (this depends on the JVM implementation) and executes a Java method. Such a native-to-Java call happens on the stack (usually on the Java stack), and at the same time the thread will also leave the native stack, usually creating a new frame on the Java stack.

stack

Each thread has its own stack that stores the frame for each method executed on the thread. The stack is a last-in, first-out data structure, which keeps the currently executing method at the top of the stack. For each method execution, a new frame is created and pushed to the top of the stack. The frame is popped from the stack when the method returns normally or when an uncaught exception is encountered during method execution. The stack is not manipulated directly, except for push/pop frame objects. Therefore, it can be seen that the frame object may be allocated on the heap, and the memory is not necessarily a contiguous address space (please note the distinction between the frame pointer and the frame object).

stack limit

A stack can be dynamic or appropriately sized. If a thread requires a larger stack, it will throw a StackOverflowError exception; if a thread requires a new frame to be created, and there is not enough memory space to allocate, an OutOfMemoryError exception will be thrown.

Frame

For each method execution, a new frame is created and pushed to the top of the stack. When the method returns normally or encounters an uncaught exception during method execution, the frame will be popped off the stack.

local variable array

The local variable array contains all variables used during method execution. Contains a reference to this, all method parameters, and other locally defined variables. For class methods (such as static methods), the storage index of method parameters starts from 0; for instance methods, the slot with index 0 is reserved for storing this pointer.

operand stack

The operand stack is used during the execution of bytecode instructions. It is similar to the general purpose registers used by native CPUs. Most of the bytecode spends its time dealing with the operand stack, by pushing, popping, copying, swapping, or performing operations that produce/consume values. Instructions that move values ​​between the local variable array and the operand stack are very frequent for bytecode.

dynamic link

Each frame contains a reference to the runtime constant pool. This reference points to the constant pool of the class to which the method to be executed belongs. This reference is also used to assist dynamic linking.

When a Java class is compiled, all references to variables and methods stored in the class's constant pool are treated as symbolic references. A symbolic reference is just a logical reference rather than a reference that ultimately points to a physical memory address. A JVM implementation can choose when to resolve a symbolic reference. This timing can occur when the class file is verified and loaded. This is called eager or static analysis; the difference is that it can also occur when a symbolic reference is used for the first time. When this happens, it is called lazy or delayed analysis. But the JVM has to guarantee that parsing happens before each reference is used for the first time, and at that point it can throw an exception if it encounters parsing errors. Binding is the process of replacing a field, method, or class identified by a symbolic reference with a direct reference. This processing happens only once, because symbolic references need to be completely replaced. If a symbolic reference is associated with a class that has not yet been resolved, the class will also be loaded immediately. Each direct reference is stored at an offset, which is associated with the runtime location of the variable or method.

shared between threads

heap

  • The value of a node in the heap is always not greater or less than the value of its parent node;

  • A heap is always a complete binary tree.

The heap with the largest root node is called the max heap or the big root heap, and the heap with the smallest root node is called the min heap or the small root heap. Common heaps include binary heaps, Fibonacci heaps, etc.

A heap is defined as follows: a sequence of n elements {k1,k2,ki,…,kn} is called a heap if and only if the following relation is satisfied.

(ki <= k2i, ki <= k2i + 1) Someone (ki> = k2i, ki> = k2i + 1), (i = 1,2,3,4 ... n / 2)

If the one-dimensional array corresponding to this sequence (that is, the one-dimensional array is used as the storage structure of this sequence) is regarded as a complete binary tree, the meaning of the heap indicates that the value of all non-terminal nodes in the complete binary tree is not greater than (or not less than) the value of its left and right children. Thus, if the sequence {k1,k2,...,kn} is a heap, the top element of the heap (or the root of a complete binary tree) must be the minimum (or maximum) value of n elements in the sequence

off-heap memory

Some objects are not created on the heap, these objects are logically considered part of the JVM mechanism.

Off-heap memory includes:

  • The permanent generation includes:

  • method area

  • internal string

  • Code cache: used to compile and store methods that have been JIT-compiled to native code

memory management

Objects and arrays are never explicitly freed, so they can only be collected automatically by the garbage collector.

Usually, the steps are as follows:

  1. New objects and arrays are created in the young generation

  2. The secondary garbage collector will execute on the young generation. Those objects that are still alive will be moved from the eden area to the survivor area

  3. The main garbage collector will move objects from generation to generation, and the main garbage collector usually causes the application's threads to pause. Objects that are still alive will be moved from the young generation to the old generation

  4. The permanent generation will be collected at the same time every time the old generation is collected, they will be collected after either of them is full

JIT compilation

The specific way of JIT is this: when a type is loaded, the CLR creates an internal data structure and corresponding function for the type, and when the function is called for the first time, JIT compiles the function into machine language. When encountered again This function executes the compiled machine language directly from the cache.

method area

All threads share the same method area. Therefore, access to method area data and processing of dynamic links must be thread-safe. If two threads attempt to access a field or method of a class that has not been loaded (the class must be loaded only once), the two threads cannot continue until the class is loaded.

class file structure

A compiled class file contains the following structure:

ClassFile
 { u4magic; u2minor_version; u2major_version; u2constant_pool_count;
cp_infocontant_pool[constant_pool_count – 1]; u2access_flags;
u2this_class; u2super_class; u2interfaces_count;
u2interfaces[interfaces_count]; u2fields_count;
field_infofields[fields_count]; u2methods_count;
method_infomethods[methods_count]; u2attributes_count;
attribute_infoattributes[attributes_count];}
magic,

minor_version,

major_version

Specify some information:

The version of the current class, the JDK version that compiles the current class

constant_pool Similar to symbol table, but it contains more data
access_flags provide a set of modifiers for the class
this_class Provide the fully qualified name of the class with an index in the constant pool, for example: org/jamesdbloom/foo/Bar
super_class Provides the index of the symbolic reference to its parent class in the constant pool, for example: java/lang/Object
interface index into the constant pool array that provides symbolic references to all implemented interfaces
fields index into the constant pool of an array that provides a full description of each field
methods index into the constant pool of an array that provides a full description of each method signature, if the method is not abstract or native,

then also contains bytecode

attributes An array of different values ​​that provide additional information about the class, including annotations: RetentionPolicy.CLASS and RetentionPolicy.RUNTIME

You can use the javap command to view the bytecode of the compiled java class.

The opcodes used in this class file are listed below:

aload_0 The opcode is one of a set of opcodes in the form aload_<n>.

They are both used to load an object reference onto the operand stack.

And "<n>" is used to indicate the position of the object reference to be accessed in the local variable array, but the value of n can only be 0, 1, 2 or 3.

There are other similar opcodes for loading non-object references, such as: iload_<n>, lload_<n>, fload_<n> and dload_<n>

(where i means int, l means long, f means float, and d means double, the value range of n above is also applicable to these *load_<n>).

If the index of the local variable is greater than 3, it can be loaded using iload, lload, float, dload and aload.

These opcodes all carry the index in the array of the local variable to be loaded.

ldc This opcode is used to fetch a constant from the runtime constant pool and push it onto the operand stack
getstatic This opcode is used to push a static value from the static field list of the runtime constant pool to the operand stack
invokespecial

invokevirtual

这些操作码是一组用来执行方法的操作码

(总共有:invokedynamic、invokeinterface、invokespecial、invokestatic、invokevirtual这几种)。

其中,本例中出现的invokevirtual用来执行类的实例方法;

而invokespecial用于执行实例的初始化方法,同时也用于执行私有方法以及属于超类但被当前类继承的方法

(超类方法动态绑定到子类)。

return 该操作码是一组操作码(ireturn,lreturn,freturn,dreturn,areturn以及return)中的其中一个。

每个操作码,都是类型相关的返回语句。

其中i代表int,l表示long,f表示float,d表示double而a表示一个对象的引用。

没有标识符作为首字母的return语句,仅会返回void

就像在其他通用的字节码中那样,以上这些操作码主要用于跟本地变量、操作数栈以及运行时常量池打交道。

构造器有两个指令,第一个将“this”压入到操作数栈,接下来该构造器的父构造器被执行,这一操作将导致this被“消费”,因此this将从操作数栈出栈。

Ali architects will take you to understand jvm in simple terms

而对于sayHello()方法,它的执行将更为复杂。因为它不得不通过运行时常量池,解析符号引用到真实的引用。第一个操作数getstatic,用来入栈一个指向System类的静态字段out的引用到操作数栈。接下来的操作数ldc,入栈一个字符串字面量“Hello”到操作数栈。最后,invokevirtual操作数,执行System.out的println方法,这将使得“Hello”作为一个参数从操作数栈出栈,并为当前线程创建一个新的frame。

Ali architects will take you to understand jvm in simple terms

在此我向大家推荐一个架构学习交流群。交流学习群号: 744642380, 里面会分享一些资深架构师录制的视频录像:有Spring,MyBatis,Netty源码分析,高并发、高性能、分布式、微服务架构的原理,JVM性能优化、分布式架构等这些成为架构师必备的知识体系。还能领取免费的学习资源,目前受益良

类加载器

JVM的启动是通过bootstrap类加载器来加载一个用于初始化的类。在publicstatic void main(String[])被执行前,该类会被链接以及实例化。main方法的执行,将顺序经历加载,链接,以及对额外必要的类跟接口的初始化。

加载: 加载是这样一个过程:查找表示该类或接口类型的类文件,并把它读到一个字节数组中。接着,这些字节会被解析以确认它们是否表示一个Class对象以及是否有正确的主、次版本号。任何被当做直接superclass的类或接口也一同被加载。一旦这些工作完成,一个类或接口对象将会从二进制表示中创建。

链接: 链接包含了对该类或接口的验证,准备类型以及该类的直接父类跟父接口。简而言之,链接包含三个步骤:验证、准备以及解析(optional)

验证:该阶段会确认类以及接口的表示形式在结构上的正确性,同时满足Java编程语言以及JVM语义上的要求。

在验证阶段执行这些检查意味着在运行时可以免去在链接阶段进行这些动作,虽然拖慢了类的加载速度,然而它避免了在执行字节码的时候执行这些检查。

准备:包含了对静态存储的内存分配以及JVM所使用的任何数据结构(比如方法表)。静态字段都被创建以及实例化为它们的默认值。然而,没有任何实例化器或代码在这个阶段被执行,因为这些任务将会发生在实例化阶段。

解析:是一个可选的阶段。该阶段通过加载引用的类或接口来检查符号引用是否正确。如果在这个点这些检查没发生,那么对符号引用的解析会被推迟到直到它们被字节码指令使用之前。

实例化 类或接口,包含执行类或接口的实例化方法:<clinit>

Ali architects will take you to understand jvm in simple terms

在JVM中存在多个不同职责的类加载器。每一个类加载器都代理其已被加载的父加载器(除了bootstrap类加载器,因为它是根加载器)。

Bootstrap类加载器:当java程序运行时,java虚拟机需要装载java类,这个过程需要一个类装载器来完成。而类装载器本身也是一个java类,这就出现了类似人类的第一位母亲是如何产生出来的问题。

其实,java虚拟机中内嵌了一个称为Bootstrap的类装载器,它是用特定于操作系统的本地代码实现的,属于java虚拟机的内核,这个Bootstrap类装载器不用专门的类装载器去装载。Bootstrap类装载器负责加载java核心包中的类。

Extension 类加载器:从标准的Java扩展API中加载类。例如,安全的扩展功能集。

System 类加载器:这是应用程序默认的类加载器。它从classpath中加载应用程序类。

用户定义的类加载器:可以额外得定义类加载器来加载应用程序类。用户定义的类加载器可用于一些特殊的场景,比如:在运行时重新加载类或将一些特殊的类隔离为多个不同的分组(通常web服务器中都会有这样的需求,比如Tomcat)。

更快的类加载

一个称之为类数据共享(CDS)的特性自HotspotJVM 5.0开始被引进。在安装JVM期间,安装器加载一系列的Java核心类(如rt.jar)到一个经过映射过的内存区进行共享存档。CDS减少了加载这些类的时间从而提升了JVM的启动速度,同时允许这些类在不同的JVM实例之间共享。这大大减少了内存碎片。

方法区的位置

JVM Specification Java SE 7 Edition清楚地声明:尽管方法区是堆的一个逻辑组成部分,但最简单的实现可能是既不对它进行垃圾回收也不压缩它。然而矛盾的是利用jconsole查看Oracle的JVM的方法区(以及CodeCache)是非堆形式的。OpenJDK代码显示CodeCache相对ObjectHeap而言是VM中一个独立的域。

类加载器引用

类通常是按需加载,即第一次使用该类时才加载。由于有了类加载器,Java运行时系统不需要知道文件与文件系统。

运行时常量池

JVM对每个类型维护着一个常量池,它是一个跟符号表相似的运行时数据结构,但它包含了更多的数据。Java的字节码需要一些数据,通常这些数据会因为太大而难以直接存储在字节码中。取而代之的一种做法是将其存储在常量池中,字节码包含一个对常量池的引用。运行时常量池主要用来进行动态链接。

几种类型的数据会存储在常量池中,它们是:

  • 数值字面量

  • 字符串字面量

  • 类的引用

  • 字段的引用

  • 方法的引用

如果你编译下面的这个简单的类:

package org.jvminternals;public class SimpleClass { public void sayHello() {System.out.println("Hello");}}

生成的类文件的常量池,看起来会像下图所示:

Constant
 pool: #1 = Methodref #6.#17 // java/lang/Object."<init>":()V#2 = 
Fieldref #18.#19 // java/lang/System.out:Ljava/io/PrintStream;#3 = 
String #20 // "Hello"#4 = Methodref #21.#22 // 
java/io/PrintStream.println:(Ljava/lang/String;)V#5 = Class #23 // 
org/jvminternals/SimpleClass#6 = Class #24 // java/lang/Object#7 = Utf8 
<init> #8 = Utf8 ()V #9 = Utf8 Code #10 = Utf8 LineNumberTable #11
 = Utf8 LocalVariableTable #12 = Utf8 this #13 = Utf8 
Lorg/jvminternals/SimpleClass; #14 = Utf8 sayHello #15 = Utf8 SourceFile
 #16 = Utf8 SimpleClass.java #17 = NameAndType #7:#8 // 
"<init>":()V#18 = Class #25 // java/lang/System#19 = NameAndType 
#26:#27 // out:Ljava/io/PrintStream;#20 = Utf8 Hello #21 = Class #28 // 
java/io/PrintStream#22 = NameAndType #29:#30 // 
println:(Ljava/lang/String;)V#23 = Utf8 org/jvminternals/SimpleClass #24
 = Utf8 java/lang/Object#25 = Utf8 java/lang/System #26 = Utf8 out#27 = 
Utf8 Ljava/io/PrintStream; #28 = Utf8 java/io/PrintStream #29 = Utf8 
println #30 = Utf8 (Ljava/lang/String;)V

常量池中包含了下面的这些类型:

Integer 一个4字节的int常量
Long 一个8字节的long常量
Float 一个4字节的float常量
Double 一个8字节的double常量
String 一个String字面值常量指向常量池中另一个包含最终字节的UTF8记录
Utf8 一个字节流表示一个Utf8编码的字串序列
Class 一个Class字面值常量指向常量池中的另一个Utf8记录,它包含JVM内部格式的完全限定名

(它用于动态链接)

NameAndType 用一个冒号区分一对值,每个值都指向常量池中得其他记录。

冒号前的第一个值指向一个utf8字符串字面量表示方法名或者字段名。

第二个值指向一个utf8字符串字面量表示类型。

举一个字段的例子是完全限定的类名;

举一个方法的例子是: 它是一个列表,该列表中每个参数都是完全限定的类名

Fieldref,

Methodref,

InterfaceMethodref

用点来分隔的一对值,每个值指向常量池中的另一个记录。

点前的第一个值指向一个Class记录。第二个值指向一个NameAndType记录

异常表

异常表存储了每个异常处理器的信息:

  • 起始点

  • 终止点

  • 处理代码的PC偏移量

  • 被捕获的异常类的常量池索引

如果一个方法定义了try-catch或try-finally异常处理器,那么一个异常表将会被创建。它包含了每个异常处理器的信息或者finally块以及正在被处理的异常类型跟处理器代码的位置。

当一个异常被抛出,JVM会为当前方法寻找一个匹配的处理器。如果没有找到,那么该方法最终会唐突地出栈当前stackframe而异常会被重新抛出到调用链(新的frame)。如果在所有的frame都出栈之前还是没有找到异常处理器,那么当前线程将会被终止。当然这也可能会导致JVM被终止,如果异常被抛出到最后一个非后台线程的话,比如该线程就是主线程。

最终异常处理器会匹配所有的异常类型并且无论什么时候该类型的异常被抛出总是会得到执行。在没有异常抛出的例子中,finally块仍然会在方法的最后被执行。一旦return语句被执行就会立即跳转到finally代码块继续执行。

字符比较

字符比较(character comparison)是指按照字典次序对单个字符或字符串进行比较大小的操作,一般都是以ASCII码值的大小作为字符比较的标准。

符号表

The symbol table needs to continuously collect, record and use the types and characteristics of some syntax symbols in the source program in the process of compiling the program. This information is generally stored in the system in tabular form. Such as constant table, variable name table, array name table, procedure name table, label table, etc., collectively referred to as symbol table. The quality of the organization, construction and management of the symbol table will directly affect the running efficiency of the compilation system.

In the JVM, internal strings are stored in the string table. A string table is a hashtable mapping object pointers to symbols (eg Hashtable<oop,Symbol>), which are stored in the permanent generation.

String literals are automatically "internalized" by the compiler and added to the character table when the class is loaded. Alternatively instances of the String class can be explicitly internalized by calling String.intern(). When String.intern() is called, a reference to the string is returned if the symbol table already contains the string. If the string is not contained in the character table, it is added to the string table and its reference is returned.

Here I recommend an architecture learning exchange group to everyone. Communication and learning group number: 744642380, which will share some videos recorded by senior architects: Spring, MyBatis, Netty source code analysis, high concurrency, high performance, distributed, microservice architecture principles, JVM performance optimization, distributed architecture Wait for these to become the necessary knowledge system for architects. You can also receive free learning resources, which are currently benefiting

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324913358&siteId=291194637