The virtual/non-virtual method of the underlying implementation of the method call, dynamic/static dispatch

Understand the significance of the method execution process

The code we write enters the JVM runtime data area after compilation and various stages of class loading. But as a programmer, what we care most about is the execution of the code. The execution of the code is essentially the execution of the method. From the perspective of the JVM, in the final analysis, it is the execution of bytecode. The main function is the starting point for the execution of JVM instructions. The JVM will create the main thread to execute the main function to trigger the execution of a series of instructions of the JVM, and really run the JVM. Next, in our code, it is the process of method invocation method, so it is very necessary to understand the method invocation in JVM.

Five bytecode instructions for calling methods

Regarding method call, Java bytecode provides 5 instructions to call different types of methods.

1. Invokestatic is used to call static methods

2. Invokespecial is used to call private instance methods, constructors and super keywords, etc.

3. Invokevirtual is used to call non-private instance methods, such as public and protected. Most method calls belong to this category.

4. InvokeInterface is similar to the above instruction, but it acts on the interface class.

5. Invokedynamic is used to call dynamic methods.

Statically typed language

Static and dynamic dispatch

Dynamic dispatch and static dispatch mechanism is the principle of Java polymorphism, so what is dynamic and static dispatch? We analyze it from the perspective of method invocation.

Method invocation is not equal to method execution. The only task in the method invocation phase is to determine the version of the method being called. For the time being, the specific operation process inside the method is not involved.

When the program is running, method calls are the most common and frequent operation. However, the compilation process of the Class file does not include the connection steps in the traditional compilation . All method calls are stored in the Class file only by symbol references, rather than the entry address of the method in the actual runtime memory layout (relative to the direct reference mentioned earlier) ).

image.png

This feature brings more powerful dynamic expansion capabilities to Java, but it also makes the Java method call process relatively complicated. It is necessary to determine the direct reference of the target during class loading or even during runtime .

Parsing

Remember the 7 stages of class loading we talked about before?

image

We know that the role of the parsing stage is to turn symbolic references into direct references.

Here, we talk about the parsing stage in detail. As far as the method is concerned, not all symbol references will be converted into direct references, but some of the symbols will be converted into direct references.

When we understand the feature of object-oriented programming polymorphism, we all know that polymorphism can give us huge program expansion capabilities, but it also makes it difficult for the called layer to confirm its actual running version. Let's look at a simple program.

The existing inheritance relationship is as follows:

image.png

Let's simulate the process of program compilation (assuming we become a compiler ourselves), our sayHello belongs to the called layer.

image.png

We as compilers. When compiling, we saw a piece of this code. At this time, we will definitely be in the heart! Hey there! Who is this code passed to me when it runs? Because the process of program execution is different, it may be passed on to me as a woman, or it may be passed on as a man! It is precisely because of this problem, therefore, to confirm the final execution version of the method, it can only be finalized at runtime.

Summarizing the above conclusions, the premise that the analysis can be established at compile time is: the method has a determinable calling version before the program is actually executed, and the calling version of this method is immutable during runtime. In other words, the calling target must be determined when the program code is written and the compiler compiles it. Obviously, the above overloaded example cannot be confirmed at compile time.

In the Java language, methods that meet the requirement of " knowable at compile time and immutable at runtime " mainly include static methods and private methods. The former is directly related to the type, and the latter cannot be accessed externally. The respective characteristics of these two methods determine that they cannot rewrite other versions through inheritance or other methods, so they are suitable for parsing in the class loading stage.

Static method, private method, instance constructor, super class method (super), and a final method (but the final method uses the invokevirtual instruction call, but in spite of this, it can still confirm the unique version number in the parsing phase) . These methods are called non-virtual methods (methods that do not change). When these five are called, the symbol reference resolution is called the direct reference of the methodduring the resolution stage of class loading. In contrast, other methods are called virtual methods (the actual version needs to be confirmed at runtime).

Oh~ After reading here, we only know that the parsing stage can only determine the version of these non-virtual methods, so what are the manifestations of these methods in bytecode?

Dispatch

What is distribution? My understanding of the concept of dispatch is the process of "passing" the method data in the Class file to the runtime data area from the perspective of the Class file. Static dispatch, as the name implies, is to throw the method structure of "knowable by compilation and immutable at runtime" into the method area. On the contrary, dynamic dispatch throws the method structure of "compilation uncertain, know at runtime" into the method area. Therefore, dispatch is a verb that describes the process of transferring the method structure from the Class file to the runtime data area. The virtual/non-virtual method is a noun, describing whether this method can be put into the runtime data area during the parsing phase.

Static dispatch

Reference type as parameter overload

Let's analyze the example just now again.

Picture 1.png

image.png

Eh? Was it a bit daunting to see this result for the first time? Let's let the bytecode tell us the answer.

image.png

Obviously, we can see that these two invokespecila instructions are non-virtual methods that are called. It proves that this method can determine its only version during compilation.

In the compilation phase, Human man = new Man() is an obvious polymorphism, but this line of code can only confirm its static/appearance type (Human) in the compiler. Let's transform the code a bit to help you understand.

image.png

As you can see, we have less overloaded the sayHello method that takes the appearance type as a parameter, and the compiler will prompt us directly. "Ah~~~, I didn't know these two subclasses when I compiled them~~~". After this description, I hope you have a deeper understanding of the invokespecial command.

Human man = new Man();

We refer to "Human" as the static type of the variable, and the following "Man" as the actual type of the variable. Both the static type and the actual type can change in the program. The difference is that the change of static type only occurs during use, the static type of the variable will not be changed, and the final static type is known at compile time, and the result of the actual type change is determined at runtime. At compile time, the actual type of the object is not known at compile time.

image.png

image.png

The compiler uses the static type of the parameter instead of the actual type as the basis for judgment when overloading. And the static type is known at compile time, so the javac compiler will decide which overloaded version to use according to the static type of the parameter .

All dispatch actions that rely on static types to locate the execution version of a method are called static dispatch. A typical application of static dispatch is method overloading.

Static dispatch occurs in the compilation phase, so the action of determining static dispatch is not actually executed by the virtual machine, but by the compiler.

Literal as parameter overload

The literal has no explicit static type, and its static type can only be understood and inferred through the rules of the language.

image.png

Annotating in turn will get different results.

Look, these basic types as parameters, are they more in line with our conventional thinking? The reason is the characteristics caused by polymorphism.

If it is not possible to determine which type to convert to during compilation, it will prompt that the type is ambiguous and refuse to compile.

image.png

Although these types have methods that can be overloaded, the compiler still cannot determine the actual version ( random number generated by r.nextInt()%2) at the compilation stage . So it started reporting errors again. "Eh eh eh~~~ I can't figure it out again. Who will save me???"

Dynamic dispatch

Dynamic dispatch of parent class/child class

Dynamic allocation is too common! Most of the methods we call normally fall into this category.

image.png

image.png

image.png

After understanding the static allocation, it will be very clear to look at the dynamic allocation. At this time, the object of our key operation is no longer the man in Human man = new Man();, it is already the behavior of man.sayHello(). The bytecode instruction for this behavior is invokevirtual(). At this time, the call is already a virtual method, and it is the actual type Man that determines the meaning of the man.sayHello() method. So the actual type of transformation (man=new Woman()), the whole result will also change. This process of determining method behavior at runtime is dynamic dispatch.

Obviously, here can no longer be determined based on the static type, because the static type is also the two variables man and woman perform different behaviors when calling the sayHello() method, and the variable man performs different methods in the two calls. The reason for this phenomenon is obvious. The actual types of these two variables are different. How does the Java virtual machine assign the method execution version based on the actual type?

Let's start with the polymorphic search process of the invokevirtual instruction. The runtime parsing process of the invokevirtual instruction is roughly divided into the following steps:

1. Find the actual type pointed to by the first element on the top of the operand stack, denoted as C.

2. If you find a method (metadata: class method, class, variable) that matches the simple name described in the constant pool in Type C, then access it, then verify the access authority, and return this method if the verification passes. The search process ends. If it fails, a java.lang.IllegalAccessError exception is thrown.

3. If not found, the second step of the search and verification process is performed on each parent class of type C from top to bottom according to the inheritance relationship.

4. If no suitable method is found, a java.lang.AbstractMethodError exception is thrown.

Since the first step in the execution of the invokevirtual instruction is to determine the actual type of the recipient at runtime, the invokevirtual instruction in the two calls does not end by directly referencing the symbolic reference resolution of the method in the constant pool, and it will be accepted according to the method. The actual type of the person to choose the method version. This is the essence of Java language method rewriting. The dispatching process in which we determine the method version based on the actual type during this runtime is called dynamic dispatch.

Dynamic assignment of interface/internship

image.png

image.png

image.png

image.png

The virtual method instruction of the interface is called here, which also belongs to dynamic dispatch.

Version confirmation process for dynamic allocation of virtual machines

 

Picture 3.png

The implementation of the dispatch described above is basically sufficient as an analysis of the conceptual model of the virtual machine. It has solved the problem of "what will be done" in the dispatch of the virtual machine.

But the virtual machine "specifically how to do it", there may be differences in the implementation of various virtual machines.

Since dynamic dispatch is a very frequent action, and the method version selection process of dynamic dispatch requires the runtime to search for the appropriate target method in the method metadata of the class, the actual implementation of the virtual machine is based on performance considerations, most of which will not Really conduct such frequent searches. In the face of this situation, the most commonly used "stable optimization" method is to create a virtual method table (Virtual Method Table) in the method area for the class, and use the virtual method table index instead of metadata lookup to improve performance.

 

The virtual method table stores the actual entry addresses of various methods. If a method is not overridden in a subclass, the address entry in the virtual method table of the subclass is consistent with the address entry of the parent class method. They all point to the actual entrance of the parent class. If this method is overridden in the subclass, the address in the subclass method table will be replaced with the entry address pointing to the actual version of the subclass.

For the convenience of program implementation, methods with the same signature have the same index number in the virtual method tables of the parent class and the child class. In this way, when the type is changed, only the method table to be looked up can be changed from a different virtual method table. In the method table, the required entry address is converted according to the index.

The method table is generally initialized in the connection phase of the class loading phase. After preparing the initial value of the class variable, the virtual machine also initializes the method table of the class.

Looking at it this way, do you think of the parental delegation model? The idea of ​​the parent delegation model is to try to load this class from the parent class loader to the child class loader in turn. And dynamic dispatch has the meaning of locating the version from the child class to the parent class.

Dynamically typed language

Five bytecode instructions about the method:

1. Invokestatic is used to call static methods

2. Invokespecial is used to call private instance methods, constructors and super keywords, etc.

3. Invokevirtual is used to call non-private instance methods, such as public and protected. Most method calls belong to this category.

4. InvokeInterface is similar to the above instruction, but it acts on the interface class.

5. Invokedynamic is used to call dynamic methods.

The first four are all operating instructions for statically typed languages, and the Invokedynamic instruction specializes in operating dynamically typed languages.

image.png

What is a dynamically typed language?

Since the advent of Sun for more than two decades, only one instruction has been added to the Class bytecode. That is-invokedynamic instruction. The goal of this language is to achieve dynamic language types.

Before understanding the dynamic language support of the Java virtual machine, we must first understand what a dynamically typed language is? What does it have to do with the Java language and the Java virtual machine? Understanding the technical background of dynamic language support provided by the Java virtual machine is very necessary to understand this language feature.

What is a dynamically typed language? The key feature of a dynamically typed language is that the main body of its type checking is carried out at runtime rather than at compile time.

Are you confused at this time? Didn’t dynamic dispatch also deal with type issues during runtime?

It must be clearly recognized here. Due to the nature of polymorphism, dynamic dispatch is in the method operation stage to confirm the version of its specific implementation .

And the dynamic type language...hey~ don't mention the version, I don't even know what type you are.

The characteristic of a dynamically typed language is that the main body of its type checking is carried out at runtime rather than at compile time.

It is not difficult to find that the language characteristics of non-virtual methods->virtual methods->dynamically typed languages ​​make the type more and more ambiguous when the code is written. This also represents a concept of program scalability and dynamics.

What is connection time, runtime?

We use the following small example to test how to deal with problems during compilation and runtime in Java.

image.png

Obviously, the size of the array created must not be a negative number. But there is no error reported here. After running the program.

image.png

The program reported an error.

This is, it is easy to explain what is a runtime exception. As the name suggests, runtime exception means that as long as the code does not execute to this line, there will be no problems. The concept opposite to the runtime is the connection-time exception. For example, the very common NoClassDefFoundError belongs to the connection-time exception. Even if the code that caused the connection exception is placed on a path branch that cannot be executed at all, the exception will still be thrown when the class is loaded.

It can be seen that which behavior of a language needs to be checked at runtime or at compile time is completely irregular, and the key is artificially set in the language specification.

Type check

I have answered what is "connect time" and what is "run time", then let's explain what "type checking" is. There is such a line of code as follows:

obj.println(“hello world”);

Although we know what he wants to do, for computers, it's just a no-brainer. It needs to have a specific context (for example, what is the programming language. What type of obj) to discuss to make sense.

In Java, the static type of the variable obj is java.io.PrintStream, so the actual type of obj must be a subclass of PrintStream (a class that implements the PrintStream interface) to be legal. Otherwise, even if obj belongs to a type that does contain the same signature method as the println(String) method, the program will still be impossible to run-because the type detection is illegal. In other words, Java is a strongly typed language. If you don't clarify the type, it can't do a good job for you.

So what are the manifestations of other weakly typed languages?

The same code is different in JavaScript. Regardless of the type of obj at the specific time, regardless of its inheritance relationship. As long as this type of method definition does include the println(String) method, and the method with the same signature can be found, the call can be successful.

The fundamental reason for this difference is that the Java language has already generated the symbolic reference completed by the pritln(String) method during compilation and stored it in the Class file as a parameter of the method call instruction.

This symbolic reference contains the specific type in which the method is defined, the name of the method, the order of the parameters, the type of the parameters, and the return value of the method. Through this symbolic reference, the Java virtual machine can translate the direct reference of the method.

There is a core difference between dynamic languages ​​such as JavaScript and Java. That is, the variable obj itself has no type (if you have learned the C# language, you must have used the var modifier to specifically modify weakly typed variables). Therefore, the compiler can at most determine the method name, parameters, and return value information at compile time, instead of determining the specific type of the method. "Variables have no type but variable values ​​have types". This feature is also a core feature of dynamically typed languages.

Comparison of dynamic language and static language

So which one is better, dynamic or static language? each has its own Hobby. Static languages ​​can determine the type at compile time, so the compiler can provide comprehensive and rigorous type checking. In this way, potential problems related to data types can be resolved during encoding. Use stability to make the project easier to grow bigger. The dynamic language provides great flexibility. Certain functions that require a lot of bloated code to be implemented in a static language may be simple and clear in a dynamic language, which means that development efficiency is improved.

The history of dynamic language development in Java

Java currently still lacks language support for dynamic typing. In the bytecode instruction set before JDK7, the first parameter of the four method invocation instructions (invokevirtual, invokespecial, invokestatic, and invokeinterface) is the symbolic reference of the method being invoked. As mentioned earlier, the symbolic references of methods are generated at compile time. The dynamic language only determines the recipient of the method at runtime. The Java virtual machine can only use the curve to save the country, such as leaving a placeholder type when compiling, and waiting to dynamically generate bytecode at runtime to achieve the adaptation of the specific type to the placeholder. But this will inevitably increase the complexity of dynamically typed language implementation, and will also bring additional performance and memory overhead, which is obviously easy to see. A lot of dynamic classes will come out with method calls. Because the static type of the calling object cannot be determined, an important compile-time optimization such as inlining within a method cannot be improved, resulting in performance degradation.

 We have talked a lot about the theory, maybe you have no experience in weak type development. Let us give some examples to deepen your understanding.

image.png

As in the above case, the elements in the array can be of any type. Even if they have the sayHello() method in their types, it is certainly impossible to determine which class the specific sayHello() code is in when compiling and optimizing. The compiler can only continuously compile every sayHello() method it encounters, and open up memory for users to use. Therefore, this problem is solved at the virtual machine level, which is the technical background of the Invokeynamic and java.lang.invoke packages.

The birth of MethodHandle

What is a method handle? We know that before JDK7, we only rely on symbolic references to confirm the target method of the call. Then after JDK7, the method handle appeared, which is a mechanism for dynamically determining the target method.

In a sense, it can be said that the function of the invokedynamic instruction and the MethodHandle mechanism is the same. All are to solve the problem that the original four "invoke" instruction method assignment rules are completely solidified in the virtual machine. How to transfer the decision right to find the target method from the virtual machine to the specific user code.

What is MethodHandle? Simply put, it is the method handle, through which the corresponding method can be called.

 

image.png

The bottom layer of the Invokedynamic instruction is implemented using MethodHandle. Method handle is a reference that can be executed. It can point to static methods and instance methods, as well as fictitious get and set methods. You can see some of the methods provided by MethodHandle from the following cases.

Picture 4.png

Call flow

1. Create a MethodType, get the signature of the specified method (out parameter and input parameter, that is, the formal parameter and return value type)

2. Find the method handle MethodHandle of MethodType in Lookup

3. Incoming method parameters call the method through MethodHandle

Picture 5.png


image.png

Summarize again. We are talking about the first four invoke instructions in Java. Whether it is static dispatch or dynamic dispatch, when they call the method, the method must be fixed by the virtual machine at compile time, the type of the return parameter, and the type of the formal parameter. The difference is that the reference points Is the type of static/appearance type or actual type.

But let's see! Let's take a closer look! ! The method handle does what the original virtual machine should do. The things that the virtual machine should have done when the class is loaded are put into the runtime and controlled by our business code.

Process detail analysis

 

MethodType

MethodType represents an object of method type. Each MethodHandle has an instance of MethodType. MethodType is used to specify the return value type and parameter type of the method. It contains multiple overloads of factory methods.

image.png

Lookup

MethodHandle.Lookup can get the corresponding MethodHandle through the corresponding findxxx method, which is equivalent to the factory method of MethodHandle. To put it bluntly, the method we need to call is what kind of instruction is used in the original bytecode to call, we move to call it during runtime.

For example, findStatic is equivalent to getting a handle to a static method (similar to the role of Invokestatic), and findVirtual finds a normal method (similar to the role of invokevirtual).

Invoke

One thing to pay attention to is invoke and invokeExact. The former can perform the type conversion of the return value and the parameter when it is called, while the latter is an exact match. (InvokeExact does not strengthen the transfer error report, invoke will automatically convert)

 image.png

Reflection and handle

After talking about the method, many of us feel familiar or simply think this thing is too similar to reflection.

Indeed, from the perspective of Java, MethodHandle has many similarities with Reflection in terms of usage and effects. But they are also different:

1. Reflection and MethodHandle mechanisms are essentially simulating method calls. But Reflection is simulating method calls at the Java level. The MethodHandle is a method call that simulates the bytecode level. The three methods findStatic(), findVirtula(), and findSpecal() on MethodHadles.Lookup correspond to the execution permission verification behaviors of invokestatic, invokevirtual, and invokespecial bytecode instructions. These low-level details do not need to be concerned about in the Reflection API.

2. The java.lang.reflect.Method object in Reflection contains much more information than the java.lang.invoke.MethodHandle object in the MethodHandle mechanism. The former is a comprehensive image of the method on the Java side, including method signatures, descriptors, and the Java side representation of various attributes in the method attribute table, as well as running information such as execution permissions. The latter only contains information about the execution of the method. As the developer's saying goes, Reflection is heavyweight, and MethodHandle is lightweight.

3. Since MethodHandle is a simulation of the method instruction call of bytecode, in theory, the various optimizations made by the virtual machine in this area should also be supported by similar ideas on MethodHandle (currently still being improved) . Such as method inlining and so on. However, it is almost impossible to directly implement various optimization measures through reflection.

The design goal of Reflection API is only to serve the Java language. The design philosophy of MethodHandle is to serve all languages ​​on the Java virtual machine. It also includes the Java language, and the Java language is not the protagonist here.

Capturing and non-capturing of lambda expressions

When a Lambda expression accesses a non-static variable or object defined outside of the Lambda expression (that is, when there are unfixed variables in the Lambda expression), then the Lambda expression is called "captured".

image.png

A non-capturing Lambda expression means that the Lambda expression does not access a non-static variable or object defined outside of the Lambda expression. (That is to say, when there are no unfixed variables in the Lambda expression), then the Lambda expression is called "non-capturing".

image.png

Whether the lambda expression is a capture is closely related to performance. A non-capturing lambda is usually more efficient than capturing. Non-capturing only needs to be calculated once, and then a unique instance is returned each time it is used. The captured variable is dynamic, so it needs to be recalculated every time. And from the current implementation point of view, his implementation is very similar to anonymous inner classes. The worst case performance of Lambda expressions is the same as the inner class, and the good case is faster than the inner class.

 

to sum up

The Lambda language is actually completed by the method handle. This is roughly achieved (when the JVM is compiled, use invokedynamic to implement Lambda expressions, and invokedynamic is implemented by MethodHandle. So the JVM will compile a set of available Lambda expression codes based on the Lambda expression code you write. To call the bytecode code of MethodHandle )

If we have done C# language development, we can think about why there is a delegation from the perspective of the content of this chapter, not just limited to the usage of delegation.

Handle type (MethodType) allows us to specify a method with a specific description of the method, with the method name, to be able to locate a class of functions. The access method handle is basically the same as the original call instruction, but its call exception, including some permission checks, can only be found at runtime.

In the case, we completed the characteristics of the dynamic language, and carried out different invoke bytecode instructions through the method name and the object body passed in. And Bike and Man have nothing to do with each other.

When we use Lambda expressions, we did not specify the name of the run method, and the capture type lambda did not directly pass the method value. It can be seen that the underlying layer is processed by lambda expressions.

image.png

This also means that there are more call steps in the call chain, so is the performance of Lambda expressions lower? For most "non-capturing" Lambda expressions, the escape analysis of the JIT compiler can optimize this part of the difference, and the performance is no different from traditional methods. But for "capture type" expressions, you need to continuously generate adapters through method handles, and the performance is naturally much lower (but compared with convenience, the loss of some performance is acceptable).

Invokedynamic instructions are actually implemented through method handles. The most relevant to us is the Lambda syntax. We understand the principle and can ignore the disputes about the performance of Lambda. At the same time, we should try to write some "non-capturing" Lambda expressions.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Guess you like

Origin blog.csdn.net/weixin_47184173/article/details/109903542
Recommended