Analysis of the realization principle of some syntactic sugar in Java

1. What is syntactic sugar

We all know that syntactic sugar refers to adding a certain grammar to a programming language. This grammar has no effect on the function of the language, but it is more convenient for programmers to use. For example, for each loop in Java, we can write this way when we write code to make the code more concise. But after compiling into a class file, for each becomes an ordinary loop statement (implemented by bytecodes such as goto, if, etc.).

What I want to share is to understand how some common syntactic sugar in Java is realized by decompiling and viewing bytecode.

2. How to analyze the realization principle of syntactic sugar

So how do we analyze the implementation principle behind syntactic sugar? As mentioned earlier, after the Java code file is compiled into a class file, the syntactic sugar becomes an ordinary basic grammar (this process is called sugar relief). Then we can understand the basic grammar after the sugar is solved by analyzing the class file and we can know how Java supports these syntactic sugar.

Analyzing class files.
Class files are binary files. To understand it directly, you need to be very familiar with its organizational structure. Therefore, generally use the decompilation tool to decompile the class file into a readable form and then conduct research.

The javap command is provided in the JDK , which can translate the class file into a relatively easy-to-read bytecode form. Analyzing class files in this way can get the most comprehensive information, but you also need to understand a lot of bytecode-related knowledge before you can read it.

In addition to the javap command, there are many third-party tools that can directly decompile class files into Java code. The results obtained in this way are very readable. IDEA integrates such a decompilation tool, and you can directly open the class file with IDEA to see the result of decompilation.

  • javap command
  • IDEA

Below we use the above two methods to analyze some of the syntactic sugar in Java.

3. Syntactic sugar in Java

3.1 for each loop

(1) for each to traverse the array
Java code

int[] arr = {
    
    1,2,3,4};
for(int i : arr) {
    
    
	System.out.println(i);
}

After decompilation

int[] var1 = new int[]{
    
    1, 2, 3, 4};
int var2 = 0;
int[] var3 = var1;
int var4 = var1.length;

for(int var5 = 0; var5 < var4; ++var5) {
    
    
   int var6 = var3[var5];
   var2 += var6;
}

We can see that after decompilation, the for each loop becomes an ordinary for loop. It can be understood that for each traversing the array is achieved through a more basic for loop.

3.2 Boxing and unboxing

Take the mutual conversion of Integer and int as an example

Java code:

int num1 = 1000;
Integer num2 = num1;
int num3 = num2;

After decompilation:

 short var1 = 1000;
 Integer var2 = Integer.valueOf(var1);
 int var3 = var2;

From the result of decompilation, you can see the boxing process, that is, the use of the Integer.valueOf method. But I can't see the realization principle of Integer to int conversion.

In order to verify our conclusion, we need to use the javap command to look at the lower-level bytecode.

javap -v BoxingAndUnboxing
Classfile /C:/Users/xiaozhigang/Desktop/Java语法糖/BoxingAndUnboxing.class
  Last modified 2019-11-14; size 400 bytes
  MD5 checksum 769b615cb1893668acb2c7660c6a233c
  Compiled from "BoxingAndUnboxing.java"
public class BoxingAndUnboxing
  minor version: 0
  major version: 52
  flags: ACC_PUBLIC, ACC_SUPER
Constant pool:
   #1 = Methodref          #5.#14         // java/lang/Object."<init>":()V
   #2 = Methodref          #15.#16        // java/lang/Integer.valueOf:(I)Ljava/lang/Integer;
   #3 = Methodref          #15.#17        // java/lang/Integer.intValue:()I
   #4 = Class              #18            // BoxingAndUnboxing
   #5 = Class              #19            // java/lang/Object
   #6 = Utf8               <init>
   #7 = Utf8               ()V
   #8 = Utf8               Code
   #9 = Utf8               LineNumberTable
  #10 = Utf8               main
  #11 = Utf8               ([Ljava/lang/String;)V
  #12 = Utf8               SourceFile
  #13 = Utf8               BoxingAndUnboxing.java
  #14 = NameAndType        #6:#7          // "<init>":()V
  #15 = Class              #20            // java/lang/Integer
  #16 = NameAndType        #21:#22        // valueOf:(I)Ljava/lang/Integer;
  #17 = NameAndType        #23:#24        // intValue:()I
  #18 = Utf8               BoxingAndUnboxing
  #19 = Utf8               java/lang/Object
  #20 = Utf8               java/lang/Integer
  #21 = Utf8               valueOf
  #22 = Utf8               (I)Ljava/lang/Integer;
  #23 = Utf8               intValue
  #24 = Utf8               ()I
{
    
    
  public BoxingAndUnboxing();
    descriptor: ()V
    flags: ACC_PUBLIC
    Code:
      stack=1, locals=1, args_size=1
         0: aload_0
         1: invokespecial #1                  // Method java/lang/Object."<init>":()V
         4: return
      LineNumberTable:
        line 1: 0

  public static void main(java.lang.String[]);
    descriptor: ([Ljava/lang/String;)V
    flags: ACC_PUBLIC, ACC_STATIC
    Code:
      stack=1, locals=4, args_size=1
         0: sipush        1000
         3: istore_1
         4: iload_1
         5: invokestatic  #2                  // Method java/lang/Integer.valueOf:(I)Ljava/lang/Integer;
         8: astore_2
         9: aload_2
        10: invokevirtual #3                  // Method java/lang/Integer.intValue:()I
        13: istore_3
        14: return
      LineNumberTable:
        line 3: 0
        line 4: 4
        line 6: 9
        line 7: 14
}
SourceFile: "BoxingAndUnboxing.java"

Let's analyze the execution process of the bytecode above. First recall the stack of Java's runtime data area. As shown in the figure below, a stack frame is allocated when each method is executed. A stack frame includes two parts: the operand stack and the local variable table. Most of the main method part of the above bytecode is the operation of these two structures.
Insert picture description here

Insert picture description here
From the bytecode, we can see that the boxing operation is implemented through the valueOf method of Integer.

3.3 switch statement supports String type

(1) The switch statement supports the String type

Java code:

String str = "hello";

switch(str) {
    
    
	case "world":
		System.out.println("world");
		break;
	case "hello":
		System.out.println("hello");
		break;
	default:
		break;
}

After decompilation:

String var1 = "hello";
byte var3 = -1;
switch(var1.hashCode()) {
    
    
	case 99162322:
	    if (var1.equals("hello")) {
    
    
	        var3 = 1;
	    }
	    break;
	case 113318802:
	    if (var1.equals("world")) {
    
    
	        var3 = 0;
	    }
}

switch(var3) {
    
    
	case 0:
	    System.out.println("world");
	    break;
	case 1:
	    System.out.println("hello");
}

As you can see, Java implements the switch's support for String through the hashCode() and equals() methods of the string. We know that switch supports the int type, and the return value of the hashCode method is an int type. After searching through the hashCode, use the equals method to judge to avoid hash conflicts.

(2) Simple analysis of the implementation principle of switch from bytecode

Example 1:

public int tableSwitchTest(int num) {
    
    
	switch(num) {
    
    
		case 100: return 0;
		case 101: return 1;
		case 104: return 4;
		default: return -1;
	}
}

Use the javap -c command to analyze its bytecode:

public int tableSwitchTest(int);
  Code:
     0: iload_1
     1: tableswitch   {
    
     // 100 to 104
                 100: 36
                 101: 38
                 102: 42
                 103: 42
                 104: 40
             default: 42
        }
    36: iconst_0
    37: ireturn
    38: iconst_1
    39: ireturn
    40: iconst_4
    41: ireturn
    42: iconst_m1
    43: ireturn

Obviously, the tableswitch in the bytecode implements the switch statement in the Java syntax. 100:36, which means that when the operand is 100, it jumps to line 36 to execute. It is iconst_0, ireturn here, that is, 0 is returned. And so on. But when we compare the Java code, we will find a problem: 102 and 103 are extra in the bytecode. This is because the compiler will analyze the value of the case. If the value of the case is relatively compact with a few or no faults in the middle, tableswitch will be used to implement the switch-case. If there are faults, some false cases will be generated to help fill in the continuity. , This can realize the search of O(1) time complexity: because the case has been filled in to be continuous, it can be found at one time through the cursor.

Example 2:

public int lookupSwitchTest(int num) {
    
    
	switch(num) {
    
    
		case 1: return 1;
		case 100: return 100;
		case 10: return 10;
		
		default: return -1;
	}
}

Use the javap -c command to analyze its bytecode:

public int lookupSwitchTest(int);
  Code:
     0: iload_1
     1: lookupswitch  {
    
     // 3
                   1: 36
                  10: 41
                 100: 38
             default: 44
        }
    36: iconst_1
    37: ireturn
    38: bipush        100
    40: ireturn
    41: bipush        10
    43: ireturn
    44: iconst_m1
    45: ireturn

The switch implementation in this example uses lookupswitch. This is because the case value distribution in this example is relatively sparse. It is obviously unreasonable to use the above method of filling cases. The keys of lookupswitch are sorted and can be realized by binary search when searching, and the time complexity is O(logN).

So the conclusion is: when the switch-case statement is relatively sparse, the compiler will use the lookupswitch instruction to implement it, otherwise, the compiler will use tableswitch to implement it.

3.4 Generic

Java code

HashMap<String, String> map = new HashMap<String, String>();   
map.put("name", "xiaoming");
String name = map.get("name");
System.out.println(name);

After decompilation

HashMap var1 = new HashMap();
var1.put("name", "xiaoming");
String var2 = (String)var1.get("name");
System.out.println(var2);

After decompilation, the generic type HashMap<String,String> disappeared, only the ordinary type HashMap. During the get operation, a forced type conversion was performed to convert the Object type to the String type we need.

3.5 Method variable length parameter

Java code

public static void main(String[] args) {
    
    
	print("hello","world");
}

public static void print(String...args) {
    
    
	for(int i = 0; i < args.length; i++) {
    
    
		System.out.println(args[i]);
	}
}

After decompilation:

public static void main(String[] var0) {
    
    
   print("hello", "world");
}

public static void print(String... var0) {
    
    
   for(int var1 = 0; var1 < var0.length; ++var1) {
    
    
       System.out.println(var0[var1]);
   }
}

Guess you like

Origin blog.csdn.net/vxzhg/article/details/103073799