In Java, a bug that has existed for more than ten years ...

Today, I share a surprising bug in the JDK. The magic of this bug is that it is too simple to reproduce its use case, and the human eye can answer the question, but it has existed in the JDK for more than ten years. After testing, we found that this problem exists from JDK8 to 14.

You can try this code on your own development platform:

public class Hello {  
    public void test() {  
        int  i = 8;  
        while  ((i -= 3) > 0);  
        System.out.println("i = " + i);  
    }  

    public static void main(String[] args) {  
        Hello hello = new Hello();  
        for (int  i = 0; i < 50_000; i++) {  
            hello.test();  
        }  
    }  
}  

Then use the following command to execute: java Hello

Then, you will see this output:

In Java, a bug that has existed for more than ten years ...

 

Of course, at the beginning of the program, the correct "i = -1" can still be printed.

This problem finally solved by two colleagues of Huawei JDK and went back to the community. I will probably talk about the thinking of analysis here. Pay attention to the WeChat public account: Java technology stack, reply in the background: java, you can get the N latest Java tutorials I have compiled, all are dry goods.

First, you can find that the results are correct using interpreted execution, which means that this is basically a problem with the JIT compiler, and then close the C1 compilation through -XX: -TieredCompilation, the problem also reappears, but -XX: TieredStopAtLevel = 3 Stop the JIT compilation in the C stage, and the problem will not be repeated. This can be determined to be the problem of C2.

Next, a colleague immediately guessed that this "/" is actually ('0'-1), which happens to be the ascii code of the character zero minus 1.

Well, the importance of memorizing the ascii code table is reflected. The next step is to find the int-to-character in c2. The key point is the character '0'. Of course, you need to have enough understanding of C2, and immediately find the method of character conversion in c2 (for specific code, please refer to the OpenJDK community):

void PhaseStringOpts::int_getChars(GraphKit& kit, Node* arg, Node* char_array, Node* start, Node* end) {  
  // ......  
  // char sign = 0;  

  Node* i = arg;  
  Node* sign = __ intcon(0);  

  // if (i < 0) {  
  //     sign = '-';  
  //     i = -i;  
  // }  
  {  
    IfNode* iff = kit.create_and_map_if(kit.control(),  
                                        __ Bool(__ CmpI(arg, __ intcon(0)), BoolTest::lt),  
                                        PROB_FAIR, COUNT_UNKNOWN);  

    RegionNode *merge = new (C) RegionNode(3);  
    kit.gvn().set_type(merge, Type::CONTROL);  
    i = new (C) PhiNode(merge, TypeInt::INT);  
    kit.gvn().set_type(i, TypeInt::INT);  
    sign = new (C) PhiNode(merge, TypeInt::INT);  
    kit.gvn().set_type(sign, TypeInt::INT);  

    merge->init_req(1, __ IfTrue(iff));  
    i->init_req(1, __ SubI(__ intcon(0), arg));  
    sign->init_req(1, __ intcon('-'));  
    merge->init_req(2, __ IfFalse(iff));  
    i->init_req(2, arg);  
    sign->init_req(2, __ intcon(0));  

    kit.set_control(merge);  

    C->record_for_igvn(merge);  
    C->record_for_igvn(i);  
    C->record_for_igvn(sign);  
  }  

  // for (;;) {  
  //     q = i / 10;  
  //     r = i - ((q << 3) + (q << 1));  // r = i-(q*10) ...  
  //     buf [--charPos] = digits [r];  
  //     i = q;  
  //     if (i == 0) break;  
  // }  

  {  
   // 略去和这个循环相对应的代码   
  }  

  // 略去很多代码   
}  

It can be seen that an "i <0" judgment is introduced in the middle representation stage. The main reason is the CmpI node. It seems that the logic here is wrong, so that i is obviously less than 0, but the result is a branch greater than 0. In this way, the result of directly summing the character '0' with i is wrong .

Why is this CmpI wrong? Using the c2visualizer tool, you can see that at the GVN stage, the CmpI in the above loop and the CmpI introduced here are merged. The full name of GVN is Global Value Numbering, the name is very large, in fact, the expression is deduplicated.

E.g:

In Java, a bug that has existed for more than ten years ...

 

In the above example, the input parameters of the two CmpIs are identical. Both are variable i and integer 0, so these two CmpI nodes are actually the same. In this case, the compiler will merge these two CmpI nodes into one when doing intermediate optimization.

So far, it's actually no problem. But then, the compiler will do some special transformations on the empty loop body. The compiler can directly calculate that the value of i is -1 after the end of the empty loop body, and find that the empty loop body does nothing, so it Simply change the two parameters of CmpI to -1, so that the loop can't come in-and the compiler can completely eliminate the CmpI by doing another constant propagation.

However, there is a problem with CmpI here. Here, it is forced to make False so that the loop is not executed, and the value of i is also directly changed to the value at the end of the loop. But the CmpI merged just now was also eaten.

This leads to directly taking the value of i = -1 into the branch of i> = 0. So the modification is also very simple, that is, when transforming the CmpI, see if it has any other out, if there is, copy it out.

The related issues and patches of this bug are here:

https://bugs.openjdk.java.net/projects/JDK/issues/JDK-8231988?filter=allissues

There is no detailed analysis process on the JBS system, only the final patch, so I wrote a summary of this problem and posted it here.

It can be seen that even very simple test cases will undergo various complex transformations and optimizations inside the compiler. Then some stages of optimization may affect the latter stage, so compiler bugs are often obscure. But conversely, it's also very interesting.

Guess you like

Origin www.cnblogs.com/CQqfjy/p/12724129.html