StringTable of JVM study notes

table of Contents

background

String basic characteristics

Immutability

Pass by value

String memory allocation

Basic operation of String

String splicing operation

Use of intern()

Garbage collection of StringTable

String deduplication operation in G1

Conclusion

background

After learning JVM for a long time, it's time to review String

String basic characteristics

String is a string final class and cannot be inherited; it implements the Serializable interface and the Comparable interface, indicating serializable and comparable size

jdk8 and before internally defined final char[] to store strings, jdk9 changed to final byte[]; the reason for the change is that String objects in the heap are mainly Latin characters. These use 1 byte is enough, so String internal Use byte[] instead, and add coding marks to adapt to complex characters such as Chinese. The same change also occurs in string classes such as StringBuffer, StringBuilder, etc.

Immutability

String s1 = "szc";
String s2 = "szc";


System.out.println(s1 == s2); // true,
// 因为String内部覆写了compare()方法,实现了逐字符比较;而且s1和s2都指向的是同一个常量池对象


System.out.println(s1.hashCode()); // 114396
s1 += "is"; // 同样的还有重新赋值、replace()
System.out.println(s1.hashCode()); // 109937926

From the different s1 hash codes before and after splicing, it can be seen that the objects are different.

Pass by value

The following code shows that String parameter passing is value passing

public class StringTest1 {
    private String str = "szc";
    private char[] ch = {'1', '2', '3'};

    public void change(String str, char[] ch) {
        str = "sss"; // str的哈希和this.str的哈希不一样
        ch[0] = 'a';
    }

    public static void main(String[] args) {
        StringTest1 test = new StringTest1();
        test.change(test.str, test.ch);

        System.out.println(test.str); // szc
        System.out.println(test.ch); // a23
    }
}

The string literal is stored in the constant pool, and the same string is not stored.

String Pool is a fixed-size hash table, that is, array + linked list. The default size is 60013 (jdk7 and later). The reason for the larger length is to reduce the length of the linked list and improve efficiency. The minimum value that can be set in jdk8 is 1009, which can be passed -XX:StringTableSize to set.

String memory allocation

String constant pool and string objects are in the heap

1) String objects declared directly with double quotation marks will be directly stored in the constant pool, such as String info = "szc";

2), intern() method

Reasons for putting StringTable in the heap:

1) The string constant pool in jdk6 is in the permanent generation, and this area is relatively small

2) The frequency of permanent generation collection is very low, which is not conducive to releasing memory

Basic operation of String

Use the following example to prove that duplicate strings will not be added to the string constant pool

public class StringTest2 {
    public static void main(String[] args) {
        System.out.println("1");
        System.out.println("2");
        System.out.println("3");


        System.out.println("1");
        System.out.println("2");
        System.out.println("3");
    }
}

Before executing the first line System.out.println("1");, there are a total of 2513 literals in the constant pool

After executing the first line System.out.println("1");, there are two more added: line feed and "1"

After executing the first System.out.println("2");, add one more: "2"

After executing the first System.out.println("3");, the most add one: "3"

Then execute the following respectively to output 123, there will be no new strings stored

The following example illustrates the relationship between stack objects, heap objects, and constant pool objects

public class Memory {
    public static void main(String[] args) {
        int i = 1;
        Object obj = new Object();
        Memory mem = new Memory();


        mem.foo(obj);
    }


    private void foo(Object param) {
        String str = param.toString();
        System.out.println(str);
    }
}

The structure diagram is as follows

It can be seen that the toString() method creates a string object in the string pool, and then the str in the foo() method points to this object

String splicing operation

The splicing result of constants and constants is in the constant pool, the principle is optimization during compile time

There will be no constants with the same content in the constant pool

As long as one of them is a variable when splicing, the result is in the heap, and the splicing principle is StringBuilder

If the splicing result calls the inter() method, and the string content is not in the constant pool, put it into the pool actively and return the object address

 

Case 1:

public class StringTest3 {
    public static void main(String[] args) {
        String s1 = "a" + "b" + "c"; // a + b + c在编译期就被优化为abc
        String s2 = "abc";


        System.out.println(s1 == s2); // true
    }
}

Case 2:

public class StringTest3 {
    public static void main(String[] args) {
        String s2 = "abc";


        System.out.println(s1 == s2);


        String s3 = "a";
        String s4 = "b";
        String s5 = "c";


        String s6 = s3 + "bc"; // 拼接有一个变量,结果就在堆中新建一个String对象,内容为拼接后的结果
        String s7 = "a" + s4 + "c";
        String s8 = "ab" + s5;


        System.out.println(s1 == s6); // false
        System.out.println(s1 == s7); // false
        System.out.println(s1 == s8); // false


        System.out.println(s6 == s8); // false


        String s9 = s8.intern(); // intern()方法在常量池创建新的字面量对象,或者复用已有的,然后返回对象地址
        System.out.println(s2 == s9); // 由于s8的字面量abc已经有了,所以返回的s9的地址就是s2的地址,故而输出为true
    }
}

Case 3:

public static void f() {
    String s1 = "a";
    String s2 = "b";
    String s3 = "ab";
    String s4 = s1 + s2;
    System.out.println(s3 == s4);
}

Corresponding bytecode

0 ldc #5 <a>
2 astore_0
3 ldc #6 <b>
5 astore_1
6 ldc #13 <ab>
8 astore_2
9 new #8 <java/lang/StringBuilder>
12 dup
13 invokespecial #9 <java/lang/StringBuilder.<init>>
16 aload_0
17 invokevirtual #10 <java/lang/StringBuilder.append>
20 aload_1
21 invokevirtual #10 <java/lang/StringBuilder.append>
24 invokevirtual #12 <java/lang/StringBuilder.toString>
27 astore_3
28 getstatic #3 <java/lang/System.out>
31 aload_2
32 aload_3
33 if_acmpne 40 (+7)
36 iconst_1
37 goto 41 (+4)
40 iconst_0
41 invokevirtual #4 <java/io/PrintStream.println>
44 return

When a variable appears in the string splicing, a StringBuilder is first created (line 9 of the bytecode), and then the splicing operation actually calls the append() method of the StringBuilder (line 17 and line 21 of the bytecode). Here are the differences Append an a, append a b, and finally call StringBuilder's toString (bytecode line 24), ≈ new String("ab"), so the final output is false. Finally, StringBuilder is used after jdk5, StringBuffer is used before jdk5

 

The efficiency of the splicing operation is much lower than the append() method, because a new StringBuilder and a new String are created for each splicing, while the append() method does not. You can pass in the length of the string when constructing the StringBuilder object. Limit value to further optimize, so as to avoid multiple expansion of the character array.

 

Note that the constants here include final objects

Case 4:

public static void g() {
    final String s1 = "a";
    final String s2 = "b";
    String s3 = "ab";
    String s4 = s1 + s2;


    System.out.println(s3 == s4); // true
}

Corresponding bytecode

0 ldc #5 <a>
2 astore_0
3 ldc #6 <b>
5 astore_1
6 ldc #13 <ab>
8 astore_2
9 ldc #13 <ab>
11 astore_3
12 getstatic #3 <java/lang/System.out>
15 aload_2
16 aload_3
17 if_acmpne 24 (+7)
20 iconst_1
21 goto 25 (+4)
24 iconst_0
25 invokevirtual #4 <java/io/PrintStream.println>
28 return

It can be seen from lines 6, 8, and 9 of the bytecode that the program also uses compile-time optimization for String s4 = s1 + s2

Use of intern()

The core description of this method in jdk8 is as follows

A pool of strings, initially empty, is maintained privately by the class String.

When the intern method is invoked, if the pool already contains a string equal to this String object as determined by the equals(Object) method, then the string from the pool is returned. Otherwise, this String object is added to the pool and a reference to this String object is returned.

It follows that for any two strings s and t, s.intern() == t.intern() is true if and only if s.equals(t) is true.

 The general idea is that when this method is called, if the constant pool contains the literal value of the string object, the literal value object in the pool will be returned; if it does not, the new literal value will be added to the pool and the new literal value will be returned. Reference to the object. For two strings s and t, if and only if the literal values ​​of the two are equal, the return value of the intern() method of the two will be equal

There are two ways to ensure that the variable s points to the data in the string constant pool: literal assignment and calling intern()

 

How many objects does new String() create? 2, the java code is as follows

public class StringNewTest {
    public static void main(String[] args) {
        String s = new String("szc");
    }
}

Corresponding bytecode

0 new #2 <java/lang/String>
3 dup
4 ldc #3 <szc>
6 invokespecial #4 <java/lang/String.<init>>
9 astore_1
10 return

According to the bytecode of lines 0 and 4, it is obvious that two new ones have been created, one is a new object, and one is szc.

 

In the same way, new String("a") + new String("b"), 6 are created, corresponding to bytecodes

0 new #2 <java/lang/StringBuilder>
3 dup
4 invokespecial #3 <java/lang/StringBuilder.<init>>
7 new #4 <java/lang/String>
10 dup
11 ldc #5 <a>
13 invokespecial #6 <java/lang/String.<init>>
16 invokevirtual #7 <java/lang/StringBuilder.append>
19 new #4 <java/lang/String>
22 dup
23 ldc #8 <b>
25 invokespecial #6 <java/lang/String.<init>>
28 invokevirtual #7 <java/lang/StringBuilder.append>
31 invokevirtual #9 <java/lang/StringBuilder.toString>
34 astore_1
35 return

According to line 0, line 7, line 11, line 19, and line 23, two new String objects, a and b literal objects, and a StringBuilder object are constructed, and then the toString of StringBuilder is called on line 31 () method, we can see that a new string object is created

@Override
public String toString() {
    // Create a copy, don't share the array
    return new String(value, 0, count);
}

The source code corresponding to new String(value, 0, count) is as follows

public String(char value[], int offset, int count) {
    if (offset < 0) {
        throw new StringIndexOutOfBoundsException(offset);
    }
    if (count <= 0) {
        if (count < 0) {
            throw new StringIndexOutOfBoundsException(count);
        }
        if (offset <= value.length) {
            this.value = "".value;
            return;
        }
    }
    // Note: offset or count might be near -1>>>1.
    if (offset > value.length - count) {
        throw new StringIndexOutOfBoundsException(offset + count);
    }
    this.value = Arrays.copyOfRange(value, offset, offset+count);
}

It can be seen from the last line that no ab object was created in the constant pool (analyzing the bytecode instruction can also get the same conclusion, there is no instruction similar to ldc <ab>), so a total of 6 objects were created.

 

Analyze the following code execution results (jdk7 and above)

public class StringNewTest {
    public static void main(String[] args) {
        String s = new String("a");
        s.intern();
        String s1 = "a";
        System.out.println(s == s1); // false

        String s2 = new String("a") + new String("b");
        s2.intern();
        String s3 = "ab";
        System.out.println(s2 == s3); // true
    }
}

When s.intern() is executed, since there is already a literal in the constant pool (created in new String), this line of code is actually useless here. s1 directly points to the constant pool, and s points to the string object created in the heap space, so the result is false

When s2.intern() is executed, since there is no ab in the string constant pool when s2 is created, after the intern() method is executed, ab will be created in the string constant pool. And jdk7 and later, if the string value passed by the intern() method is not in the constant pool, then the JVM will only create a new string object in the heap, and then the constant pool creates a pointer to the address of the new string object in the heap Reference, and s3 points to the address of the object in the constant pool, which is actually a string object in the heap, so s2 == s3 is true

In jdk6 and before, when intern() a new string value, it will actually create two new objects in the permanent generation constant pool and the heap, so s2 == s3 is false at that time

 

For the following code

String s4 = "cd";
String s5 = new String("c") + new String("d");
s4.intern();
System.out.println(s4 == s5); // false

Since the intern() here is s4, the "cd" pointed to by s4 has been created in the constant pool when s4 is created, so this intern() is useless. And s5 points to the String object in the heap, so s4 is not equal to s5

 

For the following code

String s6 = new String("e") + new String("f");
String s7 = s6.intern();


System.out.println(s6 == "ef"); // true
System.out.println(s7 == "ef"); // true

When s6 is intern, there is no ef in the constant pool, so the address of the string object ef in the heap (naturally the address of s6) is stored in the constant pool, which is returned to s7. So s6, "ef" and s7 are equal to each other

 

For the following code

String s6 = new String("ef");
s6.intern();
String s7 = "ef";

System.out.println(s6 == s7); // false

Since when s6 is created, ef is obtained through new String(), so the literal ef is stored in the constant pool, and s6 points to the string object in the heap, so s6 == s7 is false

 

When constructing a large number of repeated string objects, the intern() method will greatly improve the space-time efficiency

Garbage collection of StringTable

You can use -XX:+PrintStringTableStatistics to print the statistics of the string constant pool

Test code

public class StringGcTest {
    public static void main(String[] args) {
        for (int i = 0; i < 100000; i++) {
            String.valueOf(i).intern();
        }
    }
}

Parameters: -Xms10m -Xmx10m -XX:+PrintStringTableStatistics -XX:+PrintGCDetails. Output result

[GC (Allocation Failure) [PSYoungGen: 2048K->504K(2560K)] 2048K->889K(9728K), 0.0797190 secs] [Times: user=0.00 sys=0.00, real=0.08 secs]
[GC (Allocation Failure) [PSYoungGen: 2552K->504K(2560K)] 2937K->1009K(9728K), 0.0012426 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
[GC (Allocation Failure) [PSYoungGen: 2552K->488K(2560K)] 3057K->1057K(9728K), 0.0012180 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
Heap
PSYoungGen      total 2560K, used 2086K [0x00000000ffd00000, 0x0000000100000000, 0x0000000100000000)
  eden space 2048K, 78% used [0x00000000ffd00000,0x00000000ffe8f980,0x00000000fff00000)
  from space 512K, 95% used [0x00000000fff00000,0x00000000fff7a020,0x00000000fff80000)
  to   space 512K, 0% used [0x00000000fff80000,0x00000000fff80000,0x0000000100000000)
ParOldGen       total 7168K, used 569K [0x00000000ff600000, 0x00000000ffd00000, 0x00000000ffd00000)
  object space 7168K, 7% used [0x00000000ff600000,0x00000000ff68e4b8,0x00000000ffd00000)
Metaspace       used 3241K, capacity 4496K, committed 4864K, reserved 1056768K
  class space    used 350K, capacity 388K, committed 512K, reserved 1048576K
SymbolTable statistics:
Number of buckets       :     20011 =    160088 bytes, avg   8.000
Number of entries       :     13291 =    318984 bytes, avg  24.000
Number of literals      :     13291 =    568080 bytes, avg  42.742
Total footprint         :           =   1047152 bytes
Average bucket size     :     0.664
Variance of bucket size :     0.664
Std. dev. of bucket size:     0.815
Maximum bucket size     :         6
StringTable statistics:
Number of buckets       :     60013 =    480104 bytes, avg   8.000
Number of entries       :     30235 =    725640 bytes, avg  24.000
Number of literals      :     30235 =   1752696 bytes, avg  57.969
Total footprint         :           =   2958440 bytes
Average bucket size     :     0.504
Variance of bucket size :     0.464
Std. dev. of bucket size:     0.681
Maximum bucket size     :         4

It can be seen from the Number of entries: 30235 and Number of literals: 30235 values ​​in the StringTable statistics information that a constant pool GC has occurred

 

When not enabled, just change + to-

String deduplication operation in G1

For each object accessed, it is checked whether it is a candidate String object to be deduplicated, and if it is, a reference to this object is inserted into the queue.

A background thread is dedicated to de-duplication. Processing this queue means deleting this element from the queue, and then de-duplicating the object referenced by this element.

The method of deduplication for String objects is to use a hash table to record all non-repeated char arrays used by string objects. When deduplication, check whether there is an identical char array in the heap according to this hash table.

If it exists, the String object will reference the array that already exists in the table, release the reference to the array in the heap, and the array in the heap will be GC off; if it does not exist, the char array will be inserted into the hash table so that later generations can share it. .

Conclusion

At this point in the writing, the String overwrite combined with JVM is over. Later I will sort out the study notes of the garbage collector and share it with you

Guess you like

Origin blog.csdn.net/qq_37475168/article/details/106599844