Part 1 of JVM_13_StringTable_Silicon Valley

1 Basic characteristics of String

The basic characteristics of String

  • String: String, identified by a pair of "".
  • String: Declared as final, cannot be inherited
  • String implements the Serializable interface: indicates that the string supports serialization. Implemented the Comparable interface: the identification String can be compared in size
  • String defined final char[] value in jdk8 and before to store string data. jdk9 changed to byte[]

String storage structure changes in jdk9

Official document: https://openjdk.java.net/jeps/254

Motivation

The current implementation of the String class stores characters in a char array, using two bytes (sixteen bits) for each character. Data gathered from many different applications indicates that strings are a major component of heap usage and, moreover, that most String objects contain only Latin-1 characters. Such characters require only one byte of storage, hence half of the space in the internal char arrays of such String objects is going unused.

Description

We propose to change the internal representation of the String class from a UTF-16 char array to a byte array plus an encoding-flag field. The new String class will store characters encoded either as ISO-8859-1/Latin-1 (one byte per character), or as UTF-16 (two bytes per character), based upon the contents of the string. The encoding flag will indicate which encoding is used.

Conclusion: String is no longer stored in char[], but changed to byte[] with encoding mark, which saves some space.

public final class String 
    implements java.io.Serializable,Comparable<String>, CharSequence {
    
    
    @Stable
    private final byte[] value;
}

Are StringBuffer and StringBuilder still indifferent?

String-related classes suah as AbstractStringBuilder,StringBuilder,and StringBuffer will be updated to use the same representation,as will the HotSpot VM‘s intrinsic(固有的、内置的)string operations.

Also made corresponding changes

string constant pool

  • String: Represents an immutable sequence of characters. Short name: immutability.
    • When reassigning a string, it is necessary to rewrite the specified memory area for assignment, and the original value cannot be used for assignment.
    • When performing concatenation operations on existing strings, it is also necessary to re-specify the memory area for assignment, and the original value cannot be used for assignment.
    • When calling the replace() method of String to modify the specified character or string, it is also necessary to re-specify the memory area for assignment, and the original value cannot be used for assignment.
  • Assign a value to a string in a literal way (different from new), and the string value at this time is declared in the string constant pool.
  • Strings with the same content will not be stored in the string constant pool.
  • The String Pool of String is a Hashtable with a fixed size, and the default size and length is 1009. If a lot of Strings are put into the String Pool, it will cause serious Hash conflicts, resulting in a very long linked list, and the linked list will directly lose The impact is that the performance will drop significantly when calling String.intern.
  • Use -XX:StringTableSize to set the length of StringTable
  • StringTable is fixed in jdk6, which is the length of 1009 , so if there are too many strings in the constant pool, the efficiency will drop rapidly. StringTableSize setting is not required
  • In jdk7, the default length of StringTable is 60013
  • Starting from jdk8, if you set the length of StringTable, 1009 is the minimum value that can be set.
jinfo -flag StringTableSize 12512
-XX:StringTableSize=60013

2 String memory allocation

String memory allocation

  • There are 8 basic data types and a special type String in the Java language. These types provide a concept of a constant pool to make them faster and more memory-efficient during operation.

  • The constant pool is similar to a cache provided by the Java system level. The constant pools of basic data types in 8 are coordinated by the system. String's constant pool is special. It is mainly used in two ways

    • String objects declared directly using double quotes will be directly stored in Long Ah Ling Chi.

      For example: String info = "hello";

    • If it is not a String object declared with double quotes, you can use the intern() method provided by Stringt. This will be introduced later

  • Java 6 and before, the string constant pool is stored in the permanent generation

  • In Java7, Oracle engineers made a big change to the logic of the string pool, that is, to adjust the location of the string constant pool to the Java heap.

    • All strings are saved in the heap (Heap), just like ordinary objects, so that you only need to adjust the heap size when you tune the application.
    • The concept of string constant pool was used a lot before, but this change gives us enough reasons for us to reconsider using String, inter() in Java7.
  • Java8 metaspace, string constants are on the heap

image-20220616221150670

Why should StringTable be adjusted? ①permSize is relatively small by default ②PermGen garbage collection frequency is low

image-20220616222711080

Example of heap OOM caused by using String

/**
 * jdk6中
 * -XX:PermSize=6m -XX:MaxMetaspaceSize=6m -Xms6m -Xmx6m
 *
 * jdk8中
 * -XX:MetaspaceSize=6m -XX:MaxMetaspaceSize=6m -Xms6m -Xmx6m
 */
public class StringTest3 {
    
    
  public static void main(String[] args) {
    
    
    // 使用Set保持着常量池引用,避免full gc回收常量池行为
    Set<String> set = new HashSet<String>();
    // 在short可以取值的范围内足以让6MB的Permiseze或heap产生OOM了。
    short i = 0;
    while (true) {
    
    
      set.add(String.valueOf(i++).intern());
    }
  }
}

3 Basic operation of String

​ The Java language specification requires identical string literals, which should contain the same Unicode character sequence (constants containing the same code point sequence), and must point to the same String class instance.

View changes in the number of String literals

public class StringTest4 {
    
    
  public static void main(String[] args) {
    
    
    System.out.println();    // 2252 代表内存中字符串的数量,详细操作看下图
    System.out.println("1"); // 2253
    System.out.println("2");
    System.out.println("3");
    System.out.println("4");
    System.out.println("5");
    System.out.println("6");
    System.out.println("7");
    System.out.println("8");
    System.out.println("9");
    System.out.println("10");// 2362

    System.out.println("1"); // 2363
    System.out.println("2"); // 2363
    System.out.println("3");
    System.out.println("4");
    System.out.println("5");
    System.out.println("6");
    System.out.println("7");
    System.out.println("8");
    System.out.println("9");
    System.out.println("10"); // 2363
  }
}

Run in debug mode, and break on the first line, check the number of classes in the debug console

image-20220616223420652

image-20220616223432177

image-20220616223444863

4 string concatenation operation

  1. The splicing result of constants and constants is in the constant pool, the principle is that the compiler optimizes
  2. Constants with the same content will not exist in the constant pool.
  3. As long as one of them is a variable, the result is on the heap. The principle of variable splicing is StringBuilder
  4. If the concatenated structure calls the intern() method, it will actively put the character creation object that is not in the constant pool into the pool, and return the address.

5 use of intern()

use of inern()

image-20220616224028151

​ If it is not a String object declared by double quotes, you can use the intern method provided by String: the intern method will query whether the current string exists from the string constant pool, and if it does not exist, it will put the current string into the constant pool.

  • For example: String myInfo = new String(" lys ");

That is to say, if the String.intern method is called on any string, the class instance pointed to by the returned result must be exactly the same as the string instance that appears directly in the form of a constant. Therefore, the following expressions must evaluate to true:

(“a" + “b” + “c”).intern() == ”abc"

In layman's terms, Interned String is to ensure that there is only one copy of the string in memory, which can save memory space and speed up the execution of string manipulation tasks. Note that this value will be stored in the String Intern Pool (String Intern Pool)

The use of Intern(): jdk6 vs jdk7/8

public static void main(String[] args) {
    
    
  String s = new String("1");
  s.intern();
  String s2 = "1";
  System.out.println(s == s2); // false

  String s3 = new String("1") + new String("1");
  s3.intern();
  String s4 = "11";
  System.out.println(s3 == s4); //  true
}

Topic: How many objects will new String("ab") create?

new String("ab") 会创建几个对象?看字节码,就知道是两个
(1)new关键字在堆空间创建的
(2)字符串常量池有个“ab”对象。字节码指令:ldc

Title: new String("a) + new String("b") creates several objects

对象1:new StringBuilder()
对象2:new String("a")
对象3:字符常量池”a"
对象4:new String("b")
对象5:字符串常量池“b"

image-20220616231234915

image-20220616231259198

image-20220616231306719

To summarize the use of String's intern():

  • In jdk6, try to put this string object into the string pool.
    • If there is one in the string pool, it will not be put in. Returns the address of an object in an existing string pool
    • If not, this object will be copied , put into the string pool, and the object address in the string pool will be returned
  • Starting from jdk7, try to put this string object into the string pool.
    • If there is one in the string pool, it will not be put in. Returns the address of an object in an existing string pool
    • If not, it will copy the reference address of the object into the string pool, and return the reference address in the string pool.

image-20220617001206415

image-20220617001215280

Efficiency test of intern(): space angle

/**
 * 使用intern()测试执行效率:空间使用上
 * 结论:对于程序中大量存在的字符串,尤其其中存在很多重复字符串时,使用intern()可以节省内存空间
 */
public class StringInterSpaceTest {
    
    
  static final int MAX_COUNT = 1000 * 10000;
  static final String[] arr = new String[MAX_COUNT];

  public static void main(String[] args) throws InterruptedException {
    
    
    Integer[] data = new Integer[]{
    
    1, 2, 3, 4, 5, 6, 7, 8, 9, 10};

    long start = System.currentTimeMillis();
    for (int i = 0; i < MAX_COUNT; i++) {
    
    
      arr[i] = new String(String.valueOf(data[i % data.length]));
//      arr[i] = new String(String.valueOf(data[i % data.length])).intern();
    }

    long end = System.currentTimeMillis();
    System.out.println("花费的时间为:" + (end - start));

    Thread.sleep(10000000);
    System.gc();
  }
}

​ A large website platform needs to store a large number of strings in memory. For example, social networking sites, many people store: Beijing, Haidian District and other information. At this time, if the string calls the intern() method, the memory size will be significantly reduced. (The main implementation is that the variables that need to be used are all returned using intern(), the value of the heap space can be garbage collected, and the objects used use the data in the string constant pool)

6 StringTable Garbage Collection

7 String deduplication operation in G1

Official document: https://openjdk.org/jeps/192

JEP 192: String Deduplication in G1

Summary

Reduce the Java heap live-data set by enhancing the G1 garbage collector so that duplicate instances of String are automatically and continuously deduplicated.

Motivation

Many large-scale Java applications are currently bottlenecked on memory. Measurements have shown that roughly 25% of the Java heap live data set in these types of applications is consumed by String objects. Further, roughly half of those String objects are duplicates, where duplicates means string1.equals(string2) is true. Having duplicate String objects on the heap is, essentially, just a waste of memory. This project will implement automatic and continuous String deduplication in the G1 garbage collector to avoid wasting memory and reduce the memory footprint.

G1's String deduplication operation

  • Background: Tests on many Java applications (both large and small) yielded the following results:

    • String objects account for 25% of the heap live data collection
    • 13.5% of duplicate String objects in the heap live data collection
    • The average length of a String object is 45
  • The bottleneck of many large-scale Java applications lies in memory and test table names. In these types of applications, almost 25% of the data sets surviving in the Java heap are String objects . Going one step further, almost half of the String objects are repeated, which means:

    string1. equals(string2)==true. Duplicate String objects on the heap must be a waste of memory. This project will automatically and continuously deduplicate repeated String objects in the G1 garbage collector, so as to avoid wasting memory.

  • accomplish

    • When the garbage collector is working, it will access the live objects on the heap. For each accessed object, it is checked whether it is a candidate String object to be deduplicated.
    • If so, insert a reference to this object into the queue for further processing. A deduplication thread runs in the background, processing this queue. Processing an element of the queue means that the queue removes the element and then attempts to dereference the String object it refers to.
    • Use a hashtable to record all non-repeating char arrays used by other String objects. When deduplication, the hashtable will be checked to see if an identical char array already exists on the heap.
    • If it exists, the String object will be adjusted to refer to that array, releasing the reference to the original array, which will eventually be reclaimed by the garbage collector.
    • If the lookup fails, the char array will be inserted into the hashtable so that the array can be shared later.
  • command line options

    • UseStringDeduplication(bool): Enable String deduplication, which is not enabled by default and needs to be enabled manually.
    • PrintStringDeduplicationStatistics(bool): Print detailed deduplication statistics
    • StringDeduplicationAgeThreshp;d(uintx): String objects reaching this age are considered candidates for deduplication.

Guess you like

Origin blog.csdn.net/weixin_43811294/article/details/125462300