String and constant pool

Two ways to create a String

  We know that to create a string has two forms, one is created using a string literal, the other is created using the new keyword:

1 String str1 = "123"; // 字面量
2 
3 String str2 = new String("123"); // new

  Let's look at this code

1 String s = new String("1");
2 s.intern();
3 String s2 = "1";
4 System.out.println(s == s2);
5 
6 String s3 = new String("1") + new String("1");
7 s3.intern();
8 String s4 = "11";
9 System.out.println(s3 == s4);

  JDK6 output: false false 

  JDK7 output: false true

  So what led to the two versions of the output of mismatch? Is a constant pool.

Constant pool

  JVM constant pool is divided into three types:

  1. Static constant pool 

  2. runtime constant pool 

  3. string constant pool

Static constant pool

  Static constant pool is part of the class file. There is a section called constant_pool in the class file structure, which is a static constant pool. Static const main storage pool two constants: literal and symbolic references

  Into literal text strings (e.g.  String s = "abc";  wherein "abc" is a text string) and the final modified variables (including static variables, instance variables, local variables).

  Symbol references a fully qualified name of the class and interface, the name and descriptor, the method name and descriptor fields.

  constant_pool has a data type called CONSTANT_Utf8_info, which is stored in the data type string UTF-8 encoding of the type of structure is as follows:

struct CONSTANT_UTF8_INFO {
    u1 tag;
    u2 length;
    u1 bytes[length];
}

  Type wherein the length u2, u2 is a 16-bit unsigned integer, so in theory the maximum length is 65535 bytes, but since javac only at compile time less than the string length 65535, so if using a string literal definitions which 65534 is the maximum length (maximum string length generated runtime -1 >>> 1).

  ps: data type static constant pool inside a man named CONSTANT_String type, general description of him as "literal string type of storage", in fact, the data structure which contains a pointer to a CONSTANT_Utf8, so is the real CONSTANT_Utf8 a data structure stored string.

Runtime constant pool

  Runtime constant pool area is part of the method is globally shared. We know, jvm in the implementation of a class, must be loaded connection (validation, preparation, analytical), initialization, in the first step of the loading phase, the virtual machine needs to complete the following three things:

  • By "a class of fully qualified name to obtain such a binary byte stream."
  • The byte stream that represents the static storage structure into the method area of the run-time data structure
  • Object class represents a java.lang.Class generate in memory such as the various methods of data access region of the inlet of this class

  The above mentioned static storage structure is the second file contains a class of static constant pool , so the data is static constant pool will eventually enter into the runtime constant pool , but it runtime constant pool of data will run from when produced.

  Runtime constant pool is the role of symbolic information stored in the class file constant pool. Runtime constant pool holds some of the described symbol file referenced class, is loaded in both the class " parsing phase " will direct reference to these reference symbols are translated in time constant (direct pointer points to an instance object) is stored in the operation pool.

  Runtime constant pool relative to the class constant pool is a major feature is its dynamic nature, Java specification does not require constant can only be generated at runtime, that runtime constant pool of content is not all from the class constant pool, class constant pool runtime constant pool is not only a data input port; may be generated by a constant code at runtime and placed into runtime constant pool , this feature is used is more String.intern ().

   ps: Method permanent generation region is located, is removed JDK8 version, the permanent generation of a new element space, runtime constant pool is also moved to metaSpace. The function and method similar to the element space, with the permanent generation of the biggest difference is that the former does not use the virtual machine memory, but the use of local memory.

String constant pool

  字符串常量池是JVM维护的一个字符串实例的引用的HashTable,在JDK7以前,字符串常量池位于永久代中,JDK7将字符串常量池移到了堆中,以下我们详细介绍JDK6,我们在最后介绍JDK7与JDK6的差别。在上面的运行时数据区看不到字符串常量池,所以我们换个更详细的图。

   这是将JVM内存按照堆-非堆的维度划分,让我们放大非堆的结构。

 

  可以看到,在非堆中的永久代除了方法区之外还有一个Interned Strings,这就是字符串常量池,值得注意的是,字符串常量池不会存储字符串对象本身,存储的是字符串的引用

以下是openjdk源码中字符串常量池HashTableEntry结构的一部分:

/hotspot/src/share/vm/utilities/hashtable.hpp

class BasicHashtableEntry : public CHeapObj {
  friend class VMStructs;
private:
  unsigned int         _hash;           // 32-bit hash for item
  BasicHashtableEntry* _next;
 } 

  其中_hash就是字符串的地址,字符串对象本身是存储在永久代中。

  在上面说到,JVM在加载一个类时会将该类的静态常量池的数据加载到运行时常量池,其实这是创建了CONSTANT_UTF8,并没有创建CONSTANT_STRING,只有当该字符串被引用到时,才会被创建。

在类加载过程中,有一个叫做解析(resolve)的步骤,在JVM规范中明确指出,这个阶段可以是lazy的,CONSTANT_STRING就是lazy resolve的。

  ps:CONSTANT_UTF8和CONSTANT_STRING是JVM使用的对象不是JAVA对象,JAVA程序只认识java.lang.String。

String进入到字符串常量池的过程

 1 public class TestClass {
 2 
 3     public static void main(String[] args) {
 4         String s = "123";
 5         String s1 = new String("1");
 6         String s2 = s1 + new String("23");
 7         String s3 = s2.intern();
 8         final String s4 = "1";
 9         String s5 = s4 + "23";
10         System.out.println(s==s3);
11        System.out.println(s==s5);
12     }
13

  当这个类编译完成后,在静态常量池中会创建一个保存有字符串"123"的CONSTANT_UTF8_INFO和一个保存有前者的index的CONSTANT_STRING_INFO。

  当程序启动时,JVM会加载该类,将静态常量池中的CONSTANT_UTF8_INFO加载到运行时常量池。

  当执行到第三行代码时,发现字符串"123"被引用,实例化CONSTANT_STRING_INFO,并且会在永久代创建字符串对象然后将字符串的引用保存到字符串常量池中。

  当执行到第五行代码时,会在堆中创建三个String对象,这是因为对于String的+运算符,如果参与运算的对象含有变量则会变成使用StringBuilder.appen来构建字符串,如果参与运算的对象都是常量的话,则在编译期间会直接将常量合并为一个字符串。

  当执行到第六行代码时,intern()方法会先判断字符串常量池中是否有字符串"123",如果有则返回引用,没有则在永久代创建该字符串,并且会在字符串常量池中保存该字符串的引用,然后返回引用。

  当执行到第八行代码时,JVM在字符串常量池中找到了字符串"123",返回其引用。

  所以最终结果是true true,同理我们可以分析文章开头那一段代码。

  综上,对于字面量来说,CONSTANT_UTF8是存储字符串内容的数据结构,CONSTANT_STRING则持有前者的引用,当字面量被引用时CONSTANT_STRING才会被创建,并且同时字符串常量池会保存字符串的引用。对于intern()方法来说,如果常量池没有该字符串,则会在永久代创建该字符串并返回引用,否则,直接返回常量池中字符串的引用。

JDK7与JDK6字符串常量池的区别

在JDK7以前,字符串常量池位于永久代中,JDK7将字符串常量池移到了堆中,所以在永久代创建字符串变成了在堆中创建。其行为也有一些不同之处。让我们在JDK7下来分析文章开头的代码中:

 1 String s = new String("1");
 2 s.intern();
 3 String s2 = "1";
 4 System.out.println(s == s2);
 5 
 6  
 7 String s3 = new String("1") + new String("1");
 8 s3.intern();
 9 String s4 = "11";
10 System.out.println(s3 == s4); 

  第一段的第一行代码在堆中创建了两个字符串对象,第一个是String构造函数中的字面量"1",该字符串的引用会被存到字符串常量池中,第二个是new的一个String对象,它的内容与字面量"1"相同。然后将s保存到字符串常量中,由于常量池中已存在字符串"1",所以intern()没有更新常量池。s2是字面量"1"也就是常量池中的对象,所以最终结果s!=s2。

  第二段代码的第一行在堆中创建了一个内容为"11"的字符串。然后将该字符串保存到字符串常量池中,由于字符串常量池中没有"11"的字符串,所以字符串常量池会保存该字符串的引用。然后s4是字面量"11",由于字符串常量池中存在"11"字符串,所以返回该字符串的引用,所以最终s3==s4。

最终状态图如下所示:

 

在JDK6下,最终结果如下图所示:

 

参考:

Guess you like

Origin www.cnblogs.com/ouhaitao/p/12132113.html