Let’s talk about String, StringBuilder and StringBuffer in Java

For these three, the first thing we can know is that String is immutable, StringBuilder and StringBuffer are mutable, so let's first talk about String, why it is designed to be immutable and how to achieve immutability.

Why is String designed to be immutable?

We can actually feel that strings are actually the most commonly used data structure in our development process. If we rely on conventional object creation methods, there will be a large number of objects with repeated string values, which will consume a lot of space. This affects GC efficiency.

Therefore, if it is designed to be immutable, the references of multiple objects with the same value will point to a string object, which can greatly reduce the heap memory. At the same time, the hash value is cached in String, which will also affect the use of hash. Improve a lot of performance .

At the same time, when it is designed to be immutable, it is thread-safe . Even if other threads modify the value, it will create or reference an existing object instead of modifying the current value. At the same time, it is also very secure. We consider immutable content to be trustworthy. If its value can be changed at will, it is too untrustworthy.

How does String design achieve immutability?

First take a look at the source code in jdk1.8

arduino

复制代码

public final class String implements java.io.Serializable, Comparable<String>, CharSequence { /** The value is used for character storage. */ private final char value[]; /** Cache the hash code for the string */ private int hash; // Default to 0 public String substring(int beginIndex) { if (beginIndex < 0) { throw new StringIndexOutOfBoundsException(beginIndex); } int subLen = value.length - beginIndex; if (subLen < 0) { throw new StringIndexOutOfBoundsException(subLen); } return (beginIndex == 0) ? this : new String(value, beginIndex, subLen); } public String concat(String str) { int otherLen = str.length(); if (otherLen == 0) { return this; } int len = value.length; char buf[] = Arrays.copyOf(value, len + otherLen); str.getChars(buf, len); return new String(buf, true); } }

From the source code, we can see that the string is stored in a char array modified with final, which means that the character array is immutable, and the substring and concat methods actually return new String().

Expand

In Java 9 and above, a new structure for storing strings is added, which is a byte array. This is actually a layer of optimization. And why do this? Java internally uses UTF-16 for encoding, which means that even if a single character can be identified by one byte, it will still occupy two bytes after using UTF-16. This is actually a waste of time, and in many cases strings In fact, they can all be encoded using LATIN-1 (a single-byte encoding scheme that can identify 128 characters including ASCLL). Therefore, the concept of "Compact String" is introduced. So how to distinguish when to use UTF-16 and when to use LATIN-1?

A field named coder is defined in the String class , which is used to save what the string is encoded with, and then stores it in different storage structures according to the type. The related indexOf method requires this field to decide. In which array the corresponding character is found.

What did you do when you called new String()? Just create an object?

Java objects have a certain structure when stored in the JVM, that is, the object model , which also contains two pieces of information. One is the object header, which stores some runtime information, such as threads, lock identifiers, etc., and the other part is metadata. , is a pointer to class information. I will write about this convenient knowledge of JVM separately later.

In fact, no matter what, we will create an object on the heap when we use new, but there is a special case for strings. This special case is the string constant in the constant pool . This string is actually entered during the class compilation phase. into the class constant pool. When the class is loaded by ClassLoader for the first time, it will enter the runtime constant pool from the class constant pool (after 1.8, the string constant pool was moved to the heap in order to better manage objects and prevent memory leaks. ). The string constant pool stores string references and objects . The references are stored in the String Table, and new String() comes out as object instances on the heap. Its references are the characters in the referenced string constant pool. String reference. So it can be seen that if there is no such object in the string constant pool, then it is possible to create two objects, one in the heap instance and one in the string constant pool. Creating one or two objects depends on the string Does this object exist in the constant pool?

intern

The string constant pool is often mentioned above. The most popular explanation for the string constant pool is a string whose result can be known when the program is running, such as the following code

java

复制代码

public static void main(String[] args) { String a = "abc"; String b = "def"; String c = "abc"+"def"; }

The result of decompilation is String c="abcdef"; when two constants use +, they become a constant. And another way to add variables


java

复制代码

public static void main(String[] args) { String a = "abc"; String b = "def"; String c = a+b; }

The result of decompilation is


java

复制代码

String c = (new StringBuilder()).append(a).append(b).toString();

The calculated result value will not enter the constant pool. At the same time, such strings are often used. What should I do? So the role of intern is reflected. It has two functions. One is to add this value to the string constant pool if the constant pool does not have this string, and the second is to return a reference to this constant.

Expand again --> Does String have a length limit?

The answer is yes, and it is different. The maximum length of String during compilation is 65535, and the maximum length during runtime is the maximum value of int 2^31-1. This involves the issue of Java virtual machine specifications. To put it roughly, a CONSTANT_Utf8_info structure is used to represent string constants in the virtual machine. The structure is as follows: CONSTANT_Utf8_info{ u1 tag; u2 length; u1 bytes[length]; } where U2 identifies 2 An unsigned number of bytes, one byte is 8 bits, and 2 bytes is 16 bits, so the maximum value is 2^16-1 = 65535.

Both StringBuilder and StringBuffer are mutable, and StringBuffer is thread-safe.

Both StringBuilder and StringBuffer inherit AbstractStringBuilder, which has two properties.

java

复制代码

char[] value; /** * The count is the number of characters used. */ int count;

And none of them have been modified by final, indicating that they are variable, so take a look at their append source code


java

复制代码

public AbstractStringBuilder append(StringBuffer sb) { if (sb == null) return appendNull(); int len = sb.length(); ensureCapacityInternal(count + len); sb.getChars(0, len, value, count); count += len; return this; }

In fact, it does two things: expanding the capacity and placing characters. The append method is overridden in StringBuffer


java

复制代码

@Override public synchronized StringBuffer append(String str) { toStringCache = null; super.append(str); return this; }

Synchronized is added to indicate that this is a thread-safe method.

Guess you like

Origin blog.csdn.net/m0_71777195/article/details/132975014