String memory optimization

【Preface】

Strings often occupy a lot of memory, and there is a lot of space for optimization. When doing memory optimization, optimizing string memory is an essential step.

There are three core principles of string memory optimization:

1. Reuse strings to reduce the number of strings

2. Reduce the memory occupied by non-reusable strings

3. Reduce the GC string memory generated during runtime

Under these three core principles, there are some specific methods.

[Reuse strings to reduce the number of strings]

There are two situations in which a large number of repeated strings appear: one is that many strings are written in different places, which may be because the same person wrote at different times and forgot the previous ones, different people do not know each other, and different places cannot call Passing and other reasons; the second is that some characters are used repeatedly in the process of string splicing.

The solution is as follows:

1. Resident pool

The CLR has done special processing for string memory allocation, and internally maintains a resident pool. Its internal essence is to maintain a dictionary, the key is a string, and the value is the string reference address allocated on the heap. If the string is found in the dictionary, return the address in value instead of allocating memory for the string and then return the address. When will the CLR use this dictionary?

  • Use literal values ​​to create string objects (literal values ​​have definite values ​​during compilation, and many identical strings can be found)
  • Use string.Intern() to create a string object (this allows us to actively put a string into the resident pool at runtime, the disadvantage is that when we no longer need to use this string, it still exists with the resident pool occupies part of the memory)
  • Literal value + literal value splicing to create a string object

2. Create your own dictionary similar to the resident pool

As the project becomes more and more complex, the CLR's resident pool is always not enough, and you need to create a string dictionary similar to the resident pool to realize string reuse.

3. Avoid reading the same text or resources involving strings in different places, and try to ensure that only one copy of this resource exists in memory.

[Reduce the memory occupied by non-reusable strings]

Strings that cannot be reused are mainly because they are bound to resources or are part of resources and need to be read into the program at runtime. In order to reduce the memory of these strings, the strings must be simplified length.

We use strings to represent a situation or a thing. The strings of different situations or different things are unique, which can ensure that the strings and situations correspond one-to-one.

Before that, you need to understand the composition of the string : (take 32bit as an example)

1. As a reference type, an indispensable part: the reference header (object header) requires 8 bytes (including 4bytes synchronization index block and 4bytes type object pointer)

2. An int32 field to record the length of the string (among various string operations, the speed of finding the length is the fastest, just query directly. This also limits the maximum length of a string)

3. The null character at the end occupies two bytes

4. The remaining part is the character content stored in the character buffer, ending with a null character

Therefore, the memory size occupied by a string is 14(8+4+2)+2*number of characters. For example, the "HelloWorld" string occupies 34 bytes of memory. Note that UTF-16 encoding is adopted for the string type in C#, so a character in the string type occupies 2 bytes.

Note that this is only the memory size required by the string, because .NET's GC uses 32-bit memory alignment (that is, all allocated memory is an integer multiple of 4 bytes), so the actual memory size occupied by "HelloWorld" is 36 bytes.

(An empty string string.Empty occupies 8 bytes)

The solution is as follows:

1. On the premise of keeping the meaning of the string, try to simplify the string, for example, replace Chinese with CN, etc.

2. Some strings cannot be simplified, such as paths. At this time, you can hash the strings to ensure that each string corresponds to a unique hash value.

3. In the case of limited strings, use byte, uint and other types instead of strings, and create a mapping table. For example, there are no more than 8 types of strings, use byte type, etc.

4. Use byte[] instead of string

5. By analyzing the business, some strings do not appear at the same time, and can be loaded on demand

6. Decompress long but infrequently used strings (note that there will be performance and GC consumption)

Through these 6 methods, the memory occupied by strings can usually be optimized by more than 80%.

[Reduce the GC string memory generated at runtime]

1. One is to pay attention to the possible occurrence of string GC when writing code , especially in high-frequency places

2. In some places, GC is unavoidable, such as the operation of Number To String. At this time, a resident memory can be opened up separately. When performing string operations, all operations are performed in this memory, and this memory is reused continuously. GC can be avoided, which is also the principle of the 0GC string scheme. Generally speaking, the zstring solution is enough. The project address is as follows: GitHub - 871041532/zstring: C# string zero GC supplementary solution

[Operations related to string performance]

1. To create an empty string use string s = string.Empty instead of string s = ""

2. Stringbuilder for high-frequency string splicing

3. Methods such as ToUpper and ToLower will regenerate the string to see if it can be avoided

4. When judging true, use "value" == string is the fastest; when judging false, use "value".Equals(string) is the fastest

5. String comparison operation

Guess you like

Origin blog.csdn.net/enternalstar/article/details/126842396