Coding tips to reduce GC overhead

 

In this article, let's take a look at five techniques to make code efficient. These techniques allow our garbage collector (GC) to allocate and free memory, take up less CPU time, and reduce GC time. overhead. When memory is reclaimed, GC processing for a long time often causes our code to break (aka "stop the world").

 

background

 

GC is used to handle the allocation of a large number of short-lived objects (imagine opening a web page, once the page is loaded, most of the objects allocated memory will be discarded).

 

The GC uses a heap space called the "young generation" to do this. The "new generation" is the heap memory used to store newly created objects. Each object has an "age" (stored in the object's header information), which is used to define a collection of garbage that has not been collected. Once a certain "age" is reached, the object is copied to another space in the heap, this space is called "survivor space" or "old generation space". (Translator's Note: In fact, the survivor space is located in the new generation space, the original text is wrong, but here is temporarily translated according to the original text, for more details, please click to become a JavaGC expert Part I - explain the Java garbage collection mechanism in a simple way)

 

http://www.importnew.com/1993.html

 

While this is effective, it comes at a significant cost. Reducing the number of ephemeral allocations can really help us increase throughput, especially in large data environments, or in resource-constrained apps.

 

The following five code methods can use memory more efficiently, and do not take a lot of time, and will not reduce code readability.

 

1. Avoid implicit String strings

 

String Strings are an integral part of every data structure we manage. They cannot be modified after they have been allocated. For example, the "+" operation will allocate a new string that links the two strings. To make matters worse, an implicit StringBuilder object is allocated here to chain the two Strings.

 

E.g:

 

a = a + b; // a and b are Strings

 

The compiler will generate a piece of code like this behind the scenes:

 

StringBuilder temp = new StringBuilder(a).

temp.append(b);

a = temp.toString(); // a new String object is allocated

// The first object "a" is now garbage

 

It got worse.

 

Let's look at this example:

 

String result = foo() + arg;

result += boo();

System.out.println(“result = “ + result);

 

In this example, there are three StringBuilders objects allocated behind it - each one produced by the "+" operation, and two additional String objects, one holding the result of the second allocation, and the other being passed to print The String parameter of the method has 5 extra objects in what seems to be a very simple piece of statement.

 

Just imagine what would happen in a real code scenario, for example, the process of generating a web page from xml or textual information in a file. In nested loop structures, you will find that hundreds of objects are implicitly allocated. Even though VMs have mechanisms for dealing with this garbage, there is still a huge cost - and the cost may be borne by your users.

 

solution:

 

One way to reduce garbage objects is to be good at using StringBuilder to build objects. The following example achieves the same function as above, but only generates a StringBuilder object and a String object that stores the final result.

 

StringBuilder value = new StringBuilder(“result = “);

value.append(foo()).append(arg).append(boo());

System.out.println(value);

 

By being mindful of the possibility that String and StringBuilder are implicitly allocated, you can reduce the number of short-lived objects allocated, especially in locations with a lot of code.

 

2. Plan the capacity of the List

 

Dynamic collections like ArrayList are used to store some basic structure of variable length data. ArrayList and some other collections (such as HashMap, TreeMap), the bottom layer is realized by using Object[] array. While String (they wrap themselves in a char[] array), the size of the char array is constant. Then the question arises, if their size is constant, how can we put item records into the collection? The answer is obvious: allocate more arrays.

 

See the example below:

 

List<Item> items = new ArrayList<Item>();

  

for (int i = 0; i < len; i++)

{

Item item = readNextItem();

items.add(item);

}

 

The value of len determines the final size of items at the end of the loop. However, initially, the ArrayList's constructor does not know the size of this value, and the constructor allocates a default Object array size. Once the internal array overflows, it is replaced by a new one that is large enough to make the previously allocated array garbage.

 

If the loop is executed thousands of times, then there will be more allocations of new arrays, and more recycling of old arrays. For code running at scale, these allocations and deallocations should be eliminated from CPU cycles as much as possible.

 

solution:

 

Whenever possible, assign an initial capacity to a List or Map, like this:

 

List<MyObject> items = new ArrayList<MyObject>(len);

 

Because List is initialized, it has enough capacity, all this can reduce unnecessary allocation and deallocation of internal array at runtime. If you don't know the exact size, it's best to estimate the average of this value and add some buffering to prevent accidental overflows.

 

3. Use efficient collections with primitive types

 

The current version of the Java compiler supports arrays and Maps with keys of primitive data types through "boxing" - autoboxing is to put the original data into a corresponding object, which can be GCed distribution and recycling.

 

This will have some negative effects. Java can implement most collections by using internal arrays. For each key/value record added to the HashMap, an internal object that stores the key and value is allocated. Horrible when dealing with maps, which means that every time you put a record into a map, an extra allocation and deallocation happens. This will most likely result in an excessively large number and have to reallocate a new internal array. When dealing with Maps with hundreds of thousands or more records, these internal allocation operations will increase the cost of GC.

 

A common case is to store a mapping between a primitive type (such as an id) and an object. Since Java's HashMap is designed to contain only object types (not primitive types), this means that each map insertion operation may allocate an extra object to store the primitive type (i.e. boxing).

 

The Integer.valueOf method caches values ​​between -128 - 127, but for each value outside the range, a new object will be allocated in addition to the internal key/value record object. This is likely more than three times the overhead of the GC for the map. This is really disturbing news for a C++ developer, where STL templates can solve such problems very efficiently.

 

Fortunately, this problem will be solved in the next version of Java. Until then, this will be quickly handled by some libraries that provide basic tree structures (Tree), maps (Map), and Java's basic types such as List. I highly recommend Trove, I've used it for a long time, and it really reduces GC overhead when dealing with large-scale code.

 

4. Use Streams instead of in-memory buffers

 

In a server application, most of the data we manipulate is presented to us in the form of a file or a network stream of data from another web server or DB. In most cases, incoming data is in serialized form and needs to be deserialized into Java objects before we can use them. This process is very prone to a large number of implicit allocations.

 

The easiest way is to read the data into memory through ByteArrayInputStream, ByteBuffer, and then deserialize it.

 

This is a bad move, because when constructing a new object with complete data, you need to allocate space for it and then free it up immediately. And, since you don't know the size of the data, you can only guess - when the initial capacity is exceeded, the byte[] array has to be allocated and freed to store the data.

 

The solution is very simple. Like Java's own serialization tools and Google's Protocol Buffers, they can deserialize data from files or network streams without saving to memory or allocating new byte arrays to accommodate growth. The data. If you can, you can compare this method with the method of loading data into memory, I believe the GC will thank you very much.

 

5. List collection

 

Immutability is great, but in large-scale situations, it can be seriously flawed. When passing a List object to the method.

 

When a method returns a collection, it is usually wise to create a collection object (such as an ArrayList) in the method, populate it, and return it as an immutable collection.

 

In some cases, this doesn't work very well. The most obvious is when a collection from multiple methods calls a final collection. Because of immutability, in the case of large-scale data, a large number of temporary collections are allocated.

 

The solution in this case would not be to return a new collection, but instead of a combined collection, pass the individual collections as arguments to those methods.

 

Example 1 (inefficient):

 

List<Item> items = new ArrayList<Item>();

for (FileData fileData : fileDatas)

{

// Each call creates a temporary list that stores the internal temporary array

items.addAll(readFileItem(fileData));

}

 

Example 2:

 

List<Item> items =

new ArrayList<Item>(fileDatas.size() * avgFileDataSize * 1.5);

  

for (FileData fileData : fileDatas)

{

readFileItem(fileData, items); // add records inside

}

 

In example 2, the allocation of N lists (and any temporary array allocations) can be saved when the immutability rule is violated (which should normally be observed). This will be a big discount for your GC.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325145623&siteId=291194637