Why minor GC duration changes so much when an element of a large array is updated?

kostya :

I have the following simple program:

public class GCArrays {
    public static void main(String[] args) {
        Object[] bigArr = new Object[1 << 24];
        Object[] smallArr = new Object[1 << 12]; 

        bigArr[0x897] = new Object();
        smallArr[0x897] = new Object();

        for (int i = 0; i < 1e10; i++) {
            smallArr[0x897] = new Object();  // (*)
            //bigArr[0x897] = new Object();
        }

        // to prevent bigArr and smallArr from being garbage collected
        bigArr[0x897] = new Object();
        smallArr[0x897] = new Object();
    }
}

When I run it using ParallelGC as GC algorithm for young generation:

java -classpath . -XX:InitialHeapSize=4G -XX:MaxHeapSize=4G -XX:NewRatio=3 -XX:+PrintGC -XX:+PrintGCDetails -XX:+UseParallelGC -XX:+UseParallelOldGC GCArrays

the average pause times I get are below 1ms:

[GC (Allocation Failure) [PSYoungGen: 1047584K->32K(1048064K)] 1113476K->65924K(4193792K), 0.0007385 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]

However if I change the line marked with (*) to modify bigArr instead of smallArr the pause time increases to 10ms:

[GC (Allocation Failure) [PSYoungGen: 1047584K->32K(1048064K)] 1113468K->65916K(4193792K), 0.0101251 secs] [Times: user=0.00 sys=0.00, real=0.01 secs]

Note that the program only modifies a single element of the array. However it looks like JVM still scans the whole array to find live objects during minor collection. Is my guess explaining longer GC pauses correct? Why the whole array needs to be scanned when only one element is modified in this case?

Alexey Ragozin :

This article explains concept of dirty cards and their role in young GC.

In both cases singe memory address in old space is "dirtied" and thus single card. For reference array object spanning multiple cards (512 byte blocks) only card for really modified index subrange is modified.

As only single card is "dirtified", GC only need to scan corresponding 512 bytes of memory.

With -XX:+UseConcMarkSweepGC both "smallArr" and "bigArr" versions are showing similar timings.

-XX:+UseConcMarkSweepGC + smallArr

[GC (Allocation Failure) [ParNew: 419458K->2K(471872K), 0.0015320 secs] 485365K->65909K(996160K), 0.0015635 secs] 
[Times: user=0.00 sys=0.00, real=0.00 secs]

-XX:+UseConcMarkSweepGC + bigArr

[GC (Allocation Failure) [ParNew: 419458K->2K(471872K), 0.0020550 secs] 485365K->65909K(996160K), 0.0020885 secs] 
[Times: user=0.00 sys=0.00, real=0.00 secs]

Though this -XX:+UseParallelOldGC, it seems that GC has to scan whole "bigArr"

-XX:+ParallelOldGC + smallArr

[GC (Allocation Failure) [PSYoungGen: 522768K->16K(523520K)] 588691K->65939K(1047808K), 0.0009430 secs] 
[Times: user=0.00 sys=0.00, real=0.00 secs]

-XX:+ParallelOldGC + bigArr

[GC (Allocation Failure) [PSYoungGen: 522768K->16K(523008K)] 588687K->65935K(1047296K), 0.0149276 secs] 
[Times: user=0.03 sys=0.00, real=0.02 secs]

-XX:+ParallelOldGC + bigArr = new Object[1 << 25]

[GC (Allocation Failure) [PSYoungGen: 522768K->16K(523520K)] 654219K->131467K(1047808K), 0.0413473 secs]
[Times: user=0.09 sys=0.00, real=0.04 secs] 

Counter intuitively ParallelOldGC and ConcMarkSweepGC are using different implementations of very similar young GC algorithm.

It looks like PSYoungGen is missing optimization to scan only dirty portion of object array.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=306857&siteId=1