Is "GC overhead limit exceeded" a secondary reason for failing?

Andremoniy :

By motives of this question: Error java.lang.OutOfMemoryError: GC overhead limit exceeded

Recently I had a debate with someone about this error.

In my understanding, this error itself can not be treated as a "primary" reason why JVM fails.

What I mean is that an extensive garbage collection itself is not a reason for failing. The extensive garbage collection is always caused by a very little free memory amount, which causes a frequent GC invocation (and the core reason can be a memory leak).

If I correctly understood the position of my opponent, he believes that a lot of small objects eligible for GC'ing produced in the system causes their frequent collection, what leads to this error. So the devil is not a memory leak or low memory limits, but the GC invocation frequency itself.

Here is where we have different point of views.

In my understanding it does not matter how many small objects eligible for GC'ing your process produces (even if it is not a good design and probably you should reduce this amount where possible). If you have enough memory, and there is no apparent memory leakage, then at some point the GC will collect big portions of such objects, so it should not be a problem. At least this will not cause a system crushing.

To resume in brief my position: if you have GC overhead limit exceeded, then either you have a kind of memory leak, or you simply need to increase your memory limits.

To resume in brief my opponent's position: if you produce a lot of small objects eligible for GC'ing, it is already a problem because this it self can cause GC overhead limit exceeded.

Am I wrong and missing something?

Alexandre Dupriez :

- Partial answer -

Note that I am using the OpenJDK (JDK 9) sources as a foundation to comment on this question. This answer does not relies on any kind of documentation or published specifications, and includes a bit of speculation coming from my understanding and interpretation of the source code.

The GC overhead limit exceeded is considered in the VM as a subtype of out of memory error and generated after an attempt to allocate memory fails (see (a)).

Essentially, the VM keeps track of the number of occurences of full garbage collection and compares it against the limit enforced for full GCs (which can configured on Hotspot using -XX:GCTimeLimit=, cf Garbage Collector Ergonomics).

The implementation of how the full GC count is tracked and the logic behind when a GC overhead limit is detected is available in one place, in hotspot/src/share/vm/gc/shared/adaptiveSizePolicy.cpp. As you can see, two additional conditions on the memory available in the old and eden generations are required to satisfy the criteria of a GC overhead limit:

void AdaptiveSizePolicy::check_gc_overhead_limit(
                                      size_t young_live,
                                      size_t eden_live,
                                      size_t max_old_gen_size,
                                      size_t max_eden_size,
                                      bool   is_full_gc,
                                      GCCause::Cause gc_cause,
                                      CollectorPolicy* collector_policy) {
  ...
  if (is_full_gc) {
    if (gc_cost() > gc_cost_limit &&
      free_in_old_gen < (size_t) mem_free_old_limit &&
      free_in_eden < (size_t) mem_free_eden_limit) {
      // Collections, on average, are taking too much time, and
      //      gc_cost() > gc_cost_limit
      // we have too little space available after a full gc.
      //      total_free_limit < mem_free_limit
      // where
      //   total_free_limit is the free space available in
      //     both generations
      //   total_mem is the total space available for allocation
      //     in both generations (survivor spaces are not included
      //     just as they are not included in eden_limit).
      //   mem_free_limit is a fraction of total_mem judged to be an
      //     acceptable amount that is still unused.
      // The heap can ask for the value of this variable when deciding
      // whether to thrown an OutOfMemory error.
      // Note that the gc time limit test only works for the collections
      // of the young gen + tenured gen and not for collections of the
      // permanent gen.  That is because the calculation of the space
      // freed by the collection is the free space in the young gen +
      // tenured gen.
      // At this point the GC overhead limit is being exceeded.
      inc_gc_overhead_limit_count();
      if (UseGCOverheadLimit) {
        if (gc_overhead_limit_count() >= AdaptiveSizePolicyGCTimeLimitThreshold){
          // All conditions have been met for throwing an out-of-memory
          set_gc_overhead_limit_exceeded(true);
          // Avoid consecutive OOM due to the gc time limit by resetting
          // the counter.
          reset_gc_overhead_limit_count();
      } else {
        ...
      }

(a) When is a GC overhead limit exceeded error generated?

It actually does not happen during a collection itself, but when the VM makes an attempt to allocate memory - you can find the justification of these statements in hotspot/src/share/vm/gc/shared/collectedHeap.inline.hpp:

HeapWord* CollectedHeap::common_mem_allocate_noinit(KlassHandle klass, size_t size, TRAPS) {
    ...
    bool gc_overhead_limit_was_exceeded = false;
    result = Universe::heap()->mem_allocate(size, &gc_overhead_limit_was_exceeded);
    ...
    // Failure cases
   if (!gc_overhead_limit_was_exceeded) {
       report_java_out_of_memory("Java heap space");
       ...
    } else {
       report_java_out_of_memory("GC overhead limit exceeded");
       ...
    }

(b) Note about the G1 implementation

Looking at the method mem_allocate of the G1 implementation (which can be found in g1CollectedHeap.cpp), it appears the boolean gc_overhead_limit_was_exceeded is not used anymore. I wouldn't be too fast to draw the conclusion that the GC memory overhead error cannot occur anymore if G1 GC is enabled though - I need to check this.

Conclusion

  • It seems you were right in that this error genuinely comes from memory exhaustion;

  • The argument that this error can be generated based on the number of times small objects are collected does not seem right to me, because

    1. We saw the VM does need to run out of memory for this error to occur;
    2. Independently of the first reason, we would need to refine the statement further anyway - and especially the reference to small objects. Are we talking about young generation collection only? If so, these collections are not included in the GC count checked against the limit, and therefore would never have a chance to be involved in this error, would the VM run OOM or not.

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=437032&siteId=1