Can someone make sense of the G1 garbage collector output?

A blog post explains every part of the GC log. Here is a quote:

0.522: [GC pause (young), 0.15877971 secs]

This is the highest level information telling us that it is an Evacuation Pause that started at 0.522 secs from the start of the process, in which all the regions being evacuated are Young i.e. Eden and Survivor regions. This collection took 0.15877971 secs to finish.

Evacuation Pauses can be mixed as well. In which case the set of regions selected include all of the young regions as well as some old regions.

1.730: [GC pause (mixed), 0.32714353 secs]

Let's take a look at all the sub-tasks performed in this Evacuation Pause.

[Parallel Time: 157.1 ms]

Parallel Time is the total elapsed time spent by all the parallel GC worker threads. The following lines correspond to the parallel tasks performed by these worker threads in this total parallel time, which in this case is 157.1 ms.

[GC Worker Start (ms): 522.1 522.2 522.2 522.2 Avg: 522.2, Min: 522.1, Max: 522.2, Diff: 0.1]

The first line tells us the start time of each of the worker thread in milliseconds. The start times are ordered with respect to the worker thread ids – thread 0 started at 522.1ms and thread 1 started at 522.2ms from the start of the process. The second line tells the Avg, Min, Max and Diff of the start times of all of the worker threads.


Disclaimer

I am in no way well versed on the Garbage-First Garbage Collector and this question inspired me to check it out for the first time. There is a chance I may be wrong with some of my information.

Intro

To get more information about the Garbage-First Garbage Collector (G1GC), the Garbage-First Garbage Collection paper (here, here, here and here) is an invaluable resource. You can find an introduction to the G1GC here. The HotSpot Glossary comes in handy to understand JVM terms. The following papers were also helpful in further understanding Garbage Collection:

  • A Hard Look at Hard Real-Time Garbage Collection: here and here
  • Non-Blocking Garbage Collection for Multiprocessors: here and here
  • Non-blocking Root Scanning for Real-Time Garbage Collection: here
  • Non-blocking Object Copy for Real-Time Garbage Collection: here and here

With these resources and an OpenJDK 7 debug build, you can start to understand the G1GC log.

Definitions

Using the papers and web pages referenced above, here are some useful definitions of terms that came up most often:

  • concurrent marking: Both provides collection ”completeness” and identifies regions ripe for reclamation via compacting evacuation. Provides collector completeness without imposing any order on region choice for collection sets. Provides the live data information that allows regions to be collected in “garbage-first” order.
  • heap: The area of memory used by the JVM for dynamic memory allocation.
  • heap region: The heap is partitioned into a set of equal-sized heap regions. The Garbage-First heap is divided into equal-sized heap regions, each a contiguous range of virtual memory.
  • marking bitmap: Contains one bit for each address that can be the start of an object.
  • remembered set: A data structure that indicates heap locations outside the region that may contain pointers to objects in the region. Each region has an associated remembered set, which indicates all locations that might contain pointers to (live) objects within the region. Garbage-First remembered sets record pointers from all regions (with some exceptions). A current buffer or sequence of modified cards. A data structure that records pointers between generations.
  • root set: A set of objects which is known to be directly accessible. The locations from which all live objects are reachable.

Source Code

To understand the G1GC log better, I used the following OpenJDK 7 source files:

  • hotspot/src/share/vm/gc_implementation/g1/concurrentMarkThread.cpp
  • hotspot/src/share/vm/gc_implementation/g1/concurrentMark.cpp
  • hotspot/src/share/vm/gc_implementation/g1/concurrentMark.hpp
  • hotspot/src/share/vm/gc_implementation/g1/g1_specialized_oop_closures.hpp
  • hotspot/src/share/vm/gc_implementation/g1/g1CollectedHeap.cpp
  • hotspot/src/share/vm/gc_implementation/g1/g1CollectedHeap.hpp
  • hotspot/src/share/vm/gc_implementation/g1/g1CollectorPolicy.cpp
  • hotspot/src/share/vm/gc_implementation/g1/g1CollectorPolicy.hpp
  • hotspot/src/share/vm/gc_implementation/g1/g1OopClosures.hpp
  • hotspot/src/share/vm/gc_implementation/g1/g1OopClosures.inline.hpp
  • hotspot/src/share/vm/gc_implementation/g1/g1RemSet.cpp
  • hotspot/src/share/vm/memory/sharedHeap.cpp
  • hotspot/src/share/vm/memory/sharedHeap.hpp
  • hotspot/src/share/vm/utilities/taskqueue.hpp
  • hotspot/src/share/vm/runtime/timer.cpp
  • hotspot/src/share/vm/runtime/timer.hpp
  • hotspot/src/share/vm/gc_implementation/g1/vm_operations_g1.cpp
  • hotspot/src/share/vm/gc_implementation/g1/vm_operations_g1.hpp

Log Analysis

Below is an annotated copy of the G1GC log in question.

44900.297: [GC pause (young) (initial-mark), 0.08894851 secs]
^^^^^^^^^^^ Elapsed seconds from JVM start
           ^^^^^^^^^ Collection pause at safe-point
                    ^^^^^^^^ In full young GC mode
                            ^^^^^^^^^^^^^^^ Last pause included initial mark
                                           ^^^^^^^^^^^^^^^^^^ Elapsed seconds in method
44900.386: [GC concurrent-mark-start]
^^^^^^^^^^^ Elapsed seconds from JVM start
           ^^^^^^^^^^^^^^^^^^^^^^^^^^ Concurrent mark thread started
   [Parallel Time:  83.7 ms]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Elapsed milliseconds for all GC worker threads to finish
      [GC Worker Start Time (ms):  44900297.6  44900297.6  44900297.6  44900297.6  44900297.6  44900297.7  44900297.7  44900297.7  44900297.7  44900297.7  44900297.7  44900297.7  44900297.7
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^... Elapsed seconds from JVM start that GC worker threads were started
       Avg: 44900297.7, Min: 44900297.6, Max: 44900297.7, Diff:   0.1]
^^^^^^^^^^^^^^^^^^^^^^ Average GC worker thread start time (elapsed seconds from JVM start)
                      ^^^^^^^^^^^^^^^^^ Minimum GC worker thread start time (elapsed seconds from JVM start)
                                       ^^^^^^^^^^^^^^^^^ Maximum GC worker thread start time (elapsed seconds from JVM start)
                                                        ^^^^^^^^^^^^^^ Total seconds to start all GC worker threads
      [Update RS (ms):  23.5  24.3  25.0  25.0  23.9  24.4  25.2  24.1  25.7  24.7  24.8  24.4  24.7
^^^^^^^^^^^^^^^^^^^^^^^^... Elapsed milliseconds for each GC worker thread to update cards in remembered sets during an evacuation pause
       Avg:  24.6, Min:  23.5, Max:  25.7, Diff:   2.1]
^^^^^^^^^^^^^^^^^ Average GC worker thread milliseconds to update RS
                 ^^^^^^^^^^^^ Minimum GC worker thread milliseconds to update RS
                             ^^^^^^^^^^^^ Maximum GC worker thread milliseconds to update RS
                                         ^^^^^^^^^^^^^^ Minimum/maximum delta of GC worker thread milliseconds to update RS
         [Processed Buffers : 16 19 19 23 20 24 18 18 18 17 20 16 19
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^... Total remembered set buffers processed by each GC worker thread
          Sum: 247, Avg: 19, Min: 16, Max: 24, Diff: 8]
^^^^^^^^^^... Summary information for total remembered set buffers processed by all GC worker thread
      [Ext Root Scanning (ms):  2.2  2.7  2.2  2.6  3.0  3.1  2.2  1.1  2.3  3.0  2.2  2.4  2.9
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^... Elapsed milliseconds for each GC worker thread to process heap roots during an evacuation pause
       Avg:   2.4, Min:   1.1, Max:   3.1, Diff:   2.0]
^^^^^^^... Summary information for total elapsed milliseconds for all GC worker thread to process heap roots during an evacuation pause
      [Mark Stack Scanning (ms):  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^... Elapsed milliseconds for each GC worker thread to scan strong roots in the mark stack during an evacuation pause
       Avg:   0.0, Min:   0.0, Max:   0.0, Diff:   0.0]
^^^^^^^... Summary information for total milliseconds for all GC worker thread to scan strong roots in the mark stack during an evacuation pause
      [Scan RS (ms):  14.1  14.6  14.5  14.3  14.6  14.2  14.4  14.5  14.0  13.9  14.6  14.5  14.0
^^^^^^^^^^^^^^^^^^^^^^... Elapsed milliseconds for each GC worker thread to scan for dirty cards in a heap region to update the remembered set
       Avg:  14.3, Min:  13.9, Max:  14.6, Diff:   0.8]
^^^^^^^^^^^^^... Summary information for total elapsed milliseconds for all GC worker thread to scan for dirty cards in a heap region to update the remembered set
      [Object Copy (ms):  41.4  39.5  39.4  39.0  39.6  39.5  39.1  41.4  39.0  39.3  39.3  39.8  39.5
^^^^^^^^^^^^^^^^^^^^^^^^^^... Elapsed milliseconds for each GC worker thread to scan root sets for pointers to include in this heap region's remembered set during an evacuation pause
       Avg:  39.7, Min:  39.0, Max:  41.4, Diff:   2.4]
^^^^^^^^^^^^^... Summary information for total elapsed milliseconds for all GC worker thread to scan root sets for pointers to include in this heap region's remembered set during an evacuation pause
      [Termination (ms):  1.3  1.4  1.5  1.6  1.5  1.4  1.6  1.4  1.5  1.7  1.5  1.4  1.3
^^^^^^^^^^^^^^^^^^^^^^^^^^... Elapsed milliseconds for each GC worker thread to terminate scanning root sets for pointers to include in this heap region's remembered set during an evacuation pause
       Avg:   1.5, Min:   1.3, Max:   1.7, Diff:   0.4]
^^^^^^^^^^^^^^... Summary information for total elapsed milliseconds for all GC worker thread to terminate scanning root sets for pointers to include in this heap region's remembered set during an evacuation pause
         [Termination Attempts : 1185 1205 1219 1436 1171 1231 1471 1237 1461 1526 1353 1259 1170
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^... Total attempts for each GC worker thread to terminate scanning root sets for pointers to include in this heap region's remembered set during an evacuation pause
          Sum: 16924, Avg: 1301, Min: 1170, Max: 1526, Diff: 356]
^^^^^^^^^^^^^^^... Summary information for total attempts for all GC worker thread to terminate scanning root sets for pointers to include in this heap region's remembered set during an evacuation pause
      [GC Worker End Time (ms):  44900380.2  44900380.2  44900380.2  44900380.2  44900380.2  44900380.3  44900380.2  44900380.2  44900380.2  44900380.2  44900380.2  44900380.2  44900380.2
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^... Elapsed seconds from JVM start that GC worker threads ended
       Avg: 44900380.2, Min: 44900380.2, Max: 44900380.3, Diff:   0.1]
^^^^^^^^^^^^... Summary information for total elapsed seconds from JVM start that GC worker threads ended for all GC worker threads
      [GC Worker Times (ms):  82.6  82.6  82.6  82.6  82.6  82.6  82.5  82.6  82.5  82.5  82.5  82.5  82.5
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^... Elapsed milliseconds for each GC worker threads
       Avg:  82.6, Min:  82.5, Max:  82.6, Diff:   0.1]
^^^^^^^^^^^^^... Summary information for total elapsed milliseconds for all GC worker threads
      [Other:   1.2 ms]
^^^^^^^^^^^^^^^^^^^^^^^ Other milliseconds during GC parallel processing; basically, total milliseconds minus: update RS, ext root scanning, mark stack scanning, scan RS, object copy, termination
   [Clear CT:   0.5 ms]
^^^^^^^^^^^^^^^^^^^^^^^ Total milliseconds cleaning up the dirty card tables list
   [Other:   4.8 ms]
^^^^^^^^^^^^^^^^^^^^ Total milliseconds in this GC pause not spent in parallel and clear CT
      [Choose CSet:   0.0 ms]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Total milliseconds to choose young heap regions to add to the collection set
   [ 1331M->830M(1840M)]
^^^^^ Heap size change for this garbage collector
     ^^^^^ Total heap size before this collection pause
            ^^^^ Total heap size after this collection pause
                 ^^^^^ Total heap size capacity for this garbage collector
 [Times: user=1.07 sys=0.01, real=0.09 secs]
^^^^^^^^ Execution seconds (uses TraceCPUTime class; constructor initializes times to zero, destructor prints times)
        ^^^^^^^^^^ User execution seconds for all threads
                  ^^^^^^^^^ System execution seconds
                           ^^^^^^^^^^^ Real (wall clock) execution seconds
44901.205: [GC concurrent-mark-end, 0.8186002 sec]
^^^^^^^^^^^ Elapsed seconds from JVM start
           ^^^^^^^^^^^^^^^^^^^^^^^^ Concurrent mark thread ended
                                    ^^^^^^^^^^^^^ Total seconds executing concurrent mark thread
44901.205: [GC remark, 0.0258621 secs]
^^^^^^^^^^^ Elapsed seconds from JVM start
           ^^^^^^^^^^ Concurrent re-marking of all heap roots, final work
 [Times: user=0.02 sys=0.00, real=0.03 secs]
^^^^^^^^ Execution seconds (uses TraceCPUTime class; constructor initializes times to zero, destructor prints times)
        ^^^^^^^^^^ User execution seconds for all threads
                  ^^^^^^^^^ System execution seconds
                           ^^^^^^^^^^^ Real (wall clock) execution seconds
44901.231: [GC concurrent-count-start]
^^^^^^^^^^^ Elapsed seconds from JVM start
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^ Start concurrent marking of live objects
44901.479: [GC concurrent-count-end, 0.2478477]
^^^^^^^^^^^ Elapsed seconds from JVM start
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^ End concurrent marking of live objects
                                   ^^^^^^^^^^^^ Total seconds to do concurrent marking of live objects
44901.479: [GC cleanup 940M->931M(1840M), 0.0073079 secs]
^^^^^^^^^^^ Elapsed seconds from JVM start
           ^^^^^^^^^^^ Concurrent marking cleanup (world is stopped at this checkpoint)
                      ^... Heap size change for this garbage collector
                       ^^^^ Total heap size before this operation
                             ^^^^ Total heap size after this operation
                                  ^^^^^ Total heap size capacity for this garbage collector
                                          ^^^^^^^^^^^^^^ Total seconds executing concurrent marking cleanup

OpenJDK 7 G1GC

If you can build a debug copy of OpenJDK 7, you'll have a few more JVM options available to you that will give you even further information about G1GC. Use the following command to get a list of all JVM options:

java -XX:+AggressiveOpts -XX:+UnlockDiagnosticVMOptions -XX:+PrintFlagsFinal -XX:+PrintFlagsWithComments -version

The output shows the following G1GC options available; this is not an all inclusive list, just some I thought were intersting.

     intx G1CardCountCacheExpandThreshold           = 16                                  Expand the card count cache if the number of collisions for a particular entry exceeds this value.
    uintx G1ConcMarkForceOverflow                   = 0                                   The number of times we'll force an overflow during concurrent marking
   double G1ConcMarkStepDurationMillis              = 10.000000       {product}           Target duration of individual concurrent marking steps in milliseconds.
     intx G1ConcRSHotCardLimit                      = 4                                   The threshold that defines (>=) a hot card.
     intx G1ConcRSLogCacheSize                      = 10                                  Log base 2 of the length of conc RS hot-card cache.
     bool G1ConcRegionFreeingVerbose                = false                               Enables verboseness during concurrent region freeing
     intx G1ConfidencePercent                       = 50              {product}           Confidence level for MMU/pause predictions
     bool G1DeferredRSUpdate                        = true                                If true, use deferred RS updates
     bool G1FixedEdenSize                           = false                               When set, G1 will not allocate unused survivor space regions
    uintx G1FixedSurvivorSpaceSize                  = 0                                   If non-0 is the size of the G1 survivor space, otherwise SurvivorRatio is used to determine the size
     bool G1FixedTenuringThreshold                  = false                               When set, G1 will not adjust the tenuring threshold
     bool G1Gen                                     = true                                If true, it will enable the generational G1
    uintx G1HeapRegionSize                          = 0               {product}           Size of the G1 regions.
     intx G1InitYoungSurvRatio                      = 50                                  Expected Survival Rate for newly allocated bytes
     intx G1MarkRegionStackSize                     = 1048576         {product}           Size of the region stack for concurrent marking.
     intx G1MarkingOverheadPercent                  = 0                                   Overhead of concurrent marking
     intx G1MarkingVerboseLevel                     = 0                                   Level (0-4) of verboseness of the marking code
     intx G1MaxHotCardCountSizePercent              = 25                                  The maximum size of the hot card count cache as a percentage of the number of cards for the maximum heap.
     intx G1MaxVerifyFailures                       = -1                                  The maximum number of verification failrues to print.  -1 means print all.
     intx G1PausesBtwnConcMark                      = -1                                  If positive, fixed number of pauses between conc markings
     intx G1PolicyVerbose                           = 0                                   The verbosity level on G1 policy decisions
     bool G1PrintCTFilterStats                      = false                               If true, print stats on RS filtering effectiveness
     bool G1PrintHeapRegions                        = false           {diagnostic}        If set G1 will print information on which regions are being allocated and which are reclaimed.
     bool G1PrintOopAppls                           = false                               When true, print applications of closures to external locs.
     bool G1PrintParCleanupStats                    = false                               When true, print extra stats about parallel cleanup.
     bool G1PrintReachableAtInitialMark             = false                               Reachable object dump at the initial mark pause
    ccstr G1PrintReachableBaseFile                  =                                     The base file name for the reachable object dumps
     bool G1PrintRegionLivenessInfo                 = false           {product}           Prints the liveness information for all regions in the heap at the end of a marking cycle.
     bool G1RSCountHisto                            = false                               If true, print a histogram of RS occupancies after each pause
     bool G1RSLogCheckCardTable                     = false                               If true, verify that no dirty cards remain after RS log processing.
     bool G1RSScrubVerbose                          = false                               When true, do RS scrubbing with verbose output.
     intx G1RSetRegionEntries                       = 0               {product}           Max number of regions for which we keep bitmaps.Will be set ergonomically by default
     intx G1RSetRegionEntriesBase                   = 256                                 Max number of regions in a fine-grain table per MB.
    uintx G1RSetScanBlockSize                       = 64              {product}           Size of a work unit of cards claimed by a worker threadduring RSet scanning.
     intx G1RSetSparseRegionEntries                 = 0               {product}           Max number of entries per region in a sparse table.Will be set ergonomically by default.
     intx G1RSetSparseRegionEntriesBase             = 4                                   Max number of entries per region in a sparse table per MB.
     intx G1RSetUpdatingPauseTimePercent            = 10              {product}           A target percentage of time that is allowed to be spend on process RS update buffers during the collection pause.
     bool G1RecordHRRSEvents                        = false                               When true, record recent calls to rem set operations.
     bool G1RecordHRRSOops                          = false                               When true, record recent calls to rem set operations.
     intx G1RefProcDrainInterval                    = 10              {product}           The number of discovered reference objects to process before draining concurrent marking work queues.
     intx G1ReservePercent                          = 10              {product}           It determines the minimum reserve we should have in the heap to minimize the probability of promotion failure.
    uintx G1SATBBufferEnqueueingThresholdPercent    = 60              {product}           Before enqueueing them, each mutator thread tries to do some filtering on the SATB buffers it generates. If post-filtering the percentage of retained entries is over this threshold the buffer will be enqueued for processing. A value of 0 specifies that mutator threads should not do such filtering.
     intx G1SATBBufferSize                          = 1024            {product}           Number of entries in an SATB log buffer.
     bool G1SATBPrintStubs                          = false                               If true, print generated stubs for the SATB barrier
     intx G1SATBProcessCompletedThreshold           = 20                                  Number of completed buffers that triggers log processing.
     bool G1ScrubRemSets                            = true                                When true, do RS scrubbing after cleanup.
    uintx G1SecondaryFreeListAppendLength           = 5                                   The number of regions we will add to the secondary free list at every append operation
     bool G1StressConcRegionFreeing                 = false                               It stresses the concurrent region freeing operation
    uintx G1StressConcRegionFreeingDelayMillis      = 0                                   Artificial delay during concurrent region freeing
     bool G1SummarizeConcMark                       = false           {diagnostic}        Summarize concurrent mark info
     bool G1SummarizeRSetStats                      = false           {diagnostic}        Summarize remembered set processing info
     intx G1SummarizeRSetStatsPeriod                = 0               {diagnostic}        The period (in number of GCs) at which we will generate update buffer processing info (0 means do not periodically generate this info); it also requires -XX:+G1SummarizeRSetStats
     bool G1TraceConcRefinement                     = false           {diagnostic}        Trace G1 concurrent refinement
     bool G1TraceMarkStackOverflow                  = false                               If true, extra debugging code for CM restart for ovflw.
     intx G1UpdateBufferSize                        = 256             {product}           Size of an update buffer
     bool G1VerifyDuringGCPrintReachable            = false                               If conc mark verification fails, dump reachable objects
     intx G1YoungSurvRateNumRegionsSummary          = 0                                   the number of regions for which we'll print a surv rate summary.
     bool G1YoungSurvRateVerbose                    = false                               print out the survival rate of young regions according to age.

To see what kind of G1GC information is provided using the OpenJDK 7 debug build, I used the following test program:

import java.lang.ref.WeakReference;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;
import java.util.Random;

public class G1GCTest implements Runnable {
    private int iterations = 0;
    private final Random rnd = new Random();
    private final List<Object> young = new ArrayList<Object>(1000);
    private final List<WeakReference<Object>> old =
            new ArrayList<WeakReference<Object>>(100000);
    
    private void clearOld() {
        int clearedCnt = 0;
        for (final Iterator<WeakReference<Object>> iter = old.iterator(); iter
                .hasNext();)
        {
            final WeakReference<Object> ref = iter.next();
            if (null == ref.get()) {
                iter.remove();
                clearedCnt++;
            }
        }
        if (0 < clearedCnt) {
            System.out.println("Cleared " + clearedCnt
                    + " weak references to old objects.");
        }
    }
    
    public void run() {
        if (0 == ++iterations % 10000) {
            System.out.println("iterations=" + iterations + ", young.size()="
                    + young.size() + ", old.size()=" + old.size());
            clearOld();
            try {
                Thread.sleep(100);
            } catch (final Throwable e) {
                // Do nothing!
            }
        }
        if (rnd.nextBoolean()) {
            young.add(new byte[1000]);
        }
        if (rnd.nextBoolean() && !young.isEmpty()) {
            final int nextInt = Math.abs(rnd.nextInt());
            final int idx = nextInt % young.size();
            final Object obj = young.remove(idx);
            old.add(new WeakReference<Object>(obj));
        }
    }
    
    public static void main(final String[] args) {
        final G1GCTest t = new G1GCTest();
        while (1000000 > t.iterations) {
            t.run();
        }
    }
}

And executed it using:

java -Xms8m -Xmx8m -XX:+UnlockExperimentalVMOptions -XX:+UseG1GC -XX:+PrintGC -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -XX:+G1SummarizeConcMark -XX:+G1SummarizeRSetStats -XX:+G1YoungSurvRateVerbose -XX:G1PolicyVerbose=2 -verbose:gc G1GCTest

The output of which was very verbose, but provides a slew of information that could prove useful if you're doing some GC tuning.