G1: One Garbage Collector To Rule Them All

Many articles describe how a poorly tuned garbage collector can bring an application's Service Level Agreement (SLA) commitments to its knees. For example, an unpredictably protracted garbage collection pause can easily exceed the response-time requirements of an otherwise performant application. Moreover, the irregularity increases when you have a non-compacting Garbage Collector (GC) such as Concurrent Mark and Sweep (CMS) that tries to reclaim its fragmented heap with a serial (single-threaded) full garbage collection that is stop-the-world (STW).
Let us now expand on the above paragraph: Suppose an allocation failure in the young generation triggers a young collection, leading to promotions to the old generation. Further, suppose that the fragmented old generation has insufficient space for the newly promoted objects. Such conditions would trigger a full garbage collection cycle, which will perform compaction of the heap.
With CMS GC, the full collection is serial and STW, hence your application threads are stopped for the entire duration while the heap space is reclaimed and then compacted. The duration for the STW pause depends on your heap size and the surviving objects.

Alternatively, even if you do have parallel (multi-threaded) compaction to combat fragmentation, you still end up with a full garbage collection (that involves all the generations of the Java heap), when it might have been sufficient to just reclaim some of the free space from the old generation.
This is a common scenario with Parallel Old GC. With Parallel Old, the reclamation of old generation is with a parallel STW full garbage collection pause. This full garbage collection is not incremental; it is one big STW pause and does not interleave with the application execution.

With the above information, we would like to consider one solution in the form of the "Garbage First” (G1) collector, HotSpot's latest GC (introduced in JDK7 update 4).
G1 GC is an incremental parallel compacting GC that provides more predictable pause times compared to CMS GC and Parallel Old GC. By introducing a parallel, concurrent and multi-phased marking cycle, G1 GC can work with much larger heaps while providing reasonable worst-case pause times. The basic idea with G1 GC is to set your heap ranges (using -Xms for min heap size and -Xmx for the max size) and a realistic (soft real time) pause time goal (using -XX:MaxGCPauseMillis) and then let the GC do its job.
With the introduction of G1 GC, HotSpot moves away from its conventional GC layout where a contiguous Java heap splits into (contiguous) young and old generations. In G1 GC, HotSpot introduces the concept of “regions”. A single large contiguous Java heap space divides into multiple fixed-sized heap regions. A list of “free” regions maintains these regions. As the need arises, the free regions are assigned to either the young or the old generation. These regions can span from 1MB to 32MB in size depending on your total Java heap size. The goal is to have around 2048 regions for the total heap. Once a region frees up, it goes back to the "free" regions list. The principle of G1 GC is to reclaim the Java heap as much as possible (while trying its best to meet the pause time goal) by collecting the regions with the least amount of live data i.e. the ones with most garbage, first; hence the name Garbage First.

Fig. 1: Conventional GC Layout

Fig. 1: Conventional GC Layout
One thing to note is that for G1 GC, neither the young nor the old generation has to be contiguous. This is a handy feature since the sizing of the generation is now more dynamic.
Adaptive sized GC algorithms like the Parallel Old GC, end up reserving the extra space that may be required by each generation so that they can fit in their contiguous space constraint. In case of CMS, a full garbage collection is required to resize the Java heap and the generations.
In contrast, G1 GC uses logical generations (a collection of non-contiguous regions of the young generation and a remainder in the old generation), so there is not much wastage in space or time.
To be sure, the G1 GC algorithm does utilize some of HotSpot’s basic concepts. For example, the concepts of allocation, copying to survivor space and promotion to old generation are similar to previous HotSpot GC implementations. Eden regions and survivor regions still make up the young generation. Most allocations happen in eden except for “humongous” allocations. (Note: For G1 GC, objects that span more than half a region size are considered “Humongous objects” and are directly allocated into “humongous” regions out of the old generation.) G1 GC selects an adaptive young generation size based on your pause time goal. The young generation can range anywhere from the preset min to the preset max sizes, that are a function of the Java heap size. When eden reaches capacity, a “young garbage collection”, also known as an “evacuation pause”, will occur. This is a STW pause that copies (evacuates) the live objects from the regions that make up the eden, to the 'to-space' survivor regions.


Fig. 2: Garbage First GC Layout
In addition, live objects from the 'from-space' survivor regions will be either copied to the 'to-space' survivor regions or, based on the object's age and the 'tenuring threshold', will be promoted to region(s) from the old generation space.
Every young collection involves parallel worker time and sequential/serial time. To explain this further, I will use a log output from the latest Java 7 update release, which at the time of this publication is 7u25. (We also have an Early Access (EA) for 7u40. Please feel free to try out the EA bundles for your platform. With 7u40 EA, you may see a difference in the log format, but the basic premise remains the same.)
The following command line options generated the GC log output thereafter –
java –Xmx1G –Xms1G –XX:+UseG1GC –XX:+PrintGCDetails –XX:+PrintGCTimeStamps GCTestBench
Note: I went with the default pause time goal of 200ms.

0.189: [GC pause (young), 0.00080776 secs]
   [Parallel Time: 0.4 ms]
      [GC Worker Start (ms): 188.7 188.7 188.8 188.8
       Avg: 188.8, Min: 188.7, Max: 188.8, Diff: 0.1]
      [Ext Root Scanning (ms): 0.2 0.2 0.2 0.1
       Avg: 0.2, Min: 0.1, Max: 0.2, Diff: 0.1]
      [Update RS (ms): 0.0 0.0 0.0 0.0
       Avg: 0.0, Min: 0.0, Max: 0.0, Diff: 0.0]
         [Processed Buffers : 0 0 0 1
          Sum: 1, Avg: 0, Min: 0, Max: 1, Diff: 1]
      [Scan RS (ms): 0.0 0.0 0.0 0.0
       Avg: 0.0, Min: 0.0, Max: 0.0, Diff: 0.0]
      [Object Copy (ms): 0.2 0.2 0.1 0.2
       Avg: 0.2, Min: 0.1, Max: 0.2, Diff: 0.0]
      [Termination (ms): 0.0 0.0 0.0 0.0
       Avg: 0.0, Min: 0.0, Max: 0.0, Diff: 0.0]
         [Termination Attempts : 1 2 1 2
          Sum: 6, Avg: 1, Min: 1, Max: 2, Diff: 1]
      [GC Worker End (ms): 189.1 189.1 189.1 189.1
       Avg: 189.1, Min: 189.1, Max: 189.1, Diff: 0.0]
      [GC Worker (ms): 0.4 0.4 0.3 0.3
       Avg: 0.4, Min: 0.3, Max: 0.4, Diff: 0.1]
      [GC Worker Other (ms): 0.0 0.0 0.1 0.1
       Avg: 0.1, Min: 0.0, Max: 0.1, Diff: 0.1]
   [Clear CT: 0.2 ms]
   [Other: 0.2 ms]
      [Choose CSet: 0.0 ms]
      [Ref Proc: 0.2 ms]
      [Ref Enq: 0.0 ms]
      [Free CSet: 0.0 ms]

The indentation demarcates the parallel and the **sequential **work groups. The parallel worker time is further split into -
External Root Scanning:
The time spent by the parallel GC worker threads in scanning the external roots such as registers, thread stacks, etc that point into the Collection Set.
Update Remembered Sets (RSets):
RSets aid G1 GC in tracking reference that point into a region. The time shown here is the amount of time the parallel worker threads spent in updating the RSets.
Processed Buffers:
The count shows how many ‘Update Buffers’ were processed by the worker threads.
Scan RSets:
The time spent in Scanning the RSets for references into a region. This time will depend on the “coarseness” of the RSet data structures.
Object Copy:
During every young collection, the GC copies all live data from the eden and ‘from-space’ survivor, either to the regions in the ‘to-space’ survivor or to the old generation regions. The amount of time it takes the worker threads to complete this task is listed here.
Termination:
After completing their particular work (e.g. object scan and copy), each worker thread enters its ‘termination protocol’. Prior to terminating, the worker thread looks for work from the other threads to steal and terminates when there is none. The time listed here indicates the time spent by the worker threads offering to terminate.
Parallel worker ‘Other’ time:
Time spent by the worker threads that was not accounted in any of the parallel activities listed above.

The sequential work (which could be parallelized, individually) is divided into -
Clear CT: Time spent by the GC worker threads in clearing the Card Table of RSet scanning meta-data.
And a few others clubbed under the ‘Other’ time, comprised of:

Choose Collection Set (CSet): A garbage collection cycle collects the set of regions in the CSet. The collection pause collects/evacuates all the live data in a particular CSet. The time listed here is the time spent in finalizing the set of regions added to the CSet.

Reference Processing: The time spent in processing the deferred references (soft, weak, final and phantom) from the prior garbage collection phases.

Reference En-queuing: The time spent in placing the references on to the pending list.

Free CSet: Time spent in freeing the just collected set of regions. This includes the time spent in freeing their RSets as well.

I have just skimmed the surface with respect to many things like the RSets, its coarsening, the update buffers, the CSet, and in the next few paragraphs there will be a few more things like the Snapshot-At-The-Beginning (SATB) algorithm and barriers, etc. However, in-order to learn more about them, we would have to “deep dive” into the internals of G1 GC, an interesting topic that is outside the scope of this article.
Now that we understand how the young collections start filling up the old generation, we need to introduce (and understand) the concept of a ‘marking threshold’. When the occupancy of the total heap crosses this threshold, G1 GC will trigger a multi-phased concurrent marking cycle. The command line option that sets the threshold is –XX:InitiatingHeapOccupancyPercent and it defaults to 45 percent of the total Java heap size. G1 GC uses a marking algorithm called Snapshot-At-The-Beginning (SATB) that takes a logical snapshot of the set of live objects in the heap at the ‘beginning’ of the marking cycle. This algorithm uses a pre-write barrier to record and mark the objects that are a part of the logical snapshot. Now let us spend some time discussing the individual phases of the multi-phased concurrent marking and first a look at the output from the GC log:
0.078: [GC pause (young) (initial-mark
), 0.00262460 secs][Parallel Time: 2.3 ms][GC Worker Start (ms): 78.1 78.2 78.2 78.2Avg: 78.2, Min: 78.1, Max: 78.2, Diff: 0.1][Ext Root Scanning (ms): 0.2 0.1 0.2 0.1Avg: 0.2, Min: 0.1, Max: 0.2, Diff: 0.1][Update RS (ms): 0.2 0.2 0.2 0.2Avg: 0.2, Min: 0.2, Max: 0.2, Diff: 0.0][Processed Buffers : 2 3 2 2Sum: 9, Avg: 2, Min: 2, Max: 3, Diff: 1][Scan RS (ms): 0.0 0.0 0.0 0.0Avg: 0.0, Min: 0.0, Max: 0.0, Diff: 0.0][Object Copy (ms): 1.8 1.8 1.8 1.8Avg: 1.8, Min: 1.8, Max: 1.8, Diff: 0.0][Termination (ms): 0.0 0.0 0.0 0.0Avg: 0.0, Min: 0.0, Max: 0.0, Diff: 0.0][Termination Attempts : 1 1 1 1Sum: 4, Avg: 1, Min: 1, Max: 1, Diff: 0][GC Worker End (ms): 80.4 80.4 80.4 80.4Avg: 80.4, Min: 80.4, Max: 80.4, Diff: 0.0][GC Worker (ms): 2.2 2.2 2.2 2.2Avg: 2.2, Min: 2.2, Max: 2.2, Diff: 0.1][GC Worker Other (ms): 0.0 0.1 0.1 0.1Avg: 0.1, Min: 0.0, Max: 0.1, Diff: 0.1][Clear CT: 0.2 ms][Other: 0.2 ms][Choose CSet: 0.0 ms][Ref Proc: 0.1 ms][Ref Enq: 0.0 ms][Free CSet: 0.0 ms][Eden: 3072K(5120K)->0B(5120K) Survivors: 1024K->1024K Heap: 16M(32M)->16M(32M)][Times: user=0.06 sys=0.00, real=0.00 secs]0.081: [GC concurrent-root-region-scan-start
]0.082: [GC concurrent-root-region-scan-end, 0.0009122
]0.082: [GC concurrent-mark-start
]<snip> [Zero or more embedded young garbage collections are possible here,but removed for brevity.]
0.094: [GC concurrent-mark-end, 0.0115579 sec
]0.094: [GC remark 0.094: [GC ref-proc, 0.0000033 secs], 0.0004374 secs]
[Times: user=0.00 sys=0.00, real=0.00 secs
] 0.094: [**GC cleanup 22M->10M(32M), 0.0003031 secs
**]
[
**Times: user=0.00 sys=0.00, real=0.00 secs
*]0.095: [GC concurrent-cleanup-start
]0.095: [GC concurrent-cleanup-end, 0.0000350
*]
In addition, here are the details:
The Initial Mark Phase
– G1 GC marks the roots during the initial-mark phase. This is what the first line of output above is telling us. The initial-mark phase is piggy backed (done at the same time) on a normal (STW) young garbage collection. Hence, the output is similar to what you see during a young evacuation pause.
The Root Region Scanning Phase
– During this phase, G1 GC scans survivor regions of the initial mark phase for references into the old generation and marks the referenced objects. This phase runs concurrently (not STW) with the application. It is important that this phase complete before the next young garbage collection happens.
The Concurrent Marking Phase
– During this phase, G1 GC looks for reachable (live) objects across the entire Java heap. This phase happens concurrently with the application and a young garbage collection can interrupt the concurrent marking phase (shown above
).
The Remark Phase
– The remark phase helps the completion of marking. During this STW phase, G1 GC drains any remaining SATB buffers and traces any as-yet unvisited live objects. G1 GC also does reference processing during the remark phase.
The Cleanup Phase
– This is the final phase of the multi-phase marking cycle. It is **partly STW
**when G1 GC does live-ness accounting (to identify completely free regions and mixed garbage collection candidate regions) and when G1 GC scrubs the RSets. It is *partly concurrent

  • when G1 GC resets and returns the empty regions to the free list.

Once G1 GC successfully completes the concurrent marking cycle, it has the information that it needs to start the old generation collection. Up until now, the collection of the old regions was not possible since G1 GC did not have any marking information associated with those regions. A collection that facilitates the compaction and evacuation of old generation is appropriately called a 'mixed' collection since G1 GC not only collects the eden and the survivor regions, but also (optionally) adds old regions to the mix. Let us now discuss some details that are important to understand a mixed collection.
A mixed collection can (and usually does) happen over multiple mixed garbage collection cycles. When a sufficient number of old regions are collected, G1 GC reverts to performing the young garbage collections until the next marking cycle completes. A number of flags listed and defined here control the exact number of old regions added to the CSets:
–XX:G1MixedGCLiveThresholdPercent: The occupancy threshold of live objects in the old region to be included in the mixed collection.
–XX:G1HeapWastePercent: The threshold of garbage that you can tolerate in the heap.
–XX:G1MixedGCCountTarget: The target number of mixed garbage collections within which the regions with at most G1MixedGCLiveThresholdPercent live data should be collected.
–XX:G1OldCSetRegionThresholdPercent: A limit on the max number of old regions that can be collected during a mixed collection.
Let us look at a mixed collection cycle output from a G1 GC log:
1.269: [GC pause (mixed
), 0.00373874 secs][Parallel Time: 3.0 ms][GC Worker Start (ms): 1268.9 1268.9 1268.9 1268.9 Avg: 1268.9, Min: 1268.9, Max: 1268.9, Diff: 0.0][Ext Root Scanning (ms): 0.2 0.2 0.2 0.1Avg: 0.2, Min: 0.1, Max: 0.2, Diff: 0.1] [Update RS (ms): 0.0 0.0 0.0 0.0Avg: 0.0, Min: 0.0, Max: 0.0, Diff: 0.0][Processed Buffers : 0 0 0 1Sum: 1, Avg: 0, Min: 0, Max: 1, Diff: 1][Scan RS (ms): 0.1 0.0 0.0 0.1Avg: 0.1, Min: 0.0, Max: 0.1, Diff: 0.1][Object Copy (ms): 2.6 2.7 2.7 2.6Avg: 2.7, Min: 2.6, Max: 2.7, Diff: 0.1][Termination (ms): 0.1 0.1 0.0 0.1Avg: 0.0, Min: 0.0, Max: 0.1, Diff: 0.1][Termination Attempts : 2 1 2 2Sum: 7, Avg: 1, Min: 1, Max: 2, Diff: 1] [GC Worker End (ms): 1271.9 1271.9 1271.9 1271.9Avg: 1271.9, Min: 1271.9, Max: 1271.9, Diff: 0.0][GC Worker (ms): 3.0 3.0 3.0 2.9Avg: 3.0, Min: 2.9, Max: 3.0, Diff: 0.0][GC Worker Other (ms): 0.1 0.1 0.1 0.1Avg: 0.1, Min: 0.1, Max: 0.1, Diff: 0.0][Clear CT: 0.1 ms][Other: 0.6 ms][Choose CSet: 0.0 ms][Ref Proc: 0.1 ms][Ref Enq: 0.0 ms][Free CSet: 0.3 ms]
In summary, G1 improves upon its predecessor GCs by introducing the concept of regions that make up a logical generation. The regions help provide finer granularity for an incremental collection of the old generation. G1 does most of its reclamation through copying of the live data, thus achieving compaction. This is definitely a step up from in-space de-allocation without compaction, which lends the old generation looking like Swiss cheese! J
The first level of reclamation happens during the Cleanup phase (of the multi-phased marking cycle) when the completely free (i.e. full of garbage) regions are reclaimed and returned to the free list. The next level happens during the incremental mixed garbage collections. If all else fails, the entire Java heap is collected. This is the well-known fail-safe full garbage collection.
All of the above makes the reclamation of the old generation a lot easier and in a way tiered.
I hope this article helped in painting a basic picture of the differences and the makeup of G1 GC. Thank you for tuning in!
Editor's note: Please stay tuned for part 2, coming in September 2013, where Monica will discuss some advanced topics and offer some advice about how to use these metrics to tune your application performance.

最后編輯于
?著作權歸作者所有,轉載或內容合作請聯系作者
平臺聲明:文章內容(如有圖片或視頻亦包括在內)由作者上傳并發布,文章內容僅代表作者本人觀點,簡書系信息發布平臺,僅提供信息存儲服務。
  • 序言:七十年代末,一起剝皮案震驚了整個濱河市,隨后出現的幾起案子,更是在濱河造成了極大的恐慌,老刑警劉巖,帶你破解...
    沈念sama閱讀 228,443評論 6 532
  • 序言:濱河連續發生了三起死亡事件,死亡現場離奇詭異,居然都是意外死亡,警方通過查閱死者的電腦和手機,發現死者居然都...
    沈念sama閱讀 98,530評論 3 416
  • 文/潘曉璐 我一進店門,熙熙樓的掌柜王于貴愁眉苦臉地迎上來,“玉大人,你說我怎么就攤上這事。” “怎么了?”我有些...
    開封第一講書人閱讀 176,407評論 0 375
  • 文/不壞的土叔 我叫張陵,是天一觀的道長。 經常有香客問我,道長,這世上最難降的妖魔是什么? 我笑而不...
    開封第一講書人閱讀 62,981評論 1 312
  • 正文 為了忘掉前任,我火速辦了婚禮,結果婚禮上,老公的妹妹穿的比我還像新娘。我一直安慰自己,他們只是感情好,可當我...
    茶點故事閱讀 71,759評論 6 410
  • 文/花漫 我一把揭開白布。 她就那樣靜靜地躺著,像睡著了一般。 火紅的嫁衣襯著肌膚如雪。 梳的紋絲不亂的頭發上,一...
    開封第一講書人閱讀 55,204評論 1 324
  • 那天,我揣著相機與錄音,去河邊找鬼。 笑死,一個胖子當著我的面吹牛,可吹牛的內容都是我干的。 我是一名探鬼主播,決...
    沈念sama閱讀 43,263評論 3 441
  • 文/蒼蘭香墨 我猛地睜開眼,長吁一口氣:“原來是場噩夢啊……” “哼!你這毒婦竟也來了?” 一聲冷哼從身側響起,我...
    開封第一講書人閱讀 42,415評論 0 288
  • 序言:老撾萬榮一對情侶失蹤,失蹤者是張志新(化名)和其女友劉穎,沒想到半個月后,有當地人在樹林里發現了一具尸體,經...
    沈念sama閱讀 48,955評論 1 336
  • 正文 獨居荒郊野嶺守林人離奇死亡,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內容為張勛視角 年9月15日...
    茶點故事閱讀 40,782評論 3 354
  • 正文 我和宋清朗相戀三年,在試婚紗的時候發現自己被綠了。 大學時的朋友給我發了我未婚夫和他白月光在一起吃飯的照片。...
    茶點故事閱讀 42,983評論 1 369
  • 序言:一個原本活蹦亂跳的男人離奇死亡,死狀恐怖,靈堂內的尸體忽然破棺而出,到底是詐尸還是另有隱情,我是刑警寧澤,帶...
    沈念sama閱讀 38,528評論 5 359
  • 正文 年R本政府宣布,位于F島的核電站,受9級特大地震影響,放射性物質發生泄漏。R本人自食惡果不足惜,卻給世界環境...
    茶點故事閱讀 44,222評論 3 347
  • 文/蒙蒙 一、第九天 我趴在偏房一處隱蔽的房頂上張望。 院中可真熱鬧,春花似錦、人聲如沸。這莊子的主人今日做“春日...
    開封第一講書人閱讀 34,650評論 0 26
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽。三九已至,卻和暖如春,著一層夾襖步出監牢的瞬間,已是汗流浹背。 一陣腳步聲響...
    開封第一講書人閱讀 35,892評論 1 286
  • 我被黑心中介騙來泰國打工, 沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留,地道東北人。 一個月前我還...
    沈念sama閱讀 51,675評論 3 392
  • 正文 我出身青樓,卻偏偏與公主長得像,于是被迫代替她去往敵國和親。 傳聞我的和親對象是個殘疾皇子,可洞房花燭夜當晚...
    茶點故事閱讀 47,967評論 2 374

推薦閱讀更多精彩內容

  • **2014真題Directions:Read the following text. Choose the be...
    又是夜半驚坐起閱讀 9,708評論 0 23
  • 轉自 http://tech.meituan.com/g1.html 前言G1 GC,全稱Garbage-Firs...
    noexceptionsir閱讀 1,604評論 0 11
  • 你很安靜,從不打擾我,在我碼字時,在我閱讀時 你就那樣靜靜旳默默地注視著我 看著我一點一點也變得越來越安靜 寡言少...
    密小度閱讀 300評論 0 1
  • 談談我的一些想法,和我對導圖的實際應用 1.思維導圖用于時間管理 這方面我主要是用思維導圖做一天的規劃。就是每天早...
    陽光小花閱讀 577評論 9 13
  • 1.“罌粟是美麗的,有罪的只是吸毒的人。”每次想寫點東西的時候,總會想起這句存在我腦海中十余年的話。 2.今天鄭州...
    chbbing8641閱讀 201評論 1 1