簡(jiǎn)書(shū) 滌生。
轉(zhuǎn)載請(qǐng)注明原創(chuàng)出處,謝謝!
如果讀完覺(jué)得有收獲的話,歡迎點(diǎn)贊加關(guān)注。
前言
經(jīng)常有同學(xué)會(huì)問(wèn),為啥我的應(yīng)用 Old Gen 沒(méi)到 CMSInitiatingOccupancyFraction 參數(shù)配置的閾值,就觸發(fā)了 CMS GC,表示很莫名奇妙,不知道問(wèn)題出在哪?
其實(shí) CMS GC 的觸發(fā)條件非常多,不只是 CMSInitiatingOccupancyFraction 閾值觸發(fā)這么簡(jiǎn)單。本文通過(guò)源碼全面梳理了觸發(fā) CMS GC 的條件,盡可能的幫你了解平時(shí)遇到的奇奇怪怪的 CMS GC 問(wèn)題。
先拋出一些問(wèn)題,來(lái)吸引你的注意力。
為什么 Old Gen 使用占比僅 50% 就進(jìn)行了一次 CMS GC?
Metaspace 的使用也會(huì)觸發(fā) CMS GC 嗎?
為什么 Old Gen 使用占比非常小就進(jìn)行了一次 CMS GC?
觸發(fā)條件
CMS GC 在實(shí)現(xiàn)上分成 foreground collector 和 background collector。foreground collector 相對(duì)比較簡(jiǎn)單,background collector 比較復(fù)雜,情況比較多。
下面我們從 foreground collector 和 background collector 分別來(lái)說(shuō)明他們的觸發(fā)條件:
說(shuō)明:本文內(nèi)容是基于 JDK 8
說(shuō)明:本文僅涉及 CMS GC 的觸發(fā)條件,至于算法的具體過(guò)程,以及什么時(shí)候進(jìn)行 MSC(mark sweep compact)不在本文范圍
foreground collector
foreground collector 觸發(fā)條件比較簡(jiǎn)單,一般是遇到對(duì)象分配但空間不夠,就會(huì)直接觸發(fā) GC,來(lái)立即進(jìn)行空間回收。采用的算法是 mark sweep,不壓縮。
background collector
說(shuō)明 background collector 的觸發(fā)條件之前,先來(lái)說(shuō)下 background collector 的流程,它是通過(guò) CMS 后臺(tái)線程不斷的去掃描,過(guò)程中主要是判斷是否符合 background collector 的觸發(fā)條件,一旦有符合的情況,就會(huì)進(jìn)行一次 background 的 collect。
void ConcurrentMarkSweepThread::run() {
...//省略
while (!_should_terminate) {
sleepBeforeNextCycle();
if (_should_terminate) break;
GCCause::Cause cause = _collector->_full_gc_requested ?
_collector->_full_gc_cause : GCCause::_cms_concurrent_mark;
_collector->collect_in_background(false, cause);
}
...//省略
}
每次掃描過(guò)程中,先等 CMSWaitDuration 時(shí)間,然后再去進(jìn)行一次 shouldConcurrentCollect 判斷,看是否滿足 CMS background collector 的觸發(fā)條件。CMSWaitDuration 默認(rèn)時(shí)間是 2s(經(jīng)常會(huì)有業(yè)務(wù)遇到頻繁的 CMS GC,注意看每次 CMS GC 之間的時(shí)間間隔,如果是 2s,那基本就可以斷定是 CMS 的 background collector)。
void ConcurrentMarkSweepThread::sleepBeforeNextCycle() {
while (!_should_terminate) {
if (CMSIncrementalMode) {
icms_wait();
if(CMSWaitDuration >= 0) {
// Wait until the next synchronous GC, a concurrent full gc
// request or a timeout, whichever is earlier.
wait_on_cms_lock_for_scavenge(CMSWaitDuration);
}
return;
} else {
if(CMSWaitDuration >= 0) {
// Wait until the next synchronous GC, a concurrent full gc
// request or a timeout, whichever is earlier.
wait_on_cms_lock_for_scavenge(CMSWaitDuration);
} else {
// Wait until any cms_lock event or check interval not to call shouldConcurrentCollect permanently
wait_on_cms_lock(CMSCheckInterval);
}
}
// Check if we should start a CMS collection cycle
if (_collector->shouldConcurrentCollect()) {
return;
}
// .. collection criterion not yet met, let's go back
// and wait some more
}
}
那 shouldConcurrentCollect() 方法中都有哪些條件呢?
bool CMSCollector::shouldConcurrentCollect() {
// 第一種觸發(fā)情況
if (_full_gc_requested) {
if (Verbose && PrintGCDetails) {
gclog_or_tty->print_cr("CMSCollector: collect because of explicit "
" gc request (or gc_locker)");
}
return true;
}
// For debugging purposes, change the type of collection.
// If the rotation is not on the concurrent collection
// type, don't start a concurrent collection.
NOT_PRODUCT(
if (RotateCMSCollectionTypes &&
(_cmsGen->debug_collection_type() !=
ConcurrentMarkSweepGeneration::Concurrent_collection_type)) {
assert(_cmsGen->debug_collection_type() !=
ConcurrentMarkSweepGeneration::Unknown_collection_type,
"Bad cms collection type");
return false;
}
)
FreelistLocker x(this);
// ------------------------------------------------------------------
// Print out lots of information which affects the initiation of
// a collection.
if (PrintCMSInitiationStatistics && stats().valid()) {
gclog_or_tty->print("CMSCollector shouldConcurrentCollect: ");
gclog_or_tty->stamp();
gclog_or_tty->print_cr("");
stats().print_on(gclog_or_tty);
gclog_or_tty->print_cr("time_until_cms_gen_full %3.7f",
stats().time_until_cms_gen_full());
gclog_or_tty->print_cr("free="SIZE_FORMAT, _cmsGen->free());
gclog_or_tty->print_cr("contiguous_available="SIZE_FORMAT,
_cmsGen->contiguous_available());
gclog_or_tty->print_cr("promotion_rate=%g", stats().promotion_rate());
gclog_or_tty->print_cr("cms_allocation_rate=%g", stats().cms_allocation_rate());
gclog_or_tty->print_cr("occupancy=%3.7f", _cmsGen->occupancy());
gclog_or_tty->print_cr("initiatingOccupancy=%3.7f", _cmsGen->initiating_occupancy());
gclog_or_tty->print_cr("metadata initialized %d",
MetaspaceGC::should_concurrent_collect());
}
// ------------------------------------------------------------------
// 第二種觸發(fā)情況
// If the estimated time to complete a cms collection (cms_duration())
// is less than the estimated time remaining until the cms generation
// is full, start a collection.
if (!UseCMSInitiatingOccupancyOnly) {
if (stats().valid()) {
if (stats().time_until_cms_start() == 0.0) {
return true;
}
} else {
// We want to conservatively collect somewhat early in order
// to try and "bootstrap" our CMS/promotion statistics;
// this branch will not fire after the first successful CMS
// collection because the stats should then be valid.
if (_cmsGen->occupancy() >= _bootstrap_occupancy) {
if (Verbose && PrintGCDetails) {
gclog_or_tty->print_cr(
" CMSCollector: collect for bootstrapping statistics:"
" occupancy = %f, boot occupancy = %f", _cmsGen->occupancy(),
_bootstrap_occupancy);
}
return true;
}
}
}
// 第三種觸發(fā)情況
// Otherwise, we start a collection cycle if
// old gen want a collection cycle started. Each may use
// an appropriate criterion for making this decision.
// XXX We need to make sure that the gen expansion
// criterion dovetails well with this. XXX NEED TO FIX THIS
if (_cmsGen->should_concurrent_collect()) {
if (Verbose && PrintGCDetails) {
gclog_or_tty->print_cr("CMS old gen initiated");
}
return true;
}
// 第四種觸發(fā)情況
// We start a collection if we believe an incremental collection may fail;
// this is not likely to be productive in practice because it's probably too
// late anyway.
GenCollectedHeap* gch = GenCollectedHeap::heap();
assert(gch->collector_policy()->is_two_generation_policy(),
"You may want to check the correctness of the following");
if (gch->incremental_collection_will_fail(true /* consult_young */)) {
if (Verbose && PrintGCDetails) {
gclog_or_tty->print("CMSCollector: collect because incremental collection will fail ");
}
return true;
}
// 第五種觸發(fā)情況
if (MetaspaceGC::should_concurrent_collect()) {
if (Verbose && PrintGCDetails) {
gclog_or_tty->print("CMSCollector: collect for metadata allocation ");
}
return true;
}
return false;
}
上述代碼可知,從大類上分 background collector 一共有 5 種觸發(fā)情況:
- 是否是并行 Full GC
指的是在 GC cause 是 _gc_locker 且配置了 GCLockerInvokesConcurrent 參數(shù), 或者 GC cause 是_java_lang_system_gc(就是 System.gc()調(diào)用)and 且配置了 ExplicitGCInvokesConcurrent 參數(shù),這是會(huì)觸發(fā)一次 background collector。
- 根據(jù)統(tǒng)計(jì)數(shù)據(jù)動(dòng)態(tài)計(jì)算(僅未配置 UseCMSInitiatingOccupancyOnly 時(shí))
未配置 UseCMSInitiatingOccupancyOnly 時(shí),會(huì)根據(jù)統(tǒng)計(jì)數(shù)據(jù)動(dòng)態(tài)判斷是否需要進(jìn)行一次 CMS GC。
判斷邏輯是,如果預(yù)測(cè) CMS GC 完成所需要的時(shí)間大于預(yù)計(jì)的老年代將要填滿的時(shí)間,則進(jìn)行 GC。
這些判斷是需要基于歷史的 CMS GC 指標(biāo),然而,第一次 CMS GC 時(shí),統(tǒng)計(jì)數(shù)據(jù)還沒(méi)有形成是無(wú)效的,這時(shí)會(huì)跟據(jù) Old Gen 的使用占比來(lái)進(jìn)行判斷是否要進(jìn)行 GC。
if (!UseCMSInitiatingOccupancyOnly) {
if (stats().valid()) {
if (stats().time_until_cms_start() == 0.0) {
return true;
}
} else {
// We want to conservatively collect somewhat early in order
// to try and "bootstrap" our CMS/promotion statistics;
// this branch will not fire after the first successful CMS
// collection because the stats should then be valid.
if (_cmsGen->occupancy() >= _bootstrap_occupancy) {
if (Verbose && PrintGCDetails) {
gclog_or_tty->print_cr(
" CMSCollector: collect for bootstrapping statistics:"
" occupancy = %f, boot occupancy = %f", _cmsGen->occupancy(),
_bootstrap_occupancy);
}
return true;
}
}
}
那占多少比率,開(kāi)始回收呢?(也就是 _bootstrap_occupancy 的值是多少呢?)
答案是 50%。或許你已經(jīng)遇到過(guò)類似案例,在沒(méi)有配置 UseCMSInitiatingOccupancyOnly 時(shí),發(fā)現(xiàn)老年代占比到 50% 就進(jìn)行了一次 CMS GC,當(dāng)時(shí)的你或許還一頭霧水呢。
_bootstrap_occupancy = ((double)CMSBootstrapOccupancy)/(double)100;
//參數(shù)默認(rèn)值
product(uintx, CMSBootstrapOccupancy, 50,
"Percentage CMS generation occupancy at which to initiate CMS collection for bootstrapping collection stats")
- 根據(jù) Old Gen 情況判斷
bool ConcurrentMarkSweepGeneration::should_concurrent_collect() const {
assert_lock_strong(freelistLock());
if (occupancy() > initiating_occupancy()) {
if (PrintGCDetails && Verbose) {
gclog_or_tty->print(" %s: collect because of occupancy %f / %f ",
short_name(), occupancy(), initiating_occupancy());
}
return true;
}
if (UseCMSInitiatingOccupancyOnly) {
return false;
}
if (expansion_cause() == CMSExpansionCause::_satisfy_allocation) {
if (PrintGCDetails && Verbose) {
gclog_or_tty->print(" %s: collect because expanded for allocation ",
short_name());
}
return true;
}
if (_cmsSpace->should_concurrent_collect()) {
if (PrintGCDetails && Verbose) {
gclog_or_tty->print(" %s: collect because cmsSpace says so ",
short_name());
}
return true;
}
return false;
}
從源碼上看,這里主要分成兩類:
(1) Old Gen 空間使用占比情況與閾值比較,如果大于閾值則進(jìn)行 CMS GC
"occupancy() > initiating_occupancy()",occupancy 毫無(wú)疑問(wèn)是 Old Gen 當(dāng)前空間的使用占比,而 initiating_occupancy 是多少呢?
_cmsGen ->init_initiating_occupancy(CMSInitiatingOccupancyFraction, CMSTriggerRatio);
...
void ConcurrentMarkSweepGeneration::init_initiating_occupancy(intx io, uintx tr) {
assert(io <= 100 && tr <= 100, "Check the arguments");
if (io >= 0) {
_initiating_occupancy = (double)io / 100.0;
} else {
_initiating_occupancy = ((100 - MinHeapFreeRatio) +
(double)(tr * MinHeapFreeRatio) / 100.0)
/ 100.0;
}
}
可以看到當(dāng) CMSInitiatingOccupancyFraction 參數(shù)配置值大于 0,就是 “io / 100.0”;
當(dāng) CMSInitiatingOccupancyFraction 參數(shù)配置值小于 0 時(shí)(注意,默認(rèn)是 -1),是 “((100 - MinHeapFreeRatio) + (double)(tr * MinHeapFreeRatio) / 100.0) / 100.0”,這到底是多少呢?
是 92%,這里就不貼出具體的計(jì)算過(guò)程了,或許你已經(jīng)在某些書(shū)或者博客中了解過(guò),CMSInitiatingOccupancyFraction 沒(méi)有配置,就是 92,但是其實(shí) CMSInitiatingOccupancyFraction 沒(méi)有配置是 -1,所以閾值取后者 92%,并不是 CMSInitiatingOccupancyFraction 的值是 92。
(2) 接下來(lái)沒(méi)有配置 UseCMSInitiatingOccupancyOnly 的情況
這里也分成有兩小類情況:
a. Old Gen 剛因?yàn)閷?duì)象分配空間而進(jìn)行擴(kuò)容,且成功分配空間,這時(shí)會(huì)考慮進(jìn)行一次 CMS GC;
b. 根據(jù) CMS Gen 空閑鏈判斷,這里有點(diǎn)復(fù)雜,目前也沒(méi)整清楚,好在按照默認(rèn)配置其實(shí)這里返回的是 false,所以默認(rèn)是不用考慮這種觸發(fā)條件了。
- 根據(jù)增量 GC 是否可能會(huì)失敗(悲觀策略)
什么意思呢??jī)纱?GC 體系中,主要指的是 Young GC 是否會(huì)失敗。如果 Young GC 已經(jīng)失敗或者可能會(huì)失敗,JVM 就認(rèn)為需要進(jìn)行一次 CMS GC。
bool incremental_collection_will_fail(bool consult_young) {
// Assumes a 2-generation system; the first disjunct remembers if an
// incremental collection failed, even when we thought (second disjunct)
// that it would not.
assert(heap()->collector_policy()->is_two_generation_policy(),
"the following definition may not be suitable for an n(>2)-generation system");
return incremental_collection_failed() ||
(consult_young && !get_gen(0)->collection_attempt_is_safe());
}
我們看兩個(gè)判斷條件,“incremental_collection_failed()” 和 “!get_gen(0)->collection_attempt_is_safe()”
incremental_collection_failed() 這里指的是 Young GC 已經(jīng)失敗,至于為什么會(huì)失敗一般是因?yàn)?Old Gen 沒(méi)有足夠的空間來(lái)容納晉升的對(duì)象。
!get_gen(0)->collection_attempt_is_safe() 指的是新生代晉升是否安全。
通過(guò)判斷當(dāng)前 Old Gen 剩余的空間大小是否足夠容納 Young GC 晉升的對(duì)象大小。
Young GC 到底要晉升多少是無(wú)法提前知道的,因此,這里通過(guò)統(tǒng)計(jì)平均每次 Young GC 晉升的大小和當(dāng)前 Young GC 可能晉升的最大大小來(lái)進(jìn)行比較。
//av_promo 是平均每次 YoungGC 晉升的大小,max_promotion_in_bytes 是當(dāng)前可能的最大晉升大小( eden+from 當(dāng)前使用空間的大小)
bool res = (available >= av_promo) || (available >= max_promotion_in_bytes);
- 根據(jù) meta space 情況判斷
這里主要看 metaspace 的 _should_concurrent_collect 標(biāo)志,這個(gè)標(biāo)志在 meta space 進(jìn)行擴(kuò)容前如果配置了 CMSClassUnloadingEnabled 參數(shù)時(shí),會(huì)進(jìn)行設(shè)置。
這種情況下就會(huì)進(jìn)行一次 CMS GC。因此經(jīng)常會(huì)有應(yīng)用啟動(dòng)不久,Old Gen 空間占比還很小的情況下,進(jìn)行了一次 CMS GC,讓你很莫名其妙,其實(shí)就是這個(gè)原因?qū)е碌摹?/p>
總結(jié)
本文梳理了 CMS GC 的 foreground collector 和 background collector 的觸發(fā)條件,foreground collector 的觸發(fā)條件相對(duì)來(lái)說(shuō)比較簡(jiǎn)單,而 background collector 的觸發(fā)條件比較多,分成 5 大種情況,各大種情況種還有一些小的觸發(fā)分支。尤其是在沒(méi)有配置 UseCMSInitiatingOccupancyOnly 參數(shù)的情況下,會(huì)多出很多種觸發(fā)可能,一般在生產(chǎn)環(huán)境是強(qiáng)烈建議配置 UseCMSInitiatingOccupancyOnly 參數(shù),以便于能夠比較確定的執(zhí)行 CMS GC,另外,也方便排查 GC 原因。
個(gè)人微信公共號(hào),感興趣的關(guān)注下,獲取更多技術(shù)文章