JVM 源碼解讀之 CMS GC 觸發(fā)條件

簡(jiǎn)書(shū) 滌生
轉(zhuǎn)載請(qǐng)注明原創(chuàng)出處,謝謝!
如果讀完覺(jué)得有收獲的話,歡迎點(diǎn)贊加關(guān)注。

前言

經(jīng)常有同學(xué)會(huì)問(wèn),為啥我的應(yīng)用 Old Gen 沒(méi)到 CMSInitiatingOccupancyFraction 參數(shù)配置的閾值,就觸發(fā)了 CMS GC,表示很莫名奇妙,不知道問(wèn)題出在哪?

其實(shí) CMS GC 的觸發(fā)條件非常多,不只是 CMSInitiatingOccupancyFraction 閾值觸發(fā)這么簡(jiǎn)單。本文通過(guò)源碼全面梳理了觸發(fā) CMS GC 的條件,盡可能的幫你了解平時(shí)遇到的奇奇怪怪的 CMS GC 問(wèn)題。

先拋出一些問(wèn)題,來(lái)吸引你的注意力。

為什么 Old Gen 使用占比僅 50% 就進(jìn)行了一次 CMS GC?
Metaspace 的使用也會(huì)觸發(fā) CMS GC 嗎?
為什么 Old Gen 使用占比非常小就進(jìn)行了一次 CMS GC?

觸發(fā)條件

CMS GC 在實(shí)現(xiàn)上分成 foreground collector 和 background collector。foreground collector 相對(duì)比較簡(jiǎn)單,background collector 比較復(fù)雜,情況比較多。

下面我們從 foreground collector 和 background collector 分別來(lái)說(shuō)明他們的觸發(fā)條件:

說(shuō)明:本文內(nèi)容是基于 JDK 8
說(shuō)明:本文僅涉及 CMS GC 的觸發(fā)條件,至于算法的具體過(guò)程,以及什么時(shí)候進(jìn)行 MSC(mark sweep compact)不在本文范圍

foreground collector

foreground collector 觸發(fā)條件比較簡(jiǎn)單,一般是遇到對(duì)象分配但空間不夠,就會(huì)直接觸發(fā) GC,來(lái)立即進(jìn)行空間回收。采用的算法是 mark sweep,不壓縮。

background collector

說(shuō)明 background collector 的觸發(fā)條件之前,先來(lái)說(shuō)下 background collector 的流程,它是通過(guò) CMS 后臺(tái)線程不斷的去掃描,過(guò)程中主要是判斷是否符合 background collector 的觸發(fā)條件,一旦有符合的情況,就會(huì)進(jìn)行一次 background 的 collect。

void ConcurrentMarkSweepThread::run() {
  ...//省略
  while (!_should_terminate) {
    sleepBeforeNextCycle();
    if (_should_terminate) break;
    GCCause::Cause cause = _collector->_full_gc_requested ?
      _collector->_full_gc_cause : GCCause::_cms_concurrent_mark;
    _collector->collect_in_background(false, cause);
  }
  ...//省略
}

每次掃描過(guò)程中,先等 CMSWaitDuration 時(shí)間,然后再去進(jìn)行一次 shouldConcurrentCollect 判斷,看是否滿足 CMS background collector 的觸發(fā)條件。CMSWaitDuration 默認(rèn)時(shí)間是 2s(經(jīng)常會(huì)有業(yè)務(wù)遇到頻繁的 CMS GC,注意看每次 CMS GC 之間的時(shí)間間隔,如果是 2s,那基本就可以斷定是 CMS 的 background collector)。

void ConcurrentMarkSweepThread::sleepBeforeNextCycle() {
  while (!_should_terminate) {
    if (CMSIncrementalMode) {
      icms_wait();
      if(CMSWaitDuration >= 0) {
        // Wait until the next synchronous GC, a concurrent full gc
        // request or a timeout, whichever is earlier.
        wait_on_cms_lock_for_scavenge(CMSWaitDuration);
      }
      return;
    } else {
      if(CMSWaitDuration >= 0) {
        // Wait until the next synchronous GC, a concurrent full gc
        // request or a timeout, whichever is earlier.
        wait_on_cms_lock_for_scavenge(CMSWaitDuration);
      } else {
        // Wait until any cms_lock event or check interval not to call shouldConcurrentCollect permanently
        wait_on_cms_lock(CMSCheckInterval);
      }
    }
    // Check if we should start a CMS collection cycle
    if (_collector->shouldConcurrentCollect()) {
      return;
    }
    // .. collection criterion not yet met, let's go back
    // and wait some more
  }
}

那 shouldConcurrentCollect() 方法中都有哪些條件呢?

bool CMSCollector::shouldConcurrentCollect() {

  // 第一種觸發(fā)情況
  if (_full_gc_requested) {
    if (Verbose && PrintGCDetails) {
      gclog_or_tty->print_cr("CMSCollector: collect because of explicit "
                             " gc request (or gc_locker)");
    }
    return true;
  }
  
  // For debugging purposes, change the type of collection.
  // If the rotation is not on the concurrent collection
  // type, don't start a concurrent collection.
  NOT_PRODUCT(
    if (RotateCMSCollectionTypes &&
        (_cmsGen->debug_collection_type() !=
          ConcurrentMarkSweepGeneration::Concurrent_collection_type)) {
      assert(_cmsGen->debug_collection_type() !=
        ConcurrentMarkSweepGeneration::Unknown_collection_type,
        "Bad cms collection type");
      return false;
    }
  )
  FreelistLocker x(this);
  // ------------------------------------------------------------------
  // Print out lots of information which affects the initiation of
  // a collection.
  if (PrintCMSInitiationStatistics && stats().valid()) {
    gclog_or_tty->print("CMSCollector shouldConcurrentCollect: ");
    gclog_or_tty->stamp();
    gclog_or_tty->print_cr("");
    stats().print_on(gclog_or_tty);
    gclog_or_tty->print_cr("time_until_cms_gen_full %3.7f",
      stats().time_until_cms_gen_full());
    gclog_or_tty->print_cr("free="SIZE_FORMAT, _cmsGen->free());
    gclog_or_tty->print_cr("contiguous_available="SIZE_FORMAT,
                           _cmsGen->contiguous_available());
    gclog_or_tty->print_cr("promotion_rate=%g", stats().promotion_rate());
    gclog_or_tty->print_cr("cms_allocation_rate=%g", stats().cms_allocation_rate());
    gclog_or_tty->print_cr("occupancy=%3.7f", _cmsGen->occupancy());
    gclog_or_tty->print_cr("initiatingOccupancy=%3.7f", _cmsGen->initiating_occupancy());
    gclog_or_tty->print_cr("metadata initialized %d",
      MetaspaceGC::should_concurrent_collect());
  }
  // ------------------------------------------------------------------
  
  // 第二種觸發(fā)情況
  // If the estimated time to complete a cms collection (cms_duration())
  // is less than the estimated time remaining until the cms generation
  // is full, start a collection.
  if (!UseCMSInitiatingOccupancyOnly) {
    if (stats().valid()) {
      if (stats().time_until_cms_start() == 0.0) {
        return true;
      }
    } else {
      // We want to conservatively collect somewhat early in order
      // to try and "bootstrap" our CMS/promotion statistics;
      // this branch will not fire after the first successful CMS
      // collection because the stats should then be valid.
      if (_cmsGen->occupancy() >= _bootstrap_occupancy) {
        if (Verbose && PrintGCDetails) {
          gclog_or_tty->print_cr(
            " CMSCollector: collect for bootstrapping statistics:"
            " occupancy = %f, boot occupancy = %f", _cmsGen->occupancy(),
            _bootstrap_occupancy);
        }
        return true;
      }
    }
  }
  
  // 第三種觸發(fā)情況
  // Otherwise, we start a collection cycle if
  // old gen want a collection cycle started. Each may use
  // an appropriate criterion for making this decision.
  // XXX We need to make sure that the gen expansion
  // criterion dovetails well with this. XXX NEED TO FIX THIS
  if (_cmsGen->should_concurrent_collect()) {
    if (Verbose && PrintGCDetails) {
      gclog_or_tty->print_cr("CMS old gen initiated");
    }
    return true;
  }

  // 第四種觸發(fā)情況
  // We start a collection if we believe an incremental collection may fail;
  // this is not likely to be productive in practice because it's probably too
  // late anyway.
  GenCollectedHeap* gch = GenCollectedHeap::heap();
  assert(gch->collector_policy()->is_two_generation_policy(),
         "You may want to check the correctness of the following");
  if (gch->incremental_collection_will_fail(true /* consult_young */)) {
    if (Verbose && PrintGCDetails) {
      gclog_or_tty->print("CMSCollector: collect because incremental collection will fail ");
    }
    return true;
  }
  
  // 第五種觸發(fā)情況
  if (MetaspaceGC::should_concurrent_collect()) {
      if (Verbose && PrintGCDetails) {
      gclog_or_tty->print("CMSCollector: collect for metadata allocation ");
      }
      return true;
    }

  return false;
}

上述代碼可知,從大類上分 background collector 一共有 5 種觸發(fā)情況:

  • 是否是并行 Full GC

指的是在 GC cause 是 _gc_locker 且配置了 GCLockerInvokesConcurrent 參數(shù), 或者 GC cause 是_java_lang_system_gc(就是 System.gc()調(diào)用)and 且配置了 ExplicitGCInvokesConcurrent 參數(shù),這是會(huì)觸發(fā)一次 background collector。

  • 根據(jù)統(tǒng)計(jì)數(shù)據(jù)動(dòng)態(tài)計(jì)算(僅未配置 UseCMSInitiatingOccupancyOnly 時(shí))

未配置 UseCMSInitiatingOccupancyOnly 時(shí),會(huì)根據(jù)統(tǒng)計(jì)數(shù)據(jù)動(dòng)態(tài)判斷是否需要進(jìn)行一次 CMS GC。

判斷邏輯是,如果預(yù)測(cè) CMS GC 完成所需要的時(shí)間大于預(yù)計(jì)的老年代將要填滿的時(shí)間,則進(jìn)行 GC。
這些判斷是需要基于歷史的 CMS GC 指標(biāo),然而,第一次 CMS GC 時(shí),統(tǒng)計(jì)數(shù)據(jù)還沒(méi)有形成是無(wú)效的,這時(shí)會(huì)跟據(jù) Old Gen 的使用占比來(lái)進(jìn)行判斷是否要進(jìn)行 GC。

if (!UseCMSInitiatingOccupancyOnly) {
    if (stats().valid()) {
      if (stats().time_until_cms_start() == 0.0) {
        return true;
      }
    } else {
      // We want to conservatively collect somewhat early in order
      // to try and "bootstrap" our CMS/promotion statistics;
      // this branch will not fire after the first successful CMS
      // collection because the stats should then be valid.
      if (_cmsGen->occupancy() >= _bootstrap_occupancy) {
        if (Verbose && PrintGCDetails) {
          gclog_or_tty->print_cr(
            " CMSCollector: collect for bootstrapping statistics:"
            " occupancy = %f, boot occupancy = %f", _cmsGen->occupancy(),
            _bootstrap_occupancy);
        }
        return true;
      }
    }
  }

那占多少比率,開(kāi)始回收呢?(也就是 _bootstrap_occupancy 的值是多少呢?)
答案是 50%。或許你已經(jīng)遇到過(guò)類似案例,在沒(méi)有配置 UseCMSInitiatingOccupancyOnly 時(shí),發(fā)現(xiàn)老年代占比到 50% 就進(jìn)行了一次 CMS GC,當(dāng)時(shí)的你或許還一頭霧水呢。

 _bootstrap_occupancy = ((double)CMSBootstrapOccupancy)/(double)100;
 //參數(shù)默認(rèn)值
 product(uintx, CMSBootstrapOccupancy, 50,
          "Percentage CMS generation occupancy at which to initiate CMS collection for bootstrapping collection stats")  
  • 根據(jù) Old Gen 情況判斷
bool ConcurrentMarkSweepGeneration::should_concurrent_collect() const {
  assert_lock_strong(freelistLock());
  if (occupancy() > initiating_occupancy()) {
    if (PrintGCDetails && Verbose) {
      gclog_or_tty->print(" %s: collect because of occupancy %f / %f  ",
        short_name(), occupancy(), initiating_occupancy());
    }
    return true;
  }
  if (UseCMSInitiatingOccupancyOnly) {
    return false;
  }
  if (expansion_cause() == CMSExpansionCause::_satisfy_allocation) {
    if (PrintGCDetails && Verbose) {
      gclog_or_tty->print(" %s: collect because expanded for allocation ",
        short_name());
    }
    return true;
  }
  if (_cmsSpace->should_concurrent_collect()) {
    if (PrintGCDetails && Verbose) {
      gclog_or_tty->print(" %s: collect because cmsSpace says so ",
        short_name());
    }
    return true;
  }
  return false;
}

從源碼上看,這里主要分成兩類:

(1) Old Gen 空間使用占比情況與閾值比較,如果大于閾值則進(jìn)行 CMS GC

"occupancy() > initiating_occupancy()",occupancy 毫無(wú)疑問(wèn)是 Old Gen 當(dāng)前空間的使用占比,而 initiating_occupancy 是多少呢?

_cmsGen ->init_initiating_occupancy(CMSInitiatingOccupancyFraction, CMSTriggerRatio);
...
void ConcurrentMarkSweepGeneration::init_initiating_occupancy(intx io, uintx tr) {
 assert(io <= 100 && tr <= 100, "Check the arguments");
 if (io >= 0) {
   _initiating_occupancy = (double)io / 100.0;
 } else {
   _initiating_occupancy = ((100 - MinHeapFreeRatio) +
                            (double)(tr * MinHeapFreeRatio) / 100.0)
                           / 100.0;
 }
}

可以看到當(dāng) CMSInitiatingOccupancyFraction 參數(shù)配置值大于 0,就是 “io / 100.0”;

當(dāng) CMSInitiatingOccupancyFraction 參數(shù)配置值小于 0 時(shí)(注意,默認(rèn)是 -1),是 “((100 - MinHeapFreeRatio) + (double)(tr * MinHeapFreeRatio) / 100.0) / 100.0”,這到底是多少呢?
是 92%,這里就不貼出具體的計(jì)算過(guò)程了,或許你已經(jīng)在某些書(shū)或者博客中了解過(guò),CMSInitiatingOccupancyFraction 沒(méi)有配置,就是 92,但是其實(shí) CMSInitiatingOccupancyFraction 沒(méi)有配置是 -1,所以閾值取后者 92%,并不是 CMSInitiatingOccupancyFraction 的值是 92。

(2) 接下來(lái)沒(méi)有配置 UseCMSInitiatingOccupancyOnly 的情況

這里也分成有兩小類情況:
a. Old Gen 剛因?yàn)閷?duì)象分配空間而進(jìn)行擴(kuò)容,且成功分配空間,這時(shí)會(huì)考慮進(jìn)行一次 CMS GC;
b. 根據(jù) CMS Gen 空閑鏈判斷,這里有點(diǎn)復(fù)雜,目前也沒(méi)整清楚,好在按照默認(rèn)配置其實(shí)這里返回的是 false,所以默認(rèn)是不用考慮這種觸發(fā)條件了。

  • 根據(jù)增量 GC 是否可能會(huì)失敗(悲觀策略)

什么意思呢??jī)纱?GC 體系中,主要指的是 Young GC 是否會(huì)失敗。如果 Young GC 已經(jīng)失敗或者可能會(huì)失敗,JVM 就認(rèn)為需要進(jìn)行一次 CMS GC。

  bool incremental_collection_will_fail(bool consult_young) {
    // Assumes a 2-generation system; the first disjunct remembers if an
    // incremental collection failed, even when we thought (second disjunct)
    // that it would not.
    assert(heap()->collector_policy()->is_two_generation_policy(),
           "the following definition may not be suitable for an n(>2)-generation system");
    return incremental_collection_failed() ||
           (consult_young && !get_gen(0)->collection_attempt_is_safe());
  }

我們看兩個(gè)判斷條件,“incremental_collection_failed()” 和 “!get_gen(0)->collection_attempt_is_safe()”
incremental_collection_failed() 這里指的是 Young GC 已經(jīng)失敗,至于為什么會(huì)失敗一般是因?yàn)?Old Gen 沒(méi)有足夠的空間來(lái)容納晉升的對(duì)象。

!get_gen(0)->collection_attempt_is_safe() 指的是新生代晉升是否安全。
通過(guò)判斷當(dāng)前 Old Gen 剩余的空間大小是否足夠容納 Young GC 晉升的對(duì)象大小。
Young GC 到底要晉升多少是無(wú)法提前知道的,因此,這里通過(guò)統(tǒng)計(jì)平均每次 Young GC 晉升的大小和當(dāng)前 Young GC 可能晉升的最大大小來(lái)進(jìn)行比較。

//av_promo 是平均每次 YoungGC 晉升的大小,max_promotion_in_bytes 是當(dāng)前可能的最大晉升大小( eden+from 當(dāng)前使用空間的大小)
bool   res = (available >= av_promo) || (available >= max_promotion_in_bytes);
  • 根據(jù) meta space 情況判斷

這里主要看 metaspace 的 _should_concurrent_collect 標(biāo)志,這個(gè)標(biāo)志在 meta space 進(jìn)行擴(kuò)容前如果配置了 CMSClassUnloadingEnabled 參數(shù)時(shí),會(huì)進(jìn)行設(shè)置。
這種情況下就會(huì)進(jìn)行一次 CMS GC。因此經(jīng)常會(huì)有應(yīng)用啟動(dòng)不久,Old Gen 空間占比還很小的情況下,進(jìn)行了一次 CMS GC,讓你很莫名其妙,其實(shí)就是這個(gè)原因?qū)е碌摹?/p>

總結(jié)

本文梳理了 CMS GC 的 foreground collector 和 background collector 的觸發(fā)條件,foreground collector 的觸發(fā)條件相對(duì)來(lái)說(shuō)比較簡(jiǎn)單,而 background collector 的觸發(fā)條件比較多,分成 5 大種情況,各大種情況種還有一些小的觸發(fā)分支。尤其是在沒(méi)有配置 UseCMSInitiatingOccupancyOnly 參數(shù)的情況下,會(huì)多出很多種觸發(fā)可能,一般在生產(chǎn)環(huán)境是強(qiáng)烈建議配置 UseCMSInitiatingOccupancyOnly 參數(shù),以便于能夠比較確定的執(zhí)行 CMS GC,另外,也方便排查 GC 原因。


個(gè)人微信公共號(hào),感興趣的關(guān)注下,獲取更多技術(shù)文章

滌生-微信公共號(hào)
最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書(shū)系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。