ZGC源碼分析(3)- ZGC觸發的時機

ZGC以被動回收為主,即由后臺線程控制何時啟動垃圾回收。 ZGC的觸發時機在

jdk11/src/hotspot/share/gc/z/zDirector.cpp

ZDirector決定是否要執行GC的后臺線程

線程出發的條件是根據一個自定義的時鐘。

void ZDirector::run_service() {
  // Main loop
  while (_metronome.wait_for_tick()) {
    sample_allocation_rate();
    const GCCause::Cause cause = make_gc_decision();
    if (cause != GCCause::_no_gc) {
      ZCollectedHeap::heap()->collect(cause);
    }
  }
}
GCCause::Cause ZDirector::make_gc_decision() const {
  // Rule 0: Timer
  if (rule_timer()) {
    return GCCause::_z_timer;
  }

  // Rule 1: Warmup
  if (rule_warmup()) {
    return GCCause::_z_warmup;
  }

  // Rule 2: Allocation rate
  if (rule_allocation_rate()) {
    return GCCause::_z_allocation_rate;
  }

  // Rule 3: Proactive
  if (rule_proactive()) {
    return GCCause::_z_proactive;
  }

  // No GC
  return GCCause::_no_gc;
}

目前有4種情況可以自動的觸發ZGC.

1,基于一個固定時間觸發

這個時間可以通過ZCollectionInterval,來控制,缺省值為0,表示不需要。

  product(uint, ZCollectionInterval, 0,                                     \
          "Force GC at a fixed time interval (in seconds)")                 \
bool ZDirector::rule_timer() const {
  if (ZCollectionInterval == 0)      return false;

  // Perform GC if timer has expired.
  const double time_since_last_gc = ZStatCycle::time_since_last();
  const double time_until_gc = ZCollectionInterval - time_since_last_gc;

  return time_until_gc <= 0;
}

2,預熱規則觸發

指的是當Hotspot剛啟動時,當發現heap使用率達到整個堆的10/20/30%,并且其他類型的GC都還沒執行時,會主動地觸發GC。當其他類型的GC出發后,會判斷是否還需要預熱,如果需要繼續執行,不需要則不再執行。預熱的的條件是 GC發生的次數不超過3次。

bool ZDirector::is_warm() const {
  return ZStatCycle::ncycles() >= 3;
}

bool ZDirector::rule_warmup() const {
  if (is_warm()) {
    // Rule disabled
    return false;
  }

  // Perform GC if heap usage passes 10/20/30% and no other GC has been
  // performed yet. This allows us to get some early samples of the GC
  // duration, which is needed by the other rules.
  const size_t max_capacity = ZHeap::heap()->current_max_capacity();
  const size_t used = ZHeap::heap()->used();
  const double used_threshold_percent = (ZStatCycle::ncycles() + 1) * 0.1;
  const size_t used_threshold = max_capacity * used_threshold_percent;

  log_debug(gc, director)("Rule: Warmup %.0f%%, Used: " SIZE_FORMAT "MB, UsedThreshold: " SIZE_FORMAT "MB",
                          used_threshold_percent * 100, used / M, used_threshold / M);

  return used >= used_threshold;
}

3,根據分配速率

在這里使用到正態分布,我們在代碼里面能看到兩個相關的應用:根據內存分配的情況估算內存被消耗完還可能要多長時間;根據垃圾回收的時間估算進行一次垃圾回收的時間。

在G1中我們介紹到垃圾回收時間的估算使用的是衰減平均(Decaying Average),它是一種簡單的數學方法,用來計算一個數列的平均,核心是給近期的數據更高的權重,即強調近期數據對結果的影響。衰減平均計算公式如下:

image.png

式中 ɑ 為歷史數據權值,1?ɑ 為最近一次數據權值。即 ɑ 越小,最新的數據對結果影響越大,最近一次的數據對結果影響最大。不難看出,其實傳統的平均就是 ɑ 取值為 (n?1)/n 的情況。具體可以參考《JVM G1源碼分析和調優》

ZGC中主要是基于正態分布來估算,學過概率論的同學大都知道這一概念。為了讀懂這段代碼,我們先來回顧一下正態分布。首先它是一條中間高,兩端逐漸下降且完全對稱的鐘形曲線。圖形形狀為:

image.png

正態分布也非常容易理解,它指的大多數數據應該集中在中間附近,少數異常的情況才會落在兩端。
對于垃圾回收算法中的數據:內存的消耗時間,垃圾回收的時間也應該符合這樣的分布。注意,并不是說G1中的停頓預測模型不正確或者效果不好;而是說使用正態分布來做預測有更強的數學理論支撐。在使用中ZGC還是對這個數學模型做了一些改變。
通常使用N表示正態分布,假設X符合均值為μ方差為σ2的分布,做數學變換令Y = (X - μ)/ σ則它符合N(0, 1)分布。如下所示:


image.png

假設已知內存分配的時間符合正態分布,我們可以獲得抽樣數據,從而估算出內存分配所需時間的均值和方差。這個均值和方差是我們基于樣本數據估算得到的,它們可能和實際真實的均值和方差有一定的誤差。所以如果我們直接使用這個均值和方差可能由樣本數據波動導致不準確,所以在概率論中引入了置信度和置信區間。簡單的說置信區間指的是這個參數估計的一段區間,它是這個參數的真實值有一定概率落在測量結果的周圍的程度。而置信度指的是就是這個概率。

假定給定一個內存分配花費的時間X1, X2, …, Xn我們想要知道在99.9%的情況下內存分配花費的時間。點估計量符合:

image.png

其中μ為樣本均值,σ樣本標準差。
對應99.9%置信度,查標準正態分布表得到統計量為3.290527。

image.png

由此可以得到置信區間為
image.png

。所以可以得到最大的內存消耗在滿足99.9%的情況下不會超過
image.png

。在ZGC中對這個公司又做了一點修改,實際上是把這個值變得更大:
image.png

。Tolerance缺省值為2,這樣的結果使得置信度更高,即遠大于99.9%。同理對于垃圾回收的時間也類似處理。理解了置信區間和置信度下面的代碼非常簡單。
bool ZDirector::rule_allocation_rate() const {
  if (is_first()) {
    // Rule disabled
    return false;
  }

  // Perform GC if the estimated max allocation rate indicates that we
  // will run out of memory. The estimated max allocation rate is based
  // on the moving average of the sampled allocation rate plus a safety
  // margin based on variations in the allocation rate and unforeseen
  // allocation spikes.

  // Calculate amount of free memory available to Java threads. Note that
  // the heap reserve is not available to Java threads and is therefore not
  // considered part of the free memory.
  const size_t max_capacity = ZHeap::heap()->current_max_capacity();
  const size_t max_reserve = ZHeap::heap()->max_reserve();
  const size_t used = ZHeap::heap()->used();
  const size_t free_with_reserve = max_capacity - used;
  const size_t free = free_with_reserve - MIN2(free_with_reserve, max_reserve);

  // Calculate time until OOM given the max allocation rate and the amount
  // of free memory. The allocation rate is a moving average and we multiply
  // that with an allocation spike tolerance factor to guard against unforeseen
  // phase changes in the allocate rate. We then add ~3.3 sigma to account for
  // the allocation rate variance, which means the probability is 1 in 1000
  // that a sample is outside of the confidence interval.
  const double max_alloc_rate = (ZStatAllocRate::avg() * ZAllocationSpikeTolerance) + (ZStatAllocRate::avg_sd() * one_in_1000);
  const double time_until_oom = free / (max_alloc_rate + 1.0); // Plus 1.0B/s to avoid division by zero

  // Calculate max duration of a GC cycle. The duration of GC is a moving
  // average, we add ~3.3 sigma to account for the GC duration variance.
  const AbsSeq& duration_of_gc = ZStatCycle::normalized_duration();
  const double max_duration_of_gc = duration_of_gc.davg() + (duration_of_gc.dsd() * one_in_1000);

  // Calculate time until GC given the time until OOM and max duration of GC.
  // We also deduct the sample interval, so that we don't overshoot the target
  // time and end up starting the GC too late in the next interval.
  const double sample_interval = 1.0 / ZStatAllocRate::sample_hz;
  const double time_until_gc = time_until_oom - max_duration_of_gc - sample_interval;

  log_debug(gc, director)("Rule: Allocation Rate, MaxAllocRate: %.3lfMB/s, Free: " SIZE_FORMAT "MB, MaxDurationOfGC: %.3lfs, TimeUntilGC: %.3lfs",
                          max_alloc_rate / M, free / M, max_duration_of_gc, time_until_gc);

  return time_until_gc <= 0;
}
ZAllocationSpikeTolerance是一個修正系數,

  product(double, ZAllocationSpikeTolerance, 2.0,                           \
          "Allocation spike tolerance factor")                              \

4,自行控制進行GC

heap距離上次GC發生后使用增長率超過10%,或者距離上次GC發生后超過5min。這個參數是彌補第三個條件中沒有覆蓋的場景,從上述分析可以得到第三個條件更多的覆蓋分配速率比較高的場景。

  diagnostic(bool, ZProactive, true,                                        \
          "Enable proactive GC cycles")                                     \
bool ZDirector::rule_proactive() const {
  if (!ZProactive || !is_warm()) {
    // Rule disabled
    return false;
  }

  // Perform GC if the impact of doing so, in terms of application throughput
  // reduction, is considered acceptable. This rule allows us to keep the heap
  // size down and allow reference processing to happen even when we have a lot
  // of free space on the heap.

  // Only consider doing a proactive GC if the heap usage has grown by at least
  // 10% of the max capacity since the previous GC, or more than 5 minutes has
  // passed since the previous GC. This helps avoid superfluous GCs when running
  // applications with very low allocation rate.
  const size_t used_after_last_gc = ZStatHeap::used_at_relocate_end();
  const size_t used_increase_threshold = ZHeap::heap()->current_max_capacity() * 0.10; // 10%
  const size_t used_threshold = used_after_last_gc + used_increase_threshold;
  const size_t used = ZHeap::heap()->used();
  const double time_since_last_gc = ZStatCycle::time_since_last();
  const double time_since_last_gc_threshold = 5 * 60; // 5 minutes
  if (used < used_threshold && time_since_last_gc < time_since_last_gc_threshold) {
    return false;
  }

  const double assumed_throughput_drop_during_gc = 0.50; // 50%
  const double acceptable_throughput_drop = 0.01;        // 1%
  const AbsSeq& duration_of_gc = ZStatCycle::normalized_duration();
  const double max_duration_of_gc = duration_of_gc.davg() + (duration_of_gc.dsd() * one_in_1000);
  const double acceptable_gc_interval = max_duration_of_gc * ((assumed_throughput_drop_during_gc / acceptable_throughput_drop) - 1.0);
  const double time_until_gc = acceptable_gc_interval - time_since_last_gc;

  return time_until_gc <= 0;
}
最后編輯于
?著作權歸作者所有,轉載或內容合作請聯系作者
平臺聲明:文章內容(如有圖片或視頻亦包括在內)由作者上傳并發布,文章內容僅代表作者本人觀點,簡書系信息發布平臺,僅提供信息存儲服務。

推薦閱讀更多精彩內容