Window TinyLFU算法

一、簡(jiǎn)介

判斷一個(gè)緩存的好壞最核心的指標(biāo)命中率、性能以及資源的占用等指標(biāo)。淘汰策略是影響緩存命中率的重要因素。一般比較簡(jiǎn)單的緩存就會(huì)直接用到 LFU(Least Frequently Used,即最不經(jīng)常使用) 或者LRU(Least Recently Used,即最近最少使用) ,而 Caffeine 使用了 W-TinyLFU 算法。

二、LRU和LFU的缺點(diǎn)

  • LRU 實(shí)現(xiàn)簡(jiǎn)單,在一般情況下能夠表現(xiàn)出很好的命中率,是一個(gè)“性價(jià)比”很高的算法,平時(shí)也很常用。雖然 LRU 對(duì)突發(fā)性的稀疏流量(sparse bursts)表現(xiàn)很好,但同時(shí)也會(huì)產(chǎn)生緩存污染,舉例來(lái)說(shuō),如果偶然性的要對(duì)全量數(shù)據(jù)進(jìn)行遍歷,那么“歷史訪問(wèn)記錄”就會(huì)被刷走,造成污染。

  • 如果數(shù)據(jù)的分布在一段時(shí)間內(nèi)是固定的話,那么 LFU 可以達(dá)到最高的命中率。但是 LFU 有兩個(gè)缺點(diǎn),第一,它需要給每個(gè)記錄項(xiàng)維護(hù)頻率信息,每次訪問(wèn)都需要更新,這是個(gè)巨大的開(kāi)銷;第二,對(duì)突發(fā)性的稀疏流量無(wú)力,因?yàn)榍捌诮?jīng)常訪問(wèn)的記錄已經(jīng)占用了緩存,偶然的流量不太可能會(huì)被保留下來(lái),而且過(guò)去的一些大量被訪問(wèn)的記錄在將來(lái)也不一定會(huì)使用上,這樣就一直把“坑”占著了。

三、TinyLFU

TinyLFU 算法是專門(mén)為了解決 LFU 上述提到的兩個(gè)問(wèn)題而被設(shè)計(jì)出來(lái)的。

  • 解決第一個(gè)問(wèn)題是采用了 Count–Min Sketch 算法。

  • 解決第二個(gè)問(wèn)題是讓記錄盡量保持相對(duì)的“新鮮”(Freshness Mechanism),并且當(dāng)有新的記錄插入時(shí),可以讓它跟老的記錄進(jìn)行“PK”,輸者就會(huì)被淘汰,這樣一些老的、不再需要的記錄就會(huì)被剔除。

1、統(tǒng)計(jì)頻率 Count–Min Sketch 算法

Caffeine 對(duì)這個(gè)算法的實(shí)現(xiàn)在FrequencySketch類。Caffeine 是用了一個(gè)一維的數(shù)組;如果是數(shù)值類型的話,這個(gè)數(shù)需要用 int 或 long 來(lái)存儲(chǔ),但是 Caffeine 認(rèn)為緩存的訪問(wèn)頻率不需要用到那么大,只需要 15 就足夠,一般認(rèn)為達(dá)到 15 次的頻率算是很高的了,而且 Caffeine 還有另外一個(gè)機(jī)制來(lái)使得這個(gè)頻率進(jìn)行衰退減半。如果最大是 15 的話,那么只需要 4 個(gè) bit 就可以滿足了,一個(gè) long 有 64bit,可以存儲(chǔ) 16 個(gè)這樣的統(tǒng)計(jì)數(shù),使得存儲(chǔ)效率提高了 16 倍.

追加方法

frequencySketch().increment(key);
//FrequencySketch的一些屬性

//種子數(shù)
static final long[] SEED = { // A mixture of seeds from FNV-1a, CityHash, and Murmur3
 0xc3a5c85c97cb3127L, 0xb492b66fbe98f273L, 0x9ae16a3b2f90404fL, 0xcbf29ce484222325L};
static final long RESET_MASK = 0x7777777777777777L;
static final long ONE_MASK = 0x1111111111111111L;

int sampleSize;
//為了快速根據(jù)hash值得到table的index值的掩碼
//table的長(zhǎng)度size一般為2的n次方,而tableMask為size-1,這樣就可以通過(guò)&操作來(lái)模擬取余操作
int tableMask;
//存儲(chǔ)數(shù)據(jù)的一維long數(shù)組
long[] table;
int size;

/**
 * Increments the popularity of the element if it does not exceed the maximum (15). The popularity
 * of all elements will be periodically down sampled when the observed events exceeds a threshold.
 * This process provides a frequency aging to allow expired long term entries to fade away.
 *
 * @param e the element to add
 */
public void increment(@NonNull E e) {
 if (isNotInitialized()) {
 return;
 }

 //根據(jù)key的hashCode通過(guò)一個(gè)哈希函數(shù)得到一個(gè)hash值
 int hash = spread(e.hashCode());
 //Caffeine把一個(gè)long的64bit劃分成16個(gè)等分,每一等分4個(gè)bit。
 //這個(gè)start就是用來(lái)定位到是哪一個(gè)等分的,用hash值低兩位作為隨機(jī)數(shù),再左移2位,得到一個(gè)小于16的值
 // start 值:0、4、8、12
 int start = (hash & 3) << 2;

 //indexOf方法的意思就是,根據(jù)hash值和不同種子得到table的下標(biāo)index
 //這里通過(guò)四個(gè)不同的種子,得到四個(gè)不同的下標(biāo)index
 int index0 = indexOf(hash, 0);
 int index1 = indexOf(hash, 1);
 int index2 = indexOf(hash, 2);
 int index3 = indexOf(hash, 3);

 //根據(jù)index和start(+1, +2, +3)的值,把table[index]對(duì)應(yīng)的等分追加1
 boolean added = incrementAt(index0, start);
 added |= incrementAt(index1, start + 1);
 added |= incrementAt(index2, start + 2);
 added |= incrementAt(index3, start + 3);

 if (added && (++size == sampleSize)) {
 reset();
 }
}

/**
 * Increments the specified counter by 1 if it is not already at the maximum value (15).
 *
 * @param i the table index (16 counters)
 * @param j the counter to increment
 * @return if incremented
 */
boolean incrementAt(int i, int j) {
 //這個(gè)j表示16個(gè)等分的下標(biāo),那么offset就是相當(dāng)于在64位中的下標(biāo) 16 * 4
 int offset = j << 2;
 //上面提到Caffeine把頻率統(tǒng)計(jì)最大定為15,即0xfL
 //mask就是在64位中的掩碼,即1111后面跟很多個(gè)0
 long mask = (0xfL << offset);
 //如果&的結(jié)果不等于15,那么就追加1。等于15就不會(huì)再加了
 if ((table[i] & mask) != mask) {
 table[i] += (1L << offset);
 return true;
 }
 return false;
}

/**
 * Returns the table index for the counter at the specified depth.
 *
 * @param item the element's hash
 * @param i the counter depth
 * @return the table index
 */
int indexOf(int item, int i) {
 long hash = SEED[i] * item;
 hash += hash >>> 32;
 return ((int) hash) & tableMask;
}

/**
 * Applies a supplemental hash function to a given hashCode, which defends against poor quality
 * hash functions.
 */
int spread(int x) {
 x = ((x >>> 16) ^ x) * 0x45d9f3b;
 x = ((x >>> 16) ^ x) * 0x45d9f3b;
 return (x >>> 16) ^ x;
}

讀取方法

@NonNegative
public int frequency(@NonNull E e) {
 if (isNotInitialized()) {
 return 0;
 }

 int hash = spread(e.hashCode());
 //得到等分的下標(biāo),跟上面一樣
 int start = (hash & 3) << 2;
 int frequency = Integer.MAX_VALUE;
 //循環(huán)四次,分別獲取在table數(shù)組中不同的下標(biāo)位置
 for (int i = 0; i < 4; i++) {
 int index = indexOf(hash, i);
 //定位到table[index] + 等分的位置,再根據(jù)mask取出計(jì)數(shù)值
 int count = (int) ((table[index] >>> ((start + i) << 2)) & 0xfL);
 //取四個(gè)中的較小值
 frequency = Math.min(frequency, count);
 }
 return frequency;
}
4.png
2、保新機(jī)制

為了讓緩存保持“新鮮”,剔除掉過(guò)往頻率很高但之后不經(jīng)常的緩存,Caffeine 有一個(gè)新鮮度機(jī)制。就是當(dāng)整體的統(tǒng)計(jì)計(jì)數(shù)達(dá)到某一個(gè)值時(shí),那么所有記錄的頻率統(tǒng)計(jì)除以 2。

//size變量就是所有記錄的頻率統(tǒng)計(jì)之,即每個(gè)記錄加1,這個(gè)size都會(huì)加1
//sampleSize一個(gè)閾值,從FrequencySketch初始化可以看到它的值為maximumSize的10倍
if (added && (++size == sampleSize)) {
 reset();
}

void reset() {
 int count = 0;
 for (int i = 0; i < table.length; i++) {
 count += Long.bitCount(table[i] & ONE_MASK);
 table[i] = (table[i] >>> 1) & RESET_MASK;
 }
 size = (size >>> 1) - (count >>> 2);
}

四、window

Caffeine 通過(guò)測(cè)試發(fā)現(xiàn) TinyLFU 在面對(duì)突發(fā)性的稀疏流量(sparse bursts)時(shí)表現(xiàn)很差,因?yàn)樾碌挠涗洠╪ew items)還沒(méi)來(lái)得及建立足夠的頻率就被剔除出去了,這就使得命中率下降。 于是 Caffeine 設(shè)計(jì)出一種新的算法,即 Window Tiny LFU(W-TinyLFU),并通過(guò)實(shí)驗(yàn)和實(shí)踐發(fā)現(xiàn) W-TinyLFU 比 TinyLFU 表現(xiàn)的更好。

5.png

它主要包括兩個(gè)緩存模塊,主緩存是 SLRU(Segmented LRU,即分段 LRU),SLRU 包括一個(gè)名為 protected 和一個(gè)名為 probation 的緩存區(qū)。通過(guò)增加一個(gè)緩存區(qū)(即 Window Cache),當(dāng)有新的記錄插入時(shí),會(huì)先在 window 區(qū)呆一下,就可以避免稀疏流量問(wèn)題。

五、淘汰策略

當(dāng) window 區(qū)滿了,就會(huì)根據(jù) LRU 把 candidate(即淘汰出來(lái)的元素)放到 probation 區(qū),如果 probation 區(qū)也滿了,就把 candidate 和 probation 將要淘汰的元素,兩個(gè)進(jìn)行“PK”,勝者留在 probation,輸者就要被淘汰了。 而且經(jīng)過(guò)實(shí)驗(yàn)發(fā)現(xiàn)當(dāng) window 區(qū)配置為總?cè)萘康?1%,剩余的 99%當(dāng)中的 80%分給 protected 區(qū),20%分給 probation 區(qū)時(shí),這時(shí)整體性能和命中率表現(xiàn)得最好,所以 Caffeine 默認(rèn)的比例設(shè)置就是這個(gè)。 不過(guò)這個(gè)比例 Caffeine 會(huì)在運(yùn)行時(shí)根據(jù)統(tǒng)計(jì)數(shù)據(jù)(statistics)去動(dòng)態(tài)調(diào)整,如果你的應(yīng)用程序的緩存隨著時(shí)間變化比較快的話,那么增加 window 區(qū)的比例可以提高命中率,相反緩存都是比較固定不變的話,增加 Main Cache 區(qū)(protected 區(qū) +probation 區(qū))的比例會(huì)有較好的效果。 淘汰策略所在類 BoundedLocalCache 關(guān)鍵屬性

//最大的個(gè)數(shù)限制
long maximum;
//當(dāng)前的個(gè)數(shù)
long weightedSize;
//window區(qū)的最大限制
long windowMaximum;
//window區(qū)當(dāng)前的個(gè)數(shù)
long windowWeightedSize;
//protected區(qū)的最大限制
long mainProtectedMaximum;
//protected區(qū)當(dāng)前的個(gè)數(shù)
long mainProtectedWeightedSize;

final FrequencySketch<K> sketch;
//window區(qū)的LRU queue(FIFO)
final AccessOrderDeque<Node<K, V>> accessOrderWindowDeque;
//probation區(qū)的LRU queue(FIFO)
final AccessOrderDeque<Node<K, V>> accessOrderProbationDeque;
//protected區(qū)的LRU queue(FIFO)
final AccessOrderDeque<Node<K, V>> accessOrderProtectedDeque;
@GuardedBy("evictionLock")
  void evictEntries() {
    if (!evicts()) {
      return;
    }
    //淘汰window區(qū)的記錄
  int candidates = evictFromWindow();
  //淘汰Main區(qū)的記錄
  evictFromMain(candidates);
  }

//根據(jù)W-TinyLFU,新的數(shù)據(jù)都會(huì)無(wú)條件的加到admission window
//但是window是有大小限制,所以要“定期”做一下“維護(hù)”
@GuardedBy("evictionLock")
int evictFromWindow() {
  int candidates = 0;
  //查看window queue的頭部節(jié)點(diǎn)
  Node<K, V> node = accessOrderWindowDeque().peek();
  //如果window區(qū)超過(guò)了最大的限制,那么就要把“多出來(lái)”的記錄做處理
  while (windowWeightedSize() > windowMaximum()) {
    // The pending operations will adjust the size to reflect the correct weight
    if (node == null) {
      break;
    }
    //下一個(gè)節(jié)點(diǎn)
    Node<K, V> next = node.getNextInAccessOrder();
    if (node.getWeight() != 0) {
      //把node定位在probation區(qū)
      node.makeMainProbation();
      //從window區(qū)去掉
      accessOrderWindowDeque().remove(node);
      //加入到probation queue,相當(dāng)于把節(jié)點(diǎn)移動(dòng)到probation區(qū)
      accessOrderProbationDeque().add(node);
      candidates++;
      //因?yàn)橐瞥艘粋€(gè)節(jié)點(diǎn),所以需要調(diào)整window的size
      setWindowWeightedSize(windowWeightedSize() - node.getPolicyWeight());
    }
    //處理下一個(gè)節(jié)點(diǎn)
    node = next;
  }

  return candidates;
}

@GuardedBy("evictionLock")
void evictFromMain(int candidates) {
  int victimQueue = PROBATION;
  //victim是probation queue的頭部
  Node<K, V> victim = accessOrderProbationDeque().peekFirst();
  //candidate是probation queue的尾部,也就是剛從window晉升來(lái)的
  Node<K, V> candidate = accessOrderProbationDeque().peekLast();
  //當(dāng)cache不夠容量時(shí)才做處理
  while (weightedSize() > maximum()) {
    // Stop trying to evict candidates and always prefer the victim
    if (candidates == 0) {
      candidate = null;
    }

    //對(duì)candidate為null且victim為bull的處理
    if ((candidate == null) && (victim == null)) {
      if (victimQueue == PROBATION) {
        victim = accessOrderProtectedDeque().peekFirst();
        victimQueue = PROTECTED;
        continue;
      } else if (victimQueue == PROTECTED) {
        victim = accessOrderWindowDeque().peekFirst();
        victimQueue = WINDOW;
        continue;
      }

      // The pending operations will adjust the size to reflect the correct weight
      break;
    }

    //對(duì)節(jié)點(diǎn)的weight為0的處理
    if ((victim != null) && (victim.getPolicyWeight() == 0)) {
      victim = victim.getNextInAccessOrder();
      continue;
    } else if ((candidate != null) && (candidate.getPolicyWeight() == 0)) {
      candidate = candidate.getPreviousInAccessOrder();
      candidates--;
      continue;
    }

    // Evict immediately if only one of the entries is present
    if (victim == null) {
      @SuppressWarnings("NullAway")
      Node<K, V> previous = candidate.getPreviousInAccessOrder();
      Node<K, V> evict = candidate;
      candidate = previous;
      candidates--;
      evictEntry(evict, RemovalCause.SIZE, 0L);
      continue;
    } else if (candidate == null) {
      Node<K, V> evict = victim;
      victim = victim.getNextInAccessOrder();
      evictEntry(evict, RemovalCause.SIZE, 0L);
      continue;
    }

    // Evict immediately if an entry was collected
    K victimKey = victim.getKey();
    K candidateKey = candidate.getKey();
    if (victimKey == null) {
      @NonNull Node<K, V> evict = victim;
      victim = victim.getNextInAccessOrder();
      evictEntry(evict, RemovalCause.COLLECTED, 0L);
      continue;
    } else if (candidateKey == null) {
      candidates--;
      @NonNull Node<K, V> evict = candidate;
      candidate = candidate.getPreviousInAccessOrder();
      evictEntry(evict, RemovalCause.COLLECTED, 0L);
      continue;
    }

    //放不下的節(jié)點(diǎn)直接處理掉
    if (candidate.getPolicyWeight() > maximum()) {
      candidates--;
      Node<K, V> evict = candidate;
      candidate = candidate.getPreviousInAccessOrder();
      evictEntry(evict, RemovalCause.SIZE, 0L);
      continue;
    }

    //根據(jù)節(jié)點(diǎn)的統(tǒng)計(jì)頻率frequency來(lái)做比較,看看要處理掉victim還是candidate
    //admit是具體的比較規(guī)則,看下面
    candidates--;
    //如果candidate勝出則淘汰victim
    if (admit(candidateKey, victimKey)) {
      Node<K, V> evict = victim;
      victim = victim.getNextInAccessOrder();
      evictEntry(evict, RemovalCause.SIZE, 0L);
      candidate = candidate.getPreviousInAccessOrder();
    } else {
      //如果是victim勝出,則淘汰candidate
      Node<K, V> evict = candidate;
      candidate = candidate.getPreviousInAccessOrder();
      evictEntry(evict, RemovalCause.SIZE, 0L);
    }
  }
}

@GuardedBy("evictionLock")
boolean admit(K candidateKey, K victimKey) {
  //分別獲取victim和candidate的統(tǒng)計(jì)頻率
  //frequency這個(gè)方法的原理和實(shí)現(xiàn)上面已經(jīng)解釋了
  int victimFreq = frequencySketch().frequency(victimKey);
  int candidateFreq = frequencySketch().frequency(candidateKey);
  //誰(shuí)大誰(shuí)贏
  if (candidateFreq > victimFreq) {
    return true;

    //如果相等,candidate小于5都當(dāng)輸了
  } else if (candidateFreq <= 5) {
    // The maximum frequency is 15 and halved to 7 after a reset to age the history. An attack
    // exploits that a hot candidate is rejected in favor of a hot victim. The threshold of a warm
    // candidate reduces the number of random acceptances to minimize the impact on the hit rate.
    return false;
  }
  //如果相等且candidate大于5,則隨機(jī)淘汰一個(gè)
  int random = ThreadLocalRandom.current().nextInt();
  return ((random & 127) == 0);
}
@GuardedBy("evictionLock")
  void onAccess(Node<K, V> node) {
    if (evicts()) {
      K key = node.getKey();
      if (key == null) {
        return;
      }
      frequencySketch().increment(key);
      if (node.inWindow()) {
        reorder(accessOrderWindowDeque(), node);
      } else if (node.inMainProbation()) {
        reorderProbation(node);
      } else {
        reorder(accessOrderProtectedDeque(), node);
      }
      setHitsInSample(hitsInSample() + 1);
    } else if (expiresAfterAccess()) {
      reorder(accessOrderWindowDeque(), node);
    }
    if (expiresVariable()) {
      timerWheel().reschedule(node);
    }
  }

@GuardedBy("evictionLock")
  void reorderProbation(Node<K, V> node) {
    if (!accessOrderProbationDeque().contains(node)) {
      // Ignore stale accesses for an entry that is no longer present
      return;
    } else if (node.getPolicyWeight() > mainProtectedMaximum()) {
      return;
    }

    // If the protected space exceeds its maximum, the LRU items are demoted to the probation space.
    // This is deferred to the adaption phase at the end of the maintenance cycle.
    setMainProtectedWeightedSize(mainProtectedWeightedSize() + node.getPolicyWeight());
    accessOrderProbationDeque().remove(node);
    accessOrderProtectedDeque().add(node);
    node.makeMainProtected();
  }
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書(shū)系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

推薦閱讀更多精彩內(nèi)容