上一節講述了jemalloc的思想,本節將分析Netty的實現細節。在Netty實現中,相關的類都加上了前綴Pool
,比如PoolArena
、PoolChunk
等,本節分析PoolArena
的源碼實現細節。
首先看類簽名:
abstract class PoolArena<T> implements PoolArenaMetric
該類是一個抽象類,這是因為ByteBuf
分為Heap和Direct,所以PoolArena
同樣分為兩類:Heap和Direct。該類實現的接口PoolArenaMetric
是一些信息的測度統計,忽略這些信息不再分析。
其中的關鍵成員變量如下:
private final int maxOrder; // chunk相關滿二叉樹的高度
final int pageSize; // 單個page的大小
final int pageShifts; // 用于輔助計算
final int chunkSize; // chunk的大小
final int subpageOverflowMask; // 用于判斷請求是否為Small/Tiny
final int numSmallSubpagePools; // small請求的雙向鏈表頭個數
final int directMemoryCacheAlignment; // 對齊基準
final int directMemoryCacheAlignmentMask; // 用于對齊內存
private final PoolSubpage<T>[] tinySubpagePools; // Subpage雙向鏈表
private final PoolSubpage<T>[] smallSubpagePools; // Subpage雙向鏈表
final PooledByteBufAllocator parent;
對于前述分析的如QINIT、Q0等chunk狀態,Netty使用PoolChunkList
作為容器存放相同狀態的Chunk塊,相關變量如下:
private final PoolChunkList<T> q050;
private final PoolChunkList<T> q025;
private final PoolChunkList<T> q000;
private final PoolChunkList<T> qInit;
private final PoolChunkList<T> q075;
private final PoolChunkList<T> q100;
構造方法如下:
protected PoolArena(PooledByteBufAllocator parent, int pageSize,
int maxOrder, int pageShifts, int chunkSize, int cacheAlignment) {
this.parent = parent;
this.pageSize = pageSize;
this.maxOrder = maxOrder;
this.pageShifts = pageShifts;
this.chunkSize = chunkSize;
directMemoryCacheAlignment = cacheAlignment;
directMemoryCacheAlignmentMask = cacheAlignment - 1;
subpageOverflowMask = ~(pageSize - 1);
tinySubpagePools = new PoolSubpage[numTinySubpagePools];
for (int i = 0; i < tinySubpagePools.length; i ++) {
tinySubpagePools[i] = newSubpagePoolHead(pageSize);
}
numSmallSubpagePools = pageShifts - 9;
smallSubpagePools = new PoolSubpage[numSmallSubpagePools];
for (int i = 0; i < smallSubpagePools.length; i ++) {
smallSubpagePools[i] = newSubpagePoolHead(pageSize);
}
initPoolChunkList();
}
private PoolSubpage<T> newSubpagePoolHead(int pageSize) {
PoolSubpage<T> head = new PoolSubpage<T>(pageSize);
head.prev = head;
head.next = head;
return head;
}
其中initPoolChunkList()
如下:
q100 = new PoolChunkList<T>(this, null, 100, Integer.MAX_VALUE, chunkSize);
q075 = new PoolChunkList<T>(this, q100, 75, 100, chunkSize);
q050 = new PoolChunkList<T>(this, q075, 50, 100, chunkSize);
q025 = new PoolChunkList<T>(this, q050, 25, 75, chunkSize);
q000 = new PoolChunkList<T>(this, q025, 1, 50, chunkSize);
qInit = new PoolChunkList<T>(this, q000, Integer.MIN_VALUE, 25, chunkSize);
q100.prevList(q075);
q075.prevList(q050);
q050.prevList(q025);
q025.prevList(q000);
q000.prevList(null);
qInit.prevList(qInit);
這段代碼實現如下圖所示的雙向鏈表:
Netty使用一個枚舉來表示每次請求大小的類別:
enum SizeClass {
Tiny,
Small,
Normal
// 除此之外的請求為Huge
}
根據請求分配大小判斷所屬分類的代碼如下,體會其中的位運算:
// capacity < pageSize
boolean isTinyOrSmall(int normCapacity) {
// subpageOverflowMask = ~(pageSize - 1)
return (normCapacity & subpageOverflowMask) == 0;
}
// normCapacity < 512
static boolean isTiny(int normCapacity) {
return (normCapacity & 0xFFFFFE00) == 0;
}
// capacity <= chunkSize
boolean isNormal(int normCapacity){
return normCapacity <= chunkSize;
}
對容量進行規范化的代碼如下:
int normalizeCapacity(int reqCapacity) {
// Huge 直接返回(直接內存需要對齊)
if (reqCapacity >= chunkSize) {
return directMemoryCacheAlignment == 0 ? reqCapacity :
alignCapacity(reqCapacity);
}
// Small和Normal 規范化到大于2的n次方的最小值
if (!isTiny(reqCapacity)) { // >= 512
int normalizedCapacity = reqCapacity;
normalizedCapacity --;
normalizedCapacity |= normalizedCapacity >>> 1;
normalizedCapacity |= normalizedCapacity >>> 2;
normalizedCapacity |= normalizedCapacity >>> 4;
normalizedCapacity |= normalizedCapacity >>> 8;
normalizedCapacity |= normalizedCapacity >>> 16;
normalizedCapacity ++;
if (normalizedCapacity < 0) {
normalizedCapacity >>>= 1;
}
return normalizedCapacity;
}
// Tiny且直接內存需要對齊
if (directMemoryCacheAlignment > 0) {
return alignCapacity(reqCapacity);
}
// Tiny且已經是16B的倍數
if ((reqCapacity & 15) == 0) {
return reqCapacity;
}
// Tiny不是16B的倍數則規范化到16B的倍數
return (reqCapacity & ~15) + 16;
}
規范化的結果可查看請求分類圖,實現中使用了大量位運算,請仔細體會。另外,直接內存對齊后的請求容量為基準的倍數,比如基準為64B,則分配的內存都需要為64B的整數倍,也就是常說的按64字節對齊,實現代碼如下(依然使用位運算):
int alignCapacity(int reqCapacity) {
// directMemoryCacheAlignmentMask = cacheAlignment - 1;
int delta = reqCapacity & directMemoryCacheAlignmentMask;
return delta == 0 ? reqCapacity : reqCapacity + directMemoryCacheAlignment - delta;
}
對于Small和Tiny的請求,隨著請求的分配,PoolArena
可能會形成如下的雙向循環鏈表:
其中的每個節點都是
PoolSubpage
,在jemalloc的介紹中,說明Subpage會以第一次請求分配的大小為基準劃分,之后也只能進行這個基準大小的內存分配。在PoolArena
中繼續對PoolSubpage
進行分組,將相同基準的PoolSubpage
連接成為雙向循環鏈表,便于管理和內存分配。需要注意的是鏈表頭結點head是一個特殊的PoolSubpage
,不進行實際的內存分配任務。得到鏈表head節點的代碼如下:
PoolSubpage<T> findSubpagePoolHead(int elemSize) {
int tableIdx;
PoolSubpage<T>[] table;
if (isTiny(elemSize)) { // < 512 Tiny
tableIdx = elemSize >>> 4;
table = tinySubpagePools;
} else { // Small
tableIdx = 0;
elemSize >>>= 10; // 512=0, 1KB=1, 2KB=2, 4KB=3
while (elemSize != 0) {
elemSize >>>= 1;
tableIdx ++;
}
table = smallSubpagePools;
}
return table[tableIdx];
}
明白了這些,繼續分析重要的內存分配方法allocate()
:
private void allocate(PoolThreadCache cache, PooledByteBuf<T> buf,
final int reqCapacity) {
// 規范化請求容量
final int normCapacity = normalizeCapacity(reqCapacity);
// capacity < pageSize, Tiny/Small請求
if (isTinyOrSmall(normCapacity)) {
int tableIdx;
PoolSubpage<T>[] table;
boolean tiny = isTiny(normCapacity);
if (tiny) { // < 512 Tiny請求
if (cache.allocateTiny(this, buf, reqCapacity, normCapacity)) {
return; // 嘗試從ThreadCache進行分配
}
tableIdx = tinyIdx(normCapacity);
table = tinySubpagePools;
} else { // Small請求
if (cache.allocateSmall(this, buf, reqCapacity, normCapacity)) {
return; // 嘗試從ThreadCache進行分配
}
tableIdx = smallIdx(normCapacity);
table = smallSubpagePools;
}
// 分組的Subpage雙向鏈表的頭結點
final PoolSubpage<T> head = table[tableIdx];
synchronized (head) { // 鎖定防止其他操作修改head結點
final PoolSubpage<T> s = head.next;
if (s != head) {
assert s.doNotDestroy && s.elemSize == normCapacity;
long handle = s.allocate(); // 進行分配
assert handle >= 0;
s.chunk.initBufWithSubpage(buf, handle, reqCapacity);
return;
}
}
synchronized (this) {
// 雙向循環鏈表還沒初始化,使用normal分配
allocateNormal(buf, reqCapacity, normCapacity);
}
return;
}
// Normal請求
if (normCapacity <= chunkSize) {
if (cache.allocateNormal(this, buf, reqCapacity, normCapacity)) {
return; // 嘗試從ThreadCache進行分配
}
synchronized (this) {
allocateNormal(buf, reqCapacity, normCapacity);
}
} else {
// Huge請求直接分配
allocateHuge(buf, reqCapacity);
}
}
對于Normal和Huge的分配細節如下:
private void allocateNormal(PooledByteBuf<T> buf, int reqCapacity, int normCapacity) {
if (q050.allocate(buf, reqCapacity, normCapacity) ||
q025.allocate(buf, reqCapacity, normCapacity) ||
q000.allocate(buf, reqCapacity, normCapacity) ||
qInit.allocate(buf, reqCapacity, normCapacity) ||
q075.allocate(buf, reqCapacity, normCapacity)) {
return;
}
// 無Chunk或已存Chunk不能滿足分配,新增一個Chunk
PoolChunk<T> c = newChunk(pageSize, maxOrder, pageShifts, chunkSize);
long handle = c.allocate(normCapacity);
assert handle > 0;
c.initBuf(buf, handle, reqCapacity);
qInit.add(c); // Chunk初始狀態為QINIT
}
private void allocateHuge(PooledByteBuf<T> buf, int reqCapacity) {
PoolChunk<T> chunk = newUnpooledChunk(reqCapacity);
buf.initUnpooled(chunk, reqCapacity);
}
總結一下內存分配過程:
- 對于Tiny/Small、Normal大小的請求,優先從線程緩存中分配。
- 沒有從緩存中得到分配的Tiny/Small請求,會從以第一次請求大小為基準進行分組的Subpage雙向鏈表中進行分配;如果雙向鏈表還沒初始化,則會使用Normal請求分配Chunk塊中的一個Page,Page以請求大小為基準進行切分并分配第一塊內存,然后加入到雙向鏈表中。
- 沒有從緩存中得到分配的Normal請求,則會使用伙伴算法分配滿足要求的連續Page塊。
- 對于Huge請求,則直接使用Unpooled直接分配。
內存分配過程分析完畢,接著分析內存釋放:
void free(PoolChunk<T> chunk, long handle, int normCapacity, PoolThreadCache cache) {
if (chunk.unpooled) { // Huge
int size = chunk.chunkSize();
destroyChunk(chunk); // 模板方法,子類實現具體釋放過程
} else { // Normal, Small/Tiny
SizeClass sizeClass = sizeClass(normCapacity);
if (cache != null && cache.add(this, chunk, handle, normCapacity, sizeClass)) {
return; // 可以緩存則不釋放
}
// 否則釋放
freeChunk(chunk, handle, sizeClass);
}
}
void freeChunk(PoolChunk<T> chunk, long handle, SizeClass sizeClass) {
final boolean destroyChunk;
synchronized (this) {
// parent為所屬的chunkList,destroyChunk為true表示Chunk內存使用裝填
// 從QINIT->Q0->...->Q0,最后釋放
destroyChunk = !chunk.parent.free(chunk, handle);
}
if (destroyChunk) {
destroyChunk(chunk); // 模板方法,子類實現具體釋放過程
}
}
需要注意的是finalize()
,該方法是Object
中的方法,在對象被GC回收時調用,可知在該方法中需要清理資源,本類中主要清理內存,代碼如下:
protected final void finalize() throws Throwable {
try {
super.finalize();
} finally {
destroyPoolSubPages(smallSubpagePools);
destroyPoolSubPages(tinySubpagePools);
destroyPoolChunkLists(qInit, q000, q025, q050, q075, q100);
}
}
private static void destroyPoolSubPages(PoolSubpage<?>[] pages) {
for (PoolSubpage<?> page : pages) {
page.destroy();
}
}
private void destroyPoolChunkLists(PoolChunkList<T>... chunkLists) {
for (PoolChunkList<T> chunkList: chunkLists) {
chunkList.destroy(this);
}
}
此外,當PooledByteBuf
容量擴增時,內存需要重新分配,代碼如下:
void reallocate(PooledByteBuf<T> buf, int newCapacity, boolean freeOldMemory) {
int oldCapacity = buf.length;
if (oldCapacity == newCapacity) {
return;
}
PoolChunk<T> oldChunk = buf.chunk;
long oldHandle = buf.handle;
T oldMemory = buf.memory;
int oldOffset = buf.offset;
int oldMaxLength = buf.maxLength;
int readerIndex = buf.readerIndex();
int writerIndex = buf.writerIndex();
// 分配新內存
allocate(parent.threadCache(), buf, newCapacity);
// 將老數據copy到新內存
if (newCapacity > oldCapacity) {
memoryCopy(oldMemory, oldOffset,
buf.memory, buf.offset, oldCapacity);
} else if (newCapacity < oldCapacity) {
if (readerIndex < newCapacity) {
if (writerIndex > newCapacity) {
writerIndex = newCapacity;
}
memoryCopy(oldMemory, oldOffset + readerIndex,
buf.memory, buf.offset + readerIndex, writerIndex - readerIndex);
} else {
readerIndex = writerIndex = newCapacity;
}
}
// 重新設置讀寫索引
buf.setIndex(readerIndex, writerIndex);
// 如有必要,釋放老的內存
if (freeOldMemory) {
free(oldChunk, oldHandle, oldMaxLength, buf.cache);
}
}
最后,由于該類是一個抽象類,其中的抽象方法如下:
// 新建一個Chunk,Tiny/Small,Normal請求請求分配時調用
protected abstract PoolChunk<T> newChunk(int pageSize, int maxOrder, int pageShifts, int chunkSize);
// 新建一個Chunk,Huge請求分配時調用
protected abstract PoolChunk<T> newUnpooledChunk(int capacity);
//
protected abstract PooledByteBuf<T> newByteBuf(int maxCapacity);
// 復制內存,當ByteBuf擴充容量時調用
protected abstract void memoryCopy(T src, int srcOffset, T dst, int dstOffset, int length);
// 銷毀Chunk,釋放內存時調用
protected abstract void destroyChunk(PoolChunk<T> chunk);
// 判斷子類實現Heap還是Direct
protectted abstract boolean isDirect();
該類的兩個子類分別是HeapArena
和DirectArena
,根據底層不同而實現不同的抽象方法。方法簡單易懂,不再列出代碼。
相關鏈接: