說明
去年一年在簡書大約寫了25篇,在公司內(nèi)網(wǎng)寫了5篇博客。今年定個小目標吧,在簡書產(chǎn)出高質(zhì)量的博客50篇,加油!
首先本片文章參考了[10w定時任務(wù),如何高效觸發(fā)超時](http://chuansong.me/n/1650380646616),感謝作者!
前言
在工作中,經(jīng)常會碰到需要定時或者超時任務(wù)場景。例如在各種RPC框架或者IM、PUSH等框架中,通常需要在server和client端之間維持一條長連接。而這條長連接通常需要有心跳保持,client端(或server)通常需要給server端(或client)定時發(fā)送心跳消息,server端在一定時間內(nèi)收不到來client的心跳消息時會close掉連接。
常見方案
對于上文中提到的心跳消息處理,通常server端在收到心跳消息時會更新對應(yīng)channel的最近讀寫時間。而處理心跳超時通常會有兩種做法:
- 使用一個Timer(或者是ScheduledThreadPoolExecutor),定時對所有的channels進行遍歷,然后根據(jù)最近讀寫時間和超時時間計算是否超時
- 對每個channel使用一個Timer或者對每個channel開啟一個定時任務(wù),定時檢查該channel是否超時
在Dubbo中采用的是客戶端超時采用的是方案二,服務(wù)端超時采用的是方案一(嚴格意義上,這么區(qū)分不完全正確),具體的代碼如下:
private void startHeatbeatTimer() {
stopHeartbeatTimer();
if ( heartbeat > 0 ) {
heatbeatTimer = scheduled.scheduleWithFixedDelay(
new HeartBeatTask( new HeartBeatTask.ChannelProvider() {
public Collection<Channel> getChannels() {
return Collections.<Channel>singletonList( HeaderExchangeClient.this );
}
}, heartbeat, heartbeatTimeout),
heartbeat, heartbeat, TimeUnit.MILLISECONDS );
}
}
對于每一個HeaderExchangeClient都會創(chuàng)建一個單獨的HeartBeatTask任務(wù),而HeartBeatTask處理超時的方式如下:
public void run() {
try {
long now = System.currentTimeMillis();
for ( Channel channel : channelProvider.getChannels() ) {
if (channel.isClosed()) {
continue;
}
try {
Long lastRead = ( Long ) channel.getAttribute(
HeaderExchangeHandler.KEY_READ_TIMESTAMP );
Long lastWrite = ( Long ) channel.getAttribute(
HeaderExchangeHandler.KEY_WRITE_TIMESTAMP );
if ( ( lastRead != null && now - lastRead > heartbeat )
|| ( lastWrite != null && now - lastWrite > heartbeat ) ) {
Request req = new Request();
req.setVersion( "2.0.0" );
req.setTwoWay( true );
req.setEvent( Request.HEARTBEAT_EVENT );
channel.send( req );
if ( logger.isDebugEnabled() ) {
logger.debug( "Send heartbeat to remote channel " + channel.getRemoteAddress()
+ ", cause: The channel has no data-transmission exceeds a heartbeat period: " + heartbeat + "ms" );
}
}
if ( lastRead != null && now - lastRead > heartbeatTimeout ) {
logger.warn( "Close channel " + channel
+ ", because heartbeat read idle time out: " + heartbeatTimeout + "ms" );
if (channel instanceof Client) {
try {
((Client)channel).reconnect();
}catch (Exception e) {
//do nothing
}
} else {
channel.close();
}
}
} catch ( Throwable t ) {
logger.warn( "Exception when heartbeat to remote channel " + channel.getRemoteAddress(), t );
}
}
} catch ( Throwable t ) {
logger.warn( "Unhandled exception when heartbeat, cause: " + t.getMessage(), t );
}
}
對于客戶端來說channelProvider.getChannels()其實只有一個,就是一個HeaderExchangeClient;對于服務(wù)端來說,channelProvider.getChannels()是連接到server的所有channels。
以上兩種方案各種利弊,方案一每次需要遍歷效率不高,方案二資源可能有些浪費(通常以為這多個線程,如果是單線程其實就退化成了方案一)。
更好的做法
其實業(yè)界已經(jīng)提出了一個更高效更優(yōu)雅的做法,有論文,而Netty基于該論文實現(xiàn)了HashedWheelTimer并使用。那接下來就分析下HashedWheelTimer的使用以及怎么實現(xiàn)的呢。
簡單來說呢。HashedWheelTimer維護了一個環(huán)形的隊列。往環(huán)中添加超時任務(wù)的時候會根據(jù)超時時間計算該超時任務(wù)需要落在環(huán)中的那個節(jié)點中(還會記錄需要經(jīng)過的圈數(shù))。每tick一下會移動到環(huán)中的下一個節(jié)點,取出節(jié)點中所有的超時任務(wù)遍歷,如果超時任務(wù)剩余的圈數(shù)為1證明已經(jīng)到了超時時間則執(zhí)行超時,如果剩余圈數(shù)大于1在減1.然后繼續(xù)tick。
需要說明的是,HashedWheelTimer并非精確定時,精度取決于tickDuration。
構(gòu)造方法
先看一下HashedWheelTimer的構(gòu)造方法
public HashedWheelTimer(
ThreadFactory threadFactory,
long tickDuration, TimeUnit unit, int ticksPerWheel) {
if (threadFactory == null) {
throw new NullPointerException("threadFactory");
}
if (unit == null) {
throw new NullPointerException("unit");
}
if (tickDuration <= 0) {
throw new IllegalArgumentException("tickDuration must be greater than 0: " + tickDuration);
}
if (ticksPerWheel <= 0) {
throw new IllegalArgumentException("ticksPerWheel must be greater than 0: " + ticksPerWheel);
}
// Normalize ticksPerWheel to power of two and initialize the wheel.
wheel = createWheel(ticksPerWheel);
mask = wheel.length - 1;
// Convert tickDuration to nanos.
this.tickDuration = unit.toNanos(tickDuration);
// Prevent overflow.
if (this.tickDuration >= Long.MAX_VALUE / wheel.length) {
throw new IllegalArgumentException(String.format(
"tickDuration: %d (expected: 0 < tickDuration in nanos < %d",
tickDuration, Long.MAX_VALUE / wheel.length));
}
workerThread = threadFactory.newThread(worker);
leak = leakDetector.open(this);
}
我們需要傳入threadFactory,這個threadFactory會用來創(chuàng)建worker線程。第二個參數(shù)tickDuration代表每個tick經(jīng)過的時間。第三個參數(shù)unit表示tickDuration的時間單位。第四個參數(shù)ticksPerWheel代表環(huán)的大小。
其中需要注意的是方法createWheel(ticksPerWheel)。
private static HashedWheelBucket[] createWheel(int ticksPerWheel) {
if (ticksPerWheel <= 0) {
throw new IllegalArgumentException(
"ticksPerWheel must be greater than 0: " + ticksPerWheel);
}
if (ticksPerWheel > 1073741824) {
throw new IllegalArgumentException(
"ticksPerWheel may not be greater than 2^30: " + ticksPerWheel);
}
ticksPerWheel = normalizeTicksPerWheel(ticksPerWheel);
HashedWheelBucket[] wheel = new HashedWheelBucket[ticksPerWheel];
for (int i = 0; i < wheel.length; i ++) {
wheel[i] = new HashedWheelBucket();
}
return wheel;
}
private static int normalizeTicksPerWheel(int ticksPerWheel) {
int normalizedTicksPerWheel = 1;
while (normalizedTicksPerWheel < ticksPerWheel) {
normalizedTicksPerWheel <<= 1;
}
return normalizedTicksPerWheel;
}
以上代碼中normalizeTicksPerWheel得出環(huán)的大小,取了一個大于等于ticksPerWheel且是2的N次冪的整數(shù)。為啥要取成2的N次冪呢,主要是因為在大小而2的N次冪的環(huán)上求索引非常的方便,a & (b-1) = a % b,當b時2的N次冪時成立。
start方法
public void start() {
switch (WORKER_STATE_UPDATER.get(this)) {
case WORKER_STATE_INIT:
if (WORKER_STATE_UPDATER.compareAndSet(this, WORKER_STATE_INIT, WORKER_STATE_STARTED)) {
workerThread.start();
}
break;
case WORKER_STATE_STARTED:
break;
case WORKER_STATE_SHUTDOWN:
throw new IllegalStateException("cannot be started once stopped");
default:
throw new Error("Invalid WorkerState");
}
// Wait until the startTime is initialized by the worker.
while (startTime == 0) {
try {
startTimeInitialized.await();
} catch (InterruptedException ignore) {
// Ignore - it will be ready very soon.
}
}
}
start方法也非常的講究,可以認為WORKER_STATE_UPDATER是一個AtomicInteger變量,代表著當前HashedWheelTimer的狀態(tài),當狀態(tài)為WORKER_STATE_INIT是會啟動workerThread。在啟動worker線程之后會一直等待startTime變成非0。這段代碼還是很凸顯功底的。稍后再分析workerThread的時候會解釋下startTimeInitialized的作用。
newTimeout方法
public Timeout newTimeout(TimerTask task, long delay, TimeUnit unit) {
if (task == null) {
throw new NullPointerException("task");
}
if (unit == null) {
throw new NullPointerException("unit");
}
start();
// Add the timeout to the timeout queue which will be processed on the next tick.
// During processing all the queued HashedWheelTimeouts will be added to the correct HashedWheelBucket.
long deadline = System.nanoTime() + unit.toNanos(delay) - startTime;
HashedWheelTimeout timeout = new HashedWheelTimeout(this, task, deadline);
timeouts.add(timeout);
return timeout;
}
這是個非常重要的方法,我們調(diào)用此方法來增加一個定時任務(wù)。該方法有三個參數(shù),第一個參數(shù)描述了定時任務(wù),在任務(wù)超時的時候會執(zhí)行其run(Timeout timeout)方法,第二個參數(shù)為超時時間,也就是距離當前時刻多久之后執(zhí)行超時任務(wù),第三個參數(shù)是超時時間的時間單位。整個方法比較簡單,先計算deadline,也就是任務(wù)超時需要經(jīng)過的納秒級時間,然后構(gòu)建一個相應(yīng)的HashedWheelTimeout放入到timeouts隊列中,需要注意的是此時并沒有將HashedWheelTimeout放到環(huán)上,按照注釋Add the timeout to the timeout queue which will be processed on the next ticktimeouts超時任務(wù)隊列中超時任務(wù)將在下個tick被放入到正確的bucket中。
需要特別注意的是,newTimeout中調(diào)用了start()方法,最佳實踐是不要直接調(diào)用start(),而是在有超時任務(wù)需要執(zhí)行的時候通過newTimeout來觸發(fā)start(),以避免worker線程無畏的空轉(zhuǎn)。
HashedWheelBucket
HashedWheelBucket是一個內(nèi)部類,代表的是環(huán)上的節(jié)點。在構(gòu)造方法中會構(gòu)造一個HashedWheelBucket數(shù)組。
private static final class HashedWheelBucket {
// Used for the linked-list datastructure
private HashedWheelTimeout head;
private HashedWheelTimeout tail;
}
HashedWheelBucket中維持了一個鏈表來存儲超時任務(wù)。
Worker線程
public void run() {
// Initialize the startTime.
startTime = System.nanoTime();
if (startTime == 0) {
// We use 0 as an indicator for the uninitialized value here, so make sure it's not 0 when initialized.
startTime = 1;
}
// Notify the other threads waiting for the initialization at start().
startTimeInitialized.countDown();
do {
final long deadline = waitForNextTick();
if (deadline > 0) {
int idx = (int) (tick & mask);
processCancelledTasks();
HashedWheelBucket bucket =
wheel[idx];
transferTimeoutsToBuckets();
bucket.expireTimeouts(deadline);
tick++;
}
} while (WORKER_STATE_UPDATER.get(HashedWheelTimer.this) == WORKER_STATE_STARTED);
// Fill the unprocessedTimeouts so we can return them from stop() method.
for (HashedWheelBucket bucket: wheel) {
bucket.clearTimeouts(unprocessedTimeouts);
}
for (;;) {
HashedWheelTimeout timeout = timeouts.poll();
if (timeout == null) {
break;
}
if (!timeout.isCancelled()) {
unprocessedTimeouts.add(timeout);
}
}
processCancelledTasks();
}
整個WheelTimer中最重要的就是Woker線程了。前面提到start()方法中會啟動worker線程,并且會等待startTime不為0,worker線程會把startTime設(shè)置為當前的納秒時間,并且startTimeInitialized.countDown()喚醒阻塞在start()方法的線程。
在之后,只要WheelTimer還在WORKER_STATE_STARTED狀態(tài)(目前改變狀態(tài)會會在start和stop方法)。
waitForNextTick()
private long waitForNextTick() {
long deadline = tickDuration * (tick + 1);
for (;;) {
final long currentTime = System.nanoTime() - startTime;
long sleepTimeMs = (deadline - currentTime + 999999) / 1000000;
if (sleepTimeMs <= 0) {
if (currentTime == Long.MIN_VALUE) {
return -Long.MAX_VALUE;
} else {
return currentTime;
}
}
// Check if we run on windows, as if thats the case we will need
// to round the sleepTime as workaround for a bug that only affect
// the JVM if it runs on windows.
//
// See https://github.com/netty/netty/issues/356
if (PlatformDependent.isWindows()) {
sleepTimeMs = sleepTimeMs / 10 * 10;
}
try {
Thread.sleep(sleepTimeMs);
} catch (InterruptedException ignored) {
if (WORKER_STATE_UPDATER.get(HashedWheelTimer.this) == WORKER_STATE_SHUTDOWN) {
return Long.MIN_VALUE;
}
}
}
}
waitForNextTick()比較簡單,就是讓woker線程休眠一個tick的時間,休眠完之后返回當前納秒時間。
processCancelledTasks()
private void processCancelledTasks() {
for (;;) {
Runnable task = cancelledTimeouts.poll();
if (task == null) {
// all processed
break;
}
try {
task.run();
} catch (Throwable t) {
if (logger.isWarnEnabled()) {
logger.warn("An exception was thrown while process a cancellation task", t);
}
}
}
}
WheelTime中維護了一個cancelledTimeouts隊列,每次tick都會處理cancelledTimeouts隊列中的所有超時任務(wù),至于任務(wù)是在什么時候怎么被添加到cancelledTimeouts隊列中的后面再說。
transferTimeoutsToBuckets()
private void transferTimeoutsToBuckets() {
// transfer only max. 100000 timeouts per tick to prevent a thread to stale the workerThread when it just
// adds new timeouts in a loop.
for (int i = 0; i < 100000; i++) {
HashedWheelTimeout timeout = timeouts.poll();
if (timeout == null) {
// all processed
break;
}
if (timeout.state() == HashedWheelTimeout.ST_CANCELLED) {
// Was cancelled in the meantime.
continue;
}
long calculated = timeout.deadline / tickDuration;
timeout.remainingRounds = (calculated - tick) / wheel.length;
final long ticks = Math.max(calculated, tick); // Ensure we don't schedule for past.
int stopIndex = (int) (ticks & mask);
HashedWheelBucket bucket = wheel[stopIndex];
bucket.addTimeout(timeout);
}
}
前面提到,在newTimeout的時候,超時任務(wù)并不會立馬添加到環(huán)中,而是先放到了timeout隊列中。在每個tick來臨的時候,worker會將timeout中的所有超時任務(wù)方法環(huán)中。而計算remainingRounds和stopIndex的方法還是很巧妙的
long calculated = timeout.deadline / tickDuration;
timeout.remainingRounds = (calculated - tick) / wheel.length;
final long ticks = Math.max(calculated, tick); // Ensure we don't schedule for past.
int stopIndex = (int) (ticks & mask);
然后將超時任務(wù)添加到對應(yīng)的HashedWheelBucket中。
bucket.expireTimeouts(deadline);
public void expireTimeouts(long deadline) {
HashedWheelTimeout timeout = head;
// process all timeouts
while (timeout != null) {
boolean remove = false;
if (timeout.remainingRounds <= 0) {
if (timeout.deadline <= deadline) {
timeout.expire();
} else {
// The timeout was placed into a wrong slot. This should never happen.
throw new IllegalStateException(String.format(
"timeout.deadline (%d) > deadline (%d)", timeout.deadline, deadline));
}
remove = true;
} else if (timeout.isCancelled()) {
remove = true;
} else {
timeout.remainingRounds --;
}
// store reference to next as we may null out timeout.next in the remove block.
HashedWheelTimeout next = timeout.next;
if (remove) {
remove(timeout);
}
timeout = next;
}
}
處理環(huán)中對應(yīng)bucket中所有的超時任務(wù),如果remainingRounds小于等于0,證明超時時間到了,則執(zhí)行timeout.expire();
,如果remainingRounds大于0,則減1,如果超時任務(wù)超時或者取消,移除超時任務(wù)。
HashedWheelTimeout#cancel
public boolean cancel() {
// only update the state it will be removed from HashedWheelBucket on next tick.
if (!compareAndSetState(ST_INIT, ST_CANCELLED)) {
return false;
}
// If a task should be canceled we create a new Runnable for this to another queue which will
// be processed on each tick. So this means that we will have a GC latency of max. 1 tick duration
// which is good enough. This way we can make again use of our MpscLinkedQueue and so minimize the
// locking / overhead as much as possible.
//
// It is important that we not just add the HashedWheelTimeout itself again as it extends
// MpscLinkedQueueNode and so may still be used as tombstone.
timer.cancelledTimeouts.add(new Runnable() {
@Override
public void run() {
HashedWheelBucket bucket = HashedWheelTimeout.this.bucket;
if (bucket != null) {
bucket.remove(HashedWheelTimeout.this);
}
}
});
return true;
}```
前面提到了cancelledTimeouts隊列,在調(diào)用HashedWheelTimeout#cancel時會像cancelledTimeouts隊列中增加任務(wù),該任務(wù)就是將超時任務(wù)從對應(yīng)的bucket中移除
### stop()
public Set<Timeout> stop() {
if (Thread.currentThread() == workerThread) {
throw new IllegalStateException(
HashedWheelTimer.class.getSimpleName() +
".stop() cannot be called from " +
TimerTask.class.getSimpleName());
}
if (!WORKER_STATE_UPDATER.compareAndSet(this, WORKER_STATE_STARTED, WORKER_STATE_SHUTDOWN)) {
// workerState can be 0 or 2 at this moment - let it always be 2.
WORKER_STATE_UPDATER.set(this, WORKER_STATE_SHUTDOWN);
if (leak != null) {
leak.close();
}
return Collections.emptySet();
}
boolean interrupted = false;
while (workerThread.isAlive()) {
workerThread.interrupt();
try {
workerThread.join(100);
} catch (InterruptedException ignored) {
interrupted = true;
}
}
if (interrupted) {
Thread.currentThread().interrupt();
}
if (leak != null) {
leak.close();
}
return worker.unprocessedTimeouts();
}
我一直認為寫程序有兩點非常考驗功底,1是生命周期管理,2是異常情況處理
WheelTimer有start()方法也應(yīng)該有stop()方法,該stop方法有比較多的技巧值得學習
if (!WORKER_STATE_UPDATER.compareAndSet(this, WORKER_STATE_STARTED, WORKER_STATE_SHUTDOWN)) {
// workerState can be 0 or 2 at this moment - let it always be 2.
WORKER_STATE_UPDATER.set(this, WORKER_STATE_SHUTDOWN);
if (leak != null) {
leak.close();
}
return Collections.emptySet();
}
這里相當于有多個線程同時調(diào)用stop()方法時,只有一個能成功把狀態(tài)從WORKER_STATE_STARTED設(shè)置為WORKER_STATE_SHUTDOWN,如果設(shè)置不成功則強制設(shè)置為WORKER_STATE_SHUTDOWN(保證總有一個成功,其實應(yīng)該沒有必要),然后返回空列表(表示該線程不需要處理了,總會有另外一個成功的線程完成后面的事情)。
while (workerThread.isAlive()) {
workerThread.interrupt();
try {
workerThread.join(100);
} catch (InterruptedException ignored) {
interrupted = true;
}
}
如果workerThread.isAlive,如果worker線程仍活著,或嘗試workerThread.interrupt()(要想停止一個線程可以使用xxxThread.interrupt(),然后讓xxxThread響應(yīng)xxxThread.isInterrupted(),雖然該wokerThread沒有響應(yīng)這個...)。在WheelTimer中,stop的時候想要workerThread優(yōu)雅的處理完事情,并且返回未能處理完的任務(wù)后退出,所以使用` workerThread.join(100);`在線程中等待workerThread執(zhí)行100ms。
// Fill the unprocessedTimeouts so we can return them from stop() method.
for (HashedWheelBucket bucket: wheel) {
bucket.clearTimeouts(unprocessedTimeouts);
}
for (;;) {
HashedWheelTimeout timeout = timeouts.poll();
if (timeout == null) {
break;
}
if (!timeout.isCancelled()) {
unprocessedTimeouts.add(timeout);
}
}
processCancelledTasks()
在worker線程中,最后會將bucket中所有沒來得及處理的任務(wù)和timeout隊列中沒超時的任務(wù)放入到unprocessedTimeouts中,然后會處理掉已經(jīng)取消的超時任務(wù),然后就完成了它的使命等待被回收。
其中有響應(yīng)InterruptedException的部分處理,關(guān)于InterruptedException的處理估計會要出一篇文章詳細講解。
## 總結(jié)
代碼寫得非常好,有很多值得學習的地方,HashedWheelTimer可以用起來了。