本小節主要是繼續上一節的內容進行延續,上一節我們講了Master的注冊機制,包括Driver,Application,Worker對Master的注冊,這樣在Master端就很清楚的知道Driver,Application,Worker的啟動狀態。
在任務運行中,當Driver,Application(Executor),Worker狀態就化時,其實也同時會更新在Master端的注冊信息。
下面是當Driver狀態改變時
case DriverStateChanged(driverId, state, exception) => {
state match {
//當Driverr是這些狀態,則移除Driver
case DriverState.ERROR |
DriverState.FINISHED | DriverState.KILLED | DriverState.FAILED =>
removeDriver(driverId, state, exception)
case _ =>
throw new Exception(s"Received
unexpected state update for driver $driverId: $state")
}
}
def removeDriver(driverId: String, finalState: DriverState, exception: Option[Exception]) {
//通過driverId找到對應的driver對象
drivers.find(d => d.id == driverId) match {
//Some是樣例類,當已經找到
case Some(driver) =>
logInfo(s"Removing driver: $driverId")
//driver從內存緩存區中刪除
drivers -= driver
if (completedDrivers.size >= RETAINED_DRIVERS) {
val toRemove = math.max(RETAINED_DRIVERS / 10, 1)
completedDrivers.trimStart(toRemove)
}
//把driver加入已完成的completeDrivers列表中
completedDrivers += driver
//使用持久化引摯去年driver
persistenceEngine.removeDriver(driver)
//設定此driver狀態等
driver.state = finalState
driver.exception = exception
//通過此driver找到對應的worker,移除所有worker中的此driver的信息
driver.worker.foreach(w => w.removeDriver(driver))
//調用 了schedule方法,調度
schedule()
case None =>
logWarning(s"Asked to remove unknown driver: $driverId")
}
}
通過上面的方法,我們是不是看的很熟悉,這是我們在前面注冊Master時,要執行的步驟的反方向.
下面是Executor狀態改變時的源碼:代碼很簡單,我都寫了注解,這個與前面的Appliction注冊Master步驟的反方面很類似。
case ExecutorStateChanged(appId, execId, state, message, exitStatus) => {
//通過executorID找到applicaiton里對應的executor信息
val execOption = idToApp.get(appId).flatMap(app => app.executors.get(execId))
execOption match {
case Some(exec) => {
val appInfo = idToApp(appId)
exec.state = state
if (state == ExecutorState.RUNNING) { appInfo.resetRetryCount() }
//通知executor對應的applicaiton的Driver。更新狀態
exec.application.driver ! ExecutorUpdated(execId, state, message, exitStatus)
//判斷executor完成了
if (ExecutorState.isFinished(state)) {
// Remove this executor from the worker and app
logInfo(s"Removing executor ${exec.fullId} because it is $state")
//移除application中的executor的對應的信息
appInfo.removeExecutor(exec)
//刪除executor的worker中的executor的信息
exec.worker.removeExecutor(exec)
val normalExit = exitStatus == Some(0)
// Only retry certain number of times so we don't go into an infinite loop.
if (!normalExit) {
//當這些executor是非正常退出 ,當重試次數小于10時,會在調度
if (appInfo.incrementRetryCount() < ApplicationState.MAX_NUM_RETRY) {
schedule()
} else {
//當大于10后,刪除executor后,
val execs = appInfo.executors.values
if (!execs.exists(_.state == ExecutorState.RUNNING)) {
logError(s"Application ${appInfo.desc.name} with ID ${appInfo.id} failed " +
s"${appInfo.retryCount} times; removing it")
//同時把executor上的task對應的application也刪除了
removeApplication(appInfo, ApplicationState.FAILED)
}
}
后面的代碼簡單說明一下,當重試executor次數大于10后,刪除了execuor,當然在executor上運行的application也就不存在了.
文章每一個字都是作者寫出來的,看完若感覺有用,請點‘喜歡’