[第六章]解析Driver,Executor狀態改變后通知Master

本小節主要是繼續上一節的內容進行延續,上一節我們講了Master的注冊機制,包括Driver,Application,Worker對Master的注冊,這樣在Master端就很清楚的知道Driver,Application,Worker的啟動狀態。
在任務運行中,當Driver,Application(Executor),Worker狀態就化時,其實也同時會更新在Master端的注冊信息。

下面是當Driver狀態改變時

case DriverStateChanged(driverId, state, exception) => {
      state match {
        //當Driverr是這些狀態,則移除Driver
        case DriverState.ERROR | 
DriverState.FINISHED | DriverState.KILLED | DriverState.FAILED =>
          removeDriver(driverId, state, exception)
        case _ =>
          throw new Exception(s"Received
 unexpected state update for driver $driverId: $state")
      }
    }


def removeDriver(driverId: String, finalState: DriverState, exception: Option[Exception]) {
    //通過driverId找到對應的driver對象
    drivers.find(d => d.id == driverId) match {
      //Some是樣例類,當已經找到
    case Some(driver) =>
        logInfo(s"Removing driver: $driverId")
        //driver從內存緩存區中刪除
        drivers -= driver
        if (completedDrivers.size >= RETAINED_DRIVERS) {
          val toRemove = math.max(RETAINED_DRIVERS / 10, 1)
          completedDrivers.trimStart(toRemove)
        }
        //把driver加入已完成的completeDrivers列表中
        completedDrivers += driver
        //使用持久化引摯去年driver
        persistenceEngine.removeDriver(driver)
        //設定此driver狀態等
        driver.state = finalState
        driver.exception = exception
        //通過此driver找到對應的worker,移除所有worker中的此driver的信息
        driver.worker.foreach(w => w.removeDriver(driver))
        //調用 了schedule方法,調度
        schedule()
      case None =>
        logWarning(s"Asked to remove unknown driver: $driverId")
    }
  }

通過上面的方法,我們是不是看的很熟悉,這是我們在前面注冊Master時,要執行的步驟的反方向.

下面是Executor狀態改變時的源碼:代碼很簡單,我都寫了注解,這個與前面的Appliction注冊Master步驟的反方面很類似。

case ExecutorStateChanged(appId, execId, state, message, exitStatus) => {
      //通過executorID找到applicaiton里對應的executor信息
      val execOption = idToApp.get(appId).flatMap(app => app.executors.get(execId))
      execOption match {
        case Some(exec) => {
          val appInfo = idToApp(appId)
          exec.state = state
          if (state == ExecutorState.RUNNING) { appInfo.resetRetryCount() }
          //通知executor對應的applicaiton的Driver。更新狀態
          exec.application.driver ! ExecutorUpdated(execId, state, message, exitStatus)
          //判斷executor完成了
          if (ExecutorState.isFinished(state)) {
            // Remove this executor from the worker and app
            logInfo(s"Removing executor ${exec.fullId} because it is $state")
            //移除application中的executor的對應的信息
            appInfo.removeExecutor(exec)
            //刪除executor的worker中的executor的信息
            exec.worker.removeExecutor(exec)

            val normalExit = exitStatus == Some(0)
            // Only retry certain number of times so we don't go into an infinite loop.
            if (!normalExit) {
              //當這些executor是非正常退出 ,當重試次數小于10時,會在調度
              if (appInfo.incrementRetryCount() < ApplicationState.MAX_NUM_RETRY) {
                schedule()
              } else {
                //當大于10后,刪除executor后,
                val execs = appInfo.executors.values
                if (!execs.exists(_.state == ExecutorState.RUNNING)) {
                  logError(s"Application ${appInfo.desc.name} with ID ${appInfo.id} failed " +
                    s"${appInfo.retryCount} times; removing it")
    //同時把executor上的task對應的application也刪除了
                  removeApplication(appInfo, ApplicationState.FAILED)
                }
              }

后面的代碼簡單說明一下,當重試executor次數大于10后,刪除了execuor,當然在executor上運行的application也就不存在了.

文章每一個字都是作者寫出來的,看完若感覺有用,請點‘喜歡’

最后編輯于
?著作權歸作者所有,轉載或內容合作請聯系作者
平臺聲明:文章內容(如有圖片或視頻亦包括在內)由作者上傳并發布,文章內容僅代表作者本人觀點,簡書系信息發布平臺,僅提供信息存儲服務。

推薦閱讀更多精彩內容