hdfs delegation token 過期問題分析

什么是delegation token

delegation token其實就是hadoop里一種輕量級認證方法,作為kerberos認證的一種補充。理論上只使用kerberos來認證是足夠了,為什么hadoop還要自己開發一套使用delegation token的認證方式呢?這是因為如果在一個很大的分布式系統當中,如果每個節點訪問某個服務的時候都使用kerberos來作為認證方式,那么勢必對KDC造成很大的壓力,KDC就會成為一個系統的瓶頸。

與kerberos的區別

kerberos認證需三方參與,client, kdc, server三方協作完成認證。通常包括三個子過程:

  • client向kdc申請TGT(Ticket Granting Ticket),TGT包含client信息及client和KDC之間的 session key兩部分信息,并且使用KDC的master key加密
  • client使用TGT向KDC申請訪問某個服務的Ticket,Ticket包含client信息及client和server之間的session key兩部分信息,并且使用server的master key加密。
  • client使用Ticket訪問某個服務。

Delegation token的認證只需要兩方參與,client和server。在server端生成token并發送給client端。client使用該token訪問server,server對該token進行認證。
Delegation token可以傳遞給其它服務使用,這也是它叫delegation token的原因。比如在client端獲取到hdfs delegation token后,可以分發到Mapper端和Reducer端。這樣Map,Reduce就不用在通過Kerberos認證而直接使用該token訪問hdfs。同時,delegation token可以指定更新者(renewer),比如yarn,或者自己。token快要過期的時候需要更新,更新的時候只涉及更新者和server端。其它任何使用該token的人都不會受到影響。

delegation token 期限

delegation token有過期時間,需要定期刷新才能保證token有效。但是刷新次數不是無限的,也就是說每個token都有個最大生存時間,超過該時間,該token就失效。比如token每個24小時需要刷新一次,否則就失效。同時每個token最大生命值為7天,那么七天后該token就不能在被使用。

delegation 包含的內容

Token.java

  private byte[] identifier;
  private byte[] password;
  private Text kind;
  private Text service;
  private TokenRenewer renewer;

其中identifiertoken的標識,password用于server端認證該token。kind參數表示該token的類型,比如HDFS_DELEGATION_TOKEN,service表示該token訪問的服務,比如ha-hdfs:<nameservice>,renewer表示刷新者。
以上幾個部分是client拿到的token包含的內容。token的失效時間、owner、realuser等信息存放在server端。

delegation 生命周期

上圖展示了在yarn應用中,delegation token的生命流程。
1)client端首先通過Kerberos 認證方式訪問namenode,獲取DT(delegation token)
2)client向yarn提交應用,并且把DT傳給RM。同時會指定yarn作為該token的renewer。
3)rm選一個節點啟動Am,隨后AM向RM申請資源,將worker contaiern都啟動。這一步中DT都會分發到相應的container中。
4)所有的工作節點都使用DT去訪問hdfs
5)當工作結束后,RM釋放該DT。

delegation token過期應該怎么辦

delegation token會失效,集群默認配置是renew的間隔為一天,token最大生存時間為7天。對于像mapreduce這種批處理任務可能不會面臨token失效的問題,但對于spark streaming, storm等這種長時運行應用來說,不得不面臨一個問題:token存在最大生命周期。當token達到其最大生命周期的時候,比如七天,所有的工作節點(比如spark streaming的executor)中使用的token都會失效,此時在使用該token去訪問hdfs就會被namenode拒絕,導致應用異常退出。

一種解決思路是將keytab文件分發給Am及每個container,讓am和container去訪問kdc來認證,但這種方式會造成文章開頭所說的問題:對KDC造成很大的訪問壓力,導致KDC會誤認為自己遭受了DDos攻擊,從而影響程序性能。
另一種解決思路是先由client把keytab文件放到hdfs上。然后在Am中使用keytab登錄,并申請delegation token。Am在啟動worker的時候把該token分發給相應的容器。當token快要過期的時候,Am重新登錄一次,并重新獲取delegation token,并告知所有的worker使用更新后的token訪問服務。

spark中怎么解決delegation token過期問題

spark使用的就是第二種解決思路,接下來詳細分析下spark1.6是怎么解決token過期問題的。
spark 為了解決DT失效問題,加了兩個參數"--keytab"和"--principal",分別指定用于kerberos登錄的keytab文件和principal。spark中用于提交yarn任務的類為Client

org.apache.spark.deploy.yarn.Client

def submitApplication(): ApplicationId = {
   var appId: ApplicationId = null
   try {
     launcherBackend.connect()
     // Setup the credentials before doing anything else,
     // so we have don't have issues at any point.
     setupCredentials()
     yarnClient.init(yarnConf)
     yarnClient.start()

     logInfo("Requesting a new application from cluster with %d NodeManagers"
       .format(yarnClient.getYarnClusterMetrics.getNumNodeManagers))

     // Get a new application from our RM
     val newApp = yarnClient.createApplication()
     val newAppResponse = newApp.getNewApplicationResponse()
     appId = newAppResponse.getApplicationId()
     reportLauncherState(SparkAppHandle.State.SUBMITTED)
     launcherBackend.setAppId(appId.toString())

     // Verify whether the cluster has enough resources for our AM
     verifyClusterResources(newAppResponse)

     // Set up the appropriate contexts to launch our AM
     val containerContext = createContainerLaunchContext(newAppResponse)
     val appContext = createApplicationSubmissionContext(newApp, containerContext)

     // Finally, submit and monitor the application
     logInfo(s"Submitting application ${appId.getId} to ResourceManager")
     yarnClient.submitApplication(appContext)
     appId
   } catch {
     case e: Throwable =>
       if (appId != null) {
         cleanupStagingDir(appId)
       }
       throw e
   }

submitApplication是提交yarn任務的入口,該函數的最開始調用了setupCredentials函數用于設置Credential。

def setupCredentials(): Unit = {
    loginFromKeytab = args.principal != null || sparkConf.contains("spark.yarn.principal")
    ...
    // Defensive copy of the credentials
    credentials = new Credentials(UserGroupInformation.getCurrentUser.getCredentials)
  }

先判斷參數中是否包含“--principal”,如果包含,則成員變量logingFromKeytab置為true。這個成員變量用于后面的一些判斷。另外就是獲取當前ugi中包含的Credentials對象。
設置完credentials后,調用yarn的接口創建一個Application,并取得appid,然后需要創建containerLaunchContext,這是走的yarn程序的標準流程,進一步分析該函數

private def createContainerLaunchContext(newAppResponse: GetNewApplicationResponse)
    : ContainerLaunchContext = {
    logInfo("Setting up container launch context for our AM")
    val appId = newAppResponse.getApplicationId
    val appStagingDir = getAppStagingDir(appId)
    val pySparkArchives =
      if (sparkConf.getBoolean("spark.yarn.isPython", false)) {
        findPySparkArchives()
      } else {
        Nil
      }
    val launchEnv = setupLaunchEnv(appStagingDir, pySparkArchives)
    val localResources = prepareLocalResources(appStagingDir, pySparkArchives)

    // Set the environment variables to be passed on to the executors.
    distCacheMgr.setDistFilesEnv(launchEnv)
    distCacheMgr.setDistArchivesEnv(launchEnv)

    val amContainer = Records.newRecord(classOf[ContainerLaunchContext])
    amContainer.setLocalResources(localResources.asJava)
    amContainer.setEnvironment(launchEnv.asJava)
    ...
    setupSecurityToken(amContainer)

    amContainer
  }

這個函數很長,省略了一些代碼,主要是設置container啟動的命令行,環境變量,classpath之類的東西。在setupLaunchEnv函數里設置了后期在Am中用于通知executor的credentials文件

private def setupLaunchEnv(
      stagingDir: String,
      pySparkArchives: Seq[String]): HashMap[String, String] = {
    ...
    if (loginFromKeytab) {
      val remoteFs = FileSystem.get(hadoopConf)
      val stagingDirPath = new Path(remoteFs.getHomeDirectory, stagingDir)
      val credentialsFile = "credentials-" + UUID.randomUUID().toString
      sparkConf.set(
        "spark.yarn.credentials.file", new Path(stagingDirPath, credentialsFile).toString)
      logInfo(s"Credentials file set to: $credentialsFile")
      val renewalInterval = getTokenRenewalInterval(stagingDirPath)
      sparkConf.set("spark.yarn.token.renewal.interval", renewalInterval.toString)
    }

    ...
}

先判斷loginFromKeytab是否為true。在前面我們看到只要在命令行中使用了"--principal"參數,loginFromKeytab就為true,這里聲明了用于存放token的文件位置,默認為hdfs上位于/user/{user}/.sparkSgating/{appid}目錄下的以credentials開頭的文件。并把該文件位置放在sparkConf中,key為“spark.yarn.credentials.file”,這在后邊會用到。同時這里獲取了DT renew的間隔。也同樣放在sparkConf中,key為“spark.yarn.token.renewal.interval”。
設置完登錄環境后,進入到另一個函數prepareLocalResources,這個函數里邊有一個關鍵的步驟:獲取hdfs
delegation token

YarnSparkHadoopUtil.get.obtainTokensForNamenodes(nns, hadoopConf, credentials)
def obtainTokensForNamenodes(
    paths: Set[Path],
    conf: Configuration,
    creds: Credentials,
    renewer: Option[String] = None
  ): Unit = {
    if (UserGroupInformation.isSecurityEnabled()) {
      val delegTokenRenewer = renewer.getOrElse(getTokenRenewer(conf))
      paths.foreach { dst =>
        val dstFs = dst.getFileSystem(conf)
        logInfo("getting token for namenode: " + dst)
        dstFs.addDelegationTokens(delegTokenRenewer, creds)
      }
    }
  }

這個函數用于向namenode索取hdfs delegation token,并把該token添加到Credentials對象中。前面我們講過,credentiasl對象的初值為UserGroupInformation.getCurrentUser.getCredentials, 而ugi中默認是不包含hdfs delegation token的。因此通過該函數會吧hdfs delegation token添加到credentials中。
然后回到createContainerLaunchContext, 準備工作都做好后,創建了amContainer,并調用setupSecurityToken函數給amContainer設置剛剛獲取到的token。所以當Am起來后不需要通過kerberos認證,可以直接使用hdfs delegation token與namenode交互。

private def setupSecurityToken(amContainer: ContainerLaunchContext): Unit = {
    val dob = new DataOutputBuffer
    credentials.writeTokenStorageToStream(dob)
    amContainer.setTokens(ByteBuffer.wrap(dob.getData))
  }

至此,有關token的東西都準備好了,調用yarnClient.submitApplication(appContext)向yarn提交任務。yarn在收到請求后會先找一個機器啟動AmContainer。yarn啟動container的命令其實就是client傳給yarn的。大概就是“bin/java xxx org.apache.spark.deploy.yarn.ApplicationMaster xxx”。Am的入口為ApplicationMaster的run函數。

org.apache.spark.deploy.yarn.ApplicationMaster

final def run(): Int = {
    try {
      ...
      // If the credentials file config is present, we must periodically renew tokens. So create
      // a new AMDelegationTokenRenewer
      if (sparkConf.contains("spark.yarn.credentials.file")) {
        delegationTokenRenewerOption = Some(new AMDelegationTokenRenewer(sparkConf, yarnConf))
        // If a principal and keytab have been set, use that to create new credentials for executors
        // periodically
        delegationTokenRenewerOption.foreach(_.scheduleLoginFromKeytab())
      }

      if (isClusterMode) {
        runDriver(securityMgr)
      } else {
        runExecutorLauncher(securityMgr)
      }
    } catch {
      case e: Exception =>
        // catch everything else if not specifically handled
        logError("Uncaught exception: ", e)
        finish(FinalApplicationStatus.FAILED,
          ApplicationMaster.EXIT_UNCAUGHT_EXCEPTION,
          "Uncaught exception: " + e)
    }
    exitCode
  }

這里看到了一個熟悉的配置“spark.yarn.credentials.file”, 還記得我們之前講過該參數被設置為什么了嗎?就是Am用于保存token文件的位置。所以,如果spark submit啟動的時候傳遞了"--principal"參數,就會在sparkConf中生成一個“spark.yarn.credentials.file”配置,如果sparkConf中有“spark.yarn.credentials.file”配置,在AM中,也就是run函數中,會生成一個AMDelegationTokenRenewer對象。從名字也可以看出,這個對象就負責定期的更新token,將token寫入到一個hdfs文件,然后executor從該文件中獲取新的token從而防止token過期的作用了。

AMDelegationTokenRenewer

private[spark] def scheduleLoginFromKeytab(): Unit = {
    val principal = sparkConf.get("spark.yarn.principal")
    val keytab = sparkConf.get("spark.yarn.keytab")

    /**
     * Schedule re-login and creation of new tokens. If tokens have already expired, this method
     * will synchronously create new ones.
     */
    def scheduleRenewal(runnable: Runnable): Unit = {
      val credentials = UserGroupInformation.getCurrentUser.getCredentials
      val renewalInterval = hadoopUtil.getTimeFromNowToRenewal(sparkConf, 0.75, credentials)
      // Run now!
      if (renewalInterval <= 0) {
        logInfo("HDFS tokens have expired, creating new tokens now.")
        runnable.run()
      } else {
        logInfo(s"Scheduling login from keytab in $renewalInterval millis.")
        delegationTokenRenewer.schedule(runnable, renewalInterval, TimeUnit.MILLISECONDS)
      }
    }

    // This thread periodically runs on the driver to update the delegation tokens on HDFS.
    val driverTokenRenewerRunnable =
      new Runnable {
        override def run(): Unit = {
          try {
            writeNewTokensToHDFS(principal, keytab)
            cleanupOldFiles()
          } catch {
            case e: Exception =>
              // Log the error and try to write new tokens back in an hour
              logWarning("Failed to write out new credentials to HDFS, will try again in an " +
                "hour! If this happens too often tasks will fail.", e)
              delegationTokenRenewer.schedule(this, 1, TimeUnit.HOURS)
              return
          }
          scheduleRenewal(this)
        }
      }
    // Schedule update of credentials. This handles the case of updating the tokens right now
    // as well, since the renenwal interval will be 0, and the thread will get scheduled
    // immediately.
    scheduleRenewal(driverTokenRenewerRunnable)
  }

首先判斷token是否快要過期了,如果是,則調用writeNewTokensToHDFS函數獲取新的token,并寫到hdfs上。否則,生成一個調度任務再一段時間后重新判斷。

private def writeNewTokensToHDFS(principal: String, keytab: String): Unit = {
    // Keytab is copied by YARN to the working directory of the AM, so full path is
    // not needed.

    // HACK:
    // HDFS will not issue new delegation tokens, if the Credentials object
    // passed in already has tokens for that FS even if the tokens are expired (it really only
    // checks if there are tokens for the service, and not if they are valid). So the only real
    // way to get new tokens is to make sure a different Credentials object is used each time to
    // get new tokens and then the new tokens are copied over the the current user's Credentials.
    // So:
    // - we login as a different user and get the UGI
    // - use that UGI to get the tokens (see doAs block below)
    // - copy the tokens over to the current user's credentials (this will overwrite the tokens
    // in the current user's Credentials object for this FS).
    // The login to KDC happens each time new tokens are required, but this is rare enough to not
    // have to worry about (like once every day or so). This makes this code clearer than having
    // to login and then relogin every time (the HDFS API may not relogin since we don't use this
    // UGI directly for HDFS communication.
    logInfo(s"Attempting to login to KDC using principal: $principal")
    //1)重新登錄kdc
    val keytabLoggedInUGI = UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, keytab)
    logInfo("Successfully logged into KDC.")
    val tempCreds = keytabLoggedInUGI.getCredentials
    val credentialsPath = new Path(credentialsFile)
    val dst = credentialsPath.getParent
    //2)使用新的登錄身份信息向namenode拿hdfs delegation token,并添加到tempCreds中
    keytabLoggedInUGI.doAs(new PrivilegedExceptionAction[Void] {
      // Get a copy of the credentials
      override def run(): Void = {
        val nns = YarnSparkHadoopUtil.get.getNameNodesToAccess(sparkConf) + dst
        hadoopUtil.obtainTokensForNamenodes(nns, freshHadoopConf, tempCreds)
        null
      }
    })
    // Add the temp credentials back to the original ones.
   //3)將新獲取的token添加到當前登錄用戶中
    UserGroupInformation.getCurrentUser.addCredentials(tempCreds)
    val remoteFs = FileSystem.get(freshHadoopConf)
    // If lastCredentialsFileSuffix is 0, then the AM is either started or restarted. If the AM
    // was restarted, then the lastCredentialsFileSuffix might be > 0, so find the newest file
    // and update the lastCredentialsFileSuffix.
    if (lastCredentialsFileSuffix == 0) {
      hadoopUtil.listFilesSorted(
        remoteFs, credentialsPath.getParent,
        credentialsPath.getName, SparkHadoopUtil.SPARK_YARN_CREDS_TEMP_EXTENSION)
        .lastOption.foreach { status =>
        lastCredentialsFileSuffix = hadoopUtil.getSuffixForCredentialsPath(status.getPath)
      }
    }
    val nextSuffix = lastCredentialsFileSuffix + 1
    val tokenPathStr =
      credentialsFile + SparkHadoopUtil.SPARK_YARN_CREDS_COUNTER_DELIM + nextSuffix
    val tokenPath = new Path(tokenPathStr)
    val tempTokenPath = new Path(tokenPathStr + SparkHadoopUtil.SPARK_YARN_CREDS_TEMP_EXTENSION)
    logInfo("Writing out delegation tokens to " + tempTokenPath.toString)
    val credentials = UserGroupInformation.getCurrentUser.getCredentials
    //4)將credentials信息寫到目標文件中
    credentials.writeTokenStorageFile(tempTokenPath, freshHadoopConf)
    logInfo(s"Delegation Tokens written out successfully. Renaming file to $tokenPathStr")
    remoteFs.rename(tempTokenPath, tokenPath)
    logInfo("Delegation token file rename complete.")
    lastCredentialsFileSuffix = nextSuffix
  }

更新token的代碼都在這個函數里了。其實就包括以下幾個步驟
1)使用keytab和principal重新登錄kerberos,并獲取登錄的ugi信息:keytabLoggedInUGI,注意,這里僅僅是kerberos的一個認證過程,還并不涉及到hdfs delegation token的東西,即keytabLoggedInUGI 中并不包含token信息。loginUserFromKeytabAndReturnUGI函數回返回一個新的用戶對象,而不會影響當前登錄的用戶
2)獲取keytabLoggedInUGI 中的credentials對象,然后使用keytabLoggedInUGI 身份去向namenode獲取新的hdfs delegation token。并將token添加到一個臨時的credentials對象中
3)將臨時的credentials對象中的token添加到當前登錄的ugi中。此時Am中使用token已經被更新,所以Am不會出現token expired問題,但是還需要把token更新到executor中。
4)生成token的存放目錄,token存放目錄為/user/{user}/.sparkStaging/${appid}, token文件名格式為“credentials-UUID-suffix”, suffix為后綴,按文件個數遞增。token file默認保留五天。
至此,Am端token已經寫到hdfs文件了。接下來就是executor端怎么讀到最新的token文件,并把token更新到自己的ugi當中。
Executor啟動過程這里就不在分析,主要是在AM當中會生成executor的啟動信息及上下文,并發給NodeManager,由NodeManager來啟動executor的container。在spark中,最終代表executor的類為:CoarseGrainedExecutorBackend

CoarseGrainedExecutorBackend

private def run(
      driverUrl: String,
      executorId: String,
      hostname: String,
      cores: Int,
      appId: String,
      workerUrl: Option[String],
      userClassPath: Seq[URL]) {

     ...
      if (driverConf.contains("spark.yarn.credentials.file")) {
        logInfo("Will periodically update credentials from: " +
          driverConf.get("spark.yarn.credentials.file"))
        SparkHadoopUtil.get.startExecutorDelegationTokenRenewer(driverConf)
      }

      ...
    }
  }

這里也是先判斷在sparkConf中是否有"spark.yarn.credentials.file"配置,如果有,會生成一個ExecutorDelegationTokenUpdater對象,并調用其updateCredentialsIfRequired更新token

ExecutorDelegationTokenUpdater

try {
      val credentialsFilePath = new Path(credentialsFile)
      val remoteFs = FileSystem.get(freshHadoopConf)
      SparkHadoopUtil.get.listFilesSorted(
        remoteFs, credentialsFilePath.getParent,
        credentialsFilePath.getName, SparkHadoopUtil.SPARK_YARN_CREDS_TEMP_EXTENSION)
        .lastOption.foreach { credentialsStatus =>
        val suffix = SparkHadoopUtil.get.getSuffixForCredentialsPath(credentialsStatus.getPath)
        if (suffix > lastCredentialsFileSuffix) {
          logInfo("Reading new delegation tokens from " + credentialsStatus.getPath)
          val newCredentials = getCredentialsFromHDFSFile(remoteFs, credentialsStatus.getPath)
          lastCredentialsFileSuffix = suffix
          UserGroupInformation.getCurrentUser.addCredentials(newCredentials)
          logInfo("Tokens updated from credentials file.")
        } else {
          // Check every hour to see if new credentials arrived.
          logInfo("Updated delegation tokens were expected, but the driver has not updated the " +
            "tokens yet, will check again in an hour.")
          delegationTokenRenewer.schedule(executorUpdaterRunnable, 1, TimeUnit.HOURS)
          return
        }
      }
      val timeFromNowToRenewal =
        SparkHadoopUtil.get.getTimeFromNowToRenewal(
          sparkConf, 0.8, UserGroupInformation.getCurrentUser.getCredentials)
      if (timeFromNowToRenewal <= 0) {
        // We just checked for new credentials but none were there, wait a minute and retry.
        // This handles the shutdown case where the staging directory may have been removed(see
        // SPARK-12316 for more details).
        delegationTokenRenewer.schedule(executorUpdaterRunnable, 1, TimeUnit.MINUTES)
      } else {
        logInfo(s"Scheduling token refresh from HDFS in $timeFromNowToRenewal millis.")
        delegationTokenRenewer.schedule(
          executorUpdaterRunnable, timeFromNowToRenewal, TimeUnit.MILLISECONDS)
      }
    } catch {
      // Since the file may get deleted while we are reading it, catch the Exception and come
      // back in an hour to try again
      case NonFatal(e) =>
        logWarning("Error while trying to update credentials, will try again in 1 hour", e)
        delegationTokenRenewer.schedule(executorUpdaterRunnable, 1, TimeUnit.HOURS)
    }

這個函數就是executor端更新token的整個過程,包括幾個步驟
1)獲取hdfs上保存credentials目錄下的最近更新的文件,并取出其suffix,與當前程序中保存的lastCredentialsFileSuffix 比較,如果比lastCredentialsFileSuffix 大,則表示AM端更新了token,需要重新讀取token并更新。
2)如果Am端還沒更新,則過一小時重試
3)token更新完后,會再次判斷下次待更新的時間,并生成一個調度任務,到期執行更新操作。
在executor端更新其實就是把hdfs上的credentials文件讀取出來,使用 UserGroupInformation.getCurrentUser.addCredentials(newCredentials)函數對當前的ugi添加新的token信息就可以了。
至此,spark上解決hdfs delegation token過期問題就分析完了。整個過程類似與下面這張圖:

總結下來就是在Am端更新token信息,并把更新后的token寫到hdfs,在executor端讀取hdfs上更新的token,并更新到自己的ugi當中。按理說這樣能解決token過期的問題了,但是用過spark streaming的同學可能會遇到一個奇怪的問題,即使在提交任務的時候帶上了"--principal"參數,還是會遇到hdfs delegation token 過期的問題,那又是怎么一會事呢?下面繼續分析hdfs的一個bug。

hdfs delegation token bug

如上文分析,Spark在Am端會去更新token,因此理論上來講應該不會出現token過期問題了,但在我們使用過程中還是會出現token過期的情況,網上查了后說是hfds上的一個bug導致:hdfs-9276

https://issues.apache.org/jira/browse/HDFS-9276

理解這個bug要先知道一個概念,即Token的service字段是client從server端獲取token后添加的,client用于區分不同服務的token,在server端根本沒有service字段的概念。客戶端通過FileSystem.addDelegationTokens函數向namenode申請hdfs delegation token。當從server端申請到token后,會給token設置service字段:

DFSClient.java

public Token<DelegationTokenIdentifier> getDelegationToken(Text renewer)
      throws IOException {
    assert dtService != null;
    Token<DelegationTokenIdentifier> token =
      namenode.getDelegationToken(renewer);

    if (token != null) {
      token.setService(this.dtService);
      LOG.info("Created " + DelegationTokenIdentifier.stringifyToken(token));
    } else {
      LOG.info("Cannot get delegation token from " + renewer);
    }
    return token;

  }

這里將service字段設置為dtService。在HA這種情況下,客戶端使用nameservice訪問hdfs,所以dtService的值為:ha-hdfs:<nameservice>。
這個service我們暫且稱之為logicService。但是,client必須使用IP:PORT訪問server。當client確定active的namenode后,怎么確定使用哪個token來和server端認證呢?之前講過token的service字段用于區分不同的server,但是該字段里并不包含具體的ip和端口。為了解決這個問題,其實每次在new 一個DFSClient實例時,會把token拷貝兩份,并把里面的service字段替換成具體的ip和端口:

HAUtil.java

public static void cloneDelegationTokenForLogicalUri(
      UserGroupInformation ugi, URI haUri,
      Collection<InetSocketAddress> nnAddrs) {
    // this cloning logic is only used by hdfs
    Text haService = HAUtil.buildTokenServiceForLogicalUri(haUri,
        HdfsConstants.HDFS_URI_SCHEME);
    Token<DelegationTokenIdentifier> haToken =
        tokenSelector.selectToken(haService, ugi.getTokens());
    if (haToken != null) {
      for (InetSocketAddress singleNNAddr : nnAddrs) {
        // this is a minor hack to prevent physical HA tokens from being
        // exposed to the user via UGI.getCredentials(), otherwise these
        // cloned tokens may be inadvertently propagated to jobs
        Token<DelegationTokenIdentifier> specificToken =
            new Token.PrivateToken<DelegationTokenIdentifier>(haToken);
        SecurityUtil.setTokenService(specificToken, singleNNAddr);
        Text alias = new Text(
            buildTokenServicePrefixForLogicalUri(HdfsConstants.HDFS_URI_SCHEME)
                + "http://" + specificToken.getService());
        ugi.addToken(alias, specificToken);
        LOG.debug("Mapped HA service delegation token for logical URI " +
            haUri + " to namenode " + singleNNAddr);
      }
    } else {
      LOG.debug("No HA service delegation token found for logical URI " +
          haUri);
    }
  }

這樣一來,在client端,每一個token其實就有三個拷貝,分別為一個HA token,和兩個對應到具體namenode的namenode token。于是,client想和哪個namenode通信就能選擇到相應的token了。
那hdfs-9276這個bug就很明顯了,意思是當用戶使用UserGroupInformation.getCurrentUser().addCredentials(credentials)方法更新token時,只能更新HA token,并不能更新兩個namenode token。所以當client使用namenode 的ip和port選擇到某個namenode token時,該token其實還是老的token,并沒有被更新,因此使用該token去訪問server端,就會被server拒絕,并提示token過期異常。
所以9276的patch把這個問題解決了(代碼就不分析,感興趣的可以自己去看下),當用戶addCredentials的時候,會把HA token對應的兩個namenode token也更新。細心的讀者應該發現,當每次new一個DFSClient實例的時候,內部就會把HA token拷貝兩份,生成新的兩個namenode token,因此如果每次都new 一個DFSClient是可以繞過9276描述的問題的。
其實spark也是嘗試這么做的,回到spark 在excutor端更新token的過程:

ExecutorDelegationTokenUpdater

try {
      val credentialsFilePath = new Path(credentialsFile)
      //此處獲取了一個新的FileSystem對象,但此時ugi中的HA token還沒有被更新
      val remoteFs = FileSystem.get(freshHadoopConf)
      SparkHadoopUtil.get.listFilesSorted(
        remoteFs, credentialsFilePath.getParent,
        credentialsFilePath.getName, SparkHadoopUtil.SPARK_YARN_CREDS_TEMP_EXTENSION)
        .lastOption.foreach { credentialsStatus =>
        val suffix = SparkHadoopUtil.get.getSuffixForCredentialsPath(credentialsStatus.getPath)
        if (suffix > lastCredentialsFileSuffix) {
          logInfo("Reading new delegation tokens from " + credentialsStatus.getPath)
          //從hdfs中讀取HA token
          val newCredentials = getCredentialsFromHDFSFile(remoteFs, credentialsStatus.getPath)
          lastCredentialsFileSuffix = suffix
          //把新的HA token更新到ugi當中
          UserGroupInformation.getCurrentUser.addCredentials(newCredentials)
          logInfo("Tokens updated from credentials file.")
        } else {
          // Check every hour to see if new credentials arrived.
          logInfo("Updated delegation tokens were expected, but the driver has not updated the " +
            "tokens yet, will check again in an hour.")
          delegationTokenRenewer.schedule(executorUpdaterRunnable, 1, TimeUnit.HOURS)
          return
        }
      }
      val timeFromNowToRenewal =
        SparkHadoopUtil.get.getTimeFromNowToRenewal(
          sparkConf, 0.8, UserGroupInformation.getCurrentUser.getCredentials)
      if (timeFromNowToRenewal <= 0) {
        // We just checked for new credentials but none were there, wait a minute and retry.
        // This handles the shutdown case where the staging directory may have been removed(see
        // SPARK-12316 for more details).
        delegationTokenRenewer.schedule(executorUpdaterRunnable, 1, TimeUnit.MINUTES)
      } else {
      //更新完token后,馬上調度了下一個更新任務,而這個任務要在更新的token快要過期時才會執行
        logInfo(s"Scheduling token refresh from HDFS in $timeFromNowToRenewal millis.")
        delegationTokenRenewer.schedule(
          executorUpdaterRunnable, timeFromNowToRenewal, TimeUnit.MILLISECONDS)
      }
    } catch {
      // Since the file may get deleted while we are reading it, catch the Exception and come
      // back in an hour to try again
      case NonFatal(e) =>
        logWarning("Error while trying to update credentials, will try again in 1 hour", e)
        delegationTokenRenewer.schedule(executorUpdaterRunnable, 1, TimeUnit.HOURS)
    }

在函數的開始獲使用FileSystem.get(freshHadoopConf)獲取remoteFs對象,其中freshHadoopConf的"fs.hdfs.impl.disable.cache"設置為true,表示新生產一個FileSystem對象。這里其實很明顯就是想讓繞過9276 bug。但是很可惜,用的地方不對。在新生產這個對象的時候,ugi中保存的token其實還并沒有被更新。隨后讀取hdfs中新的token,并更新到ugi當中。然后便調度下一個任務了。可以看到在更新token后,沒有在new 一個FileSystem,所以ugi中的namenode token就得不到更新,因此還是會出現token過期問題。

hdfs delegation token過期問題分析到此結束。

最后編輯于
?著作權歸作者所有,轉載或內容合作請聯系作者
平臺聲明:文章內容(如有圖片或視頻亦包括在內)由作者上傳并發布,文章內容僅代表作者本人觀點,簡書系信息發布平臺,僅提供信息存儲服務。

推薦閱讀更多精彩內容