事情起因是一位同事寫的SparkStreaming程序，總會(huì)出現(xiàn)部分Executor上請(qǐng)求broadcast不成功的錯(cuò)誤，鑒于此，我專門走讀了一下broadcast的相關(guān)代碼，嘗試找到原因

主要計(jì)算流程是，一個(gè)獨(dú)立的線程在dirver端掃描HDFS，如果配置文件修改了，那就讀入并創(chuàng)建broadcast，executor使用該broadcast處理接下來的流處理請(qǐng)求。類似于ip黑名單，但是黑名單是變化的，每隔一段時(shí)間需要生成新的廣播變量。

1 Broadcast簡(jiǎn)介

broadcast-廣播變量，常用于MapJoin及一些配置文件的全局傳遞，使用方式很簡(jiǎn)單：

val blackIp=Set(ip1,ip2...)
#sc.broadcast創(chuàng)建廣播變量
val blackIpBC=sc.broadcast(blackIp) 
# 廣播變量.value在task內(nèi)獲取廣播變量的實(shí)際內(nèi)容
rdd.filter(row=>!blackIpBC.value.contains(row.ip))

1.1 廣播變量的優(yōu)勢(shì)

為什么不直接使用blackIp，非要包裝一層廣播變量呢？

事實(shí)上，廣播變量在使用的時(shí)候，是被拉取到Executor上的BlockManager中，只需要第一個(gè)task使用的時(shí)候拉取一次，之后其他task使用就可以復(fù)用blockManager中的變量，不需要重新拉取，也不需要在task中保存這個(gè)數(shù)據(jù)結(jié)構(gòu)。

另外,廣播變量在拉取的時(shí)候是基于Torrent協(xié)議的，即executor可以從其他executor上拉取該廣播變量。如果不使用廣播變量，那么所有請(qǐng)求都需要從driver進(jìn)行，數(shù)據(jù)量大的時(shí)候，driver會(huì)表示很有壓力。

說到Torrent協(xié)議，其實(shí)很多下載器以及媒體播放器都是基于Torrent的，比如經(jīng)常能看到后臺(tái)的迅雷或者騰訊視頻、愛奇藝客戶端在上傳數(shù)據(jù)，實(shí)際上這個(gè)時(shí)候我們的電腦也相當(dāng)于一個(gè)中間server，給其他用戶傳資源呢。如果你的電腦性能不行或網(wǎng)絡(luò)比較爛，記得要手動(dòng)限速一下。

2 廣播變量的創(chuàng)建過程

2.1 driver端做了什么？

sc.braodcast(value) 在driver端做了哪些操作？能確保executor端能訪問到這個(gè)變量呢？

2.1.1 SparkContext的broadcast方法

SparkContext.broadcast代碼如下：

  /**
   * Broadcast a read-only variable to the cluster, returning a
   * [[org.apache.spark.broadcast.Broadcast]] object for reading it in distributed functions.
   * The variable will be sent to each cluster only once.
   *
   * @param value value to broadcast to the Spark nodes
   * @return `Broadcast` object, a read-only variable cached on each machine
   */
  def broadcast[T: ClassTag](value: T): Broadcast[T] = {
    assertNotStopped()
    require(!classOf[RDD[_]].isAssignableFrom(classTag[T].runtimeClass),
      "Can not directly broadcast RDDs; instead, call collect() and broadcast the result.")
    val bc = env.broadcastManager.newBroadcast[T](value, isLocal)
    val callSite = getCallSite
    logInfo("Created broadcast " + bc.id + " from " + callSite.shortForm)
    cleaner.foreach(_.registerBroadcastForCleanup(bc))
    bc
  }

首先是不允許RDD作為廣播變量
調(diào)用BroadcastManger.newBroadcast創(chuàng)建廣播變量
getCallSite是注冊(cè)相關(guān)堆棧信息，用于做跟蹤，和具體邏輯沒啥關(guān)系，不需要關(guān)注
cleaner.foreach這個(gè)部分很重要，注冊(cè)了一個(gè)需要回收的句柄

2.1.2 BroadcastManager類

BroadcastManager是在SparkEnv中初始化的

private[spark] class BroadcastManager(val isDriver: Boolean,conf: SparkConf,securityManager: SecurityManager)
  extends Logging {

  private var initialized = false
  private var broadcastFactory: BroadcastFactory = null

  initialize()

  // Called by SparkContext or Executor before using Broadcast
  private def initialize(): Unit = {synchronized {
      if (!initialized) {
        broadcastFactory = new TorrentBroadcastFactory
        broadcastFactory.initialize(isDriver, conf, securityManager)
        initialized = true
      } }}

  def stop(): Unit = {broadcastFactory.stop()}

  private val nextBroadcastId = new AtomicLong(0)

  private[broadcast] val cachedValues = Collections.synchronizedMap(new ReferenceMap(AbstractReferenceMap.HARD, AbstractReferenceMap.WEAK)            .asInstanceOf[java.util.Map[Any, Any]])

  def newBroadcast[T: ClassTag](value_ : T, isLocal: Boolean): Broadcast[T] = {
    val bid = nextBroadcastId.getAndIncrement()
    value_ match {
      case pb: PythonBroadcast => pb.setBroadcastId(bid)
      case _ => // do nothing
    }
    broadcastFactory.newBroadcast[T](value_, isLocal, bid)
  }

  def unbroadcast(id: Long, removeFromDriver: Boolean, blocking: Boolean): Unit = {
    broadcastFactory.unbroadcast(id, removeFromDriver, blocking)
  }
}

構(gòu)造方法

BroadcastManager在構(gòu)造時(shí)有三個(gè)參數(shù)，分別是isDriver（是否為Driver節(jié)點(diǎn)）、conf（對(duì)應(yīng)的SparkConf配置）、securityManager（對(duì)應(yīng)的SecurityManager

屬性成員

BroadcastManager內(nèi)有四個(gè)屬性成員：

initialized表示BroadcastManager是否已經(jīng)初始化完成。
broadcastFactory持有廣播工廠的實(shí)例（類型是BroadcastFactory特征的實(shí)現(xiàn)類）。
nextBroadcastId表示下一個(gè)廣播變量的唯一標(biāo)識(shí)（AtomicLong類型的）。
cachedValues用來緩存已廣播出去的變量。它屬于ReferenceMap類型，是apache-commons提供的一個(gè)弱引用映射數(shù)據(jù)結(jié)構(gòu)。與我們常見的各種Map不同，它的鍵值對(duì)有可能會(huì)在GC過程中被回收。

對(duì)方提供的方法

提供了兩方法，最終都是調(diào)用BroadcastFactory的同名方法。

newBroadcast方法：創(chuàng)建廣播變量
unbroadcast方法：注銷廣播變量

實(shí)際上，BroadcastFactory是一個(gè)trait，只有TorrentBroadcastFactory一個(gè)實(shí)現(xiàn)類。

2.1.3 TorrentBroadcastFactory類

private[spark] class TorrentBroadcastFactory extends BroadcastFactory {

  override def initialize(isDriver: Boolean, conf: SparkConf,
      securityMgr: SecurityManager): Unit = { }

  override def newBroadcast[T: ClassTag](value_ : T, isLocal: Boolean, id: Long): Broadcast[T] = {
    new TorrentBroadcast[T](value_, id)
  }

  override def stop(): Unit = { }

  /**
   * Remove all persisted state associated with the torrent broadcast with the given ID.
   * @param removeFromDriver Whether to remove state from the driver.
   * @param blocking Whether to block until unbroadcasted
   */
  override def unbroadcast(id: Long, removeFromDriver: Boolean, blocking: Boolean): Unit = {
    TorrentBroadcast.unpersist(id, removeFromDriver, blocking)
  }
}

創(chuàng)建廣播變量就是初始化了一個(gè)TorrentBroadcast對(duì)象，并且isLocal這個(gè)變量是沒有被使用的，它代表了--master 是否是local的選項(xiàng)；

卸載廣播變量直接調(diào)用了TorrentBroadcast.unpersist方法；

stop什么都不做。

2.1.4 TorrentBroadcast

繼承于Broadcast，這個(gè)類代碼很多，我們分開來說，

屬性成員

  /**
   * 這是個(gè)軟連接，方便之后垃圾回收同步刪除
   */
  @transient private var _value: SoftReference[T] = _
  @transient private var compressionCodec: Option[CompressionCodec] = _
  @transient private var blockSize: Int = _

  private def setConf(conf: SparkConf): Unit = {
    compressionCodec = if (conf.get(config.BROADCAST_COMPRESS)) {
      Some(CompressionCodec.createCodec(conf))
    } else {
      None
    }
    // Note: use getSizeAsKb (not bytes) to maintain compatibility if no units are provided
    blockSize = conf.get(config.BROADCAST_BLOCKSIZE).toInt * 1024
    checksumEnabled = conf.get(config.BROADCAST_CHECKSUM)
  }
  setConf(SparkEnv.get.conf)

  private val broadcastId = BroadcastBlockId(id)

  /** Total number of blocks this broadcast variable contains. */
  private val numBlocks: Int = writeBlocks(obj)

  /** Whether to generate checksum for blocks or not. */
  private var checksumEnabled: Boolean = false
  /** The checksum for all the blocks. */
  private var checksums: Array[Int] = _

_value：廣播塊的具體數(shù)據(jù)。調(diào)用readBroadcastBlock()方法獲取數(shù)據(jù)進(jìn)行數(shù)據(jù)拉取，在driver端，如果需要訪問這個(gè)值，需要通過懶加載方式讀取blockManager。另外_value是一個(gè)軟連接，方便之后在GC同時(shí)進(jìn)行回收
compressionCodec：廣播塊的壓縮編解碼邏輯。當(dāng)配置項(xiàng)spark.broadcast.compress為true時(shí)，會(huì)啟用壓縮。
blockSize：廣播塊的大小。由spark.broadcast.blockSize配置項(xiàng)來控制，默認(rèn)值4MB。
broadcastId：廣播變量的ID。BroadcastBlockId是個(gè)結(jié)構(gòu)非常簡(jiǎn)單的case class，每產(chǎn)生一個(gè)新的廣播變量就會(huì)自增。
numBlocks：該廣播變量包含的塊數(shù)量。此在TorrentBroadcast構(gòu)造時(shí)就會(huì)直接調(diào)用writeBlocks()方法。
checksumEnabled：是否允許對(duì)廣播塊計(jì)算校驗(yàn)值，由spark.broadcast.checksum配置項(xiàng)控制，默認(rèn)值true。
checksums：廣播塊的校驗(yàn)值。

另外還調(diào)用了setConf方法進(jìn)行部分變量的初始化；writeBlocks(obj)進(jìn)行了實(shí)際的數(shù)據(jù)寫入。

writeBlocks方法

  private def writeBlocks(value: T): Int = {
    import StorageLevel._
    // Store a copy of the broadcast variable in the driver so that tasks run on the driver
    // do not create a duplicate copy of the broadcast variable's value.
    val blockManager = SparkEnv.get.blockManager
    if (!blockManager.putSingle(broadcastId, value, MEMORY_AND_DISK, tellMaster = false)) {
      throw new SparkException(s"Failed to store $broadcastId in BlockManager")
    }
    val blocks =
      TorrentBroadcast.blockifyObject(value, blockSize, SparkEnv.get.serializer, compressionCodec)
    if (checksumEnabled) {
      checksums = new Array[Int](blocks.length)
    }
    blocks.zipWithIndex.foreach { case (block, i) =>
      if (checksumEnabled) checksums(i) = calcChecksum(block)
      val pieceId = BroadcastBlockId(id, "piece" + i)
      val bytes = new ChunkedByteBuffer(block.duplicate())
      if (!blockManager.putBytes(pieceId, bytes, MEMORY_AND_DISK_SER, tellMaster = true)) {
        throw new SparkException(s"Failed to store $pieceId of $broadcastId in local BlockManager")
      }
    }
    blocks.length
  }

調(diào)用blockManager.putSingle方法，將變量作為一個(gè)獨(dú)立對(duì)象寫入到BlockManager中，putSingle方法這里不做贅述，并且使用MEMORY_AND_DISK的方式，占用Storage部分內(nèi)存，如果內(nèi)存不足，會(huì)進(jìn)行寫磁盤操作。
調(diào)用blockifyObject()方法將廣播數(shù)據(jù)轉(zhuǎn)化為塊，即Spark存儲(chǔ)的基本單元。使用的序列化器為SparkEnv中初始化的JavaSerializer。
如果校驗(yàn)值開關(guān)有效，就用calcChecksum()方法為每個(gè)塊計(jì)算校驗(yàn)值。
為廣播數(shù)據(jù)切分成的每個(gè)塊（稱為piece）都生成一個(gè)帶"piece"的廣播ID，調(diào)用BlockManager.putBytes()方法將各個(gè)塊以MEMORY_AND_DISK_SER模式序列化保存到BlockManager中。
最終返回塊的計(jì)數(shù)值。

上述流程就是在driver端進(jìn)行廣播變量的創(chuàng)建過程，需要注意的是，廣播變量被存儲(chǔ)了兩次，一次是Memory+Disk作為單個(gè)Java對(duì)象存儲(chǔ)，一次是切分塊后Memory+Disk并且序列化作為二進(jìn)制存儲(chǔ)。

2.2 executor端做了什么

在調(diào)用sc.broadcast之后，會(huì)返回一個(gè)Broadcast對(duì)象，之后在rdd算子內(nèi)調(diào)用broadcast對(duì)象.value就可以拿到這個(gè)值，具體發(fā)生了什么呢

2.2.1 Broadcast的value方法

value方法調(diào)用了assertValid，先確保該broadcast還沒有被卸載掉

 @volatile private var _isValid = true
def value: T = {
  assertValid()
  getValue()
}
/** Check if this broadcast is valid. If not valid, exception is thrown. */
protected def assertValid(): Unit = {
  if (!_isValid) throw new SparkException("Attempted to use %s after it was destroyed (%s) ".format(toString, _destroySite))
}

getValue是一個(gè)抽象方法，在TorrentBroadcast中做了具體實(shí)現(xiàn)。

2.2.2 TorrentBroadcast的getValue方法

override protected def getValue() = synchronized {
  val memoized: T = if (_value == null) null.asInstanceOf[T] else _value.get
  if (memoized != null) {
    memoized
  } else {
    val newlyRead = readBroadcastBlock()
    _value = new SoftReference[T](newlyRead)
    newlyRead
  }
}

如果_value不存在，則說明executor還沒有讀取過這個(gè)廣播變量，那么調(diào)用readBroadcastBlock讀取數(shù)據(jù)，同時(shí)為_value創(chuàng)建軟連接，指向讀取過來的廣播變量。

2.2.3 TorrentBroadcast的readBroadcastBlock方法

  private def readBroadcastBlock(): T = Utils.tryOrIOException {
    TorrentBroadcast.torrentBroadcastLock.withLock(broadcastId) {
      // As we only lock based on `broadcastId`, whenever using `broadcastCache`, we should only
      // touch `broadcastId`.
      val broadcastCache = SparkEnv.get.broadcastManager.cachedValues

      Option(broadcastCache.get(broadcastId)).map(_.asInstanceOf[T]).getOrElse {
        setConf(SparkEnv.get.conf)
        val blockManager = SparkEnv.get.blockManager
        blockManager.getLocalValues(broadcastId) match {
          case Some(blockResult) =>
            if (blockResult.data.hasNext) {
              val x = blockResult.data.next().asInstanceOf[T]
              releaseBlockManagerLock(broadcastId)
              if (x != null) broadcastCache.put(broadcastId, x)
              x
            } else {
              throw new SparkException(s"Failed to get locally stored broadcast data: $broadcastId")
            }
          case None =>
            val estimatedTotalSize = Utils.bytesToString(numBlocks * blockSize)
            logInfo(s"Started reading broadcast variable $id with $numBlocks pieces (estimated total size $estimatedTotalSize)")
            val startTimeNs = System.nanoTime()
            val blocks = readBlocks()
            logInfo(s"Reading broadcast variable $id took ${Utils.getUsedTimeNs(startTimeNs)}")
            try {
              val obj = TorrentBroadcast.unBlockifyObject[T](
                blocks.map(_.toInputStream()), SparkEnv.get.serializer, compressionCodec)
              // Store the merged copy in BlockManager so other tasks on this executor don't need to re-fetch it.
              val storageLevel = StorageLevel.MEMORY_AND_DISK
              if (!blockManager.putSingle(broadcastId, obj, storageLevel, tellMaster = false)) {
                throw new SparkException(s"Failed to store $broadcastId in BlockManager")
              }
              if (obj != null)broadcastCache.put(broadcastId, obj)
              obj
            } finally {
              blocks.foreach(_.dispose())
            }
        }
      }
    }
  }

獲取BlockManager實(shí)例，調(diào)用其getLocalValues()方法將之前寫入的廣播數(shù)據(jù)對(duì)象取出。
如果能夠直接取得廣播數(shù)據(jù)，就調(diào)用releaseBlockManagerLock()方法【實(shí)際上對(duì)應(yīng)BlockManager.releaseLock()，又對(duì)應(yīng)Object.notifyAll()】解開當(dāng)前塊的鎖。這個(gè)鎖用來保證塊讀寫的互斥性。
如果不能直接取得廣播數(shù)據(jù)，說明數(shù)據(jù)都已經(jīng)序列化，并且有可能不在本地存儲(chǔ)。此時(shí)調(diào)用readBlocks()方法從本地和遠(yuǎn)端同時(shí)獲取塊，然后調(diào)用unBlockifyObject()方法將塊轉(zhuǎn)換回廣播數(shù)據(jù)的對(duì)象。
再次調(diào)用BlockManager.putSingle()方法將廣播數(shù)據(jù)作為單個(gè)對(duì)象寫入本地存儲(chǔ)，再將其加入廣播緩存Map中，下次讀取時(shí)就不用大費(fèi)周章了。

2.2.4 TorrentBroadcast的readBlocks方法

  private def readBlocks(): Array[BlockData] = {
    val blocks = new Array[BlockData](numBlocks)
    val bm = SparkEnv.get.blockManager

    for (pid <- Random.shuffle(Seq.range(0, numBlocks))) {
      val pieceId = BroadcastBlockId(id, "piece" + pid)
      logDebug(s"Reading piece $pieceId of $broadcastId")
      bm.getLocalBytes(pieceId) match {
        case Some(block) =>
          blocks(pid) = block
          releaseBlockManagerLock(pieceId)
        case None =>
          bm.getRemoteBytes(pieceId) match {
            case Some(b) =>
              if (checksumEnabled) {
                val sum = calcChecksum(b.chunks(0))
                if (sum != checksums(pid)) {
                  throw new SparkException(s"corrupt remote block $pieceId of $broadcastId:" +
                    s" $sum != ${checksums(pid)}")
                }
              }
              // We found the block from remote executors/driver's BlockManager, so put the block
              // in this executor's BlockManager.
              if (!bm.putBytes(pieceId, b, StorageLevel.MEMORY_AND_DISK_SER, tellMaster = true)) {
                throw new SparkException(
                  s"Failed to store $pieceId of $broadcastId in local BlockManager")
              }
              blocks(pid) = new ByteBufferBlockData(b, true)
            case None =>
              throw new SparkException(s"Failed to get $pieceId of $broadcastId")
          }
      }
    }
    blocks
  }

該方法會(huì)首先對(duì)所有廣播數(shù)據(jù)的piece進(jìn)行打散，然后對(duì)打散之后的每個(gè)piece執(zhí)行以下步驟：

調(diào)用BlockManager.getLocalBytes()方法，從本地獲取序列化的廣播數(shù)據(jù)塊。將獲取到的塊放入對(duì)應(yīng)下標(biāo)的位置，并釋放該塊的鎖。
如果本地沒有廣播數(shù)據(jù)，就調(diào)用BlockManager.getRemoteBytes()方法從遠(yuǎn)端（其他Executor或者Driver）獲取廣播數(shù)據(jù)塊。
對(duì)遠(yuǎn)程獲取的塊計(jì)算校驗(yàn)值，并與之前寫入時(shí)計(jì)算的校驗(yàn)值比對(duì)。如果不同，說明傳輸發(fā)生錯(cuò)誤，拋異常出去。
若一切正常，調(diào)用BlockManager.putBytes()方法，將各個(gè)塊寫入MemoryStore（內(nèi)存）或DiskStore（磁盤），并將其放入對(duì)應(yīng)下標(biāo)的位置。最終返回所有讀取的塊。

3 廣播變量的清理

廣播變量什么時(shí)候可以做清理呢？driver端和executor端的清理機(jī)制有什么不一樣的地方呢？

3.1 主動(dòng)清理

通過調(diào)用upersist方法即可手動(dòng)清理

廣播變量.unpersist() #只產(chǎn)出executor上的廣播變量
廣播變量.doDestroy() #同時(shí)刪除driver和executor的廣播變量

注意，目前unpersist的具體實(shí)現(xiàn)在TorrentBroadcast中，只能清理掉executor端的廣播變量。

如果想清理掉driver端的廣播變量，需要調(diào)用doDestroy方法。

3.1.1 Broadcast的unpersist方法

在Broadcast類中，有兩個(gè)重載的unpersist方法，blocking代表是否在unpersist中加鎖，直到unpersist完成，相當(dāng)于異步執(zhí)行還是同步執(zhí)行，默認(rèn)blocking是false，相當(dāng)于異步。

def unpersist(): Unit = {
  unpersist(blocking = false)
}

def unpersist(blocking: Boolean): Unit = {
  assertValid()
  doUnpersist(blocking)
}

protected def doUnpersist(blocking: Boolean): Unit

最終調(diào)用了doUnpersist方法，是一個(gè)抽象方法，目前只有TorrentBroadcast中有具體實(shí)現(xiàn)。

3.1.2 TorrentBrodcast的doUnpersist方法

override protected def doUnpersist(blocking: Boolean): Unit = {
  TorrentBroadcast.unpersist(id, removeFromDriver = false, blocking)
}

override protected def doDestroy(blocking: Boolean): Unit = {
TorrentBroadcast.unpersist(id, removeFromDriver = true, blocking)
}

def unpersist(id: Long, removeFromDriver: Boolean, blocking: Boolean): Unit = {
  logDebug(s"Unpersisting TorrentBroadcast $id")
  SparkEnv.get.blockManager.master.removeBroadcast(id, removeFromDriver, blocking)
}

doUnpersist方法用于清理executor端的廣播變量，doDestroy方法用于清理driver端和executor端的廣播變量，他們都調(diào)用了unpersist方法，unpersist方法是實(shí)際做清理的部分，它有三個(gè)參數(shù)：

id ：廣播變量id
removeFromDriver ：是否清理掉driver端的廣播變量
blocking ：是否采用同步機(jī)制加鎖清理

進(jìn)一步調(diào)用了BlockManagerMaster類(SparkEnv.get.blockManager.master)的removeBroadcast方法。

3.1.3 BlockManagerMaster的removeBroadcast方法

具體的做法就是driver端先向BlockManagerMaster（在Driver端）發(fā)送一條rpc請(qǐng)求，請(qǐng)求刪除指定broadcast的消息，BlockManagerMaster再向所有BlockManagerSlave（在Executor端）發(fā)送刪除broadcast的請(qǐng)求，中間一共有兩次RPC請(qǐng)求，。

/** Remove all blocks belonging to the given broadcast. */
def removeBroadcast(broadcastId: Long, removeFromMaster: Boolean, blocking: Boolean): Unit = {
  val future = driverEndpoint.askSync[Future[Seq[Int]]](
    RemoveBroadcast(broadcastId, removeFromMaster))
  future.failed.foreach(e =>
    logWarning(s"Failed to remove broadcast $broadcastId" +
      s" with removeFromMaster = $removeFromMaster - ${e.getMessage}", e)
  )(ThreadUtils.sameThread)
  if (blocking) {
    timeout.awaitResult(future)
  }
}

如果blocking=true，那么就需要等待這個(gè)請(qǐng)求被所有executor處理完，才能返回。

具體消息總線的事件傳輸機(jī)制這里不深入講解，最終這條rpc請(qǐng)求會(huì)傳送到BlockManagerMasterEndpoint，被receiveAndReply方法處理。

3.1.4 BlockManagerMasterEndpoint端對(duì)removeBroadcast的處理

  override def receiveAndReply(context: RpcCallContext): PartialFunction[Any, Unit] = {
  ...
  case RemoveBroadcast(broadcastId, removeFromDriver) => context.reply(removeBroadcast(broadcastId, removeFromDriver))
  ... }

進(jìn)一步調(diào)用BlockManagerMasterEndpoint類的removeBroadcast方法，

private def removeBroadcast(broadcastId: Long, removeFromDriver: Boolean): Future[Seq[Int]] = {
  val removeMsg = RemoveBroadcast(broadcastId, removeFromDriver)
  val requiredBlockManagers = blockManagerInfo.values.filter { info =>
    removeFromDriver || !info.blockManagerId.isDriver
  }
  val futures = requiredBlockManagers.map { bm =>
    bm.slaveEndpoint.ask[Int](removeMsg).recover {
      case e: IOException =>
        logWarning(s"Error trying to remove broadcast $broadcastId from block manager " +
          s"${bm.blockManagerId}", e)
        0 // zero blocks were removed
    }
  }.toSeq

  Future.sequence(futures)
}

首先會(huì)將刪除broadcast的請(qǐng)求再封裝發(fā)送給所有的BlockManagerSlaveEndpointr。具體操作流程是

封裝一個(gè)RemoveBroadcast的case類
過濾出所有非driver的BlockManager，如果removeFromDriver為true，那么driver的BlockManger會(huì)被保留
向所有BlockManagerEndpoint發(fā)送removeBroadcast的RPC請(qǐng)求。

3.1.5 Executor端對(duì)刪除Broadcast的操作

BlockManagerSlaveEndpoint的receiveAndReply方法

override def receiveAndReply(context: RpcCallContext): PartialFunction[Any, Unit] = {
...
    case RemoveBroadcast(broadcastId, _) =>doAsync[Int]("removing broadcast " + broadcastId, context) {blockManager.removeBroadcast(broadcastId, tellMaster = true)
...
  }

executor端接收到removeBroadcast的請(qǐng)求后，會(huì)嘗試調(diào)用BlockManager.removeBroadcast方法

3.1.6 BlockManager的removeBroadcast方法

這是最終的操作了，具體就是遍歷BlockManager上所有的BlockId，如果是屬于該Broadcast，則調(diào)用removeBlock方法刪除具體block塊，最終返回刪除掉block塊的數(shù)量。

def removeBroadcast(broadcastId: Long, tellMaster: Boolean): Int = {
  logDebug(s"Removing broadcast $broadcastId")
  val blocksToRemove = blockInfoManager.entries.map(_._1).collect {
    case bid @ BroadcastBlockId(`broadcastId`, _) => bid
  }
  blocksToRemove.foreach { blockId => removeBlock(blockId, tellMaster) }
  blocksToRemove.size
}

removeBlock也是BlockManager的方法：

def removeBlock(blockId: BlockId, tellMaster: Boolean = true): Unit = {
  logDebug(s"Removing block $blockId")
  blockInfoManager.lockForWriting(blockId) match {
    case None =>
      // The block has already been removed; do nothing.
      logWarning(s"Asked to remove block $blockId, which does not exist")
    case Some(info) =>
      removeBlockInternal(blockId, tellMaster = tellMaster && info.tellMaster)
      addUpdatedBlockStatusToTaskMetrics(blockId, BlockStatus.empty)
  }
}

具體流程是：

獲取block的元數(shù)據(jù)信息，這個(gè)過程要對(duì)block加上寫鎖，防止其他線程同時(shí)修改數(shù)據(jù)。
如果獲取到了元數(shù)據(jù)，那么調(diào)用removeBlockInternal將其刪除
更新block的相關(guān)Metrics信息。

BlockManager的removeBlockInternal就不詳細(xì)介紹了，具體就是嘗試刪除內(nèi)存及磁盤的Block數(shù)據(jù)，最后刪除掉BlockInfoManager（保存了該BlockMNager中所有的block信息）的該廣播變量的blockId。

3.1.7 總結(jié)

這就是手動(dòng)調(diào)用廣播變量刪除的方法了，doDestroy會(huì)刪除driver及executor的廣播變量，而unpersist只會(huì)刪除executor上的廣播變量。

中間涉及到兩次RPC請(qǐng)求，分別是driver向同在driver上的BlockManagerMaster發(fā)送請(qǐng)求，以及BlockManagerMaster向BlockManagerSlave發(fā)送請(qǐng)求。

最終的刪除是調(diào)用BlockManager的removeBlock來刪除的，其實(shí)Spark中不管是RDD、Shuffle數(shù)據(jù)、最中都是以Block的形式做管理的，整體代碼邏輯非常清晰，如果想深入理解Spark的設(shè)計(jì)，一定要把BlockManager這塊搞清楚。

3.2 自動(dòng)清理：用于存儲(chǔ)MapOutputStatus

上文講到的doDestroy可以在用戶代碼中顯式調(diào)用。除此之外，它還被自己的destroy()方法做了調(diào)用，而它又被MapOutputTracker類的invalidateSerializedMapOutputStatusCache做了調(diào)用，進(jìn)行driver及executor所有的廣播變量刪除。

def invalidateSerializedMapOutputStatusCache(): Unit = withWriteLock {
  if (cachedSerializedBroadcast != null) {
    // Prevent errors during broadcast cleanup from crashing the DAGScheduler (see SPARK-21444)
    Utils.tryLogNonFatalError {
      // Use `blocking = false` so that this operation doesn't hang while trying to send cleanup
      // RPCs to dead executors.
      cachedSerializedBroadcast.destroy()
    }
    cachedSerializedBroadcast = null
  }
  cachedSerializedMapStatus = null
}

cachedSerializedBroadcast的是一個(gè)存儲(chǔ)了二進(jìn)制數(shù)組的廣播變量，如果它！=null，那么就回觸發(fā)destroy的清理。

事實(shí)上，這個(gè)方法在是在Shuffle完成之后才進(jìn)行調(diào)用的，這個(gè)廣播變量存儲(chǔ)的是Map端已經(jīng)完成的Task的id、Shuffle數(shù)據(jù)存放位置等信息。用于傳輸給driver和下游的reduce端進(jìn)行數(shù)據(jù)拉取和任務(wù)調(diào)度等操作。

說白了，Shuffle這部分的廣播變量是我們用于自己觸碰不到的，只要知道，shuffle的MaoOutputStatus相關(guān)信息是用廣播來發(fā)送的即可。

3.3 自動(dòng)清理：ContextCleaner

這個(gè)機(jī)制是默認(rèn)開啟的，可以自動(dòng)回收變?yōu)槿跻玫腞DD、Shuffle、廣播變量、累加器和checkpoint，Spark會(huì)創(chuàng)建一個(gè)定時(shí)線程，每隔一定時(shí)間，就調(diào)用System.gc()來回收變?yōu)槿跻玫?種數(shù)據(jù)類型。默認(rèn)30min，使用如下參數(shù)進(jìn)行設(shè)置：

spark.cleaner.periodicGC.interval #默認(rèn)30min

如果日志中出現(xiàn)了Spark Context Cleaner，那么證明Spark已經(jīng)自動(dòng)開啟了clean操作。

開頭[2.1.1 SparkContext的broadcast方法]部分代碼中，有一行：

cleaner.foreach(_.registerBroadcastForCleanup(bc))

用cleaner來注冊(cè)了一個(gè)廣播變量的cleanup，注意這里cleaner不是什么數(shù)組或者鏈表來才調(diào)用foreach方法，而是一個(gè)Option[ContextCleaner]，沒想到吧，Option也能調(diào)用foreach方法，如果cleaner為None，那么就跳過了registerBroadcastForCleanup方法。

3.3.1 ContextCleaner的廣播變量注冊(cè)

/** Register a Broadcast for cleanup when it is garbage collected. */
def registerBroadcastForCleanup[T](broadcast: Broadcast[T]): Unit = {
  registerForCleanup(broadcast, CleanBroadcast(broadcast.id))
}

 private def registerForCleanup(objectForCleanup: AnyRef, task: CleanupTask): Unit = {
    referenceBuffer.add(new CleanupTaskWeakReference(task, objectForCleanup, referenceQueue))
}

 private class CleanupTaskWeakReference( val task: CleanupTaskreferent: AnyRef, referenceQueue: ReferenceQueue[AnyRef])
extends WeakReference(referent, referenceQueue)

簡(jiǎn)單來說，就是封裝了一個(gè)CleanupTaskWeakReference弱引用類，與引用隊(duì)列referenceQueue聯(lián)合使用，并添加到referenceBuffer中（ConcurrentHashMap），referenceBuffer主要作用保存CleanupTaskWeakReference弱引用，確保在引用隊(duì)列沒處理前，弱引用不會(huì)被垃圾回收。當(dāng)這個(gè)廣播變量在可達(dá)性分析中變成弱引用時(shí)，就可以進(jìn)行回收了。

3.3.2 ContextCleaner的清理

每隔30min，會(huì)調(diào)用keepCleaning方法，如果廣播變量已經(jīng)被引用隊(duì)列處理了，就可以調(diào)用doCleanupBroadcast進(jìn)行清理。

private def keepCleaning(): Unit = Utils.tryOrStopSparkContext(sc) {
  while (!stopped) {
    try {
      val reference = Option(referenceQueue.remove(ContextCleaner.REF_QUEUE_POLL_TIMEOUT))
        .map(_.asInstanceOf[CleanupTaskWeakReference])
      // Synchronize here to avoid being interrupted on stop()
      synchronized {
        reference.foreach { ref =>
          logDebug("Got cleaning task " + ref.task)
          referenceBuffer.remove(ref)
          ref.task match {
            case CleanRDD(rddId) =>
              doCleanupRDD(rddId, blocking = blockOnCleanupTasks)
            case CleanShuffle(shuffleId) =>
              doCleanupShuffle(shuffleId, blocking = blockOnShuffleCleanupTasks)
            case CleanBroadcast(broadcastId) =>
              doCleanupBroadcast(broadcastId, blocking = blockOnCleanupTasks)
            case CleanAccum(accId) =>
              doCleanupAccum(accId, blocking = blockOnCleanupTasks)
            case CleanCheckpoint(rddId) =>
              doCleanCheckpoint(rddId)
          }
        }
      }
    } catch {
      case ie: InterruptedException if stopped => // ignore
      case e: Exception => logError("Error in cleaning thread", e)
    }
  }
}

從referenceQueue取出所有變需要清理的弱引用，將其一一刪除。

doCleanupBroadcast調(diào)用了BroadcastManager的unbroadcast方法：

override def unbroadcast(id: Long, removeFromDriver: Boolean, blocking: Boolean): Unit = {
  TorrentBroadcast.unpersist(id, removeFromDriver, blocking)
}

最終又調(diào)用了unpersist方法，注意這里removeFromDriver是=true的，相當(dāng)于調(diào)用destroy方法，將廣播變量從driver以及executor全部刪除。

3.4 自動(dòng)清理：軟引用

這種方式也是通過GC來清除軟引用的方式，只是清理實(shí)際的廣播變量?jī)?nèi)對(duì)應(yīng)的對(duì)象，而廣播變量依舊在BlockManager中，如果之后需要再使用該value，可以從BlockManger中重新讀取廣播變量對(duì)應(yīng)的數(shù)據(jù)。

在調(diào)用broadcast.value時(shí)，會(huì)進(jìn)一步調(diào)用TorrentBroadcast的getValue方法：

override protected def getValue() = synchronized {
  val memoized: T = if (_value == null) null.asInstanceOf[T] else _value.get
  if (memoized != null) {
 memoized
  } else {
 val newlyRead = readBroadcastBlock()
 _value = new SoftReference[T](newlyRead)
 newlyRead
  }
}

如果是第一次訪問，那么會(huì)走else塊的代碼，為_value創(chuàng)建一個(gè)軟引用。軟引用在遇到GC且內(nèi)存不足的時(shí)候會(huì)被刪除。

那么就是說，如果廣播變量在一個(gè)executor中被訪問過，且遇到一次內(nèi)存不足導(dǎo)致的GC時(shí)，就會(huì)刪除該對(duì)象。

在driver和executor端都是如此，但并不會(huì)影響到已經(jīng)存儲(chǔ)在BlockManger中的廣播變量數(shù)據(jù)。

所以，這里還可以引入一個(gè)優(yōu)化點(diǎn)，在使用廣播變量的時(shí)候，一個(gè)partition盡量只調(diào)用一次.value方法：

rdd.mapPartition(iter=>{
  val blackIps=blackIpsBC.value
  iter.filter(t=>!blackIps.contains(t.ip))
 })

這種做法，可以跳過每條數(shù)據(jù)都需要做廣播變量是否存在的判斷，是比較好的編碼習(xí)慣。

收工！

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美国产综合欧美视频

Spark-Broadcast的生命周期

Spark-Broadcast的生命周期

1 Broadcast簡(jiǎn)介

1.1 廣播變量的優(yōu)勢(shì)

2 廣播變量的創(chuàng)建過程

2.1 driver端做了什么？

2.1.1 SparkContext的broadcast方法

2.1.2 BroadcastManager類

2.1.3 TorrentBroadcastFactory類

2.1.4 TorrentBroadcast

2.2 executor端做了什么

2.2.1 Broadcast的value方法

2.2.2 TorrentBroadcast的getValue方法

2.2.3 TorrentBroadcast的readBroadcastBlock方法

2.2.4 TorrentBroadcast的readBlocks方法

3 廣播變量的清理

3.1 主動(dòng)清理

3.1.1 Broadcast的unpersist方法

3.1.2 TorrentBrodcast的doUnpersist方法

3.1.3 BlockManagerMaster的removeBroadcast方法

3.1.4 BlockManagerMasterEndpoint端對(duì)removeBroadcast的處理

3.1.5 Executor端對(duì)刪除Broadcast的操作

3.1.6 BlockManager的removeBroadcast方法

3.1.7 總結(jié)

3.2 自動(dòng)清理：用于存儲(chǔ)MapOutputStatus

3.3 自動(dòng)清理：ContextCleaner

3.3.1 ContextCleaner的廣播變量注冊(cè)

3.3.2 ContextCleaner的清理

3.4 自動(dòng)清理：軟引用

推薦閱讀更多精彩內(nèi)容

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美 国产 综合 欧美 视频

Spark-Broadcast的生命周期

1 Broadcast簡(jiǎn)介

1.1 廣播變量的優(yōu)勢(shì)

2 廣播變量的創(chuàng)建過程

2.1 driver端做了什么？

2.1.1 SparkContext的broadcast方法

2.1.2 BroadcastManager類

2.1.3 TorrentBroadcastFactory類

2.1.4 TorrentBroadcast

2.2 executor端做了什么

2.2.1 Broadcast的value方法

2.2.2 TorrentBroadcast的getValue方法

2.2.3 TorrentBroadcast的readBroadcastBlock方法

2.2.4 TorrentBroadcast的readBlocks方法

3 廣播變量的清理

3.1 主動(dòng)清理

3.1.1 Broadcast的unpersist方法

3.1.2 TorrentBrodcast的doUnpersist方法

3.1.3 BlockManagerMaster的removeBroadcast方法

3.1.4 BlockManagerMasterEndpoint端對(duì)removeBroadcast的處理

3.1.5 Executor端對(duì)刪除Broadcast的操作

3.1.6 BlockManager的removeBroadcast方法

3.1.7 總結(jié)

3.2 自動(dòng)清理：用于存儲(chǔ)MapOutputStatus

3.3 自動(dòng)清理：ContextCleaner

3.3.1 ContextCleaner的廣播變量注冊(cè)

3.3.2 ContextCleaner的清理

3.4 自動(dòng)清理：軟引用

推薦閱讀更多精彩內(nèi)容

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美国产综合欧美视频