版權聲明：本文為原創文章，未經允許不得轉載。
復習內容：
Spark中Stage的提交 http://www.lxweimin.com/p/75751908329a

Spark中Task的提交

1.在復習內容部分我們介紹了在方法onStageSubmitted中，Stage的提交，那么在該方法中還有Task的提交，如下所示：
<code>
override def onStageSubmitted(stageSubmitted: SparkListenerStageSubmitted): Unit = synchronized {
//（1）Stage的提交，詳見文章-Spark中Task的提交
//（2）Task的提交
//broadcasted task的二進制，用來分發tasks給executors。
//注意：我們broadcast RDD的拷貝并且對于每一個task我們將要反序列化，這意味著每個task得到一個不同的RDD 拷貝
var taskBinary: Broadcast[Array[Byte]] = null
try {
// For ShuffleMapTask, serialize and broadcast (rdd, shuffleDep).
// For ResultTask, serialize and broadcast (rdd, func).
val taskBinaryBytes: Array[Byte] = stage match {
case stage: ShuffleMapStage =>
closureSerializer.serialize((stage.rdd, stage.shuffleDep): AnyRef).array()
case stage: ResultStage =>
closureSerializer.serialize((stage.rdd, stage.func): AnyRef).array()
}
//將序列化后的task廣播出去
taskBinary = sc.broadcast(taskBinaryBytes)
} catch {
case e: NotSerializableException =>
abortStage(stage, "Task not serializable: " + e.toString, Some(e))
runningStages -= stage
return
case NonFatal(e) =>
abortStage(stage, s"Task serialization failed: $e\n${e.getStackTraceString}", Some(e))
runningStages -= stage
return
}
//根據stage生成tasks
val tasks: Seq[Task[]] = try {
stage match {
//對于ShuffleMapStages生成ShuffleMapTask
case stage: ShuffleMapStage =>
partitionsToCompute.map { id =>
val locs = taskIdToLocations(id)
val part = stage.rdd.partitions(id)
//可見一個partition，一個task，一個位置信息
new ShuffleMapTask(stage.id, stage.latestInfo.attemptId,
taskBinary, part, locs, stage.internalAccumulators)
}
//對于ResultStage生成ResultTask
case stage: ResultStage =>
val job = stage.resultOfJob.get
partitionsToCompute.map { id =>
val p: Int = stage.partitions(id)
val part = stage.rdd.partitions(p)
val locs = taskIdToLocations(id)
new ResultTask(stage.id, stage.latestInfo.attemptId,
taskBinary, part, locs, id, stage.internalAccumulators)
}
}
} catch {
case NonFatal(e) =>
abortStage(stage, s"Task creation failed: $e\n${e.getStackTraceString}", Some(e))
runningStages -= stage
return
}
//如果tasks的num大于0
if (tasks.size > 0) {
logInfo("Submitting " + tasks.size + " missing tasks from " + stage + " (" + stage.rdd + ")")
stage.pendingPartitions ++= tasks.map(.partitionId)
logDebug("New pending partitions: " + stage.pendingPartitions)
//調用taskScheduler提交TaskSet，詳見2
taskScheduler.submitTasks(new TaskSet(
tasks.toArray, stage.id, stage.latestInfo.attemptId, stage.firstJobId, properties))
stage.latestInfo.submissionTime = Some(clock.getTimeMillis())
} else {
//因為我們之前就已經發送了事件SparkListenerStageSubmitted，所以我們標記Stage為completed防止沒有任務提交
markStageAsFinished(stage, None)
//將debugString記錄到日志中
val debugString = stage match {
case stage: ShuffleMapStage =>
s"Stage ${stage} is actually done; " +
s"(available: ${stage.isAvailable}," +
s"available outputs: ${stage.numAvailableOutputs}," +
s"partitions: ${stage.numPartitions})"
case stage : ResultStage =>
s"Stage ${stage} is actually done; (partitions: ${stage.numPartitions})"
}
logDebug(debugString)
}
}
</code>
2.Task的提交會調用taskScheduler的submitTasks方法進行，TaskScheduler是trait，它的唯一的具體實現是TaskSchedulerImpl，submitTasks方法如下所示：
<code>
override def submitTasks(taskSet: TaskSet) {
val tasks = taskSet.tasks
logInfo("Adding task set " + taskSet.id + " with " + tasks.length + " tasks")
this.synchronized {
//為一個taskSet創建一個TaskSetManager
val manager = createTaskSetManager(taskSet, maxTaskFailures)
val stage = taskSet.stageId
val stageTaskSets =
taskSetsByStageIdAndAttempt.getOrElseUpdate(stage, new HashMap[Int, TaskSetManager])
stageTaskSets(taskSet.stageAttemptId) = manager
val conflictingTaskSet = stageTaskSets.exists { case (, ts) =>
ts.taskSet != taskSet && !ts.isZombie
}
if (conflictingTaskSet) {
throw new IllegalStateException(s"more than one active taskSet for stage $stage:" +
s" ${stageTaskSets.toSeq.map{._2.taskSet.id}.mkString(",")}")
}
//將taskSetManager和taskSet添加到兩種可調度的tree中，FIFO or FAIR
schedulableBuilder.addTaskSetManager(manager, manager.taskSet.properties)
if (!isLocal && !hasReceivedTask) {
//一個定時器
starvationTimer.scheduleAtFixedRate(new TimerTask() {
override def run() {
if (!hasLaunchedTask) {
logWarning("Initial job has not accepted any resources; " +
"check your cluster UI to ensure that workers are registered " +
"and have sufficient resources")
} else {
this.cancel()
}
}
}, STARVATION_TIMEOUT_MS, STARVATION_TIMEOUT_MS)
}
hasReceivedTask = true
}
//不同（集群）模式進行資源的分配
backend.reviveOffers()
}
</code>
這樣我們就完成了Task的提交，那么不同模式對于Task的資源又是如何分配的呢，我們后面介紹。

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美国产综合欧美视频

Spark中Task的提交源碼解讀

Spark中Task的提交源碼解讀

Spark中Task的提交

推薦閱讀更多精彩內容

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美 国产 综合 欧美 视频

Spark中Task的提交源碼解讀

Spark中Task的提交

推薦閱讀更多精彩內容

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美国产综合欧美视频