利用碎片時間閱讀了一下Flink的源碼,選擇Flink主要出發(fā)點還是了解一個穩(wěn)定的分布式計算系統(tǒng)的實現(xiàn),另外也是由于Flink相對更加成熟的Spark有其獨到的優(yōu)勢,相信其在下一代分布式計算中也會占有重要的地位。Flink的主要概念可以在官網(wǎng)了解
Flink系統(tǒng)作業(yè)的提交和調(diào)度都是利用AKKA的Actor通信,因此也是由此作為切入點,首先理清整個系統(tǒng)的啟動以及作業(yè)提交的流程和數(shù)據(jù)流。
圖中可以看到,一個完整的Flink系統(tǒng)由三個Actor System構(gòu)成,包括Client、JobManager(JM)以及TaskManager(TM)。下面對三個Actor系統(tǒng)的創(chuàng)建進行分析。
JM ActorSystem
JM是Flink系統(tǒng)的調(diào)度中心,這部分除了會看到JM ActorSystem的創(chuàng)建,還會了解到整個Flink系統(tǒng)的各個模塊的初始化與運行。
先找程序入口,從啟動腳本可以追溯到,每一個啟動腳本最終都會運行flink_deamon.sh 腳本,查看該腳本:
...
...
case $DAEMON in
(jobmanager)
CLASS_TO_RUN=org.apache.flink.runtime.jobmanager.JobManager
;;
(taskmanager)
CLASS_TO_RUN=org.apache.flink.runtime.taskmanager.TaskManager
;;
(zookeeper)
CLASS_TO_RUN=org.apache.flink.runtime.zookeeper.FlinkZooKeeperQuorumPeer
;;
(*)
echo "Unknown daemon '${DAEMON}'. $USAGE."
exit 1
;;
esac
$JAVA_RUN $JVM_ARGS ${FLINK_ENV_JAVA_OPTS} "${log_setting[@]}" -classpath "`manglePathList "$FLINK_TM_CLASSPATH:$INTERNAL_HADOOP_CLASSPATHS"`" ${CLASS_TO_RUN} "${ARGS[@]}" > "$out" 2>&1 < /dev/null &
...
...
由此找到JM的程序入口:org.apache.flink.runtime.jobmanager.JobManager.scala,代碼中可以找到main函數(shù),調(diào)用runJobManager方法:
def runJobManager(
configuration: Configuration,
executionMode: JobManagerMode,
listeningAddress: String,
listeningPort: Int)
: Unit = {
//startActorSystemAndJobManagerActors返回jobManagerSystem
val (jobManagerSystem, _, _, webMonitorOption, _) = startActorSystemAndJobManagerActors(
configuration,
executionMode,
listeningAddress,
listeningPort,
classOf[JobManager],
classOf[MemoryArchivist],
Option(classOf[StandaloneResourceManager])
)
// 阻塞,直到系統(tǒng)退出
jobManagerSystem.awaitTermination()
webMonitorOption.foreach{
webMonitor =>
try {
webMonitor.stop()
} catch {
case t: Throwable =>
LOG.warn("Could not properly stop the web monitor.", t)
}
}
}
runJobManager方法邏輯比較簡單,調(diào)用startActorSystemAndJobManagerActors方法中創(chuàng)建ActorSystem和JMActor,然后阻塞等待系統(tǒng)退出,看具體的JM創(chuàng)建過程:
def startActorSystemAndJobManagerActors(
configuration: Configuration,
executionMode: JobManagerMode,
listeningAddress: String,
listeningPort: Int,
jobManagerClass: Class[_ <: JobManager],
archiveClass: Class[_ <: MemoryArchivist],
resourceManagerClass: Option[Class[_ <: FlinkResourceManager[_]]])
: (ActorSystem, ActorRef, ActorRef, Option[WebMonitor], Option[ActorRef]) = {
LOG.info("Starting JobManager")
// Bring up the job manager actor system first, bind it to the given address.
val hostPortUrl = NetUtils.hostAndPortToUrlString(listeningAddress, listeningPort)
LOG.info(s"Starting JobManager actor system at $hostPortUrl")
val jobManagerSystem = try {
val akkaConfig = AkkaUtils.getAkkaConfig(
configuration,
Some((listeningAddress, listeningPort))
)
if (LOG.isDebugEnabled) {
LOG.debug("Using akka configuration\n " + akkaConfig)
}
AkkaUtils.createActorSystem(akkaConfig)//創(chuàng)建ActorSystem全局僅有一個
}
catch {
...
...
}
...
...//此處省略webMonitor的創(chuàng)建
try {
// bring up the job manager actor
LOG.info("Starting JobManager actor")
val (jobManager, archive) = startJobManagerActors(
configuration,
jobManagerSystem,
jobManagerClass,
archiveClass)
// start a process reaper that watches the JobManager. If the JobManager actor dies,
// the process reaper will kill the JVM process (to ensure easy failure detection)
LOG.debug("Starting JobManager process reaper")
jobManagerSystem.actorOf(
Props(
classOf[ProcessReaper],
jobManager,
LOG.logger,
RUNTIME_FAILURE_RETURN_CODE),
"JobManager_Process_Reaper")
// bring up a local task manager, if needed
if (executionMode == JobManagerMode.LOCAL) {
LOG.info("Starting embedded TaskManager for JobManager's LOCAL execution mode")
val taskManagerActor = TaskManager.startTaskManagerComponentsAndActor(
configuration,
ResourceID.generate(),
jobManagerSystem,
listeningAddress,
Some(TaskManager.TASK_MANAGER_NAME),
None,
localTaskManagerCommunication = true,
classOf[TaskManager])
LOG.debug("Starting TaskManager process reaper")
jobManagerSystem.actorOf(
Props(
classOf[ProcessReaper],
taskManagerActor,
LOG.logger,
RUNTIME_FAILURE_RETURN_CODE),
"TaskManager_Process_Reaper")
}
...
...
(jobManagerSystem, jobManager, archive, webMonitor, resourceManager)
}
...
...
}
這里可以看到startActorSystemAndJobManagerActors方法中利用AkkaUtils和flinkConfig創(chuàng)建了全局的ActorSystem,AkkaUtils也是對Actor創(chuàng)建的簡單封裝,這里不再贅述。緊接著利用剛創(chuàng)建的jobManagerSystem和jobManager的類名:jobManagerClass創(chuàng)建jobManager。除了jobManager以外,該方法中還創(chuàng)建了Flink的其他重要模塊,從返回值中可以清楚看到。另外本地模式啟動方式下,還會創(chuàng)建本地的啟動本地的taskManagerActor。繼續(xù)深入到startJobManagerActors,該方法接收jobManagerSystem等參數(shù),創(chuàng)建jobManager和archive并返回:
def startJobManagerActors(
configuration: Configuration,
actorSystem: ActorSystem,
jobManagerActorName: Option[String],
archiveActorName: Option[String],
jobManagerClass: Class[_ <: JobManager],
archiveClass: Class[_ <: MemoryArchivist])
: (ActorRef, ActorRef) = {
val (executorService: ExecutorService,
instanceManager,
scheduler,
libraryCacheManager,
restartStrategy,
timeout,
archiveCount,
leaderElectionService,
submittedJobGraphs,
checkpointRecoveryFactory,
savepointStore,
jobRecoveryTimeout,
metricsRegistry) = createJobManagerComponents(
configuration,
None)
val archiveProps = Props(archiveClass, archiveCount)
// start the archiver with the given name, or without (avoid name conflicts)
val archive: ActorRef = archiveActorName match {
case Some(actorName) => actorSystem.actorOf(archiveProps, actorName)
case None => actorSystem.actorOf(archiveProps)
}
val jobManagerProps = Props(
jobManagerClass,
configuration,
executorService,
instanceManager,
scheduler,
libraryCacheManager,
archive,
restartStrategy,
timeout,
leaderElectionService,
submittedJobGraphs,
checkpointRecoveryFactory,
savepointStore,
jobRecoveryTimeout,
metricsRegistry)
val jobManager: ActorRef = jobManagerActorName match {
case Some(actorName) => actorSystem.actorOf(jobManagerProps, actorName)
case None => actorSystem.actorOf(jobManagerProps)
}
(jobManager, archive)
}
這里首先createJobManagerComponents方法創(chuàng)建了jobManager的重要組成模塊,包括了存儲、備份等策略的組件實現(xiàn),還包括以后會遇到的scheduler、submittedJobGraphs,分別負責job的調(diào)度和作業(yè)的提交,這里暫不深入。
jobManagerActor已經(jīng)成功創(chuàng)建,但是Scala中一個Actor會繼承Actor類,并重寫receive方法接受信息并處理,由此可以發(fā)現(xiàn).JobManager類繼承FlinkActor,再看FlinkActor:
trait FlinkActor extends Actor {
val log: Logger
override def receive: Receive = handleMessage
/** Handle incoming messages
* @return
*/
def handleMessage: Receive
def decorateMessage(message: Any): Any = {
message
}
}
可以看到receive方法被重寫,并賦值為handleMessage,所以處理消息的操作被放在FlinkActor子類Jobmanager的handleMessage方法中:
override def handleMessage: Receive = {
...
...
case SubmitJob(jobGraph, listeningBehaviour) =>
val client = sender()
val jobInfo = new JobInfo(client, listeningBehaviour, System.currentTimeMillis(),
jobGraph.getSessionTimeout)
submitJob(jobGraph, jobInfo)
...
...
handleMessage方法中處理的消息很多,包括了諸如作業(yè)恢復,leader決策,TM注冊,作業(yè)的提交、恢復與取消,這里暫時只關(guān)注消息SubmitJob(jobGraph, listeningBehaviour),消息的定義很簡單,不再追溯。而SubmitJob消息的主要獲取Client傳來的jobGraph以及l(fā)isteningBehaviour。Flink的作業(yè)最后都會抽象為jobGraph交給JM處理。關(guān)于jobGraph的生成,會在后面的Job生成的過程中進行分析。
JM對job的處理函數(shù)submitJob(jobGraph, jobInfo),參數(shù)jobInfo中包括了Client端的ActorRef,用以Job處理結(jié)果的返回,該函數(shù)中實現(xiàn)了JM對作業(yè)的提交與處理的細節(jié),為突出重點,放在作業(yè)處理部分分析。但從該方法的注釋來看:
/**
* Submits a job to the job manager. The job is registered at the libraryCacheManager which
* creates the job's class loader. The job graph is appended to the corresponding execution
* graph and the execution vertices are queued for scheduling.
*
* @param jobGraph representing the Flink job
* @param jobInfo the job info
* @param isRecovery Flag indicating whether this is a recovery or initial submission
*/
在該方法中將Job注冊到libraryCacheManager,并將Job執(zhí)行餓的DAG加入到調(diào)度隊列。
小結(jié)
這里僅僅就JM Actor的創(chuàng)建過程對flink的源碼進行了分析,主要了解到flink系統(tǒng)JM部分ActorSystem的組織方式,main函數(shù)最終創(chuàng)建JM 監(jiān)聽客戶端的消息,并對作業(yè)進行調(diào)度和Job容錯處理,最終交由TaskManager進行處理。對于具體的調(diào)度和處理策略,JM和TM的通信會在以后進行分析。接下來首先看Client端的邏輯。