本文對(duì) Flink 的 Application、Per-Job 和 Session 部署模式進(jìn)行了對(duì)比分析。詳細(xì)介紹了 Native Kubernetes 場(chǎng)景下的 Application 部署模式,并且對(duì)整個(gè)啟動(dòng)流程進(jìn)行了源碼分析。
1.Native Kubernetes Application 簡(jiǎn)介
1.1 Flink 部署模式簡(jiǎn)介
Flink 的部署模式有 Application、Per-Job 和 Session 模式。
Application、Per-Job 和 Session 部署模式的主要區(qū)別:
● 集群與作業(yè)的生命周期是否一致
● 資源的隔離程度
● 作業(yè)的mian()
運(yùn)行在 client 還是集群上
Application 模式的特點(diǎn):① 作業(yè)與 Flink 集群打包在一起,在 JobManager 的啟動(dòng)時(shí)候會(huì)執(zhí)行作業(yè)的 main 函數(shù)直接啟動(dòng)作業(yè),而不需要通過(guò) Flink Client 提交作業(yè)。② 作業(yè)的生命周期與 Flink 集群的一致,即作業(yè)關(guān)閉后 Flink 集群也會(huì)關(guān)閉
說(shuō)明:Application 模式對(duì)比 Per-Job 模式最大的區(qū)別是前者使用
executeAsync()
提交作業(yè)(不阻塞),而后者使用execute()
提交作業(yè)(阻塞),因此 Application 模式可以運(yùn)行多個(gè)作業(yè)
Per-Job 模式的特點(diǎn):作業(yè)與 Flink 集群不是打包在一起,在 JobManager 啟動(dòng)后需要通過(guò) Flink Client 提交作業(yè),即增加了網(wǎng)絡(luò)傳輸?shù)膲毫涂蛻舳说?CPU 資源。
Session 模式的特點(diǎn):常駐的 JobManager,多個(gè)作業(yè)共享同一個(gè)集群。如果其中一個(gè)作業(yè)異常導(dǎo)致 TaskManager 關(guān)閉,則該 TM 上的全部作業(yè)都會(huì)重新調(diào)度。
1.2 Flink Native Kubernetes Application 架構(gòu)圖
資源調(diào)度方面:Flink 支持 Kubernetes、Yarn 和 Mesos 資源調(diào)度器
Native 是指可以通過(guò)底層的資源調(diào)度管理器,實(shí)現(xiàn)彈性擴(kuò)縮容。Native Kubernetes Application 是指 Flink 采用 Application 的部署模式,并使用 Kubernetes 進(jìn)行資源管理。
用戶只需要通過(guò) Flink Client/CLI 啟動(dòng)作業(yè)。首先通過(guò) K8s 啟動(dòng) JobManager(deployment)的同時(shí)啟動(dòng)作業(yè),然后通過(guò) JobManager 內(nèi)部的 K8sResourceManager 模塊向 K8s 直接申請(qǐng) TaskManager 的資源并啟動(dòng),最后當(dāng) TM 注冊(cè)到 JM 后作業(yè)就提交到 TM。用戶在整個(gè)過(guò)程無(wú)需指定 TaskManager 資源的數(shù)量,而是由 JobManager 向 K8s 按需申請(qǐng)的。
Flink Application on Native Kubernetes 的實(shí)踐案例:
《Flink on K8s 在阿里巴巴的實(shí)踐》
《Native Flink on K8s 在小紅書的實(shí)踐》
《Flink on K8s 在京東的持續(xù)優(yōu)化實(shí)踐》
2.啟動(dòng)流程詳解
2.1 啟動(dòng)流程總覽
2.2 啟動(dòng)腳本及其配置
$ ./bin/flink run-application \
--target kubernetes-application \
-Dkubernetes.cluster-id=top-speed-windowing-application \
-Dkubernetes.container.image=172.1.45.167:5000/flink:1.13.6-scala_2.11 \
local:///opt/flink/examples/streaming/TopSpeedWindowing.jar
Native Kubernetes Application 模式下,啟動(dòng)腳本 ./bin/flink
的必要參數(shù)有 --target kubernetes-application
、-Dkubernetes.cluster-id=***
、-Dkubernetes.container.image=***
和 作業(yè) jar 路徑 local:///***
2.3 啟動(dòng) JobManager 和作業(yè)
2.3.1 CliFrontend 入口
public int parseAndRun(String[] args) {
// 省略...
try {
// do action
switch (action) {
case ACTION_RUN:
run(params);
return 0;
// 匹配參數(shù) run-application
case ACTION_RUN_APPLICATION:
runApplication(params);
return 0;
case ACTION_LIST:
list(params);
return 0;
// 省略...
}
protected void runApplication(String[] args) throws Exception {
// 省略...
// 創(chuàng)建 ApplicationDeployer 用于創(chuàng)建 Kubernetes ClusterDescriptor
final ApplicationDeployer deployer =
new ApplicationClusterDeployer(clusterClientServiceLoader);
if (ProgramOptionsUtils.isPythonEntryPoint(commandLine)) {
programOptions = ProgramOptionsUtils.createPythonProgramOptions(commandLine);
effectiveConfiguration =
getEffectiveConfiguration(
activeCommandLine,
commandLine,
programOptions,
Collections.emptyList());
} else {
// 作業(yè)參數(shù),例如 jar 路徑、main 函數(shù)入口、args 入?yún)⒌鹊? programOptions = new ProgramOptions(commandLine);
programOptions.validate();
final URI uri = PackagedProgramUtils.resolveURI(programOptions.getJarFilePath());
effectiveConfiguration =
getEffectiveConfiguration(
activeCommandLine,
commandLine,
programOptions,
Collections.singletonList(uri.toString()));
}
final ApplicationConfiguration applicationConfiguration =
new ApplicationConfiguration(
programOptions.getProgramArgs(), programOptions.getEntryPointClassName());
// 提交用戶的作業(yè)并在集群中運(yùn)行其 main 函數(shù)
deployer.run(effectiveConfiguration, applicationConfiguration);
}
2.3.2 Flink Client 通過(guò) K8s Client 創(chuàng)建集群
public class ApplicationClusterDeployer implements ApplicationDeployer {
// 省略...
public <ClusterID> void run(
final Configuration configuration,
final ApplicationConfiguration applicationConfiguration)
throws Exception {
// 省略...
// 通過(guò) ClusterClientServiceLoader 創(chuàng)建 KubernetesClusterClientFactory
final ClusterClientFactory<ClusterID> clientFactory =
clientServiceLoader.getClusterClientFactory(configuration);
try (final ClusterDescriptor<ClusterID> clusterDescriptor =
clientFactory.createClusterDescriptor(configuration)) {
// 通過(guò) KubernetesClusterClientFactory 創(chuàng)建 KubernetesClusterDescriptor
final ClusterSpecification clusterSpecification =
clientFactory.getClusterSpecification(configuration);
// KubernetesClusterDescriptor 創(chuàng)建 application 集群
clusterDescriptor.deployApplicationCluster(
clusterSpecification, applicationConfiguration);
}
}
}
public class KubernetesClusterDescriptor implements ClusterDescriptor<String> {
// 省略...
@Override
public ClusterClientProvider<String> deployApplicationCluster(
final ClusterSpecification clusterSpecification,
final ApplicationConfiguration applicationConfiguration)
throws ClusterDeploymentException {
// 省略...
// 指定集群入口 KubernetesApplicationClusterEntrypoint 部署/啟動(dòng)集群
final ClusterClientProvider<String> clusterClientProvider =
deployClusterInternal(
KubernetesApplicationClusterEntrypoint.class.getName(),
clusterSpecification,
false);
// 省略...
}
private ClusterClientProvider<String> deployClusterInternal(
String entryPoint, ClusterSpecification clusterSpecification, boolean detached)
throws ClusterDeploymentException {
// 省略...
// 設(shè)置集群配置,例如啟動(dòng)入口entry、blobserver端口、taskmanager rpc端口、rest端口等等
flinkConfig.setString(KubernetesConfigOptionsInternal.ENTRY_POINT_CLASS, entryPoint);
// Rpc, blob, rest, taskManagerRpc ports need to be exposed, so update them to fixed values.
KubernetesUtils.checkAndUpdatePortConfigOption(
flinkConfig, BlobServerOptions.PORT, Constants.BLOB_SERVER_PORT);
KubernetesUtils.checkAndUpdatePortConfigOption(
flinkConfig, TaskManagerOptions.RPC_PORT, Constants.TASK_MANAGER_RPC_PORT);
KubernetesUtils.checkAndUpdatePortConfigOption(
flinkConfig, RestOptions.BIND_PORT, Constants.REST_PORT);
// 省略...
// 配置 JobManager 的 PodTemplate
try {
final KubernetesJobManagerParameters kubernetesJobManagerParameters =
new KubernetesJobManagerParameters(flinkConfig, clusterSpecification);
final FlinkPod podTemplate =
kubernetesJobManagerParameters
.getPodTemplateFilePath()
.map(
file ->
KubernetesUtils.loadPodFromTemplateFile(
client, file, Constants.MAIN_CONTAINER_NAME))
.orElse(new FlinkPod.Builder().build());
// 配置 JobManager 的 Deployment
// 配置 Deployment 的過(guò)程中,利用 CmdJobManagerDecorator 設(shè)置 JobManager main container 的啟動(dòng)命令,即 kubernetes-jobmanager.sh kubernetes-application
final KubernetesJobManagerSpecification kubernetesJobManagerSpec =
KubernetesJobManagerFactory.buildKubernetesJobManagerSpecification(
podTemplate, kubernetesJobManagerParameters);
client.createJobManagerComponent(kubernetesJobManagerSpec);
return createClusterClientProvider(clusterId);
// 省略...
}
}
}
public class Fabric8FlinkKubeClient implements FlinkKubeClient {
@Override
public void createJobManagerComponent(KubernetesJobManagerSpecification kubernetesJMSpec) {
final Deployment deployment = kubernetesJMSpec.getDeployment();
// 省略...
// 利用 Fabric8 Kubernetes Client 創(chuàng)建 JobManager 的 deployment
this.internalClient.resourceList(accompanyingResources).createOrReplace();
}
}
2.3.3 容器內(nèi)啟動(dòng)集群
public final class KubernetesApplicationClusterEntrypoint extends ApplicationClusterEntryPoint {
// 省略...
public static void main(final String[] args) {
// 省略...
// 設(shè)置作業(yè)配置
PackagedProgram program = null;
try {
program = getPackagedProgram(configuration);
} catch (Exception e) {
LOG.error("Could not create application program.", e);
System.exit(1);
}
try {
configureExecution(configuration, program);
} catch (Exception e) {
LOG.error("Could not apply application configuration.", e);
System.exit(1);
}
final KubernetesApplicationClusterEntrypoint kubernetesApplicationClusterEntrypoint =
new KubernetesApplicationClusterEntrypoint(configuration, program);
// 利用 helper 啟動(dòng)集群
ClusterEntrypoint.runClusterEntrypoint(kubernetesApplicationClusterEntrypoint);
}
}
private void runCluster(Configuration configuration, PluginManager pluginManager)
throws Exception {
synchronized (lock) {
// 初始化 rpcserver、haservice、blobserver等
initializeServices(configuration, pluginManager);
// 省略...
// DispatcherResourceManagerComponent,其封裝Dispatcher、ResourceManager和WebMonitorEndpoint
final DispatcherResourceManagerComponentFactory
dispatcherResourceManagerComponentFactory =
createDispatcherResourceManagerComponentFactory(configuration);
// 內(nèi)部使用DispatcherRunnerFactory創(chuàng)建DispatcherRunner
// 接著Dispatcher選主的時(shí)候,DefaultDispatcherRunner.grantLeadership() 啟動(dòng)新 dispatcher leader即startNewDispatcherLeaderProcess(),DispatcherLeaderProcess.start()會(huì)利用JobDispatcherLeaderProcess.create()創(chuàng)建ApplicationDispatcherBootstrap,最終調(diào)用ApplicationDispatcherBootstrap.runApplicationAsync()執(zhí)行用戶作業(yè)的main函數(shù)
clusterComponent =
dispatcherResourceManagerComponentFactory.create(
configuration,
ioExecutor,
commonRpcService,
haServices,
blobServer,
heartbeatServices,
metricRegistry,
executionGraphInfoStore,
new RpcMetricQueryServiceRetriever(
metricRegistry.getMetricQueryServiceRpcService()),
this);
// 省略...
}
}
當(dāng) Dispatcher 選擇主節(jié)點(diǎn)的時(shí)候,DefaultDispatcherRunner.grantLeadership() -> DefaultDispatcherRunner.startNewDispatcherLeaderProcess() -> DispatcherLeaderProcess.start() -> JobDispatcherLeaderProcess.create()創(chuàng)建ApplicationDispatcherBootstrap -> ApplicationDispatcherBootstrap.runApplicationAsync() -> ... -> ClientUtils.executeProgram() 調(diào)用作業(yè)的 main函數(shù)
說(shuō)明:Dispatcher 選主是利用了 Kubernetes Client 的
LeaderElector
,通過(guò)KubernetesLeaderElector
封裝 LeaderElector,最終利用LeaderElectionEventHandler
處理選主的回調(diào)任務(wù),其樣例如下所示。
public class LeaderElectionExample {
public static void main(String[] args) throws Exception {
ApiClient client = Config.defaultClient();
Configuration.setDefaultApiClient(client);
String lockHolderIdentityName = InetAddress.getLocalHost().getHostAddress();
// 創(chuàng)建 ConfigMap 鎖
ConfigMapLock lock = new ConfigMapLock( "default", "leader-election-ip", lockHolderIdentityName);
// Leader 選舉的配置
LeaderElectionConfig leaderElectionConfig =
new LeaderElectionConfig(lock,
Duration.ofMillis(10000),
Duration.ofMillis(8000),
Duration.ofMillis(2000));
// 初始化 LeaderElector
LeaderElector leaderElector = new LeaderElector(leaderElectionConfig);
// 選舉 Leader
leaderElector.run(
() -> {
System.out.println("Do something when getting leadership.");
},
() -> {
System.out.println("Do something when losing leadership.");
});
}
}
2.3.4 ApplicationDispatcherBootstrap 啟動(dòng)作業(yè)
Dispatcher 通過(guò) ApplicationDispatcherBootstrap
利用異步線程和反射機(jī)制,執(zhí)行作業(yè)的 mian 函數(shù),并且使用輪訓(xùn)的方式不斷查詢作業(yè)的狀態(tài),執(zhí)行步驟如下:
步驟 1:通過(guò) ThreadLocal
控制 Context 對(duì)象,在外部創(chuàng)建好 applicationJobIds
的引用列表并且層層傳入,然后利用反射執(zhí)行用戶 main 函數(shù);
步驟 2:在 main 函數(shù)中通過(guò)執(zhí)行 execute 或 executeAysnc 生成流圖并提交作業(yè),接著把作業(yè) ID 保存到 submitJobIds
即 applicationJobIds
,因此 ApplicationDispatcherBootstrap
可以獲取提交的 jobId
步驟 3:循環(huán)每個(gè)作業(yè) ID 查詢其狀態(tài)是否為結(jié)束狀態(tài)。如果沒(méi)有結(jié)束,則一直輪訓(xùn)狀態(tài);如果全部結(jié)束,則退出并關(guān)閉集群。
2.3.5 申請(qǐng)資源啟動(dòng) TaskManager
說(shuō)明:
KubernetesResourceManagerDriver.requestResource
通過(guò) Kubernetes 申請(qǐng)資源啟動(dòng) TaskManager。