【Flink on k8s】Native Kubernetes Application 部署模式詳解

本文對(duì) Flink 的 Application、Per-Job 和 Session 部署模式進(jìn)行了對(duì)比分析。詳細(xì)介紹了 Native Kubernetes 場(chǎng)景下的 Application 部署模式,并且對(duì)整個(gè)啟動(dòng)流程進(jìn)行了源碼分析。

1.Native Kubernetes Application 簡(jiǎn)介

1.1 Flink 部署模式簡(jiǎn)介

Flink 的部署模式有 Application、Per-Job 和 Session 模式

Application、Per-Job 和 Session 部署模式的主要區(qū)別:
● 集群與作業(yè)的生命周期是否一致
● 資源的隔離程度
● 作業(yè)的 mian() 運(yùn)行在 client 還是集群上

Application 模式的特點(diǎn):① 作業(yè)與 Flink 集群打包在一起,在 JobManager 的啟動(dòng)時(shí)候會(huì)執(zhí)行作業(yè)的 main 函數(shù)直接啟動(dòng)作業(yè),而不需要通過(guò) Flink Client 提交作業(yè)。② 作業(yè)的生命周期與 Flink 集群的一致,即作業(yè)關(guān)閉后 Flink 集群也會(huì)關(guān)閉

說(shuō)明:Application 模式對(duì)比 Per-Job 模式最大的區(qū)別是前者使用 executeAsync() 提交作業(yè)(不阻塞),而后者使用 execute() 提交作業(yè)(阻塞),因此 Application 模式可以運(yùn)行多個(gè)作業(yè)

Per-Job 模式的特點(diǎn):作業(yè)與 Flink 集群不是打包在一起,在 JobManager 啟動(dòng)后需要通過(guò) Flink Client 提交作業(yè),即增加了網(wǎng)絡(luò)傳輸?shù)膲毫涂蛻舳说?CPU 資源。

Session 模式的特點(diǎn):常駐的 JobManager,多個(gè)作業(yè)共享同一個(gè)集群。如果其中一個(gè)作業(yè)異常導(dǎo)致 TaskManager 關(guān)閉,則該 TM 上的全部作業(yè)都會(huì)重新調(diào)度。

部署模式匯總.PNG

1.2 Flink Native Kubernetes Application 架構(gòu)圖

資源調(diào)度方面:Flink 支持 Kubernetes、Yarn 和 Mesos 資源調(diào)度器

Native 是指可以通過(guò)底層的資源調(diào)度管理器,實(shí)現(xiàn)彈性擴(kuò)縮容。Native Kubernetes Application 是指 Flink 采用 Application 的部署模式,并使用 Kubernetes 進(jìn)行資源管理。

用戶只需要通過(guò) Flink Client/CLI 啟動(dòng)作業(yè)。首先通過(guò) K8s 啟動(dòng) JobManager(deployment)的同時(shí)啟動(dòng)作業(yè),然后通過(guò) JobManager 內(nèi)部的 K8sResourceManager 模塊向 K8s 直接申請(qǐng) TaskManager 的資源并啟動(dòng),最后當(dāng) TM 注冊(cè)到 JM 后作業(yè)就提交到 TM。用戶在整個(gè)過(guò)程無(wú)需指定 TaskManager 資源的數(shù)量,而是由 JobManager 向 K8s 按需申請(qǐng)的。

flink native kubernetes application 架構(gòu)圖.png

Flink Application on Native Kubernetes 的實(shí)踐案例
《Flink on K8s 在阿里巴巴的實(shí)踐》
《Native Flink on K8s 在小紅書的實(shí)踐》
《Flink on K8s 在京東的持續(xù)優(yōu)化實(shí)踐》

2.啟動(dòng)流程詳解

2.1 啟動(dòng)流程總覽

image.png

2.2 啟動(dòng)腳本及其配置

$ ./bin/flink run-application \
    --target kubernetes-application \
    -Dkubernetes.cluster-id=top-speed-windowing-application \
    -Dkubernetes.container.image=172.1.45.167:5000/flink:1.13.6-scala_2.11 \
    local:///opt/flink/examples/streaming/TopSpeedWindowing.jar

Native Kubernetes Application 模式下,啟動(dòng)腳本 ./bin/flink 的必要參數(shù)有 --target kubernetes-application-Dkubernetes.cluster-id=***、-Dkubernetes.container.image=*** 和 作業(yè) jar 路徑 local:///***

2.3 啟動(dòng) JobManager 和作業(yè)

2.3.1 CliFrontend 入口

    public int parseAndRun(String[] args) {
        // 省略...
        try {
            // do action
            switch (action) {
                case ACTION_RUN:
                    run(params);
                    return 0;
                 // 匹配參數(shù) run-application
                case ACTION_RUN_APPLICATION:
                    runApplication(params);
                    return 0;
                case ACTION_LIST:
                    list(params);
                    return 0;
                // 省略...
    }


    protected void runApplication(String[] args) throws Exception {
        //  省略...

        //  創(chuàng)建 ApplicationDeployer 用于創(chuàng)建 Kubernetes ClusterDescriptor
        final ApplicationDeployer deployer =
                new ApplicationClusterDeployer(clusterClientServiceLoader);

        if (ProgramOptionsUtils.isPythonEntryPoint(commandLine)) {
            programOptions = ProgramOptionsUtils.createPythonProgramOptions(commandLine);
            effectiveConfiguration =
                    getEffectiveConfiguration(
                            activeCommandLine,
                            commandLine,
                            programOptions,
                            Collections.emptyList());
        } else {
            //  作業(yè)參數(shù),例如 jar 路徑、main 函數(shù)入口、args 入?yún)⒌鹊?            programOptions = new ProgramOptions(commandLine);
            programOptions.validate();
            final URI uri = PackagedProgramUtils.resolveURI(programOptions.getJarFilePath());
            effectiveConfiguration =
                    getEffectiveConfiguration(
                            activeCommandLine,
                            commandLine,
                            programOptions,
                            Collections.singletonList(uri.toString()));
        }

        final ApplicationConfiguration applicationConfiguration =
                new ApplicationConfiguration(
                        programOptions.getProgramArgs(), programOptions.getEntryPointClassName());
        //  提交用戶的作業(yè)并在集群中運(yùn)行其 main 函數(shù)
        deployer.run(effectiveConfiguration, applicationConfiguration);
    }

2.3.2 Flink Client 通過(guò) K8s Client 創(chuàng)建集群

public class ApplicationClusterDeployer implements ApplicationDeployer {
    // 省略...
    public <ClusterID> void run(
            final Configuration configuration,
            final ApplicationConfiguration applicationConfiguration)
            throws Exception {
        // 省略...
        // 通過(guò) ClusterClientServiceLoader 創(chuàng)建 KubernetesClusterClientFactory
        final ClusterClientFactory<ClusterID> clientFactory =
                clientServiceLoader.getClusterClientFactory(configuration);
        try (final ClusterDescriptor<ClusterID> clusterDescriptor =
                clientFactory.createClusterDescriptor(configuration)) {
            // 通過(guò) KubernetesClusterClientFactory 創(chuàng)建 KubernetesClusterDescriptor
            final ClusterSpecification clusterSpecification =
                    clientFactory.getClusterSpecification(configuration);
            // KubernetesClusterDescriptor 創(chuàng)建 application 集群
            clusterDescriptor.deployApplicationCluster(
                    clusterSpecification, applicationConfiguration);
        }
    }
}

public class KubernetesClusterDescriptor implements ClusterDescriptor<String> {
    // 省略...
    @Override
    public ClusterClientProvider<String> deployApplicationCluster(
            final ClusterSpecification clusterSpecification,
            final ApplicationConfiguration applicationConfiguration)
            throws ClusterDeploymentException {
        // 省略...
        // 指定集群入口 KubernetesApplicationClusterEntrypoint 部署/啟動(dòng)集群
        final ClusterClientProvider<String> clusterClientProvider =
                deployClusterInternal(
                        KubernetesApplicationClusterEntrypoint.class.getName(),
                        clusterSpecification,
                        false);
        // 省略...
    }


    private ClusterClientProvider<String> deployClusterInternal(
            String entryPoint, ClusterSpecification clusterSpecification, boolean detached)
            throws ClusterDeploymentException {
        // 省略...
        // 設(shè)置集群配置,例如啟動(dòng)入口entry、blobserver端口、taskmanager rpc端口、rest端口等等
        flinkConfig.setString(KubernetesConfigOptionsInternal.ENTRY_POINT_CLASS, entryPoint);

        // Rpc, blob, rest, taskManagerRpc ports need to be exposed, so update them to fixed values.
        KubernetesUtils.checkAndUpdatePortConfigOption(
                flinkConfig, BlobServerOptions.PORT, Constants.BLOB_SERVER_PORT);
        KubernetesUtils.checkAndUpdatePortConfigOption(
                flinkConfig, TaskManagerOptions.RPC_PORT, Constants.TASK_MANAGER_RPC_PORT);
        KubernetesUtils.checkAndUpdatePortConfigOption(
                flinkConfig, RestOptions.BIND_PORT, Constants.REST_PORT);
        // 省略...

        // 配置 JobManager 的 PodTemplate
        try {
            final KubernetesJobManagerParameters kubernetesJobManagerParameters =
                    new KubernetesJobManagerParameters(flinkConfig, clusterSpecification);

            final FlinkPod podTemplate =
                    kubernetesJobManagerParameters
                            .getPodTemplateFilePath()
                            .map(
                                    file ->
                                            KubernetesUtils.loadPodFromTemplateFile(
                                                    client, file, Constants.MAIN_CONTAINER_NAME))
                            .orElse(new FlinkPod.Builder().build());
            // 配置 JobManager 的 Deployment
            // 配置 Deployment 的過(guò)程中,利用 CmdJobManagerDecorator 設(shè)置 JobManager main container 的啟動(dòng)命令,即 kubernetes-jobmanager.sh kubernetes-application
            final KubernetesJobManagerSpecification kubernetesJobManagerSpec =
                    KubernetesJobManagerFactory.buildKubernetesJobManagerSpecification(
                            podTemplate, kubernetesJobManagerParameters);

            client.createJobManagerComponent(kubernetesJobManagerSpec);

            return createClusterClientProvider(clusterId);
            // 省略...
        } 
    }
}

public class Fabric8FlinkKubeClient implements FlinkKubeClient {
    @Override
    public void createJobManagerComponent(KubernetesJobManagerSpecification kubernetesJMSpec) {
        final Deployment deployment = kubernetesJMSpec.getDeployment();
        // 省略...
        // 利用  Fabric8 Kubernetes Client 創(chuàng)建 JobManager 的 deployment
        this.internalClient.resourceList(accompanyingResources).createOrReplace();
    }
}

2.3.3 容器內(nèi)啟動(dòng)集群

public final class KubernetesApplicationClusterEntrypoint extends ApplicationClusterEntryPoint {
    // 省略...
    public static void main(final String[] args) {
        // 省略...
        // 設(shè)置作業(yè)配置
        PackagedProgram program = null;
        try {
            program = getPackagedProgram(configuration);
        } catch (Exception e) {
            LOG.error("Could not create application program.", e);
            System.exit(1);
        }

        try {
            configureExecution(configuration, program);
        } catch (Exception e) {
            LOG.error("Could not apply application configuration.", e);
            System.exit(1);
        }

        final KubernetesApplicationClusterEntrypoint kubernetesApplicationClusterEntrypoint =
                new KubernetesApplicationClusterEntrypoint(configuration, program);
        // 利用 helper 啟動(dòng)集群
        ClusterEntrypoint.runClusterEntrypoint(kubernetesApplicationClusterEntrypoint);
    }
}

    private void runCluster(Configuration configuration, PluginManager pluginManager)
            throws Exception {
        synchronized (lock) {
            //  初始化 rpcserver、haservice、blobserver等
            initializeServices(configuration, pluginManager);
            //  省略...
            //  DispatcherResourceManagerComponent,其封裝Dispatcher、ResourceManager和WebMonitorEndpoint
            final DispatcherResourceManagerComponentFactory
                    dispatcherResourceManagerComponentFactory =
                            createDispatcherResourceManagerComponentFactory(configuration);
            //  內(nèi)部使用DispatcherRunnerFactory創(chuàng)建DispatcherRunner
            // 接著Dispatcher選主的時(shí)候,DefaultDispatcherRunner.grantLeadership() 啟動(dòng)新 dispatcher leader即startNewDispatcherLeaderProcess(),DispatcherLeaderProcess.start()會(huì)利用JobDispatcherLeaderProcess.create()創(chuàng)建ApplicationDispatcherBootstrap,最終調(diào)用ApplicationDispatcherBootstrap.runApplicationAsync()執(zhí)行用戶作業(yè)的main函數(shù)
            clusterComponent =
                    dispatcherResourceManagerComponentFactory.create(
                            configuration,
                            ioExecutor,
                            commonRpcService,
                            haServices,
                            blobServer,
                            heartbeatServices,
                            metricRegistry,
                            executionGraphInfoStore,
                            new RpcMetricQueryServiceRetriever(
                                    metricRegistry.getMetricQueryServiceRpcService()),
                            this);
            //  省略...
        }
    }

當(dāng) Dispatcher 選擇主節(jié)點(diǎn)的時(shí)候,DefaultDispatcherRunner.grantLeadership() -> DefaultDispatcherRunner.startNewDispatcherLeaderProcess() -> DispatcherLeaderProcess.start() -> JobDispatcherLeaderProcess.create()創(chuàng)建ApplicationDispatcherBootstrap -> ApplicationDispatcherBootstrap.runApplicationAsync() -> ... -> ClientUtils.executeProgram() 調(diào)用作業(yè)的 main函數(shù)

說(shuō)明:Dispatcher 選主是利用了 Kubernetes Client 的 LeaderElector,通過(guò) KubernetesLeaderElector 封裝 LeaderElector,最終利用 LeaderElectionEventHandler 處理選主的回調(diào)任務(wù),其樣例如下所示。

public class LeaderElectionExample {
    public static void main(String[] args) throws Exception {
        ApiClient client = Config.defaultClient();
        Configuration.setDefaultApiClient(client);
        String lockHolderIdentityName = InetAddress.getLocalHost().getHostAddress();
        // 創(chuàng)建 ConfigMap 鎖
        ConfigMapLock lock = new ConfigMapLock( "default", "leader-election-ip", lockHolderIdentityName);
        // Leader 選舉的配置
        LeaderElectionConfig leaderElectionConfig =
                new LeaderElectionConfig(lock,
                        Duration.ofMillis(10000),
                        Duration.ofMillis(8000),
                        Duration.ofMillis(2000));

        // 初始化 LeaderElector
        LeaderElector leaderElector = new LeaderElector(leaderElectionConfig);
        // 選舉 Leader
        leaderElector.run(
                () -> {
                    System.out.println("Do something when getting leadership.");
                },
                () -> {
                    System.out.println("Do something when losing leadership.");
                });
    }
}

2.3.4 ApplicationDispatcherBootstrap 啟動(dòng)作業(yè)

Dispatcher 通過(guò) ApplicationDispatcherBootstrap 利用異步線程和反射機(jī)制,執(zhí)行作業(yè)的 mian 函數(shù),并且使用輪訓(xùn)的方式不斷查詢作業(yè)的狀態(tài),執(zhí)行步驟如下:

步驟 1:通過(guò) ThreadLocal 控制 Context 對(duì)象,在外部創(chuàng)建好 applicationJobIds 的引用列表并且層層傳入,然后利用反射執(zhí)行用戶 main 函數(shù);

步驟 2:在 main 函數(shù)中通過(guò)執(zhí)行 execute 或 executeAysnc 生成流圖并提交作業(yè),接著把作業(yè) ID 保存到 submitJobIdsapplicationJobIds,因此 ApplicationDispatcherBootstrap 可以獲取提交的 jobId

步驟 3:循環(huán)每個(gè)作業(yè) ID 查詢其狀態(tài)是否為結(jié)束狀態(tài)。如果沒(méi)有結(jié)束,則一直輪訓(xùn)狀態(tài);如果全部結(jié)束,則退出并關(guān)閉集群。

image.png

2.3.5 申請(qǐng)資源啟動(dòng) TaskManager

說(shuō)明KubernetesResourceManagerDriver.requestResource 通過(guò) Kubernetes 申請(qǐng)資源啟動(dòng) TaskManager。

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

推薦閱讀更多精彩內(nèi)容