1. 借鑒
使用Docker在本地搭建Flink分布式集群
基于docker構建flink大數據處理平臺
Flink集群搭建
Hadoop 集成
flink學習筆記-環境搭建篇
Apache Flink零基礎入門(一):基礎概念解析
Flink on YARN的第三種部署模式:Application Mode
Flink 系列(八)—— Flink Standalone 集群部署
2. 開始
我們的集群規劃如下:
flink01[172.173.16.23] | flink02[172.173.16.24] | flink03[172.173.16.25] | |
---|---|---|---|
JOB MANAGER | Master | Slave | Slave |
HDFS | DataNode | DataNode | DataNode |
YARN | NodeManager | NodeManager | NodeManager |
PORT | 8086 | 22 | 22 |
同時需要依賴hadoop集群,所以也一起列出來
hadoop01[172.173.16.10] | hadoop02[172.173.16.11] | hadoop03[172.173.16.12] | |
---|---|---|---|
HDFS | NameNode DataNode |
DataNode | SecondaryNameNode DataNode |
YARN | NodeManager | ResourceManager NodeManager |
NodeManager |
PORT | 22,9000,50070 | 22 | 22 |
鏡像準備
on yarn模式是依托于hadoop的,所以flink機器上需要hadoop環境
方式 1. docker hub 下載
docker pull caiserkaiser/hadoop:2.7.2
方式 2. 構建
caiser/hadoop:2.7.2 鏡像
創建自定義網絡
docker network create -d bridge --subnet "172.173.16.0/24" --gateway "172.173.16.1" datastore_net
啟動容器
docker run -it -d --network datastore_net --ip 172.173.16.23 --name flink01 caiser/hadoop:2.7.2
下載并配置flink
-
拷貝到容器內
docker cp ~/Downloads/flink-1.10.2-bin-scala_2.12.tgz ccd2b9cb65a5:/opt/envs
-
解壓
tar -zxvf flink-1.10.2-bin-scala_2.12.tgz
-
配置hadoop環境
①. vi /etc/profile
②. 配置 HADOOP_HOME
export HADOOP_HOME=/opt/envs/hadoop-2.7.2 export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
③. soure /etc/profile
-
修改masters文件
- a. 備份
cp /opt/envs/flink-1.10.2/conf/masters /opt/envs/flink-1.10.2/conf/masters.bak
- b. 修改為
flink01:8081
-
修改slaves文件
- a. 備份
cp /opt/envs/flink-1.10.2/conf/slaves /opt/envs/flink-1.10.2/conf/slaves.bak
- b. 修改為
flink01 flink02 flink03
-
配置flink-shaded-hadoop
docker cp ~/Downloads/flink-shaded-hadoop-2-uber-2.7.5-10.0.jar ccd2b9cb65a5:/opt/envs/flink-1.10.2/lib
-
修改flink-conf.yaml配置文件
- a. 備份
cp /opt/envs/flink-1.10.2/conf/flink-conf.yaml /opt/envs/flink-1.10.2/conf/flink-conf.yaml.bak
- b. 設置如下:
jobmanager.rpc.address: flink01 taskmanager.memory.process.size: 1024m rest.bind-port: 8086 web.submit.enable: true
flink-節點配置
編輯/etc/hosts,并添加以下hostname
172.173.16.23 flink01
172.173.16.24 flink02
172.173.16.25 flink03
安裝which
yum install which
保存為鏡像并移除容器
docker commit ccd2b9cb65a5 caiser/flink:1.10.2
docker stop ccd2b9cb65a5
docker rm ccd2b9cb65a5
啟動容器
docker run -it -d --network datastore_net --ip 172.173.16.23 --name flink01 caiser/flink:1.10.2 bin/bash
docker run -it -d --network datastore_net --ip 172.173.16.24 --name flink02 caiser/flink:1.10.2 bin/bash
docker run -it -d --network datastore_net --ip 172.173.16.25 --name flink03 caiser/flink:1.10.2 bin/bash
配置ssh免密登錄
-
進入容器
docker exec -it flink01 /bin/bash
-
到~/.ssh目錄下生成秘鑰
ssh-keygen -t rsa
-
拷貝秘鑰到flink01,flink02和flink03
a.[如果沒開啟]開啟ssh服務[ps -ef | grep ssh]
/usr/sbin/sshd -D &
b. 拷貝秘鑰到flink01,flink02,flink03
ssh-copy-id flink01 ssh-copy-id flink02 ssh-copy-id flink03
flink02和flink03依次執行上述1-3步驟
啟動(ON YARN)
第一步:向hadoop集群中添加節點
① 在flink01機器的hadoop/sbin目錄下啟動datanode:
hadoop-daemon.sh start datanode
② 在flink01機器的hadoop/sbin目錄下啟動nodemanager
yarn-daemon.sh start nodemanager
③ jps查看datanode和nodemanager是否已啟動
④ 在flink02和flink03上重復以上操作
⑤ 回到namenode節點打印集群信息,或網頁登錄50070端口查看節點數量
hdfs dfsadmin -report
第二步:執行yarn-session
在flink01 flink/bin中執行
./yarn-session.sh -n 3 -s 1-jm 1024 -tm 1024
看到以下內容則說明啟動成功
ps. 也可以后臺運行./yarn-session.sh -n 3 -s 1-jm 1024 -tm 1024 -d
[root@3b5491eb3eb9 bin]# ./yarn-session.sh -n 3 -s 1-jm 1024 -tm 1024
2020-11-21 16:43:12,787 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.address, flink01
2020-11-21 16:43:12,791 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.port, 6123
2020-11-21 16:43:12,791 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.heap.size, 1024m
2020-11-21 16:43:12,792 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.memory.process.size, 1024m
2020-11-21 16:43:12,792 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.numberOfTaskSlots, 1
2020-11-21 16:43:12,792 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: parallelism.default, 1
2020-11-21 16:43:12,793 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.execution.failover-strategy, region
2020-11-21 16:43:12,793 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: rest.bind-port, 8086
2020-11-21 16:43:12,794 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: web.submit.enable, true
2020-11-21 16:43:13,592 WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2020-11-21 16:43:13,783 INFO org.apache.flink.runtime.security.modules.HadoopModule - Hadoop user set to root (auth:SIMPLE)
2020-11-21 16:43:13,872 INFO org.apache.flink.runtime.security.modules.JaasModule - Jaas file will be created as /tmp/jaas-3674228881056681551.conf.
2020-11-21 16:43:13,893 WARN org.apache.flink.yarn.cli.FlinkYarnSessionCli - The configuration directory ('/opt/envs/flink-1.10.2/conf') already contains a LOG4J config file.If you want to use logback, then please delete or rename the log configuration file.
2020-11-21 16:43:14,012 INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at hadoop02/172.173.16.11:8032
2020-11-21 16:43:14,375 INFO org.apache.flink.runtime.clusterframework.TaskExecutorProcessUtils - The derived from fraction jvm overhead memory (102.400mb (107374184 bytes)) is less than its min value 192.000mb (201326592 bytes), min value will be used instead
2020-11-21 16:43:14,376 INFO org.apache.flink.runtime.clusterframework.TaskExecutorProcessUtils - The derived from fraction network memory (57.600mb (60397978 bytes)) is less than its min value 64.000mb (67108864 bytes), min value will be used instead
2020-11-21 16:43:14,609 INFO org.apache.flink.yarn.YarnClusterDescriptor - Cluster specification: ClusterSpecification{masterMemoryMB=1024, taskManagerMemoryMB=1024, slotsPerTaskManager=1}
2020-11-21 16:43:17,126 INFO org.apache.flink.yarn.YarnClusterDescriptor - Submitting application master application_1605959782595_0008
2020-11-21 16:43:17,179 INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1605959782595_0008
2020-11-21 16:43:17,180 INFO org.apache.flink.yarn.YarnClusterDescriptor - Waiting for the cluster to be allocated
2020-11-21 16:43:17,182 INFO org.apache.flink.yarn.YarnClusterDescriptor - Deploying cluster, current state ACCEPTED
2020-11-21 16:43:24,733 INFO org.apache.flink.yarn.YarnClusterDescriptor - YARN application has been deployed successfully.
2020-11-21 16:43:24,739 INFO org.apache.flink.yarn.YarnClusterDescriptor - Found Web Interface 8f4fdb3626d6:8086 of application 'application_1605959782595_0008'.
JobManager Web Interface: http://8f4fdb3626d6:8086
注:在這種模式下,WEB UI的host是會變的,所以開發還是使用單機或者單機集群模式。
啟動(STANDALONE CLUSTER)
第一步:執行start-cluster
在任意一臺機器上的 flink/bin中執行
./start-cluster.sh
接著訪問8081(默認端口)或者8086(本文配置端口)就可以訪問了
注:需要做端口映射
動態添加端口
"ExposedPorts":{"8086/tcp":{}}
"PortBindings":{"8086/tcp":[{"HostIp":"","HostPort":"8086"}]}