1. 借鑒
2. 開始
我們的集群規劃如下:
hadoop01[172.173.16.10] | hadoop02[172.173.16.11] | hadoop03[172.173.16.12] | |
---|---|---|---|
HDFS | NameNode DataNode |
DataNode | SecondaryNameNode DataNode |
YARN | NodeManager | ResourceManager NodeManager |
NodeManager |
PORT | 22,9000,50070 | 22 | 22 |
鏡像準備
- docker hub 下載
docker pull caiserkaiser/centos-ssh
創建自定義網絡
docker network create -d bridge --subnet "172.173.16.0/24" --gateway "172.173.16.1" datastore_net
啟動容器
docker run -it -d --network datastore_net --ip 172.173.16.10 --name hadoop01 caiser/centos-ssh:7.8
下載并配置hadoop
下載hadoop
-
拷貝到容器內
docker cp ~/Downloads/hadoop-2.7.2.tar.gz c446857be6c0:/opt/envs
-
配置hadoop環境變量
a. 解壓
tar -zxvf hadoop-2.7.2.tar.gz
b. 編寫 /etc/profile
vi /etc/profile
c. 配置
export HADOOP_HOME=/opt/envs/hadoop-2.7.2 export PATH=$PATH:$HADOOP_HOME/sbin export PATH=$PATH:$HADOOP_HOME/bin
d. 使生效
source /etc/profile
e. 復制到~/.bashrc目錄
cat /etc/profile >> ~/.bashrc
f. 使生效
source ~/.bashrc
hadoop-HDFS配置
core-site.xml
-
備份
cp /opt/envs/hadoop-2.7.2/etc/hadoop/core-site.xml /opt/envs/hadoop-2.7.2/etc/hadoop/core-site.xml.bak
-
編輯core-site.xml
vi /opt/envs/hadoop-2.7.2/etc/hadoop/core-site.xml
-
根據我們上面的規劃進行配置
<!-- 指定HDFS中NameNode的地址 --> <property> <name>fs.defaultFS</name> <value>hdfs://hadoop01:9000</value> </property> <!-- 指定Hadoop運行時產生的臨時文件的存儲目錄 --> <property> <name>hadoop.tmp.dir</name> <value>/opt/datas/tmp</value> </property>
hadoop-env.sh
-
備份
cp /opt/envs/hadoop-2.7.2/etc/hadoop/hadoop-env.sh /opt/envs/hadoop-2.7.2/etc/hadoop/hadoop-env.sh.bak
-
編輯hadoop-env.sh
vi /opt/envs/hadoop-2.7.2/etc/hadoop/hadoop-env.sh
-
配置java home
a. 找到下面這兩行# The java implementation to use. export JAVA_HOME=${JAVA_HOME}
b. 然后替換為以下
export JAVA_HOME=/opt/envs/jdk1.8.0_251
hdfs-site.xml
-
備份
cp /opt/envs/hadoop-2.7.2/etc/hadoop/hdfs-site.xml /opt/envs/hadoop-2.7.2/etc/hadoop/hdfs-site.xml.bak
-
編輯hdfs-site.xml
vi /opt/envs/hadoop-2.7.2/etc/hadoop/hdfs-site.xml
-
根據我們的規劃進行配置
<!-- 設置數據副本數量 --> <property> <name>dfs.replication</name> <value>3</value> </property> <!-- 設置Hadoop Second NameNode的地址--> <property> <name>dfs.namenode.secondary.http-address</name> <value>hadoop03:50090</value> </property>
hadoop-YARN配置
yarn-site.xml
-
備份
cp /opt/envs/hadoop-2.7.2/etc/hadoop/yarn-site.xml /opt/envs/hadoop-2.7.2/etc/hadoop/yarn-site.xml.bak
-
編輯yarn-site.xml
vi /opt/envs/hadoop-2.7.2/etc/hadoop/yarn-site.xml
-
根據我們上面的規劃進行配置
<!-- 指定Reducer 獲取數據的方式 --> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <!-- 指定YARN中ResourceManager的地址 --> <property> <name>yarn.resourcemanager.hostname</name> <value>hadoop02</value> </property>
yarn-env.sh
-
備份
cp /opt/envs/hadoop-2.7.2/etc/hadoop/yarn-env.sh /opt/envs/hadoop-2.7.2/etc/hadoop/yarn-env.sh.bak
-
編輯yarn-env.sh
vi /opt/envs/hadoop-2.7.2/etc/hadoop/yarn-env.sh
-
配置java home
a. 找到下面這兩行# some Java parameters # export JAVA_HOME=/home/y/libexec/jdk1.6.0/
b. 然后替換為以下
export JAVA_HOME=/opt/envs/jdk1.8.0_251
hadoop-MapReduce配置
mapred-site.xml
-
備份[本身有template]
cp /opt/envs/hadoop-2.7.2/etc/hadoop/mapred-site.xml.template /opt/envs/hadoop-2.7.2/etc/hadoop/mapred-site.xml
-
編輯mapred-site.xml
vi /opt/envs/hadoop-2.7.2/etc/hadoop/mapred-site.xml
-
根據我們上面的規劃進行配置
<!-- 指定MapReduce 運行在YARN上--> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property>
mapred-env.sh
-
備份
cp /opt/envs/hadoop-2.7.2/etc/hadoop/mapred-env.sh /opt/envs/hadoop-2.7.2/etc/hadoop/mapred-env.sh.bak
-
編輯mapred-env.sh
vi /opt/envs/hadoop-2.7.2/etc/hadoop/mapred-env.sh
-
配置java home
a. 找到下面這一行# export JAVA_HOME=/home/y/libexec/jdk1.6.0/
b. 然后替換為以下
export JAVA_HOME=/opt/envs/jdk1.8.0_251
hadoop-節點配置
-
編輯/etc/hosts,并添加以下hostname
172.173.16.10 hadoop01 172.173.16.11 hadoop02 172.173.16.12 hadoop03
-
編輯slaves
vi /opt/envs/hadoop-2.7.2/etc/hadoop/slaves
-
配置節點[沒有空格,空行]
hadoop01 hadoop02 hadoop03
安裝which
好吧,這是我在后面format namenode的時候出現的問題,所以在這里最好是裝一下,否則,后面每臺機器都得運行一遍
yum install which
保存為鏡像并移除容器
docker commit c446857be6c0 caiser/hadoop:2.7.2
docker rm c446857be6c0
啟動容器
docker run -it -d --network datastore_net --ip 172.173.16.10 --name hadoop01 caiser/hadoop:2.7.2 bin/bash
docker run -it -d --network datastore_net --ip 172.173.16.11 --name hadoop02 caiser/hadoop:2.7.2 bin/bash
docker run -it -d --network datastore_net --ip 172.173.16.12 --name hadoop03 caiser/hadoop:2.7.2 bin/bash
配置ssh免密登錄
-
進入容器
docker exec -it hadoop01 /bin/bash
-
到~/.ssh目錄下生成秘鑰
ssh-keygen -t rsa
-
拷貝秘鑰到hadoop01,hadoop02和hadoop03
a.[如果沒開啟]三個容器沒有開啟ssh服務[ps -ef | grep ssh],需要依次執行
/usr/sbin/sshd -D &
b. 拷貝秘鑰
ssh-copy-id hadoop01 ssh-copy-id hadoop02 ssh-copy-id hadoop03
hadoop02和hadoop03依次執行上述1-3步驟
啟動hdfs
在hadoop01上啟動
因為我們已經將sbin加入到PATH,索引我們直接執行
-
格式化
hdfs namenode -format
-
啟動
start-dfs.sh
看到這種就是成了
Starting namenodes on [hadoop01]
hadoop01: starting namenode, logging to /opt/envs/hadoop-2.7.2/logs/hadoop-root-namenode-3118b3248ebd.out
hadoop03: starting datanode, logging to /opt/envs/hadoop-2.7.2/logs/hadoop-root-datanode-777368e252cd.out
hadoop02: starting datanode, logging to /opt/envs/hadoop-2.7.2/logs/hadoop-root-datanode-8a20f3cf05a1.out
hadoop01: starting datanode, logging to /opt/envs/hadoop-2.7.2/logs/hadoop-root-datanode-3118b3248ebd.out
Starting secondary namenodes [hadoop03]
hadoop03: starting secondarynamenode, logging to /opt/envs/hadoop-2.7.2/logs/hadoop-root-secondarynamenode-777368e252cd.out
啟動yarn
在hadoop02上啟動
因為我們已經將sbin加入到PATH,索引我們直接執行
-
啟動
start-yarn.sh
看到這種就是成了
starting yarn daemons
starting resourcemanager, logging to /opt/envs/hadoop-2.7.2/logs/yarn-root-resourcemanager-777368e252cd.out
hadoop01: starting nodemanager, logging to /opt/envs/hadoop-2.7.2/logs/yarn-root-nodemanager-3118b3248ebd.out
hadoop02: starting nodemanager, logging to /opt/envs/hadoop-2.7.2/logs/yarn-root-nodemanager-8a20f3cf05a1.out
hadoop03: starting nodemanager, logging to /opt/envs/hadoop-2.7.2/logs/yarn-root-nodemanager-777368e252cd.out