主機環(huán)境選用Ubuntu,分別是192.168.1.141,192.168.1.142,192.168.1.143,一主二仆的模式。
機器選用100多塊的arm linux,竟然能跑起來。
一、環(huán)境準(zhǔn)備
1、統(tǒng)一hosts名稱
Master:192.168.1.141
Slave:192.168.1.142 192.168.1.143
更改各個主機上的/etc/hosts
#主機信息
192.168.1.141 hadoop01
#添加節(jié)點的信息
192.168.1.142 hadoop02
192.168.1.143 hadoop03
2、配置Master主機到slave主機ssh免密碼登錄
slave機器上創(chuàng)建 ~/.ssh
root@OrangePi:/# ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Created directory '/root/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:eTjQhVzHIjWIAmP603tQYIf1/D+tSPDlrRD0D8bBEWY root@OrangePi
The key's randomart image is:
+---[RSA 2048]----+
| +.oooo ==.E. |
| o ooo.+=..*.. |
|. .o +...o |
| . . . . = o . |
| o o S + * |
| . o = * = |
| . . + + + |
| . . o + |
| . o |
+----[SHA256]-----+
root@OrangePi:/#
root@OrangePi:/# cd root
root@OrangePi:~# cd .ssh
root@OrangePi:~/.ssh# cat id_rsa.pub >>authorized_keys
ssh到hadoop03和02
root@OrangePi:~/.ssh# scp authorized_keys root@hadoop02:/root/.ssh/authorized_keys
root@hadoop02's password:
authorized_keys 100% 790 0.8KB/s 00:00
測試一下免密碼登錄
root@OrangePi:~/.ssh# ssh hadoop02
Welcome to Ubuntu 16.04.1 LTS (GNU/Linux 3.10.65 aarch64)
記得slave機器上執(zhí)行
sudo chmod 600 ~/.ssh/authorized_keys
主機全部互信
scp ~/.ssh/authorized_keys hadoop01:/root/.ssh/authorized_keys
scp ~/.ssh/authorized_keys hadoop02:/root/.ssh/authorized_keys
scp ~/.ssh/authorized_keys hadoop03:/root/.ssh/authorized_keys
3、各主機安裝開啟ntp
# sudo apt-get install ntp
# service ntp start
4、安裝jdk
sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java8-installer
root@OrangePi:/# java -version
java version "1.8.0_171"
Java(TM) SE Runtime Environment (build 1.8.0_171-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.171-b11, mixed mode)
精簡方式的jdk home路徑為 /usr/lib/jvm/java-8-oracle
寫入etc/profile
export JAVA_HOME=/usr/lib/jvm/java-8-oracle
export JRE_HOME=$JAVA_HOME/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=${JAVA_HOMR}/bin:$PATH
二、Hadoop集群安裝
http://hadoop.apache.org/
1、創(chuàng)建目錄
root@OrangePi:~# mkdir /home/data
root@OrangePi:~# mkdir /home/data/hdfs
root@OrangePi:~# cd /home/data/hdfs
root@OrangePi:/home/data/hdfs# mkdir name
root@OrangePi:/home/data/hdfs# mkdir data
root@OrangePi:/home/data/hdfs# mkdir tmp
root@OrangePi:/home/data/hdfs# sudo chmod -R 777 /home/data
在slave機器上執(zhí)行
mkdir /home/data
mkdir /home/data/hdfs
cd /home/data/hdfs
mkdir name
mkdir data
mkdir tmp
配置etc/profile
export JAVA_HOME=/usr/lib/jvm/java-8-oracle
export JRE_HOME=$JAVA_HOME/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=${JAVA_HOMR}/bin:$PATH
export HADOOP_HOME=/home/hadoop-3.1.0
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_YARN_HOME=$HADOOP_HOME
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_CONF_DIR=$HADOOP_HOME
export HADOOP_PREFIX=$HADOOP_HOME
export HADOOP_LIBEXEC_DIR=$HADOOP_HOME/libexec
export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native:$JAVA_LIBRARY_PATH
export HADOOP_CONF_DIR=$HADOOP_PREFIX/etc/hadoop
export HDFS_DATANODE_USER=root
export HDFS_DATANODE_SECURE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export HDFS_NAMENODE_USER=root
刷新啟用命令
source /etc/profile
2、安裝配置Hadoop
http://hadoop.apache.org/releases.html
cd /home/
mkdir hadoop
wget http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-3.1.0/hadoop-3.1.0.tar.gz
tar zxvf hadoop-3.1.0.tar.gz -C /home/
3、配置core-site.xml
/home/hadoop-3.1.0/etc/hadoop\core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://hadoop01:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/data/hdfs/tmp</value>
</property>
</configuration>
4、配置hdfs-site.xml
基本配置包括副本數(shù)量,數(shù)據(jù)存放目錄等。
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/data/hdfs/name</value>
</property>
<property>
<name>dfs.namenode.data.dir</name>
<value>/home/data/hdfs/data</value>
</property>
</configuration>
5、配置yarn-site.xml
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop01</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
6、配置mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.application.classpath</name>
<value>
/home/hadoop-3.1.0/etc/hadoop,
/home/hadoop-3.1.0/share/hadoop/common/*,
/home/hadoop-3.1.0/share/hadoop/common/lib/*,
/home/hadoop-3.1.0/share/hadoop/hdfs/*,
/home/hadoop-3.1.0/share/hadoop/hdfs/lib/*,
/home/hadoop-3.1.0/share/hadoop/mapreduce/*,
/home/hadoop-3.1.0/share/hadoop/mapreduce/lib/*,
/home/hadoop-3.1.0/share/hadoop/yarn/*,
/home/hadoop-3.1.0/share/hadoop/yarn/lib/*
</value>
</property>
</configuration>
7、配置slave
etc/hadoop/workers
hadoop01
hadoop02
hadoop03
8、配置java_home(根據(jù)具體的java home配置)
etc/hadoop/hadoop-env.sh
# The java implementation to use. By default, this environment
# variable is REQUIRED on ALL platforms except OS X!
#export JAVA_HOME= /usr/lib/jvm/java-8-oracle
9、復(fù)制配置到slave
cd /home
scp -r hadoop-3.1.0 hadoop02:/home/
scp -r hadoop-3.1.0 hadoop03:/home/
10、配置path
/etc/profile
export HADOOP_HOME=/home/hadoop-3.1.0
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
source /etc/profile
三、Hadoop集群啟動運行(master機器上執(zhí)行)
1、啟動namenode
格式化HDFS文件系統(tǒng)
#hadoop namenode -format
root@Hadoop01:~# ps -ef | grep hadoop
root 3047 2756 0 10:06 pts/0 00:00:00 grep --color=auto hadoop
現(xiàn)在啟動namenode守護進程
# hadoop-daemon.sh start namenode
2、啟動datanode
hdfs --daemon start namenode
hdfs --daemon start datanode
yarn --daemon start resourcemanager
yarn --daemon start nodemanager
root@Hadoop01:/home# jps
5104 ResourceManager
5351 NodeManager
5000 DataNode
5375 Jps
3、一步啟動方式成功
start-all.sh
stop-all.sh
http://192.168.1.141:8088/cluster/nodes
相關(guān)端口
http://192.168.1.141:9870/dfshealth.html#tab-overview
4、驗證sample
home下建test.txt
內(nèi)容
hello word china chinese korea
groupby
建立目錄
hadoop fs -mkdir /input
#hadoop fs -put test.txt /input
列出目錄
hadoop fs -ls /
Found 1 items
drwxr-xr-x - root supergroup 0 2018-05-11 06:47 /input
刪除文件夾
hadoop fs -rm -r /output
#hadoop jar /home/hadoop-3.1.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.0.jar wordcount /input /output
Map-Reduce Framework
Map input records=2
Map output records=6
Map output bytes=63
Map output materialized bytes=81
Input split bytes=100
Combine input records=6
Combine output records=6
Reduce input groups=6
Reduce shuffle bytes=81
Reduce input records=6
Reduce output records=6
Spilled Records=12
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=1088
CPU time spent (ms)=4840
Physical memory (bytes) snapshot=326569984
Virtual memory (bytes) snapshot=3757453312
Total committed heap usage (bytes)=144109568
Peak Map Physical memory (bytes)=210546688
Peak Map Virtual memory (bytes)=2002776064
Peak Reduce Physical memory (bytes)=116023296
Peak Reduce Virtual memory (bytes)=1754677248
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=38
File Output Format Counters
Bytes Written=51
查看結(jié)果
root@Hadoop01:/home# hadoop fs -ls /output
WARNING: HADOOP_PREFIX has been replaced by HADOOP_HOME. Using value of HADOOP_PREFIX.
2018-05-11 13:31:47,807 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 2 items
-rw-r--r-- 2 root supergroup 0 2018-05-11 13:30 /output/_SUCCESS
-rw-r--r-- 2 root supergroup 51 2018-05-11 13:30 /output/part-r-00000
統(tǒng)計單詞結(jié)果
root@Hadoop01:/home# hadoop fs -cat /output/part-r-00000
WARNING: HADOOP_PREFIX has been replaced by HADOOP_HOME. Using value of HADOOP_PREFIX.
2018-05-11 13:32:48,377 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
china 1
chinese 1
groupby 1
hello 1
korea 1
word 1
每個文件默認blocksize=128mb
5、解決超出節(jié)點內(nèi)存的問題
mapred-site.xml
<property>
<name>mapreduce.map.memory.mb</name>
<value>512</value>
</property>
<property>
<name>mapreduce.map.java.opts</name>
<value>-Xmx512M</value>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>512</value>
</property>
<property>
<name>mapreduce.reduce.java.opts</name>
<value>-Xmx256M</value>
</property>
6、解決hadoop時間跟系統(tǒng)不一致
# cat hadoop-env.sh
.........
export HADOOP_OPTS="$HADOOP_OPTS -Duser.timezone=GMT+08"
.........
# cat yarn-env.sh
.........
YARN_OPTS="$YARN_OPTS -Duser.timezone=GMT+08"
.........
涉及到hbase的也設(shè)置時區(qū)
# cat hbase-env.sh
.........
export TZ="Asia/Shanghai"
.........
三、安裝zookeeper集群
1、下載安裝zookeeper 3.4.10版本
wget http://mirror.bit.edu.cn/apache/zookeeper/zookeeper-3.4.10/zookeeper-3.4.10.tar.gz
tar zxvf zookeeper-3.4.10.tar.gz
2、配置文件
mkdir /home/zookeeper-3.4.10/data
mkdir -p /home/zookeeper-3.4.10/datalog
cd /home/zookeeper-3.4.10/conf
復(fù)制配置文件
cp zoo_sample.cfg zoo.cfg
配置文件內(nèi)容
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
dataDir=/home/zookeeper-3.4.10/data
dataLogDir=/home/zookeeper-3.4.10/datalog
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1
server.0=hadoop01:2888:3888
server.1=hadoop02:2888:3888
server.2=hadoop03:2888:3888
3、制作myid文件
在zookeeper的data目錄下創(chuàng)建myid文件,master機內(nèi)容0,其他未1和2;
4、復(fù)制zookeeper到從機(復(fù)制完成記得修改myid)
scp -r zookeeper-3.4.10 hadoop02:/home/
scp -r zookeeper-3.4.10 hadoop03:/home/
5、配置各臺主機的Profile文件
etc/profile添加
export ZOOKEEPER_HOME=/home/zookeeper-3.4.10/data
export PATH=$PATH:$ZOOKEEPER_HOME/bin:$ZOOKEEPER_HOME/conf
記得 source /etc/profile生效
四、啟動zookeeper集群
1、各個主機啟動zookeeper
root@Hadoop01:/home# zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /home/zookeeper-3.4.10/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
root@Hadoop01:/home# jps
7105 DataNode
6982 NameNode
7272 SecondaryNameNode
7580 ResourceManager
8860 QuorumPeerMain
8878 Jps
7695 NodeManager
root@Hadoop01:/home#
1和3默認成 follower2號機默認為leader
root@Hadoop03:~# zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /home/zookeeper-3.4.10/bin/../conf/zoo.cfg
Mode: follower
root@Hadoop03:~#
停止命令
zkServer.sh stop
五、配置hadoop相關(guān)zookeeper
1、在各主機上建立journal目錄
mkdir /home/data/journal
2、修改core-site.xml
<!-- 指定hdfs的nameservice為ns -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://ns</value>
</property>
<!--指定hadoop數(shù)據(jù)臨時存放目錄-->
<property>
<name>hadoop.tmp.dir</name>
<value>/home/data/hdfs/tmp</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>4096</value>
</property>
<!--指定zookeeper地址-->
<property>
<name>ha.zookeeper.quorum</name>
<value>hadoop01:2181,hadoop02:2181,hadoop03:2181</value>
</property>
2、修改hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<!--指定hdfs的nameservice為ns,需要和core-site.xml中的保持一致 -->
<property>
<name>dfs.nameservices</name>
<value>ns</value>
</property>
<!-- ns下面有兩個NameNode,分別是nn1,nn2 -->
<property>
<name>dfs.ha.namenodes.ns</name>
<value>nn1,nn2</value>
</property>
<!-- nn1的RPC通信地址 -->
<property>
<name>dfs.namenode.rpc-address.ns.nn1</name>
<value>hadoop01:9820</value>
</property>
<!-- nn1的http通信地址 -->
<property>
<name>dfs.namenode.http-address.ns.nn1</name>
<value>hadoop01:9870</value>
</property>
<!-- nn2的RPC通信地址 -->
<property>
<name>dfs.namenode.rpc-address.ns.nn2</name>
<value>hadoop02:9820</value>
</property>
<!-- nn2的http通信地址 -->
<property>
<name>dfs.namenode.http-address.ns.nn2</name>
<value>hadoop02:9870</value>
</property>
<!-- 指定NameNode的元數(shù)據(jù)在JournalNode上的存放位置 -->
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://hadoop01;hadoop02;hadoop03/ns</value>
</property>
<!-- 指定JournalNode在本地磁盤存放數(shù)據(jù)的位置 -->
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/home/data/journal</value>
</property>
<!-- 開啟NameNode故障時自動切換 -->
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<!-- 配置失敗自動切換實現(xiàn)方式 -->
<property>
<name>dfs.client.failover.proxy.provider.ns</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<!-- 配置隔離機制,如果ssh是默認22端口,value直接寫sshfence即可(hadoop:22022) -->
<property>
<name>dfs.ha.fencing.methods</name>
<!-- <value>sshfence</value> -->
<value>
sshfence
shell(/bin/true)
</value>
</property>
<!-- 使用隔離機制時需要ssh免登陸 -->
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/root/.ssh/id_rsa</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/data/hdfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/data/hdfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<!-- 在NN和DN上開啟WebHDFS (REST API)功能,不是必須 -->
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>
同步文件
scp -r /home/hadoop-3.1.0/etc/hadoop hadoop02:/home/hadoop-3.1.0/etc
scp -r /home/hadoop-3.1.0/etc/hadoop hadoop03:/home/hadoop-3.1.0/etc
3、首次啟動
1、首先啟動各個節(jié)點的Zookeeper,在各個節(jié)點上執(zhí)行以下命令:
zkServer.sh start
2、在某一個namenode節(jié)點執(zhí)行如下命令,創(chuàng)建命名空間
hdfs zkfc -formatZK
3、在每個journalnode節(jié)點用如下命令啟動journalnode
hdfs --daemon start journalnode
4、在主namenode節(jié)點格式化namenode和journalnode目錄
hdfs namenode -format ns
5、在主namenode節(jié)點啟動namenode進程
hdfs --daemon start namenode
6、在備namenode節(jié)點執(zhí)行第一行命令,這個是把備namenode節(jié)點的目錄格式化并把元數(shù)據(jù)從主namenode節(jié)點copy過來,并且這個命令不會把journalnode目錄再格式化了!然后用第二個命令啟動備namenode進程!
hdfs namenode -bootstrapStandby
hdfs --daemon start namenode
7、在兩個namenode節(jié)點都執(zhí)行以下命令
hdfs --daemon start zkfc
8、在所有datanode節(jié)點都執(zhí)行以下命令啟動datanode
hadoop-daemon.sh start datanode
http://192.168.1.142:9870/dfshealth.html#tab-overview
http://192.168.1.141:9870/dfshealth.html#tab-overview
后續(xù)日常
start-all.sh
stop-all.sh
即可
3、故障測試
在02上
root@Hadoop02:~# jps
3410 QuorumPeerMain
5636 DFSZKFailoverController
5765 NodeManager
5367 DataNode
5287 NameNode
5498 JournalNode
5979 Jps
kill namenode
root@Hadoop02:~# kill -9 5287
回去看standby的是否變成active自動切換成功圖片
至此,安裝全部完成,從安裝系統(tǒng)到完全跑通,歷時2.5天時間。