克隆三個主機,修改主機名分別為hadoop01,hadoop02,hadoop03:
[root@hadoop01 ~]# hostname
hadoop01
[root@hadoop01 ~]# cat /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=hadoop01
[root@hadoop01 ~]# vi /etc/sysconfig/network
[root@hadoop01 ~]# cat /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=hadoop02
[root@hadoop01 ~]# reboot
配置三臺機器:
[root@hadoop01 ~]# vi /etc/hosts
[root@hadoop01 ~]# cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.216.135 hadoop01
192.168.216.136 hadoop02
192.168.216.137 hadoop03
服務器功能規劃
hadoop01 | hadoop02 | hadoop03 |
---|---|---|
NameNode | ||
DataNode | DataNode | DataNode |
NodeManager | NodeManager | NodeManager |
HistoryServer | ResourceManager | SecondaryNameNode |
1,在第一臺機器上安裝新的Hadoop
為了和之前機器上安裝偽分布式Hadoop區分開來,我們將第一臺機器上的Hadoop服務都停止掉,然后在一個新的目錄/opt/modules/app下安裝另外一個Hadoop。我們采用先在第一臺機器上解壓、配置Hadoop,然后再分發到其他兩臺機器上的方式來安裝集群。
2,解壓Hadoop目錄
3,配置Hadoop JDK路徑修改hadoop-env.sh、mapred-env.sh、yarn-env.sh文件中的JDK路徑
4,配置core-site.xml
[root@hadoop01 hadoop]# vi core-site.xml
[root@hadoop01 hadoop]# cat core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop01:8020</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/modules/app/hadoop-2.5.0/data/tmp</value>
</property>
</configuration>
[root@hadoop01 hadoop]#
fs.defaultFS為NameNode的地址。
hadoop.tmp.dir為hadoop臨時目錄的地址,默認情況下,NameNode和DataNode的數據文件都會存在這個目錄下的對應子目錄下。應該保證此目錄是存在的,如果不存在,先創建。
5,配置hdfs-site.xml
[root@hadoop01 hadoop]# vi hdfs-site.xml
[root@hadoop01 hadoop]# cat hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hadoop03:50090</value>
</property>
</configuration>
dfs.namenode.secondary.http-address是指定secondaryNameNode的http訪問地址和端口號,因為在規劃中,我們將hadoop03規劃為SecondaryNameNode服務器。
6,配置slaves
[root@hadoop01 hadoop]# vi /opt/modules/app/hadoop/etc/hadoop/slaves
[root@hadoop01 hadoop]# cat /opt/modules/app/hadoop/etc/hadoop/slaves
hadoop01
hadoop02
hadoop03
slaves文件是指定HDFS上有哪些DataNode節點。
7,配置yarn-site.xml
[root@hadoop01 hadoop]# vi yarn-site.xml
[root@hadoop01 hadoop]# cat yarn-site.xml
<?xml version="1.0"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop02</value>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>106800</value>
</property>
</configuration>
根據規劃yarn.resourcemanager.hostname這個指定resourcemanager服務器指向hadoop02。
yarn.log-aggregation-enable是配置是否啟用日志聚集功能。
yarn.log-aggregation.retain-seconds是配置聚集的日志在HDFS上最多保存多長時間。
8,配置mapred-site.xml
[root@hadoop01 hadoop]# vi mapred-site.xml
[root@hadoop01 hadoop]# cat mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>hadoop01:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hadoop01:19888</value>
</property>
</configuration>
mapreduce.framework.name設置mapreduce任務運行在yarn上。
mapreduce.jobhistory.address是設置mapreduce的歷史服務器安裝在hadoop01機器上。
mapreduce.jobhistory.webapp.address是設置歷史服務器的web頁面地址和端口號。
9,設置SSH無密碼登錄
Hadoop集群中的各個機器間會相互地通過SSH訪問,每次訪問都輸入密碼是不現實的,所以要配置各個機器間的SSH是無密碼登錄的。
a. 在hadoop01上生成公鑰
[root@hadoop01 hadoop]# ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Created directory '/root/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
6c:c6:80:64:00:ec:ab:b0:94:21:71:2e:a8:8b:c2:40 root@hadoop01
The key's randomart image is:
+--[ RSA 2048]----+
|o...o |
|...o . |
|o+ . . |
|+E. + |
|+.+ S |
|++ o |
|Bo |
|*. |
|. |
+-----------------+
一路回車,都設置為默認值,然后再當前用戶的Home目錄下的.ssh目錄中會生成公鑰文件(id_rsa.pub)和私鑰文件(id_rsa)。
b. 分發公鑰
[root@hadoop01 hadoop]# yum install -y openssh-clients
[root@hadoop01 hadoop]# ssh-copy-id hadoop01
The authenticity of host 'hadoop01 (192.168.216.135)' can't be established.
RSA key fingerprint is bd:5c:85:99:82:b4:b9:9d:92:fa:35:48:63:e1:5c:ce.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'hadoop01,192.168.216.135' (RSA) to the list of known hosts.
root@hadoop01's password:
Now try logging into the machine, with "ssh 'hadoop01'", and check in:
.ssh/authorized_keys
to make sure we haven't added extra keys that you weren't expecting.
[root@hadoop01 hadoop]# ssh-copy-id hadoop02
[root@hadoop01 hadoop]# ssh-copy-id hadoop03
同樣的在hadoop02、hadoop03上生成公鑰和私鑰后,將公鑰分發到三臺機器上。
分發Hadoop文件
1,首先在其他兩臺機器上創建存放Hadoop的目錄
[root@hadoop02 ~]# mkdir -p /opt/modules/app
[root@hadoop03 ~]# mkdir -p /opt/modules/app
2,通過Scp分發
Hadoop根目錄下的share/doc目錄是存放的hadoop的文檔,文件相當大,建議在分發之前將這個目錄刪除掉,可以節省硬盤空間并能提高分發的速度。
[root@hadoop01 hadoop]# du -sh /opt/modules/app/hadoop/share/doc
[root@hadoop01 hadoop]# rm -rf /opt/modules/app/hadoop/share/doc/
[root@hadoop01 hadoop]# scp -r /opt/modules/app/hadoop/ hadoop02:/opt/modules/app
[root@hadoop01 hadoop]# scp -r /opt/modules/app/hadoop/ hadoop03:/opt/modules/app
3,格式NameNode
在NameNode機器上執行格式化:
[root@hadoop01 hadoop]# /opt/modules/app/hadoop/bin/hdfs namenode -format
如果需要重新格式化NameNode,需要先將原來NameNode和DataNode下的文件全部刪除,不然會報錯,NameNode和DataNode所在目錄是在core-site.xml中hadoop.tmp.dir、dfs.namenode.name.dir、dfs.datanode.data.dir屬性配置的。
因為每次格式化,默認是創建一個集群ID,并寫入NameNode和DataNode的VERSION文件中(VERSION文件所在目錄為dfs/name/current 和 dfs/data/current),重新格式化時,默認會生成一個新的集群ID,如果不刪除原來的目錄,會導致namenode中的VERSION文件中是新的集群ID,而DataNode中是舊的集群ID,不一致時會報錯。
另一種方法是格式化時指定集群ID參數,指定為舊的集群ID。
啟動集群
[root@hadoop01 sbin]# /opt/modules/app/hadoop/sbin/start-dfs.sh
18/09/11 07:07:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [hadoop01]
hadoop01: starting namenode, logging to /opt/modules/app/hadoop/logs/hadoop-root-namenode-hadoop01.out
hadoop03: starting datanode, logging to /opt/modules/app/hadoop/logs/hadoop-root-datanode-hadoop03.out
hadoop02: starting datanode, logging to /opt/modules/app/hadoop/logs/hadoop-root-datanode-hadoop02.out
hadoop01: starting datanode, logging to /opt/modules/app/hadoop/logs/hadoop-root-datanode-hadoop01.out
Starting secondary namenodes [hadoop03]
hadoop03: starting secondarynamenode, logging to /opt/modules/app/hadoop/logs/hadoop-root-secondarynamenode-hadoop03.out
18/09/11 07:07:21 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[root@hadoop01 sbin]#
[root@hadoop01 sbin]# jps
3185 Jps
2849 NameNode
2974 DataNode
[root@hadoop02 ~]# jps
2305 Jps
2227 DataNode
[root@hadoop03 ~]# jps
2390 Jps
2312 SecondaryNameNode
2217 DataNode
啟動yarn
[root@hadoop01 sbin]# /opt/modules/app/hadoop/sbin/start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /opt/modules/app/hadoop/logs/yarn-root-resourcemanager-hadoop01.out
hadoop02: starting nodemanager, logging to /opt/modules/app/hadoop/logs/yarn-root-nodemanager-hadoop02.out
hadoop03: starting nodemanager, logging to /opt/modules/app/hadoop/logs/yarn-root-nodemanager-hadoop03.out
hadoop01: starting nodemanager, logging to /opt/modules/app/hadoop/logs/yarn-root-nodemanager-hadoop01.out
[root@hadoop01 sbin]# jps
3473 Jps
3329 NodeManager
2849 NameNode
2974 DataNode
[root@hadoop01 sbin]#
[root@hadoop02 ~]# jps
2337 NodeManager
2227 DataNode
2456 Jps
[root@hadoop02 ~]#
[root@hadoop03 ~]# jps
2547 Jps
2312 SecondaryNameNode
2217 DataNode
2428 NodeManager
[root@hadoop03 ~]#
在hadoop02上啟動ResourceManager:
[root@hadoop02 ~]# /opt/modules/app/hadoop/sbin/yarn-daemon.sh start resourcemanager
starting resourcemanager, logging to /opt/modules/app/hadoop/logs/yarn-root-resourcemanager-hadoop02.out
[root@hadoop02 ~]# jps
2337 NodeManager
2227 DataNode
2708 Jps
2484 ResourceManager
[root@hadoop02 ~]#
啟動日志服務器
因為我們規劃的是在hadoop03服務器上運行MapReduce日志服務,所以要在hadoop03上啟動。
[root@hadoop03 ~]# /opt/modules/app/hadoop/sbin/mr-jobhistory-daemon.sh start historyserver
starting historyserver, logging to /opt/modules/app/hadoop/logs/mapred-root-historyserver-hadoop03.out
[root@hadoop03 ~]# jps
2312 SecondaryNameNode
2217 DataNode
2602 JobHistoryServer
2428 NodeManager
2639 Jps
[root@hadoop03 ~]#
配置windows里面的host
查看HDFS Web頁面
hadoop01:50070
查看YARN Web 頁面
hadoop02:8088
測試Job
我們這里用hadoop自帶的wordcount例子來在本地模式下測試跑mapreduce。
1、 準備mapreduce輸入文件wc.input
[hadoop@bigdata-senior01 modules]$ cat /opt/data/wc.input
hadoop mapreduce hive
hbase spark storm
sqoop hadoop hive
spark hadoop
2、 在HDFS創建輸入目錄input
[hadoop@bigdata-senior01 hadoop-2.5.0]$ bin/hdfs dfs -mkdir /input
3、 將wc.input上傳到HDFS
[hadoop@bigdata-senior01 hadoop-2.5.0]$ bin/hdfs dfs -put /opt/data/wc.input /input/wc.input
4、 運行hadoop自帶的mapreduce Demo
[hadoop@bigdata-senior01 hadoop-2.5.0]$ bin/yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0.jar wordcount /input/wc.input /output
5、 查看輸出文件
[hadoop@bigdata-senior01 hadoop-2.5.0]$ bin/hdfs dfs -ls /output
Found 2 items
-rw-r--r-- 3 hadoop supergroup 0 2016-07-14 16:36 /output/_SUCCESS
-rw-r--r-- 3 hadoop supergroup 60 2016-07-14 16:36 /output/part-r-00000