hadoop集群搭建(hive、spark)

hadoop集群HA環(huán)境搭建


  1. 準(zhǔn)備工具
  • VMware12安裝包, 破解序列號:5A02H-AU243-TZJ49-GTC7K-3C61N
  • ubuntu 14.04 安裝包 (如果采用克隆或者直接復(fù)制的虛擬機,VMware需要重新生成一個mac地址)
  • hadoop-2.7.1
  • zookeeper-3.4.8
  • 7臺虛擬機
  • 虛擬機需要全部關(guān)閉防火墻,完全分布式模式也要將所有機器的防火墻關(guān)閉!否則zookeeper集群無法啟動;
  • 網(wǎng)絡(luò)模式,使用NAT 模式,即所有虛擬機和主機中的VM8虛擬網(wǎng)卡共享一個IP地址,7臺虛擬機網(wǎng)絡(luò)需要在同一個網(wǎng)段,這樣便可以使用xshell或者secureCRT方便來操控7臺虛擬機,保證主機能夠ping通7臺虛擬機的
  • 部署模式,完全分布式模式,其中兩臺為namenode,兩臺為yarn,三臺主機為slaves。
  • 實際情況下一臺yarn也是可以的
  • 注意:所以機器的系統(tǒng)時間要保證相差在10分鐘以內(nèi),否則任務(wù)執(zhí)行會失敗;
  1. 修改每一臺主機的主機名 vim /etc/hostname
user1
user2
user3
user4
user5
user6
user7
  • 其中user1、user2為namenode,user3和user為yarn,user5、user6、user7為slaves節(jié)點,同時user5、user6、user7上面安裝zookeeper
  1. ip地址配置 vim /etc/network/interfaces ,其中user1的網(wǎng)絡(luò)ip設(shè)置為
# The loopback network interface 
auto lo 
iface lo inet loopback 
 
# The primary network interface 
auto eth0 
#iface eth0 inet dhcp 
iface eth0 inet static 
address 192.168.18.11 
netmask 255.255.255.0 
gateway 192.168.18.10 
dns-nameservers 192.168.18.10
  • user1-user7 ip分別從12-17
  • 培訓(xùn)的虛擬機已經(jīng)關(guān)閉防火墻,另外dns需要加上name
  1. 修改hosts,實現(xiàn)使用主機名ping通 vim /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

192.168.18.11 user1
192.168.18.12 user2
192.168.18.13 user3
192.168.18.14 user4
192.168.18.15 user5
192.168.18.16 user6
192.168.18.17 user7
  • 修改完畢后,所有節(jié)點使用reboot命令全部重啟機器,一定要重啟機器或者重啟網(wǎng)絡(luò)服務(wù),否則/etc/hosts文件很可能不會生效;
  1. 主機之間ssh免密碼登陸
  • 在每一個虛擬機上使用命令ssh-keygen -t rsa,一直回車,得到一個公鑰、私鑰、認(rèn)證文件
  • 在虛擬機user1至user7上分別將公鑰全部拷貝給user7(包括自己一共7次) 使用命令 :
ssh-copy-id -i /root/.ssh/id_rsa.pub  user7
  • 然后cat /root/.ssh/authorized_keys查看user7的認(rèn)證文件中是否有7條認(rèn)證信息
  • 然后將user7上的認(rèn)證文件分別拷貝到user1-user6,覆蓋他們的原有的認(rèn)證文件
scp /root/.ssh/authorized_keys user1:/root/.ssh/authorized_keys
scp /root/.ssh/authorized_keys user2:/root/.ssh/authorized_keys
scp /root/.ssh/authorized_keys user3:/root/.ssh/authorized_keys
scp /root/.ssh/authorized_keys user4:/root/.ssh/authorized_keys
scp /root/.ssh/authorized_keys user5:/root/.ssh/authorized_keys
scp /root/.ssh/authorized_keys user6:/root/.ssh/authorized_keys
  • 至此,主機之間ssh免密碼登陸基本完成,此時需要在每一個節(jié)點上分別ssh一次其他的節(jié)點,一共6*6=36次,以防造成通信情況
  1. 安裝jdk虛擬機
  • 在user1機器上新建一個目錄 : mkdir /ittest
  • 將準(zhǔn)備好的jdk安裝包解壓到該路徑下: tar -zxvf jdk-8u72-linux-x64.tar.gz -C /ittest/
  • 設(shè)置環(huán)境變量 :vim /etc/profile
  • 在文件/etc/profile 中添加以下內(nèi)容:
export JAVA_HOME=/ittest/jdk1.8.0_72/
export JRE_HOME=/ittest/jdk1.8.0_72/jre
export PATH=$PATH:$JAVA_HOME/bin
export CLASSPATH=.:$JAVA_HOME/lib:$JAVA_HOME/jre/lib
  • 然后使用命令source /etc/profile 使得環(huán)境變量立即生效
  • 將user1的jdk安裝文件和環(huán)境變量文件拷貝到其他六個節(jié)點上 :
scp -r /ittest/  user2:/
scp -r /ittest/  user3:/
scp -r /ittest/  user4:/
scp -r /ittest/  user5:/
scp -r /ittest/  user6:/
scp -r /ittest/  user7:/
scp /etc/profile user2:/etc/profile
scp /etc/profile user3:/etc/profile
scp /etc/profile user4:/etc/profile
scp /etc/profile user5:/etc/profile
scp /etc/profile user6:/etc/profile
scp /etc/profile user7:/etc/profile
  • 然后分別在其他六個節(jié)點上使用命令 : source /etc/profile
  • 在七個節(jié)點上分別使用javajavac或者jps命令,如果不報錯顯示一堆參數(shù),則jdk所有節(jié)點安裝正常。(hadoop是使用java開發(fā)的,所有的進(jìn)程必須運行在JVM虛擬機上)
  1. 安裝zookeeper
  • 將準(zhǔn)備好的zookeeper安裝包解壓至路徑 tar -zxvf zookeeper-3.4.8.tar.gz -C /ittest/
  • 進(jìn)入路徑 /ittest/zookeeper-3.4.8/conf
  • 將示例配置文件復(fù)制一份 cp zoo_sample.cfg zoo.cfg
  • zoo.cfg配置文件中修改dataDir=/ittest/zookeeper-3.4.8/tmp,尾行新增:
server.1=user5:2888:3888
server.2=user6:2888:3888
server.3=user7:2888:3888
  • “ server.id=host:port:port. ”指示了不同的 ZooKeeper 服務(wù)器的自身標(biāo)識,作為集群的一部分的機器應(yīng)該知道 ensemble 中的其它機器。用戶可以從“ server.id=host:port:port. ”中讀取相關(guān)的信息。
  • 然后在/ittest/zookeeper-3.4.8路徑下新建文件夾tmp
  • ( dataDir 參數(shù)所指定的目錄)目錄下創(chuàng)建一個文件名為 myid 的文件,這個文件中僅含有一行的內(nèi)容,指定的是自身的 id 值。比如,服務(wù)器“ 1 ”應(yīng)該在 myid 文件中寫入“ 1 ”。這個 id 值必須是 ensemble 中唯一的,且大小在 1 到 255 之間。這一行配置中,第一個端口( port )是從( follower )機器連接到主( leader )機器的端口,第二個端口是用來進(jìn)行 leader 選舉的端口。在這個例子中,每臺機器使用三個端口,分別是: clientPort ,2181 ; port , 2888 ; port , 3888 。
  • 然后將zookeeper安裝文件夾復(fù)制到user5、user6、user7三個節(jié)點上:
scp -r  /ittest/zookeeper-3.4.8 user5:/ittest/
scp -r  /ittest/zookeeper-3.4.8 user6:/ittest/
scp -r  /ittest/zookeeper-3.4.8 user7:/ittest/
  • 修改路徑/ittest/zookeeper-3.4.8/tmp/myid文件,user5為1,user6為2,user7為3
  • 使用命令/ittest/zookeeper-3.4.8/bin/zkServer.sh start來啟動user5、user6、user7上的zookeeper
  • 全部啟動后使用命令:/ittest/zookeeper-3.4.8/bin/zkServer.sh status來檢查三個節(jié)點上的啟動狀況,一般是兩個follower ,一個leader

8.安裝hadoop(暫時先在user3上安裝一臺yarn,稍后再user4上配置第二臺yarn)

  • 將hadoop安裝包解壓到/ittest/ ,使用命令tar -zxvf hadoop-2.7.1.tar.gz -C /ittest/
  • cd /ittest/hadoop-2.7.1/etc/hadoop路徑下,然后開始配置文件
hadoop-env.sh     
core-site.xml        
hdfs-site.xml        
mapred-site.xml    
yarn-site.xml (需要使用示例文件新建一個) 
slaves 

  • hadoop-env.sh配置文件在尾部加上配置jdk環(huán)境變量:
export JAVA_HOME=/ittest/jdk1.8.0_72
  • core-site.xml配置文件:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
       <!--指定hdfs的nameservice為ns1 -->
       <property>
            <name>fs.defaultFS</name>
            <value>hdfs://ns1</value>
       </property>
       <!--指定hadoop臨時目錄 -->
       <property>
            <name>hadoop.tmp.dir</name>
            <value>/ittest/hadoop-2.7.1/tmp</value>
       </property>
       <!--指定zookeeper地址 -->
       <property>
           <name>ha.zookeeper.quorum</name>
           <value>user5:2181,user6:2181,user7:2181</value>
       </property>
</configuration>
  • hdfs-site.xml文件配置:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
        <!--指定hdfs的nameservice為ns1,需要和core-site.xml中保持一致 -->
        <property>
             <name>dfs.nameservices</name>
             <value>ns1</value>
        </property> 
        <!--ns1下面有兩個NameNode,分別是nn1,nn2 -->
        <property>
             <name>dfs.ha.namenodes.ns1</name>
             <value>nn1,nn2</value>
        </property>
         <!-- nn1的RPC通信地址 -->
         <property>
              <name>dfs.namenode.rpc-address.ns1.nn1</name>
              <value>user1:9000</value>
        </property>
        <!-- nn1的http通信地址 -->
        <property>
              <name>dfs.namenode.http-address.ns1.nn1</name>
              <value>user1:50070</value>
        </property>
        <!-- nn2的RPC通信地址 -->
         <property>
              <name>dfs.namenode.rpc-address.ns1.nn2</name>
              <value>user2:9000</value>
        </property>
        <!-- nn2的http通信地址 -->
        <property>
              <name>dfs.namenode.http-address.ns1.nn2</name>
              <value>user2:50070</value>
        </property>
        <!-- 指定NameNode的元數(shù)據(jù)在JournalNode上的存放位置 -->
        <property>
              <name>dfs.namenode.shared.edits.dir</name>
              <value>qjournal://user5:8485;user6:8485;user7:8485/ns1</value>
        </property>
        <!-- 指定JournalNode在本地磁盤存放數(shù)據(jù)的位置 -->
        <property>
              <name>dfs.journalnode.edits.dir</name>
              <value>/ittest/hadoop-2.7.1/journal</value>
        </property>
        <!-- 開啟NameNode失敗自動切換 -->
        <property>
              <name>dfs.ha.automatic-failover.enabled</name>
              <value>true</value>
        </property>
        <!-- 配置失敗自動切換實現(xiàn)方式 -->
        <property>
              <name>dfs.client.failover.proxy.provider.ns1</name>
              <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
        </property>
        <!-- 配置隔離機制 -->
        <property>
              <name>dfs.ha.fencing.methods</name>
              <value>sshfence</value>
        </property>
        <!-- 使用隔離機制時需要ssh免登陸 -->
        <property>
              <name>dfs.ha.fencing.ssh.private-key-files</name>
              <value>/root/.ssh/id_rsa</value>
        </property>
</configuration>
  • mapred-site.xml配置文件:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
           <!-- 指定mr框架為yarn方式 -->
           <property>
                 <name>mapreduce.framework.name</name>
                 <value>yarn</value>
           </property>
</configuration>


  • yarn-site.xml配置文件:
<?xml version="1.0"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->
<configuration>
           <!-- 指定resourcemanager地址 -->
           <property>
               <name>yarn.resourcemanager.hostname</name>
               <value>user3</value>
           </property>
           <!-- 指定nodemanager啟動時加載server的方式為shuffle server -->
           <property>
                <name>yarn.nodemanager.aux-services</name>
                <value>mapreduce_shuffle</value>
          </property>
</configuration>
  • slave文件配置:
user5
user6
user7
  • 將配置好的hadoop文件夾scp到其他的節(jié)點上:
scp -r /ittest/hadoop-2.7.1/ user2:/ittest/
scp -r /ittest/hadoop-2.7.1/ user3:/ittest/
scp -r /ittest/hadoop-2.7.1/ user4:/ittest/
scp -r /ittest/hadoop-2.7.1/ user5:/ittest/
scp -r /ittest/hadoop-2.7.1/ user6:/ittest/
scp -r /ittest/hadoop-2.7.1/ user7:/ittest/
  1. 啟動zookeeper集群(分別在user5、user6、user7上啟動zk),(如果之前有啟動則無需)
cd /ittest/zookeeper-3.4.8/bin/
            ./zkServer.sh start
  • 查看狀態(tài):(三個節(jié)點必須全部啟動后再查看狀態(tài))
./zkServer.sh status
  • (一個leader,兩個follower,出現(xiàn)這個則正常,如果報錯直接就重新配置)
  1. 啟動journalnode(在server01上啟動所有journalnode)
  • 如果不啟動這個,會造成兩個namenode之間無法通信;
cd /ittest/hadoop-2.7.1
sbin/hadoop-daemons.sh start journalnode
  • (user5、user6、user7上運行jps命令檢驗,多了JournalNode進(jìn)程)
  1. 格式化HDFS
  • 在user1上執(zhí)行命令:
hadoop namenode -format
  • 格式化后會在根據(jù)core-site.xml中的hadoop.tmp.dir配置生成個文件,
  • 這里我配置的是/ittest/hadoop-2.7.1/tmp,然后將/ittest/hadoop-2.7.1/tmp拷貝到user2的/ittest/hadoop-2.7.1/下。
scp -r tmp/ server02:/ittest/hadoop-2.7.1/
  1. 格式化ZK(在server01上執(zhí)行即可)
hdfs zkfc -formatZK

-執(zhí)行完后在 zookeep主機(user5或user6或user7)上測試!

cd /ittest/zookeeper3.4.8/bin
./zkCli.sh
  • 注意,11和12兩個步驟只能執(zhí)行一次,如果執(zhí)行了第二次,會導(dǎo)致后面datanode無法啟動,若執(zhí)行了第二次,建議刪除user2-user7上的hadoop文件夾,刪除user1上hadoop目錄tmp目錄下的所有文件,然后將user1的hadoop文件夾scp -r 到user2-user7重新格式化。
  • 注意:如果兩個namenode都是standby模式,可以使用命令hdfs haadmin -transitionToActive --forcemanual nn1命令直接強制切換一個節(jié)點為active;
  1. 啟動HDFS(在user1上執(zhí)行)
    sbin/start-dfs.sh

  2. 啟動YARN(在user1和 user3上都要執(zhí)行)
    sbin/start-yarn.sh

  3. 到此,hadoop2.7.1配置完畢,可以統(tǒng)計瀏覽器訪問:

http://192.168.18.11:50070
NameNode 'user1:9000' (active)
http://192.168.18.12:50070
NameNode 'user2:9000' (standby)
  • 此時user1和user2上面jps 會有進(jìn)程
1346 NameNode
1480 DFSZKFailoverController
  • 如果網(wǎng)頁中summary中數(shù)據(jù)都是0,則能有可能是datanode沒有啟動;如果兩個頁面模式都是standby,則應(yīng)該是zk通信問題,檢查/etc/hosts文件,檢查user5\user6\user7 上面/ittest/zookeeper-3.4.8/bin/zkServer.sh status狀態(tài),全部出現(xiàn)follower 或者leader才是正常的

  • 此時user5-user7上面的jps進(jìn)程為:

1216 JournalNode
3633 Jps
1365 DataNode
1126 QuorumPeerMain
2988 NodeManager
  • 如果此時kill -9 psid將user1 namenode 進(jìn)程殺掉可以看到user2變成active狀態(tài)!
  • 使用命令sbin/hadoop-daemon.sh start namenode重新將namenode進(jìn)程啟動,注意是daemon不是daemons那個!
  1. 驗證HDFS HA
  • 首先向hdfs上傳一個文件
    /ittest/hadoop-2.7.1/bin/hadoop fs -put /etc/profile /profile
  • 查看hdfs根目錄 :/ittest/hadoop-2.7.1/bin/hadoop fs -ls /
  • 然后再kill掉active的NameNode
    kill -9 <pid of NN>
  • 通過瀏覽器訪問:http://192.168.18.11:50070 NameNode 'user2:9000' (active)
  • 這個時候user2上的NameNode變成了active
  • 再執(zhí)行命令:/ittest/hadoop-2.7.1/bin/hadoop fs -ls /
    -rw-r--r-- 3 root supergroup 1926 2014-02-06 15:36 /profile
    剛才上傳的文件依然存在!!!
    手動啟動那個掛掉的NameNode
    /ittest/hadoop-2.7.1/sbin/hadoop-daemon.sh start namenode
    通過瀏覽器訪問:http://192.168.18.11:50070
    NameNode 'user1:9000' (standby)
  1. 驗證YARN:運行一下hadoop提供的demo中的WordCount程序:
    /ittest/hadoop-2.7.1/bin/hadoop jar /ittest/hadoop-2.7.1/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar wordcount /copyprofile.sh /out4
  • http://192.168.18.13:8088/cluster可以檢查運行狀況
  1. hadoop環(huán)境變量設(shè)置:
    修改user1 上面/etc/profile文件
export JAVA_HOME=/ittest/jdk1.8.0_72
export JRE_HOME=/ittest/jdk1.8.0_72/jre
export HADOOP_HOME=/ittest/hadoop-2.7.1
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin
export CLASSPATH=.:$JAVA_HOME/lib:$JAVA_HOME/jre/lib
  • 然后source /etc/profile
  • 然后將user1 /etc/profile 文件夾拷貝到其他的節(jié)點上,
scp /etc/profile user2:/etc/profile
scp /etc/profile user3:/etc/profile
scp /etc/profile user4:/etc/profile
scp /etc/profile user5:/etc/profile
scp /etc/profile user6:/etc/profile
scp /etc/profile user7:/etc/profile
  • 然后到其他的所有節(jié)點上運行source /etc/profile
  • 檢驗:hadoop fs -ls /,彈出目錄則正常。
  1. 將user4做成yarn,修改user1節(jié)點上的/ittest/hadoop-2.7.1/etc/hadoop 中的yarn-site.xml,可以保留原來的一份
    mv yarn-site.xml yarn.site.xml.noHA
    cp yarn.site.xml.noHA yarn-site.xml
  • yarn-site.xml文件配置為:
<configuration>

<!-- Site specific YARN configuration properties -->
  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
  </property>

  <property>
    <name>yarn.log-aggregation-enable</name>
    <value>true</value>
  </property>


  <property>
    <name>yarn.log-aggregation.retain-seconds</name>
    <value>604800</value>
  </property>

  <property>
    <name>yarn.resourcemanager.ha.enabled</name>
    <value>true</value>
  </property>

  <property>
    <name>yarn.resourcemanager.ha.automatic-failover.enabled</name>
    <value>true</value>
  </property>

  <property>
    <name>yarn.resourcemanager.cluster-id</name>
    <value>yarncluster</value>
  </property>

  <property>
    <name>yarn.resourcemanager.ha.rm-ids</name>
    <value>rm1,rm2</value>
  </property>
  
  <property>
    <name>yarn.resourcemanager.hostname.rm1</name>
    <value>user3</value>
  </property>
  <property>
    <name>yarn.resourcemanager.hostname.rm2</name>
    <value>user4</value>
  </property>

  <property>
    <name>yarn.resourcemanager.webapp.address.rm1</name>
    <value>user3:8088</value>
  </property>
  <property>
    <name>yarn.resourcemanager.webapp.address.rm2</name>
    <value>user4:8088</value>
  </property>

  <property>
    <name>yarn.resourcemanager.zk-address</name>
    <value>user5:2181,user6:2181,user7:2181</value>
  </property>

  <property>
    <name>yarn.resourcemanager.recovery.enabled</name>
    <value>true</value>
  </property>

  <property>
    <name>yarn.resourcemanager.store.class</name>
    <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
  </property>

  <property>
    <name>yarn.nodemanager.recovery.enabled</name>
    <value>true</value>
  </property>

  <property>
    <name>yarn.nodemanager.address</name>
    <value>0.0.0.0:45454</value>
  </property>
</configuration>
  • 注意,上述所有配置文件中都有中文注釋,建議將所有中文注釋去掉,以防出現(xiàn)中文編碼問題導(dǎo)致的錯誤。
  • 現(xiàn)在將所有的yarn-site.xml scp 到其他節(jié)點
scp /ittest/hadoop-2.7.1/etc/hadoop/yarn-site.xml user2:/ittest/hadoop-2.7.1/etc/hadoop/yarn-site.xml
scp /ittest/hadoop-2.7.1/etc/hadoop/yarn-site.xml user3:/ittest/hadoop-2.7.1/etc/hadoop/yarn-site.xml
scp /ittest/hadoop-2.7.1/etc/hadoop/yarn-site.xml user4:/ittest/hadoop-2.7.1/etc/hadoop/yarn-site.xml
scp /ittest/hadoop-2.7.1/etc/hadoop/yarn-site.xml user5:/ittest/hadoop-2.7.1/etc/hadoop/yarn-site.xml
scp /ittest/hadoop-2.7.1/etc/hadoop/yarn-site.xml user6:/ittest/hadoop-2.7.1/etc/hadoop/yarn-site.xml
scp /ittest/hadoop-2.7.1/etc/hadoop/yarn-site.xml user7:/ittest/hadoop-2.7.1/etc/hadoop/yarn-site.xml
  • user1\user3執(zhí)行stop-yarn.sh
  • 然后user1\user3\user4執(zhí)行start-yarn.sh
  • 至此,hadoop 2.0 HA安裝完成

Hive的部署(為后期補上,簡略寫寫)

  • hive部署必須建立在hadoop基礎(chǔ)之上
  1. mysql的安裝
  • 如果集群是可以直接連接互聯(lián)網(wǎng)的,那么可以直接在線安裝,centos系統(tǒng)使用yum命令,ubuntu系統(tǒng)使用sudo apt-get install 命令;

  • 如果集群機器不能連接互聯(lián)網(wǎng),則需要從mysql官網(wǎng)下載安裝包;安裝包分為源碼安裝和二進(jìn)制包解壓安裝,這里使用二進(jìn)制包解壓安裝;

  • 將mysql安裝包下載,然后上傳到服務(wù)器namenode(hive安裝我選擇在namenode部署mysql,hive直接從mysql獲取元數(shù)據(jù));使用tar -zxvf命令解壓到指定目錄;

  • mysql安裝是需要安裝依賴庫的,linux自帶的好像不夠;反正缺什么包就去百度搜索;比如:cmake和 libaio包;

  • 解壓文件后,修改/etc/profile文件,添加環(huán)境變量;

  • 然后啟動mysql

service mysqld start
  • 以root用戶登錄
mysql -uroot -p 
  • 跳過輸入密碼方法(百度一大堆,這里就不寫了)
  • 注意:mysql5.7以后修改密碼,如果使用代碼:
update mysql.user set password=password('root') where user ='root' 

可能會報錯;ERROR 1054(42S22) Unknown column 'password' in field list
因為mysql5.7版本以后,mysql.user表中的字段名字已經(jīng)改了,此時使用命令:

update mysql.user set authentication_string=password('root') where user='root'

  • 最關(guān)鍵的是;mysql中root用戶的權(quán)限一定要設(shè)置!!!
mysql > grant all privileges on *.* to 'root'@'%' with grant option;
mysql > grant all privileges on *.* to 'root'@'%'  identified by '123';
mysql > flush privileges;
  1. hive的部署
  • 將下載的hive包解壓到指定的安裝目錄,然后修改/etc/profile新加入環(huán)境變量;

  • 然后修改hive-env.shhive-site.xml文件,我的配置文件如下(忘了從哪兒抄來的),雖然很長,但是需要修改的參數(shù)就幾個,比較簡單:

  • hive-env.sh

# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Set Hive and Hadoop environment variables here. These variables can be used
# to control the execution of Hive. It should be used by admins to configure
# the Hive installation (so that users do not have to set environment variables
# or set command line parameters to get correct behavior).
#
# The hive service being invoked (CLI/HWI etc.) is available via the environment
# variable SERVICE


# Hive Client memory usage can be an issue if a large number of clients
# are running at the same time. The flags below have been useful in 
# reducing memory usage:
#
# if [ "$SERVICE" = "cli" ]; then
#   if [ -z "$DEBUG" ]; then
#     export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=12 -Xms10m -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:+UseParNewGC -XX:-UseGCOverheadLimit"
#   else
#     export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=12 -Xms10m -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:-UseGCOverheadLimit"
#   fi
# fi

# The heap size of the jvm stared by hive shell script can be controlled via:
#
# export HADOOP_HEAPSIZE=1024
#
# Larger heap size may be required when running queries over large number of files or partitions. 
# By default hive shell scripts use a heap size of 256 (MB).  Larger heap size would also be 
# appropriate for hive server (hwi etc).


# Set HADOOP_HOME to point to a specific hadoop install directory
# HADOOP_HOME=${bin}/../../hadoop
HADOOP_HOME=/usr/local/hadoop/hadoop-2.8.1

# Hive Configuration Directory can be controlled by:
export HIVE_CONF_DIR=/usr/local/hive/conf

# Folder containing extra ibraries required for hive compilation/execution can be controlled by:
export HIVE_AUX_JARS_PATH=/usr/local/hive/lib


  • hive-site.xml的設(shè)置主要是以下幾個,其他都是打醬油的使用默認(rèn)參數(shù)就好了;
  <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://172.16.244.235:3306/hive?createDatabaseIfNotExist=true&amp;useSSL=false</value>
    <description>JDBC connect string for a JDBC metastore</description>
  </property>

<property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.jdbc.Driver</value>
    <description>Driver class name for a JDBC metastore</description>
  </property>

  <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>root</value>
    <description>Username to use against metastore database</description>
  </property>
  
   <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>123</value>
    <description>password to use against metastore database</description>
    
  1. 拷貝JDBC驅(qū)動包
  1. 啟動hive,進(jìn)行測試。。。

后記

  • hive筆記部分簡單寫寫,基本安裝部署不難,但是mysql的坑比較多;hadoop部署安裝的坑太多了(容易崩潰和發(fā)瘋),要學(xué)會看log日志,這個很重要,程序員進(jìn)階之路,就是把別人走過的坑和自己碰到的坑一路填平,踏平坎坷,成大道。。。。。
最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

推薦閱讀更多精彩內(nèi)容