介紹
APACHE HIVE TM
Hive是運行在Hadoop之上的數據倉庫,將結構化的數據文件映射為一張數據庫表,提供簡單類SQL查詢語言,稱為HQL,并將SQL語句轉換成MapReduce任務運算。有利于利用SQL語言查詢、分析數據,適于處理不頻繁變動的數據。Hive底層可以是HBase或者HDFS存儲的文件。
推薦文章:hbase和hive的差別是什么,各自適用在什么場景中?
作者:yuan daisy
鏈接:https://www.zhihu.com/question/21677041/answer/78289309
來源:知乎
著作權歸作者所有,轉載請聯系作者獲得授權。
- Hive中的表是純邏輯表,就只是表的定義等,即表的元數據。Hive本身不存儲數據,它完全依賴HDFS和MapReduce。這樣就可以將結構化的數據文件映射為為一張數據庫表,并提供完整的SQL查詢功能,并將SQL語句最終轉換為MapReduce任務進行運行。 而HBase表是物理表,適合存放非結構化的數據。
- Hive是基于MapReduce來處理數據,而MapReduce處理數據是基于行的模式;HBase處理數據是基于列的而不是基于行的模式,適合海量數據的隨機訪問。
- HBase的表是疏松的存儲的,因此用戶可以給行定義各種不同的列;而Hive表是稠密型,即定義多少列,每一行有存儲固定列數的數據。
- Hive使用Hadoop來分析處理數據,而Hadoop系統是批處理系統,因此不能保證處理的低遲延問題;而HBase是近實時系統,支持實時查詢。
- Hive不提供row-level的更新,它適用于大量append-only數據集(如日志)的批任務處理。而基于HBase的查詢,支持和row-level的更新。
- Hive提供完整的SQL實現,通常被用來做一些基于歷史數據的挖掘、分析。而HBase不適用與有join,多級索引,表關系復雜的應用場景。
環境搭建
環境描述
用兩臺機器,主機裝了hive服務端(172.16.252.128 host:master),此機也為hadoop-master,另裝有mysql。
hive客戶端(172.16.252.128 host:slave02),此機也為hadoop-slave。
環境依賴
需要提前安裝好mysql和hadoop。
安裝教程見我博客:
[Linux(CentOS7)下rpm方式安裝mysql5.6.29](http://www.lxweimin.com/p/e23d22022c53)
[CentOS7下搭建Hadoop2.7.3集群](http://www.lxweimin.com/p/26e857d7aca8)
安裝調試
服務端(master)
- 官網下載最新的hiva二進制包apache-hive-2.1.0-bin.tar.gz
解壓即安裝,我放在/data
目錄下
$ tar -zxvf /data/apache-hive-2.1.0-bin.tar.gz
- 配置環境變量
$ vi /etc/profile
#Hive Env
HIVE_HOME=/date/apache-hive-2.1.0-bin
PATH=$PATH:$HIVE_HOME/bin
export HIVE_NAME PATH
$ source /etc/profile
- 登陸mysql并創建hive用戶和hive數據庫用以同步hive結構
$ mysql -uroot -p
mysql> CREATE USER 'hive' IDENTIFIED BY 'hive';
mysql> GRANT ALL PRIVILEGES ON *.* TO 'hive'@'master' WITH GRANT OPTION;
mysql> flush privileges;
mysql> exit;
$ mysql -h master -uhive
mysql> set password = password('hive');
- 下載
mysql-connector-java-5.1.40.tar.gz
tar -zvxf mysql-connector-java-5.1.40.tar.gz
cd mysql-connector-java-5.1.40
cp mysql-connector-java-5.1.40-bin.jar $HIVE_HOME/lib/
- 修改hive配置文件
$ cp /data/apache-hive-2.1.0-bin/conf/hive-default.xml.template /data/apache-hive-2.1.0-bin/conf/hive-site.xml
$ vi /data/apache-hive-2.1.0-bin/conf/hive-site.xml
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
#mysql連接jdbc
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
#mysql連接
<value>jdbc:mysql://master:3306/hive?createDatabaseIfNotExist=true</value>
<description>
JDBC connect string for a JDBC metastore.
To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL.
For example, jdbc:postgresql://myhost/db?ssl=true for postgres database.
</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
#mysql用戶名
<value>hive</value>
<description>Username to use against metastore database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
#mysql密碼
<value>hive</value>
<description>password to use against metastore database</description>
</property>
客戶端(slave02)
第一步和第二步同服務端。
第三步:修改配置文件
$ cp /data/apache-hive-2.1.0-bin/conf/hive-default.xml.template /data/apache-hive-2.1.0-bin/conf/hive-site.xml
$ vi /data/apache-hive-2.1.0-bin/conf/hive-site.xml
<property>
<name>hive.metastore.uris</name>
<value>thrift://master:9083</value>
<description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.</description>
</property>
啟動(有報錯請看下一部分的分析)
- 服務端開啟
[root@master bin]# cd $HADOOP_HOME/bin/
[root@master bin]# ./hive --service metastore &
[1] 32436
[root@master bin]# which: no hbase in (/data/spark-2.0.1-bin-hadoop2.7/bin:/data/spark-2.0.1-bin-hadoop2.7/sbin:/data/soft/scala-2.11.2/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/java/jdk1.7.0_79/bin:/usr/java/jdk1.7.0_79/jre/bin:/data/hadoop-2.7.3/bin:/data/hadoop-2.7.3/sbin:/date/apache-hive-2.1.0-bin/bin:/root/bin)
Starting Hive Metastore Server
- 服務器登錄hive
[root@master bin]# ./hive
which: no hbase in (/data/spark-2.0.1-bin-hadoop2.7/bin:/data/spark-2.0.1-bin-hadoop2.7/sbin:/data/soft/scala-2.11.2/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/java/jdk1.7.0_79/bin:/usr/java/jdk1.7.0_79/jre/bin:/data/hadoop-2.7.3/bin:/data/hadoop-2.7.3/sbin:/date/apache-hive-2.1.0-bin/bin:/root/bin)
Logging initialized using configuration in jar:file:/data/apache-hive-2.1.0-bin/lib/hive-common-2.1.0.jar!/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. tez, spark) or using Hive 1.X releases.
hive> show databases;
OK
default
wentao
Time taken: 0.85 seconds, Fetched: 2 row(s)
- 客戶端登錄hive
[root@slave02 bin]# ./hive
which: no hbase in (/data/spark-2.0.1-bin-hadoop2.7/bin:/data/spark-2.0.1-bin-hadoop2.7/sbin:/data/soft/scala-2.11.2/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/java/jdk1.7.0_79/bin:/usr/java/jdk1.7.0_79/jre/bin:/data/hadoop-2.7.3/bin:/data/hadoop-2.7.3/sbin:/root/bin)
Logging initialized using configuration in jar:file:/data/apache-hive-2.1.0-bin/lib/hive-common-2.1.0.jar!/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. tez, spark) or using Hive 1.X releases.
hive> show databases;
OK
default
wentao
Time taken: 0.933 seconds, Fetched: 2 row(s)
hive>
兩邊的數據是同步的。同時登錄到master的mysql查看hive表
mysql> use hive;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql> show tables;
+---------------------------+
| Tables_in_hive |
+---------------------------+
| AUX_TABLE |
| BUCKETING_COLS |
| CDS |
| COLUMNS_V2 |
| COMPACTION_QUEUE |
| COMPLETED_COMPACTIONS |
| COMPLETED_TXN_COMPONENTS |
| DATABASE_PARAMS |
| DBS |
| DB_PRIVS |
| DELEGATION_TOKENS |
| FUNCS |
| FUNC_RU |
| GLOBAL_PRIVS |
| HIVE_LOCKS |
| IDXS |
| INDEX_PARAMS |
| KEY_CONSTRAINTS |
| MASTER_KEYS |
| NEXT_COMPACTION_QUEUE_ID |
| NEXT_LOCK_ID |
| NEXT_TXN_ID |
| NOTIFICATION_LOG |
| NOTIFICATION_SEQUENCE |
| NUCLEUS_TABLES |
| PARTITIONS |
| PARTITION_EVENTS |
| PARTITION_KEYS |
| PARTITION_KEY_VALS |
| PARTITION_PARAMS |
| PART_COL_PRIVS |
| PART_COL_STATS |
| PART_PRIVS |
| ROLES |
| ROLE_MAP |
| SDS |
| SD_PARAMS |
| SEQUENCE_TABLE |
| SERDES |
| SERDE_PARAMS |
| SKEWED_COL_NAMES |
| SKEWED_COL_VALUE_LOC_MAP |
| SKEWED_STRING_LIST |
| SKEWED_STRING_LIST_VALUES |
| SKEWED_VALUES |
| SORT_COLS |
| TABLE_PARAMS |
| TAB_COL_STATS |
| TBLS |
| TBL_COL_PRIVS |
| TBL_PRIVS |
| TXNS |
| TXN_COMPONENTS |
| TYPES |
| TYPE_FIELDS |
| VERSION |
| WRITE_SET |
+---------------------------+
57 rows in set (0.00 sec)
mysql>
hive結構和數據就以鍵值對的形式存儲在這些表中。
報錯及修改
- SLF4J多重綁定
```
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/data/apache-hive-2.1.0-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/data/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
```
解決:
上述jar包有重復綁定Logger類,刪除較舊版本即可。
``rm -rf /data/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar``
- Hadoop處于安全模式無法啟動Hive
```
Exception in thread "main" java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.SafeModeException): Cannot create directory /tmp/hive. Name node is in safe mode.
Resources are low on NN. Please add or free up more resources then turn off safe mode manually. NOTE: If you turn off safe mode before adding resources, the NN will immediately return to safe mode. Use "hdfs dfsadmin -safemode leave" to turn safe mode off.
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkNameNodeSafeMode(FSNamesystem.java:1327)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3895)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:984)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:622)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:578)
at org.apache.hadoop.hive.ql.session.SessionState.beginStart(SessionState.java:518)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:705)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
```
解決:
強制關閉hadoop安全模式
``$HADOOP_HOME/bin/hadoop dfsadmin -safemode leave``
- iotmp動態地址無法解析
```
Exception in thread "main" java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: ${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D
at org.apache.hadoop.fs.Path.initialize(Path.java:205)
at org.apache.hadoop.fs.Path.<init>(Path.java:171)
at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:631)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:550)
at org.apache.hadoop.hive.ql.session.SessionState.beginStart(SessionState.java:518)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:705)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.net.URISyntaxException: Relative path in absolute URI: ${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D
at java.net.URI.checkPath(URI.java:1804)
at java.net.URI.<init>(URI.java:752)
at org.apache.hadoop.fs.Path.initialize(Path.java:202)
... 12 more
```
解決:
將`java.io.tmpdir`固定
```
$ mkdir $HIVE_HOME/tmpdir
vi $HIVE_HOME/conf/hive-site.xml
將所有含 ${system:java.io.tmpdir} 所在的value替換成 $HIVE_HOME/tmpdir
```
**注意以上所有$HIVE_HOME等標示都是你的安裝目錄**
參考
Hadoop集群之Hive安裝配置
hive shell not opening when I have hive-site.xml
Hadoop 解除 "Name node is in safe mode"
slf4j提示Class path contains multiple SLF4J bindings
hbase和hive的差別是什么,各自適用在什么場景中?