實驗目的
- 理解Hive在Hadoop體系結構中的角色。
- 熟悉Hive的DDL命令與DML操作。
- 區分數據倉庫和數據庫的概念。
實驗平臺
- 操作系統:Ubuntu-16.04
- Hadoop版本:2.6.0
- JDK版本:1.8
- IDE:Eclipse
- Hive版本:1.2.2
實驗內容和要求
Hive的安裝(安裝前開啟hadoop和mysql服務)
-
把 Hive 壓縮包放到 Home 文件夾中
-
右鍵打開終端,解壓 Hive 到/usr/local
sudo tar -zxvf apache-hive-2.3.2-bin.tar.gz -C /usr/local
-
重命名方便后續操作
sudo mv /usr/local/apache-hive-2.3.2-bin/ /usr/local/hive
-
獲取文件夾權限(tiny改為你的主機名)
sudo chown -R tiny /usr/local/hive/
-
將MySQL驅動程序復制到/usr/local/hive/lib目錄下
cp mysql-connector-java-5.1.39-bin.jar /usr/local/hive/lib
-
設置環境變量
sudo vim /etc/profile
- 在最后一行添加內容:
#set hive path export HIVE_HOME=/usr/local/hive export PATH=$HIVE_HOME/bin:$PATH
-
使環境變量生效
source /etc/profile
-
配置Hive的配置文件
- 進入/usr/local/hive/conf/
cd /usr/local/hive/conf
- 復制hive-env.sh.template,改名為hive-env.sh
cp hive-env.sh.template hive-env.sh
- 編輯內容
vim hive-env.sh
- 新建配置文件
vim hive-site.xml
- 添加內容并填好mysql用戶名和密碼:
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. --> <configuration> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://127.0.0.1:3306/hive?createDatabaseIfNotExist=true&useSSL=false</value> <description>JDBC connect string for a JDBC metastore</description> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> <description>Driver class name for a JDBC metastore</description> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>MySQL用戶名</value> <description>username to use against metastore database</description> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>MySQL密碼</value> <description>password to use against metastore database</description> </property> </configuration>
-
初始化 Hive
-
進入Hive Shell(quit;退出)
hive
Hive實驗
一. DDL命令
-
數據庫相關命令
-
創建簡單的數據庫
CREATE DATABASE testdb;
-
查看數據庫
SHOW DATABASES;
-
正則表達式檢索
SHOW DATABASES LIKE 't.*';
-
創建數據庫的同時,設置數據庫的存儲路徑
CREATE DATABASE testdb2 LOCATION '/user/mydb';
-
在建庫的同時,給數據庫添加注釋
CREATE DATABASE testdb3 COMMENT 'This is a test database3';
-
查看數據庫的注釋和存儲路徑
DESCRIBE DATABASE testdb3;
-
創建數據庫的同時,為數據庫添加鍵值對作為參數
CREATE DATABASE testdb4 WITH DBPROPERTIES('creator'='tiny','date'='2016-12-21');
-
查看數據參數
DESCRIBE DATABASE EXTENDED testdb4;
-
選擇數據庫
USE testdb4;
-
刪除庫
DROP DATABASE IF EXISTS testdb3 CASCADE;
-
-
表相關命令
-
創建一個普通表:
CREATE TABLE IF NOT EXISTS test_1 (id INT, name STRING, address STRING);
-
創建一個外部表:
CREATE EXTERNAL TABLE external_table (dummy STRING) LOCATION '/user/tom/external_table';
-
創建一個分區表:
CREATE TABLE partition_table (id INT, name STRING, city STRING) PARTITIONED BY (pt STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
-
創建一個與已經存在的表結構相同的表
CREATE TABLE test_2 LIKE test_1;
-
給表增加字段
alter table test_1 add columns (telephone STRING, qq STRING, birthday date);
-
修改表的字段名
ALTER TABLE test_1 CHANGE address addr STRING;
-
修改表名
ALTER TABLE test_1 rename to test_table;
-
二. DML命令
-
加載數據
現有一張表,建表語句如下所示:
CREATE TABLE login ( uid BIGINT, ip STRING ) PARTITIONED BY (pt string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
退出Hive Shell,創建login.txt
11151007001,192.168.1.1 11151007002,192.168.1.2
創建login2.txt
11151007003,192.168.1.3 11151007004,192.168.1.4
-
加載本地數據到Hive表(再打開Hive Shell,并且要重新
USE testdb4;
)LOAD DATA LOCAL INPATH '/usr/local/hadoop/login.txt' OVERWRITE INTO TABLE login PARTITION (pt='20161221');
SELECT * FROM LOGIN;
- 加載HDFS中的文件
LOAD DATA INPATH '/tmp/login2.txt' INTO TABLE login PARTITION (pt='20161221');
SELECT *FROM LOGIN;
-
-
查詢結果插入到表
-
單表插入
CREATE TABLE login2(uid BIGINT); INSERT OVERWRITE TABLE login2 SELECT DISTINCT uid FROM login;
-
多表插入
CREATE TABLE login_ip(ip STRING); CREATE TABLE login_uid(uid BIGINT); FROM login INSERT OVERWRITE TABLE login_uid SELECT uid INSERT OVERWRITE TABLE login_ip SELECT ip;
-
-
查詢結果輸出到文件系統中
FROM login INSERT OVERWRITE LOCAL DIRECTORY '/usr/local/hadoop/login' SELECT * INSERT OVERWRITE DIRECTORY '/tmp/ip' SELECT ip;
三. HiveJDBC
-
新建MapReduce項目
-
右鍵工程,選擇 Properties ,然后在工程中導入外部jar包
Paste_Image.png -
創建userinfo.txt文件內容(中間Tab隔開)
1 xiaoping 2 xiaoxue 3 qingqing 4 wangwu 5 zhangsan 6 lisi
-
開啟遠程服務
hive --service hiveserver
-
JAVA端執行下面代碼
import java.sql.Connection; import java.sql.DriverManager; import java.sql.ResultSet; import java.sql.SQLException; import java.sql.Statement; import org.apache.log4j.Logger; public class HiveJdbcClient { private static String driverName = "org.apache.hadoop.hive.jdbc.HiveDriver"; private static String url = "jdbc:hive://localhost:10000/default"; private static String user = ""; private static String password = ""; private static String sql = ""; private static ResultSet res; private static final Logger log = Logger.getLogger(HiveJdbcClient.class); public static void main(String[] args) { try { Class.forName(driverName); Connection conn = DriverManager.getConnection(url, user, password); Statement stmt = conn.createStatement(); String tableName = "testHiveDriverTable"; sql = "drop table " + tableName; stmt.executeQuery(sql); sql = "create table " + tableName + " (key int, value string) row format delimited fields terminated by '\t'"; stmt.executeQuery(sql); sql = "show tables '" + tableName + "'"; System.out.println("Running:" + sql); res = stmt.executeQuery(sql); System.out.println("執行“show tables”運行結果:"); if (res.next()) { System.out.println(res.getString(1)); } sql = "describe " + tableName; System.out.println("Running:" + sql); res = stmt.executeQuery(sql); System.out.println("執行“describe table”運行結果:"); while (res.next()) { System.out.println(res.getString(1) + "\t" + res.getString(2)); } String filepath = "/usr/local/hadoop/userinfo.txt"; sql = "load data local inpath '" + filepath + "' into table " + tableName; System.out.println("Running:" + sql); res = stmt.executeQuery(sql); sql = "select * from " + tableName; System.out.println("Running:" + sql); res = stmt.executeQuery(sql); while (res.next()) { System.out.println(res.getInt(1) + "\t" + res.getString(2)); } sql = "select count(1) from " + tableName; System.out.println("Running:" + sql); res = stmt.executeQuery(sql); System.out.println("執行“regular hive query”運行結果:"); while (res.next()) System.out.println(res.getString(1)); } conn.close(); conn = null; } catch (ClassNotFoundException e) { e.printStackTrace(); log.error(driverName + " not found!", e); System.exit(1); } catch (SQLException e) { e.printStackTrace(); log.error("Connection error!", e); System.exit(1); } } }