Hadoop實驗——Hive的安裝和實驗

實驗目的

  1. 理解Hive在Hadoop體系結構中的角色。
  2. 熟悉Hive的DDL命令與DML操作。
  3. 區分數據倉庫和數據庫的概念。

實驗平臺

  • 操作系統:Ubuntu-16.04
  • Hadoop版本:2.6.0
  • JDK版本:1.8
  • IDE:Eclipse
  • Hive版本:1.2.2

實驗內容和要求

Hive的安裝(安裝前開啟hadoop和mysql服務)

  1. 把 Hive 壓縮包放到 Home 文件夾中

  2. 右鍵打開終端,解壓 Hive 到/usr/local

    sudo tar -zxvf apache-hive-2.3.2-bin.tar.gz -C /usr/local

  3. 重命名方便后續操作

    sudo mv /usr/local/apache-hive-2.3.2-bin/ /usr/local/hive

  4. 獲取文件夾權限(tiny改為你的主機名)

    sudo chown -R tiny /usr/local/hive/

  5. 將MySQL驅動程序復制到/usr/local/hive/lib目錄下

    cp mysql-connector-java-5.1.39-bin.jar /usr/local/hive/lib

  6. 設置環境變量

    sudo vim /etc/profile

    • 在最后一行添加內容:
    #set hive path
    export HIVE_HOME=/usr/local/hive
    export PATH=$HIVE_HOME/bin:$PATH
    
  7. 使環境變量生效

    source /etc/profile

  8. 配置Hive的配置文件

    • 進入/usr/local/hive/conf/

    cd /usr/local/hive/conf

    • 復制hive-env.sh.template,改名為hive-env.sh

    cp hive-env.sh.template hive-env.sh

    • 編輯內容

    vim hive-env.sh

    • 新建配置文件

    vim hive-site.xml

    • 添加內容并填好mysql用戶名和密碼:
    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <!--
       Licensed to the Apache Software Foundation (ASF) under one or more
       contributor license agreements.  See the NOTICE file distributed with
       this work for additional information regarding copyright ownership.
       The ASF licenses this file to You under the Apache License, Version 2.0
       (the "License"); you may not use this file except in compliance with
       the License.  You may obtain a copy of the License at
           http://www.apache.org/licenses/LICENSE-2.0
       Unless required by applicable law or agreed to in writing, software
       distributed under the License is distributed on an "AS IS" BASIS,
       WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
       See the License for the specific language governing permissions and
       limitations under the License.
    -->
    <configuration>
    <property>
      <name>javax.jdo.option.ConnectionURL</name>
      <value>jdbc:mysql://127.0.0.1:3306/hive?createDatabaseIfNotExist=true&amp;useSSL=false</value>
      <description>JDBC connect string for a JDBC metastore</description>
    </property>
    <property>
      <name>javax.jdo.option.ConnectionDriverName</name>
      <value>com.mysql.jdbc.Driver</value>
      <description>Driver class name for a JDBC metastore</description>
    </property>
    <property>
      <name>javax.jdo.option.ConnectionUserName</name>
      <value>MySQL用戶名</value>
      <description>username to use against metastore database</description>
    </property>
    <property>
      <name>javax.jdo.option.ConnectionPassword</name>
      <value>MySQL密碼</value>
      <description>password to use against metastore database</description>
    </property>
    </configuration>
    
  9. 初始化 Hive

  10. 進入Hive Shell(quit;退出)

    hive

Hive實驗

一. DDL命令

  1. 數據庫相關命令

    • 創建簡單的數據庫

      CREATE DATABASE testdb;

    • 查看數據庫

      SHOW DATABASES;

    • 正則表達式檢索

      SHOW DATABASES LIKE 't.*';

    • 創建數據庫的同時,設置數據庫的存儲路徑

      CREATE DATABASE testdb2 LOCATION '/user/mydb';

    • 在建庫的同時,給數據庫添加注釋

      CREATE DATABASE testdb3 COMMENT 'This is a test database3';

    • 查看數據庫的注釋和存儲路徑

      DESCRIBE DATABASE testdb3;

    • 創建數據庫的同時,為數據庫添加鍵值對作為參數

      CREATE DATABASE testdb4 WITH DBPROPERTIES('creator'='tiny','date'='2016-12-21');

    • 查看數據參數

      DESCRIBE DATABASE EXTENDED testdb4;

    • 選擇數據庫

      USE testdb4;

    • 刪除庫

      DROP DATABASE IF EXISTS testdb3 CASCADE;

  2. 表相關命令

    • 創建一個普通表:

      CREATE TABLE IF NOT EXISTS test_1
          (id INT,
          name STRING,
          address STRING);
      
    • 創建一個外部表:

      CREATE EXTERNAL TABLE external_table (dummy STRING)
      LOCATION '/user/tom/external_table';
      
    • 創建一個分區表:

      CREATE TABLE partition_table (id INT, name STRING, city STRING)
      PARTITIONED BY (pt STRING)
      ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
      
    • 創建一個與已經存在的表結構相同的表

      CREATE TABLE test_2 LIKE test_1;
      
    • 給表增加字段

      alter table test_1 add columns
      (telephone STRING,
      qq STRING,
      birthday date);
      
    • 修改表的字段名

      ALTER TABLE test_1 CHANGE address addr STRING;
      
    • 修改表名

      ALTER TABLE test_1 rename to test_table;
      

二. DML命令

  1. 加載數據

    現有一張表,建表語句如下所示:

    CREATE TABLE login (
    uid BIGINT,
    ip STRING
    )
    PARTITIONED BY (pt string)
    ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
    

    退出Hive Shell,創建login.txt

    11151007001,192.168.1.1
    11151007002,192.168.1.2
    

    創建login2.txt

    11151007003,192.168.1.3
    11151007004,192.168.1.4
    
    • 加載本地數據到Hive表(再打開Hive Shell,并且要重新USE testdb4;)

      LOAD DATA LOCAL INPATH '/usr/local/hadoop/login.txt' OVERWRITE INTO TABLE login PARTITION       (pt='20161221');
      
      SELECT * FROM LOGIN;
      
      • 加載HDFS中的文件
      LOAD DATA INPATH '/tmp/login2.txt' INTO TABLE login PARTITION (pt='20161221');
      
      SELECT *FROM LOGIN;
      
  2. 查詢結果插入到表

    • 單表插入

      CREATE TABLE login2(uid BIGINT);
      INSERT OVERWRITE TABLE login2 SELECT DISTINCT uid FROM login;
      
    • 多表插入

      CREATE TABLE login_ip(ip STRING);
      CREATE TABLE login_uid(uid BIGINT);
      FROM login
      INSERT OVERWRITE TABLE login_uid
      SELECT uid
      INSERT OVERWRITE TABLE login_ip
      SELECT ip;
      
  3. 查詢結果輸出到文件系統中

    FROM login
    INSERT OVERWRITE LOCAL DIRECTORY '/usr/local/hadoop/login' SELECT *
    INSERT OVERWRITE DIRECTORY '/tmp/ip' SELECT ip;
    

三. HiveJDBC

  1. 新建MapReduce項目

  2. 右鍵工程,選擇 Properties ,然后在工程中導入外部jar包


    Paste_Image.png
  3. 創建userinfo.txt文件內容(中間Tab隔開)

    1  xiaoping
    2  xiaoxue
    3  qingqing
    4  wangwu
    5  zhangsan
    6  lisi
    
  4. 開啟遠程服務

    hive --service hiveserver

  5. JAVA端執行下面代碼

    import java.sql.Connection;
    import java.sql.DriverManager;
    import java.sql.ResultSet;
    import java.sql.SQLException;
    import java.sql.Statement;
    import org.apache.log4j.Logger;
    public class HiveJdbcClient {
        private static String driverName = "org.apache.hadoop.hive.jdbc.HiveDriver";
        private static String url = "jdbc:hive://localhost:10000/default";
        private static String user = "";
        private static String password = "";
        private static String sql = "";
        private static ResultSet res;
        private static final Logger log = Logger.getLogger(HiveJdbcClient.class);
        public static void main(String[] args) {
            try {
                Class.forName(driverName);
                Connection conn = DriverManager.getConnection(url, user, password);
                Statement stmt = conn.createStatement();
                String tableName = "testHiveDriverTable";
                sql = "drop table " + tableName;
                stmt.executeQuery(sql);
                sql = "create table "
                        + tableName
                        + " (key int, value string)  row format delimited fields terminated by  '\t'";
                stmt.executeQuery(sql);
                sql = "show tables '" + tableName + "'";
                System.out.println("Running:" + sql);
                res = stmt.executeQuery(sql);
                System.out.println("執行“show tables”運行結果:");
                if (res.next()) {
                    System.out.println(res.getString(1));
                }
                sql = "describe " + tableName;
                System.out.println("Running:" + sql);
                res = stmt.executeQuery(sql);
                System.out.println("執行“describe table”運行結果:");
                while (res.next()) {
                    System.out.println(res.getString(1) + "\t" + res.getString(2));
                }
                String filepath = "/usr/local/hadoop/userinfo.txt";
                sql = "load data local inpath '" + filepath + "' into table "
                        + tableName;
                System.out.println("Running:" + sql);
                res = stmt.executeQuery(sql);
                sql = "select * from " + tableName;
                System.out.println("Running:" + sql);
                res = stmt.executeQuery(sql);
                while (res.next()) {
                    System.out.println(res.getInt(1) + "\t" + res.getString(2));
                }
                sql = "select count(1) from " + tableName;
                System.out.println("Running:" + sql);
                res = stmt.executeQuery(sql);
                System.out.println("執行“regular hive query”運行結果:");
                while (res.next())
                    System.out.println(res.getString(1));
                }
                conn.close();
                conn = null;
            } catch (ClassNotFoundException e) {
                e.printStackTrace();
                log.error(driverName + " not found!", e);
                System.exit(1);
            } catch (SQLException e) {
                e.printStackTrace();
                log.error("Connection error!", e);
                System.exit(1);
            }
    
        }
    }
    
最后編輯于
?著作權歸作者所有,轉載或內容合作請聯系作者
平臺聲明:文章內容(如有圖片或視頻亦包括在內)由作者上傳并發布,文章內容僅代表作者本人觀點,簡書系信息發布平臺,僅提供信息存儲服務。

推薦閱讀更多精彩內容