Hive學習筆記(6)DDL

官方參考文檔 LanguageManual DDL

創建/刪除/更改/使用數據庫

在hive sql中database關鍵詞和 schema關鍵詞可以互換,意思是一樣的

創建數據庫

CREATE (DATABASE|SCHEMA) [IF NOT EXISTS] database_name
[COMMENT database_comment]
[LOCATION hdfs_path]
[WITH DBPROPERTIES (property_name=property_value, ...)];

小牛試刀:
hive> create database if not exists test;
OK
Time taken: 0.131 seconds
hive> create schema if not exists test_schema;
OK
Time taken: 0.07 seconds
hive> show databases;
OK
default
test
test_schema
Time taken: 0.255 seconds, Fetched: 3 row(s)
hive> create database if not exists stefan comment 'just for test';
OK
Time taken: 0.047 seconds
hive> desc database stefan;
OK
stefan  just for test   hdfs://jms-master-01:9000/user/hive/warehouse/stefan.db hadoop  USER
Time taken: 0.068 seconds, Fetched: 1 row(s)

刪除數據庫

刪除數據庫時,默認行為是RESTRICT,這種情況下如果數據庫不為空,則刪除動作失敗。如果需要刪除庫以及庫里的表可以使用CASCADE關鍵字。

DROP (DATABASE|SCHEMA) [IF EXISTS] database_name [RESTRICT|CASCADE];

小牛試刀:
hive> drop database test;
OK
Time taken: 0.242 seconds
hive> drop database if exists test;
OK
Time taken: 0.011 seconds
hive> drop database if exists test_schema cascade;
OK
Time taken: 1.518 seconds
hive> show databases;
OK
default
prop_database
stefan

修改數據庫

需要注意下述命令的生效版本號。

ALTER (DATABASE|SCHEMA) database_name SET DBPROPERTIES (property_name=property_value, ...); -- (Note: SCHEMA added in Hive 0.14.0)

ALTER (DATABASE|SCHEMA) database_name SET OWNER [USER|ROLE] user_or_role; -- (Note: Hive 0.13.0 and later; SCHEMA added in Hive 0.14.0)

ALTER (DATABASE|SCHEMA) database_name SET LOCATION hdfs_path; -- (Note: Hive 2.2.1, 2.4.0 and later)

alter database ... set location 語句不會將數據庫當前目錄的內容移動到新指定的位置。它不會更改當前數據庫下已經存在的任何表/分區關聯的位置,僅改變在該數據庫下新建表的默認父目錄。

小牛試刀:

添加屬性:

hive> show create database stefan;
OK
CREATE DATABASE `stefan`
COMMENT
  'just for test'
LOCATION
  'hdfs://jms-master-01:9000/user/hive/warehouse/stefan.db'
Time taken: 0.222 seconds, Fetched: 5 row(s)
hive> alter database stefan set dbproperties ('owner'='stefan', 'create_time'='20190403');
OK
Time taken: 0.07 seconds
hive> show create database stefan;
OK
CREATE DATABASE `stefan`
COMMENT
  'just for test'
LOCATION
  'hdfs://jms-master-01:9000/user/hive/warehouse/stefan.db'
WITH DBPROPERTIES (
  'create_time'='20190403',
  'owner'='stefan')
Time taken: 0.196 seconds, Fetched: 8 row(s)

修改owner:

hive> alter schema stefan set owner user stefan_test;
OK
Time taken: 0.084 seconds
hive> desc database stefan;
OK
stefan  just for test   hdfs://jms-master-01:9000/user/hive/warehouse/stefan.db stefan_test USER
Time taken: 0.145 seconds, Fetched: 1 row(s)

切換數據庫

USE database_name;
USE DEFAULT;

hive> use stefan;
OK
Time taken: 0.019 seconds

查看庫

show databases;

hive> show databases;
OK
default
Time taken: 5.987 seconds, Fetched: 1 row(s)

查看當前庫

show current_database();

hive> select current_database();
OK
default
Time taken: 0.852 seconds, Fetched: 1 row(s)

查看庫的屬性信息

desc database extended default;

hive> desc database extended default;
OK
default Default Hive database   hdfs://jms-master-01:9000/user/hive/warehouse   public  ROLE
Time taken: 0.361 seconds, Fetched: 1 row(s)

查看建庫語句

show create database default;

hive> show create database default;
OK
CREATE DATABASE `default`
COMMENT
  'Default Hive database'
LOCATION
  'hdfs://jms-master-01:9000/user/hive/warehouse'
Time taken: 0.525 seconds, Fetched: 5 row(s)

建庫時攜帶屬性

create database if not exists prop_database comment 'test properties' with dbproperties ('prop1'='aaa', 'prop2'='bbb');

hive> create database if not exists prop_database comment 'test properties database' with dbproperties('prop1'='aaa', 'prop2'='bbb');
OK
Time taken: 0.121 seconds
hive> show create database prop_database;
OK
CREATE DATABASE `prop_database`
COMMENT
  'test properties database'
LOCATION
  'hdfs://jms-master-01:9000/user/hive/warehouse/prop_database.db'
WITH DBPROPERTIES (
  'prop1'='aaa',
  'prop2'='bbb')
Time taken: 0.181 seconds, Fetched: 8 row(s)

創建/刪除/清空表

建表

根據指定表明創建表。當表明存在時拋出異常,可以使用 IF NOT EXISTS 跳過該異常。
*表和列的注釋comment格式是字符串,使用單引號。
*表名和列名不區分大小寫。
*使用external創建的表是外部表,不使用默認創建的表是托管表(我習慣稱之為內部表)。查看一個表是托管表還是外部表,可以通過describe extended table_name;查看其中的tableType屬性值。

CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name -- (Note: TEMPORARY available in Hive 0.14.0 and later)
[(col_name data_type [COMMENT col_comment], ... [constraint_specification])]
[COMMENT table_comment]
[PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
[CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
[SKEWED BY (col_name, col_name, ...) -- (Note: Available in Hive 0.10.0 and later)]
ON ((col_value, col_value, ...), (col_value, col_value, ...), ...)
[STORED AS DIRECTORIES]
[
[ROW FORMAT row_format]
[STORED AS file_format]
| STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)] -- (Note: Available in Hive 0.6.0 and later)
]
[LOCATION hdfs_path]
[TBLPROPERTIES (property_name=property_value, ...)] -- (Note: Available in Hive 0.6.0 and later)
[AS select_statement]; -- (Note: Available in Hive 0.5.0 and later; not supported for external tables)

CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
LIKE existing_table_or_view_name
[LOCATION hdfs_path];

data_type
: primitive_type
| array_type
| map_type
| struct_type
| union_type -- (Note: Available in Hive 0.7.0 and later)

primitive_type
: TINYINT
| SMALLINT
| INT
| BIGINT
| BOOLEAN
| FLOAT
| DOUBLE
| DOUBLE PRECISION -- (Note: Available in Hive 2.2.0 and later)
| STRING
| BINARY -- (Note: Available in Hive 0.8.0 and later)
| TIMESTAMP -- (Note: Available in Hive 0.8.0 and later)
| DECIMAL -- (Note: Available in Hive 0.11.0 and later)
| DECIMAL(precision, scale) -- (Note: Available in Hive 0.13.0 and later)
| DATE -- (Note: Available in Hive 0.12.0 and later)
| VARCHAR -- (Note: Available in Hive 0.12.0 and later)
| CHAR -- (Note: Available in Hive 0.13.0 and later)

array_type
: ARRAY < data_type >

map_type
: MAP < primitive_type, data_type >

struct_type
: STRUCT < col_name : data_type [COMMENT col_comment], ...>

union_type
: UNIONTYPE < data_type, data_type, ... > -- (Note: Available in Hive 0.7.0 and later)

row_format
: DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char]] [COLLECTION ITEMS TERMINATED BY char]
[MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]
[NULL DEFINED AS char] -- (Note: Available in Hive 0.13 and later)
| SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, property_name=property_value, ...)]

file_format:
: SEQUENCEFILE
| TEXTFILE -- (Default, depending on hive.default.fileformat configuration)
| RCFILE -- (Note: Available in Hive 0.6.0 and later)
| ORC -- (Note: Available in Hive 0.11.0 and later)
| PARQUET -- (Note: Available in Hive 0.13.0 and later)
| AVRO -- (Note: Available in Hive 0.14.0 and later)
| JSONFILE -- (Note: Available in Hive 4.0.0 and later)
| INPUTFORMAT input_format_classname OUTPUTFORMAT output_format_classname

constraint_specification:
: [, PRIMARY KEY (col_name, ...) DISABLE NOVALIDATE ]
[, CONSTRAINT constraint_name FOREIGN KEY (col_name, ...) REFERENCES table_name(col_name, ...) DISABLE NOVALIDATE

創建托管表:
hive> create table manage_table (id int, name string);
OK
Time taken: 0.674 seconds

創建外部表:

hive> create external table external_table (id int, name string)  location '/user/hadoop/tmp/hive/default.db/external_table';
OK
Time taken: 0.449 seconds

查看表是托管表還是外部表:
可以看出托管表的tableType是MANAGED_TABLE;外部表的tableType是EXTERNAL_TABLE。

hive> describe extended manage_table;
OK
id                      int
name                    string

Detailed Table Information  Table(tableName:manage_table, dbName:default, owner:hadoop, createTime:1554341982, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:id, type:int, comment:null), FieldSchema(name:name, type:string, comment:null)], location:hdfs://jms-master-01:9000/user/hive/warehouse/manage_table, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{totalSize=0, numRows=0, rawDataSize=0, COLUMN_STATS_ACCURATE={"BASIC_STATS":"true"}, numFiles=0, transient_lastDdlTime=1554341982}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE, rewriteEnabled:false)
Time taken: 0.329 seconds, Fetched: 4 row(s)
hive> describe extended external_table;
OK
id                      int
name                    string

Detailed Table Information  Table(tableName:external_table, dbName:default, owner:hadoop, createTime:1554342083, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:id, type:int, comment:null), FieldSchema(name:name, type:string, comment:null)], location:hdfs://jms-master-01:9000/user/hadoop/tmp/hive/default.db/external_table, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{EXTERNAL=TRUE, transient_lastDdlTime=1554342083}, viewOriginalText:null, viewExpandedText:null, tableType:EXTERNAL_TABLE, rewriteEnabled:false)
Time taken: 0.143 seconds, Fetched: 4 row(s)

創建分區表

hive> create table partition_table (id int, name string) partitioned by (dt string);
OK
Time taken: 0.393 seconds
hive> show create table partition_table;
OK
CREATE TABLE `partition_table`(
  `id` int,
  `name` string)
PARTITIONED BY (
  `dt` string)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'hdfs://jms-master-01:9000/user/hive/warehouse/stefan.db/partition_table'
TBLPROPERTIES (
  'transient_lastDdlTime'='1554344052')
Time taken: 0.275 seconds, Fetched: 15 row(s)

create table as select(CTAS)

CTAS語法限制:

  • 目標表不能是外部表
  • 目標表不能是分桶表
    CTAS語法可以實現表的格式轉換。
hive> select * from partition_table;
OK
1   郭靖  20190402    china
2   黃蓉  20190402    china
3   楊康  20190402    china
4   穆念慈 20190402    china
5   東邪  20190402    china
6   西毒  20190402    china
7   黃老邪 20190402    china
8   楊鐵心 20190402    china
Time taken: 1.499 seconds, Fetched: 8 row(s)
hive> create table new_key_value_store row format serde "org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe" stored as RCFile as select (id * 10) new_key, concat(id, name) key_value_pair from partition_table sort by new_key, key_value_pair;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = hadoop_20190404103533_902fb48c-2471-4e2d-bc79-5cba46b8c710
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1552651623473_0007, Tracking URL = http://jms-master-01:8088/proxy/application_1552651623473_0007/
Kill Command = /home/hadoop/tools/hadoop-2.7.7/bin/hadoop job  -kill job_1552651623473_0007
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2019-04-04 10:35:45,795 Stage-1 map = 0%,  reduce = 0%
2019-04-04 10:35:50,537 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.39 sec
2019-04-04 10:35:56,978 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 5.27 sec
MapReduce Total cumulative CPU time: 5 seconds 270 msec
Ended Job = job_1552651623473_0007
Moving data to directory hdfs://jms-master-01:9000/user/hive/warehouse/new_key_value_store
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 5.27 sec   HDFS Read: 8487 HDFS Write: 248 SUCCESS
Total MapReduce CPU Time Spent: 5 seconds 270 msec
OK
Time taken: 24.897 seconds
hive> select * from  new_key_value_store;
OK
10  1郭靖
20  2黃蓉
30  3楊康
40  4穆念慈
50  5東邪
60  6西毒
70  7黃老邪
80  8楊鐵心
Time taken: 0.21 seconds, Fetched: 8 row(s)
hive> show create table new_key_value_store;
OK
CREATE TABLE `new_key_value_store`(
  `new_key` int,
  `key_value_pair` string)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe'
STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.RCFileInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.RCFileOutputFormat'
LOCATION
  'hdfs://jms-master-01:9000/user/hive/warehouse/new_key_value_store'
TBLPROPERTIES (
  'transient_lastDdlTime'='1554345358')
Time taken: 0.299 seconds, Fetched: 13 row(s)

克隆表(create table ... like)

create table ... like 會創建一個和源表結構完全一致的空表。

hive> create table like_new_key_value_store like new_key_value_store;
OK
Time taken: 0.149 seconds
hive> show create table like_new_key_value_store;
OK
CREATE TABLE `like_new_key_value_store`(
  `new_key` int,
  `key_value_pair` string)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe'
STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.RCFileInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.RCFileOutputFormat'
LOCATION
  'hdfs://jms-master-01:9000/user/hive/warehouse/like_new_key_value_store'
TBLPROPERTIES (
  'transient_lastDdlTime'='1554345937')
Time taken: 1.801 seconds, Fetched: 13 row(s)
hive> select * from like_new_key_value_store;
OK
Time taken: 0.252 seconds

分桶排序表

創建一個分桶排序表
hive> create table bucketed_table (id int, name string) partitioned by (dt string, country string) clustered by (id) sorted by (name) into 4 buckets row format delimited fields terminated by '\001' collection items terminated by '\002' map keys terminated by '\003' stored as sequencefile;
OK
Time taken: 0.241 seconds
hive> show create table bucketed_table;
OK
CREATE TABLE `bucketed_table`(
  `id` int,
  `name` string)
PARTITIONED BY (
  `dt` string,
  `country` string)
CLUSTERED BY (
  id)
SORTED BY (
  name ASC)
INTO 4 BUCKETS
ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES (
  'colelction.delim'='',
  'field.delim'='',
  'mapkey.delim'='',
  'serialization.format'='')
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.SequenceFileInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat'
LOCATION
  'hdfs://jms-master-01:9000/user/hive/warehouse/stefan.db/bucketed_table'
TBLPROPERTIES (
  'transient_lastDdlTime'='1554346570')
Time taken: 0.268 seconds, Fetched: 26 row(s)

向多個分區多動態插入,需要設置參數
set hive.exec.dynamici.partition=true; #開啟動態分區,默認是false
set hive.exec.dynamic.partition.mode=nonstrict; #開啟允許所有分區都是動態的,否則必須要有靜態分區才能使用。

hive> insert into bucketed_table partition(dt, country) select id, name, dt, country from default.partition_table;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = hadoop_20190404110332_1e7c3bf3-e5e6-4ff1-af30-2a5bcf0278e6
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 4
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1552651623473_0008, Tracking URL = http://jms-master-01:8088/proxy/application_1552651623473_0008/
Kill Command = /home/hadoop/tools/hadoop-2.7.7/bin/hadoop job  -kill job_1552651623473_0008
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 4
2019-04-04 11:03:38,849 Stage-1 map = 0%,  reduce = 0%
2019-04-04 11:03:43,157 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.47 sec
2019-04-04 11:03:48,331 Stage-1 map = 100%,  reduce = 25%, Cumulative CPU 4.05 sec
2019-04-04 11:03:50,399 Stage-1 map = 100%,  reduce = 75%, Cumulative CPU 9.22 sec
2019-04-04 11:03:51,435 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 12.12 sec
MapReduce Total cumulative CPU time: 12 seconds 120 msec
Ended Job = job_1552651623473_0008
Loading data to table stefan.bucketed_table partition (dt=null, country=null)

Loaded : 1/1 partitions.
     Time taken to load dynamic partitions: 0.243 seconds
     Time taken for adding to write entity : 0.0 seconds
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1  Reduce: 4   Cumulative CPU: 12.12 sec   HDFS Read: 21718 HDFS Write: 937 SUCCESS
Total MapReduce CPU Time Spent: 12 seconds 120 msec
OK
Time taken: 20.806 seconds
hive> select * from bucketed_table;
OK
8   楊鐵心 20190402    china
4   穆念慈 20190402    china
5   東邪  20190402    china
1   郭靖  20190402    china
6   西毒  20190402    china
2   黃蓉  20190402    china
3   楊康  20190402    china
7   黃老邪 20190402    china
Time taken: 0.225 seconds, Fetched: 8 row(s)

數據傾斜表(Skewed Tables)

暫時沒接觸過也沒研究過。

臨時表

臨時表只在當前會話可見。數據存儲在用戶的臨時目錄總,會話結束時刪除。

  • 臨時表不支持分區,不支持創建索引。
  • 如果臨時表表明和永久表名沖突,則在當前會話中會屏蔽永久表;只有drop或rename臨時表表名才能使用永久表。
  • 從hive1.1+版本起,臨時表可以存儲在內存或者SSD中,通過參數hive.exec.temporary.table.storage配置,可選值有memory,ssd,default。

create temporary table ...

小牛試刀
hive> create temporary table temp_table(id int, name string);
OK
Time taken: 6.14 seconds
hive> insert into temp_table select id, name from partition_table;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = hadoop_20190408171623_2c787f42-e5e5-460d-8a21-90a5a5c9afc3
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1552651623473_0010, Tracking URL = http://jms-master-01:8088/proxy/application_1552651623473_0010/
Kill Command = /home/hadoop/tools/hadoop-2.7.7/bin/hadoop job  -kill job_1552651623473_0010
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2019-04-08 17:16:32,625 Stage-1 map = 0%,  reduce = 0%
2019-04-08 17:16:38,994 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.1 sec
MapReduce Total cumulative CPU time: 2 seconds 100 msec
Ended Job = job_1552651623473_0010
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://jms-master-01:9000/tmp/hive-hadoop/hadoop/8dbf32dc-51a3-4caa-b5be-9d682ccdb29c/_tmp_space.db/09e10ccb-5d5f-45ab-b07a-afe5590b2352/.hive-staging_hive_2019-04-08_17-16-23_750_5995661068890136721-1/-ext-10000
Loading data to table default.temp_table
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1   Cumulative CPU: 2.1 sec   HDFS Read: 4605 HDFS Write: 155 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 100 msec
OK
Time taken: 16.73 seconds
hive> select * from temp_table;
OK
1   郭靖
2   黃蓉
3   楊康
4   穆念慈
5   東邪
6   西毒
7   黃老邪
8   楊鐵心
Time taken: 0.161 seconds, Fetched: 8 row(s)

臨時表插入數據不支持insert overwrite語法。

hive> insert overwrite temp_table select id, name from partition_table;
NoViableAltException(24@[])
    at org.apache.hadoop.hive.ql.parse.HiveParser.destination(HiveParser.java:38762)
    at org.apache.hadoop.hive.ql.parse.HiveParser.insertClause(HiveParser.java:38531)
    at org.apache.hadoop.hive.ql.parse.HiveParser.regularBody(HiveParser.java:36478)
    at org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpressionBody(HiveParser.java:35822)
    at org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpression(HiveParser.java:35710)
    at org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:2284)
    at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1333)
    at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:208)
    at org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:77)
    at org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:70)
    at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:468)
    at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1317)
    at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1457)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227)
    at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
    at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)
    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
    at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821)
    at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:226)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:141)
FAILED: ParseException line 1:17 cannot recognize input near 'temp_table' 'select' 'id' in destination specification

退出當前會話,臨時表會被刪除。

hive> quit;
[hadoop@jms-master-01 ~]$ hive
which: no hbase in (/usr/share/svensudo/bin:/home/hadoop/tools/scala-2.12.8/bin:/home/hadoop/tools/spark-2.4.0-bin-hadoop2.7/bin:/home/hadoop/tools/java/jdk1.8.0_191/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/usr/local/go/bin/:/alc/tool/bin:/alc/tool/bin:/home/hadoop/tools/hadoop-2.7.7/bin:/home/hadoop/tools/apache-hive-2.3.4-bin/bin:/home/hadoop/bin)
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/tools/apache-hive-2.3.4-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/tools/hadoop-2.7.7/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Logging initialized using configuration in file:/home/hadoop/tools/apache-hive-2.3.4-bin/conf/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive> select * from temp_table;
FAILED: SemanticException [Error 10001]: Line 1:14 Table not found 'temp_table'

事務表(Transactional Tables)

從Hive1.4.0+版本開始支持事務表。Hive提供了支持ACID語義的事務表,可以完成增刪改操作。
關于事務表,展開又是一個專題了。可以參考官方事務表文檔

主鍵約束

從Hive2.1.0版本開始,支持約束。Hive支持未經驗證的主鍵和外鍵約束,注意是未經驗證的,所以主鍵的正確性需要上游數據來保證。

刪除表

drop table [if exists] table_name [purge];
通常情況下,刪除托管表,hive會刪除元數據信息和數據文件,刪除文件相當于執行了hadoop fs -rm,也就是說數據文件會被移動到hdfs系統下是trash回收站中,在誤操作的情況下,在一定時效期間還可以找回數據。如果指定了purge選項,則不會移動到回收站下,而是直接永久刪除。所以drop動作一定要謹慎。

Truncate表

truncate table table_name [partition partition_spec]
這里partition_spec指(partition_column=partition_col_value,partition_column=partition_col_value, ...)

相信如果對關系型數據庫sql語言比較熟悉的話,以上操作都可以理解。

Alter table/partition/column

Alter table
重命名表

alter table table_name rename to new_table_name;

hive> show tables;
OK
first_table
Time taken: 5.51 seconds, Fetched: 9 row(s)
hive> alter table first_table rename to second_table;
OK
Time taken: 0.406 seconds
hive> show tables;
OK
second_table
Time taken: 0.05 seconds, Fetched: 9 row(s)
修改表的屬性

alter table table_name set tblproperties table_properties;
table_properties:
(property_name = property_value, property_name = property_value, ...)

最后編輯于
?著作權歸作者所有,轉載或內容合作請聯系作者
平臺聲明:文章內容(如有圖片或視頻亦包括在內)由作者上傳并發布,文章內容僅代表作者本人觀點,簡書系信息發布平臺,僅提供信息存儲服務。

推薦閱讀更多精彩內容