版权声明:Please make the source marked https://blog.csdn.net/qq_31807385/article/details/84677722
目录
总结:
1,删除管理表的时候,会将管理表的HDFS上的数据和Metastore中的元数据(表结构等)都删除掉:而删除外部表的时候,仅会删除Metastore中的信息,HDFS上的数据并不会别删除。
2,创建表的三种方式:①普通的创建,并使用本地或者是HDFS导入数据;②创建的时候使用 as select 另外一张表,即获得该表的表结构和表数据;③创建一张表的时候,使用 like,获得该表的表结构,表的数据需要另外的导入。
Hive中创建一张表:
create table if not exists stu1(id int,name string)
row format delimited
fields terminated by '\t';
0: jdbc:hive2://hadoop108:10000> show tables;
OK
+-----------+--+
| tab_name |
+-----------+--+
| stu1 |
+-----------+--+
该表创建之后,HDFS上会多出一条目录: /user/hive/warehouse/db_hive.db/stu1
Hive创建的表默认为管理表,也叫做内部表,关于这张表的具体信息可以使用下面这条语句查询到:
0: jdbc:hive2://hadoop108:10000> desc formatted stu1;
如下:
①:从本地导入数据到stu1表中:
load data local inpath '/opt/module/hive/stu.txt' into table db_hive.stu1;
0: jdbc:hive2://hadoop108:10000> select * from stu1;
OK
+----------+------------+--+
| stu1.id | stu1.name |
+----------+------------+--+
| 1001 | zhangfei |
| 1002 | liubei |
| 1003 | guanyu |
| 1004 | zhaoyun |
| 1005 | caocao |
| 1006 | zhouyu |
+----------+------------+--+
6 rows selected (1.095 seconds)
创建一张表的同时,借助as 来导入数据:
create table stu2 as select * from stu1;
0: jdbc:hive2://hadoop108:10000> select * from stu2;
OK
+----------+------------+--+
| stu2.id | stu2.name |
+----------+------------+--+
| 1001 | zhangfei |
| 1002 | liubei |
| 1003 | guanyu |
| 1004 | zhaoyun |
| 1005 | caocao |
| 1006 | zhouyu |
+----------+------------+--+
6 rows selected (0.18 seconds)
这种表的创建方式会使用MapReduce来创建
创建一张表的同时,借助like 来获取表结构而不是数据:
0: jdbc:hive2://hadoop108:10000> create table stu3 like stu1;
OK
No rows affected (0.122 seconds)
0: jdbc:hive2://hadoop108:10000> select * from stu3;
OK
+----------+------------+--+
| stu3.id | stu3.name |
+----------+------------+--+
+----------+------------+--+
No rows selected (0.101 seconds)
0: jdbc:hive2://hadoop108:10000> show create table stu3;
OK
+-----------------------------------------------------------------+--+
| createtab_stmt |
+-----------------------------------------------------------------+--+
| CREATE TABLE `stu3`( |
| `id` int, |
| `name` string) |
| ROW FORMAT DELIMITED |
| FIELDS TERMINATED BY '\t' |
| STORED AS INPUTFORMAT |
| 'org.apache.hadoop.mapred.TextInputFormat' |
| OUTPUTFORMAT |
| 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' |
| LOCATION |
| 'hdfs://hadoop108:9000/user/hive/warehouse/db_hive.db/stu3' |
| TBLPROPERTIES ( |
| 'transient_lastDdlTime'='1543605082') |
+-----------------------------------------------------------------+--+
外部表:
1,创建一个外部表:
create external table stu_ex1(id int,name string)
row format delimited
fields terminated by '\t';
0: jdbc:hive2://hadoop108:10000> show tables;
+-----------+--+
| tab_name |
+-----------+--+
| stu1 |
| stu2 |
| stu3 |
| stu_ex1 |
+-----------+--+
4 rows selected (0.141 seconds)
0: jdbc:hive2://hadoop108:10000> desc formatted stu_ex1;
关于stu_ex1外部表的具体信息如下:
1,在外部表中添加数据
> load data local inpath '/opt/module/hive/stu.txt' into table db_hive.stu_ex1;
0: jdbc:hive2://hadoop108:10000> select * from stu_ex1;
OK
+-------------+---------------+--+
| stu_ex1.id | stu_ex1.name |
+-------------+---------------+--+
| 1001 | zhangfei |
| 1002 | liubei |
| 1003 | guanyu |
| 1004 | zhaoyun |
| 1005 | caocao |
| 1006 | zhouyu |
+-------------+---------------+--+
2,删除外部表
0: jdbc:hive2://hadoop108:10000> drop table stu_ex1;
0: jdbc:hive2://hadoop108:10000> show tables;
+-----------+--+
| tab_name |
+-----------+--+
| stu1 |
| stu2 |
| stu3 |
+-----------+--+
> !sh hadoop fs -ls /user/hive/warehouse/db_hive.db
Found 4 items
drwxr-xr-x - isea supergroup 0 2018-12-01 02:59 /user/hive/warehouse/db_hive.db/stu1
drwxr-xr-x - isea supergroup 0 2018-12-01 03:06 /user/hive/warehouse/db_hive.db/stu2
drwxr-xr-x - isea supergroup 0 2018-12-01 03:11 /user/hive/warehouse/db_hive.db/stu3
drwxr-xr-x - isea supergroup 0 2018-12-01 03:41 /user/hive/warehouse/db_hive.db/stu_ex1
观察到,该表对应在HDFS的数据并没有被删除。这就是外部表的特性,在删除外部表的时候,
删除该表的Metastore,而不删除数据的信息,但是在删除管理表的时候,两者都会删除。
外部表的使用 场景:结合location来使用:
首先在HDFS上创建一个目录,然后向该目录上上传一个文件数据,然后在hive上创建一张表,并使用location将HDFS上面的数据和该表关联起来。即数据共享。
演示如下:
0: jdbc:hive2://hadoop108:10000> !sh hadoop fs -mkdir /ex
0: jdbc:hive2://hadoop108:10000> !sh hadoop fs -put /opt/module/hive/stu.txt /ex
0: jdbc:hive2://hadoop108:10000> show tables;
OK
+-----------+--+
| tab_name |
+-----------+--+
| stu1 |
| stu2 |
| stu3 |
+-----------+--+
3 rows selected (0.06 seconds)
0: jdbc:hive2://hadoop108:10000> create external table stu_ex1(id int,name string)
0: jdbc:hive2://hadoop108:10000> row format delimited
0: jdbc:hive2://hadoop108:10000> fields terminated by '\t'
0: jdbc:hive2://hadoop108:10000> location '/ex'; //这里注意是文件夹而不是数据所在的文件
OK
No rows affected (0.202 seconds)
0: jdbc:hive2://hadoop108:10000> show tables;
OK
+-----------+--+
| tab_name |
+-----------+--+
| stu1 |
| stu2 |
| stu3 |
| stu_ex1 |
+-----------+--+
4 rows selected (0.051 seconds)
0: jdbc:hive2://hadoop108:10000> select * from stu_ex1;
OK
+-------------+---------------+--+
| stu_ex1.id | stu_ex1.name |
+-------------+---------------+--+
| 1001 | zhangfei |
| 1002 | liubei |
| 1003 | guanyu |
| 1004 | zhaoyun |
| 1005 | caocao |
| 1006 | zhouyu |
+-------------+---------------+--
外部表和内部表的转化:
0: jdbc:hive2://hadoop108:10000> alter table stu_ex1 set tblproperties('EXTERNAL'='FALSE');
0: jdbc:hive2://hadoop108:10000> alter table stu_ex1 set tblproperties('EXTERNAL'='TRUE');
OK
No rows affected (0.144 seconds)
使用下面的查询语言查询stu_ex1的表状态信息
0: jdbc:hive2://hadoop108:10000> desc formatted stu_ex1;
总结:
1,删除管理表的时候,会将管理表的HDFS上的数据和Metastore中的元数据(表结构等)都删除掉:而删除外部表的时候,仅会删除Metastore中的信息,HDFS上的数据并不会别删除。
2,创建表的三种方式:①普通的创建,并使用本地或者是HDFS导入数据;②创建的时候使用 as select 另外一张表,即获得该表的表结构和表数据;③创建一张表的时候,使用 like,获得该表的表结构,表的数据需要另外的导入。