hive的内部表and外部表创建

1.内部表

show databases；
create database lxq;
use lxq;
创建内表：
正确的建表语句为：
create table t_order(id string,create_time string,amount float,uid string)
row format delimited
fields terminated by’,’;
这样就指定了，我们的表数据文件中的字段分隔符为 “,”

添加数据并发送：
Hadoop fs -put a.txt /user/hive/warehonse/db_lxq/t_lxqs(创建好文档发送地址)
load data local inpath ‘/usr/local/course’ into table t_sourse;

drop table t_order;
删除表的效果是：
hive会从元数据库中清除关于这个表的信息；
hive还会从hdfs中删除这个表的表目录；

2.外部表

外部表(EXTERNAL_TABLE)：表目录由建表用户自己指定
create external table t_lxq (ip string,url string,access_time string)
row format delimited
fields terminated by ‘,’;
location ‘/lxq /log’;

path /lxq/log

外部表和内部表的特性差别：
1、内部表的目录在hive的仓库目录中 VS 外部表的目录由用户指定
2、drop一个内部表时：hive会清除相关元数据，并删除表数据目录
3、drop一个外部表时：hive只会清除相关元数据；

一个hive的数据仓库，最底层的表，一定是来自于外部系统，为了不影响外部系统的工作逻辑，在hive中可建external表来映射这些外部系统产生的数据目录；
然后，后续的etl操作，产生的各种表建议用managed_table

3.分区表

示例如下：
1、创建带分区的表：
create table t_lxqs(ip string,url string,access_time string)
partitioned by(dt string)
row format delimited
fields terminated by ‘,’;

2、向分区中导入数据：从本地导入表
load data local inpath ‘/root/access.log.2017-08-04.log’ into table t_access partition(dt=‘20170804’);
load data local inpath ‘/root/access.log.2017-08-05.log’ into table t_access partition(dt=‘20170805’);

3、针对分区数据进行查询
a、统计8月4号的总PV：
select count(*) from t_access where dt=‘20170804’;
实质：就是将分区字段当成表字段来用，就可以使用where子句指定分区了

b、统计表中所有数据总的PV：
select count(*) from t_access;
实质：不指定分区条件即可