Hive学习之基础语法

1、创建数据库

create database 数据库名;

2、使用数据库

use 数据库名;

3、创建表

内部表:表目录安装hive的规范来部署,位于hive仓库目录/user/hive/warehouse中

create table t_pv_log(ip string,url string,access_time string )
row format delimited
fields terminated by ',';

外部表:表目录由用户指定

在hdfs上创建文件夹

 hadoop fs -mkdir -p /pvlog/2017-09-16

准备测试数据

192.168.33.1,http://sina.com/a,2017-09-16 12:52:01
192.168.33.2,http://sina.com/a,2017-09-16 12:51:01
192.168.33.1,http://sina.com/a,2017-09-16 12:50:01
192.168.33.2,http://sina.com/b,2017-09-16 12:49:01
192.168.33.1,http://sina.com/b,2017-09-16 12:48:01
192.168.33.4,http://sina.com/a,2017-09-16 12:47:01
192.168.33.3,http://sina.com/a,2017-09-16 12:46:01
192.168.33.2,http://sina.com/b,2017-09-16 12:45:01
192.168.33.2,http://sina.com/a,2017-09-16 12:44:01
192.168.33.1,http://sina.com/a,2017-09-16 13:43:01

将数据上传至hdfs中/pvlog/2017-09-16

hadoop fs -put ./pv.log /pvlog/2017-09-16

创建外部表:

create external table t_pv_log(ip string,url string,access_time string )
row format delimited
fields terminated by ','
location '/pvlog/2017-09-16';

内部表和外部表区别:

    内部表删除时表和数据同时删除

    外部表只删除表,数据文件依旧存在于hdfs系统中

4、分区表

    分区表的实质是:在表目录中为数据文件创建分区子目录,以便于在查询时,MR程序可以针对分区子目录中的数据进行处理,缩减读取数据的范围。

    比如,网站每天产生的浏览记录,浏览记录应该建一个表来存放,但是,有时,我们可能只需要对每一天的浏览记录进行分析

    这时,就可以将这个表建为分区表,每天的数据导入其中的一个分区

准备数据:

    192.168.33.1,http://sina.com/a,2017-09-16 12:52:01
    192.168.33.2,http://sina.com/a,2017-09-16 12:51:01
    192.168.33.1,http://sina.com/a,2017-09-16 12:50:01
    192.168.33.2,http://sina.com/b,2017-09-16 12:49:01
    192.168.33.1,http://sina.com/b,2017-09-15 12:48:01
    192.168.33.4,http://sina.com/a,2017-09-15 12:47:01
    192.168.33.3,http://sina.com/a,2017-09-15 12:46:01
    192.168.33.2,http://sina.com/b,2017-09-15 12:45:01
    192.168.33.2,http://sina.com/a,2017-09-15 12:44:01
    192.168.33.1,http://sina.com/a,2017-09-15 13:43:01

    创建分区表

create table t_pv_log(ip string,url string ,access_time string)
partitioned by(day string)
row format delimited
fields terminated by ',';

    将数据加载入新建的表中:

load data local inpath '/usr/local/hivetest/pv.log.15' into table t_pv_log partition(day='20170916');

    通过分区字段查询数据:

0: jdbc:hive2://hadoop00:10000> select * from t_pv_log where day ='20170916';
+---------------+--------------------+-----------------------+---------------+--+
|  t_pv_log.ip  |    t_pv_log.url    | t_pv_log.access_time  | t_pv_log.day  |
+---------------+--------------------+-----------------------+---------------+--+
| 192.168.33.1  | http://sina.com/a  | 2017-09-16 12:52:01   | 20170916      |
| 192.168.33.2  | http://sina.com/a  | 2017-09-16 12:51:01   | 20170916      |
| 192.168.33.1  | http://sina.com/a  | 2017-09-16 12:50:01   | 20170916      |
| 192.168.33.2  | http://sina.com/b  | 2017-09-16 12:49:01   | 20170916      |
| 192.168.33.1  | http://sina.com/b  | 2017-09-16 12:48:01   | 20170916      |
| 192.168.33.4  | http://sina.com/a  | 2017-09-16 12:47:01   | 20170916      |
| 192.168.33.3  | http://sina.com/a  | 2017-09-16 12:46:01   | 20170916      |
| 192.168.33.2  | http://sina.com/b  | 2017-09-16 12:45:01   | 20170916      |
| 192.168.33.2  | http://sina.com/a  | 2017-09-16 12:44:01   | 20170916      |
| 192.168.33.1  | http://sina.com/a  | 2017-09-16 13:43:01   | 20170916      |
+---------------+--------------------+-----------------------+---------------+--+

5、文件导入

方式1:

    手动用hdfs命令,将文件放入表目录。

方式2:在hive的交互式shell中用hive命令来导入本地数据到表目录

    load data local inpath '/usr/local/data/' into table order;

方式3:用hive命令导入hdfs中的数据文件到表目录

    load data inpath ‘access.log’ into table t_access  partition(day='20170916');

注意导入本地文件和导HDFS文件区别:

    本地文件导入表:复制

    HDFS文件导入表:移动

猜你喜欢

转载自my.oschina.net/u/3411649/blog/2961397
今日推荐