Hive之——数据操作

转载请注明出处：https://blog.csdn.net/l1028386804/article/details/80550762

一、Hive基本使用——查询

基本语法

select [all | distinct] select_expr, select_expr, ... from tablename [where where_condition]

二、实例

1、hive命令行执行

select * from lyz;

2、linux命令行执行

hive -e "select * from lyz"
hive -S -e "select * from lyz"
hive -V -e "select * from lyz"

3、执行文件中的HQL

hive -f "/home/lyz.sql"

4、脚本执行HQL

#!/bin/bash
hive -e "select * from lyz"

三、Hive操作——变量

1、配置变量

set val = ''
${hiveconf:val}

2、环境变量

${env:HOME},注env查看所有环境变量

四、数据加载

1、内表数据加载

1) 创建表时加载
> create table newtable as select col1, col2 from oldtable
2)创建表时是指定数据的位置
> create table tablename() location ''
3) 本地数据加载
> load data local inpath 'localpath' [overwrite] into table tablename
4) 加载hdfs数据
> load data inpath 'hdfspath' [overwrite] into table tablename
注： 这个操作是移动数据
5) 使用Hadoop命令拷贝数据到指定的位置(Hive的shell中执行和Linux的shell执行)
6) 由查询语句加载数据
insert [overwrite | into] table tablename
select col1, col2
from table
where ...

实例：
insert overwrite test_m
select name, address
from testtext
where name = 'test';

from table
insert [overwrite | into] table tablename
select col1, col2
where ...

实例：
from testtext
insert overwrite test_m
select name, address
where name = 'test';

注意
1) 字段对应不同于一些关系型数据库
2) 在hive shell下执行Linux shell
> ! ls /home

2、外表数据加载

1) 创建表时是指定数据的位置

create external table tablename() location ''

2) 查询插入,同内表
3) 使用Hadoop命令拷贝数据到指定的位置(Hive的shell中执行和Linux的shell执行)

3、分区表数据加载

1) 内部分区表和外部分区表数据加载
   内部分区表数据加载方式类似于内表
   外部分区表数据加载方式类似于外表
注意：数据存放的路径层次要和表的分区一致；如果分区表没有新增分区，即使目标路径下已经没有数据了，但依然查不到数据
2) 不同之处
   加载数据指定目标表的不同，需要指定分区
3) 本地数据加载

load data local inpath 'localpath' [overwrite] into table tablename partition(pn = '')

4) 加载hdfs数据

load data inpath 'hdfspath' [overwrite] into table tablename partition(pn='')

5) 由查询语句加载数据

insert [overwrite] into table tablename partition(pn='')
select col1, col2
from table
where ...

实例：

#创建内部分区表
create table test_p(
name string,
val string
)
partitioned by (st string)
row format delimited fields terminated by '\t' lines terminated by '\n'
stored as textfile;

#本地数据加载
load data local inpath '/usr/local/src/data' into table test_p partition (st='20180602');
#加载hdfs数据
load data inpath '/data/data' into table test_p partition(st='20180602')
#由查询语句加载数据
insert  into table test_p partition(st='20180602')
select name, address
from lyz
where name = 'lyz';

#创建外部分区表
create table test_ep(
name string,
val string
)
partitioned by (st string)
row format delimited fields terminated by '\t' lines terminated by '\n'
stored as textfile
location '/external/data';

hadoop fs -mkdir /external/data/st=20180602
hadoop fs -copyFromLocal /usr/local/src/data /external/data/st=20180602
alter table test_ep add partition(st='20180602');  #注意：利用Hadoop命令将文件拷贝到外部分区表指定分区下的目录中，必须用此命令为表添加分区后才能查询到表中的数据
show partitions test_ep;
select * from test_ep;

4、Hive数据加载注意的问题

1) 分隔符问题，且分隔符默认只有单个字符
比如有以下建表语句：

create table test_p(
name string,
val string
)
partitioned by (st string)
row format delimited fields terminated by '#\t' lines terminated by '\n'
stored as textfile;

此时，hive只会根据#分隔每一列内容
2) 数据类型对应问题
   load数据，字段类型不能互相转化，查询返回NULL
   select查询插入，字段类型不能互相转化时，插入数据为NULL
3) select查询插入数据，字段值顺序要与表中字段顺序一致，名称可不一致
   Hive在数据加载时不做检查，查询时检查
4) 外部分区表需要添加分区才能看到数据(重要)