Hive DDL和DML、乱码、hiveserver2/beeline

DDL

与SQL语句很类似

创建数据库:
CREATE (DATABASE|SCHEMA) [IF NOT EXISTS] database_name
[COMMENT database_comment]
[LOCATION hdfs_path]
[WITH DBPROPERTIES (property_name=property_value, ...)];
创建表:
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name    -- (Note: TEMPORARY available in Hive 0.14.0 and later)
  [(col_name data_type [COMMENT col_comment], ... [constraint_specification])]
  [COMMENT table_comment]
  [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
  [CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
  [SKEWED BY (col_name, col_name, ...)                  -- (Note: Available in Hive 0.10.0 and later)]
     ON ((col_value, col_value, ...), (col_value, col_value, ...), ...)
     [STORED AS DIRECTORIES]
  [
   [ROW FORMAT row_format] 
   [STORED AS file_format]
     | STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)]  -- (Note: Available in Hive 0.6.0 and later)
  ]
  [LOCATION hdfs_path]
  [TBLPROPERTIES (property_name=property_value, ...)]   -- (Note: Available in Hive 0.6.0 and later)
  [AS select_statement];   -- (Note: Available in Hive 0.5.0 and later; not supported for external tables)

DML

导入数据:
LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]

LOCAL:本地系统
非LOCAL: HDFS

OVERWRITE: 覆盖
非OVERWRITE:追加

插入数据:
INSERT OVERWRITE [LOCAL] DIRECTORY directory1
  [ROW FORMAT row_format] [STORED AS file_format] 
  SELECT ... FROM ...  

使用:

INSERT OVERWRITE  LOCAL DIRECTORY '/home/hadoop/tmp/hivetmp'
ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t"
select * from emp;

FROM from_statement
INSERT OVERWRITE [LOCAL] DIRECTORY directory1 select_statement1
[INSERT OVERWRITE [LOCAL] DIRECTORY directory2 select_statement2] ...


from emp
INSERT OVERWRITE  LOCAL DIRECTORY '/home/hadoop/tmp/hivetmp1'
ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t"
select empno, ename  
INSERT OVERWRITE  LOCAL DIRECTORY '/home/hadoop/tmp/hivetmp2'
ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t"
select ename;   
分区表:

静态分区:

CREATE TABLE ruoze_order_partition (
order_number string,
event_time string
)
PARTITIONED BY (event_month string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t";

LOAD DATA LOCAL INPATH "/home/hadoop/data/order_created.txt" 
OVERWRITE INTO TABLE ruoze_order_partition
PARTITION (event_month='2014-05');

如果手动添加了一个分区,比如event_month=’2014-06’(是在hdfs上添加,目录结构什么的都对),但是元数据没有,查询也查不出来,可以:1、用SQL语句加上分区(ALTER TABLE table_name ADD PARTITION )
2、使用msck修复(MSCK REPAIR TABLE table_name;)

动态分区:

将emp表的数据按照部门分组,并将数据加载到其对应的分组中去ruoze_emp_partition
可以:

insert into table ruoze_emp_partition partition(deptno=10)
select empno,ename ,job ,mgr ,hiredate ,salary ,comm from emp where deptno=10;

insert into table ruoze_emp_partition partition(deptno=20)
select empno,ename ,job ,mgr ,hiredate ,salary ,comm from emp where deptno=20;

insert into table ruoze_emp_partition partition(deptno=30)
select empno,ename ,job ,mgr ,hiredate ,salary ,comm from emp where deptno=30;

但是分区或数据太多就不行了。所以需要动态分区:

CREATE TABLE ruoze_emp_dynamic_partition (
empno int,
ename string,
job string,
mgr int,
hiredate string,
salary double,
comm double
)
PARTITIONED BY (deptno int)
ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t"; 

insert into table ruoze_emp_dynamic_partition partition(deptno)
select empno,ename ,job ,mgr ,hiredate ,salary ,comm, deptno from emp;

deptno需要放在最后一个。
如果失败,提示的是动态分区需要至少有一个分区字段的话,可以设置:
set hive.exec.dynamic.partition.mode=nonstrict;

注意点

在插入数据的时候,需要注意字段的对应。

MANAGED_TABLE vs EXTERNAL的区别
MANAGED_TABLE  删除  HDFS+META  DELETE
EXTERNAL       删除  META DELETE 
所以最好用外部表。
另一种导入数据的方法
INSERT OVERWRITE TABLE ruoze_emp2 select * from emp; 
或者
CREATE TABLE emp1 as select * from emp [where 1=0];
不用启动Hive客户端的用法
hive -e "select * from emp limit 5"
可以和shell配合使用
hive --help可以查看其他用法。
乱码解决办法

改变mysql设置,不能改变已经存在的表。你需要转换表的编码。

扫描二维码关注公众号,回复: 3159991 查看本文章

alter database ruozedata_basic02 character set latin1;
use ruozedata_basic02;
alter table PARTITIONS convert to character set latin1;
alter table PARTITION_KEYS convert to character set latin1;

EXPORT/IMPORT

导出:

EXPORT TABLE tablename [PARTITION (part_column="value"[, ...])]
  TO 'export_target_path' [ FOR replication('eventid') ]
其中,export_target_path指的是hdfs中的目录,可以是/user/hive/warehouse/..  ,也可以是hdfs://192.168.137.201:9000/user/hive/warehouse/..

导入:

IMPORT [[EXTERNAL] TABLE new_or_original_tablename [PARTITION (part_column="value"[, ...])]]
  FROM 'source_path'
  [LOCATION 'import_target_path']
这里的source_path是export_target_path。

hiveserver2

hive根目录的bin目录下
启动hiveserver2:./hiveserver2
启动beeline:./beeline
连接:!connect jdbc:hive2://localhost:10000 hadoop
端口号后面的应该是用户名和密码,密码好像可以随意(或者不需要密码)。

beeline -u方式

注意不要连到spark上面去了(环境变量有可能会覆盖掉)
./beeline -u jdbc:hive2://localhost:10000/default -n hadoop
hadoop应该是所使用的系统用户名,而且不需要密码

jdbc方式

根据官网的案例https://cwiki.apache.org/confluence/display/Hive/HiveClient#HiveClient-JDBC 来使用,其中driverName是”org.apache.hive.jdbc.HiveDriver”,驱动管理的getConnection是(“jdbc:hive2://localhost:10000/default”,”“,”“)。官网这两个地方有误。

猜你喜欢

转载自blog.csdn.net/weixin_37677769/article/details/82493220