Hive学习(五)-----DML,SerDe

1.DML

1.1导入数据

1.1.1 Load方式导入数据

LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]
 
LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)] [INPUTFORMAT 'inputformat' SERDE 'serde'] (3.0 or later)

示例:

//从本地导入数据
load data local inpath 'path' into|overwrite table table_name [partition (partcol=val,.......)]
//从HDFS导入数据
load data inpath 'Hdfs_path' into|overwrite table table_name [partition (partcol=val,.......)]

注意: into表示在表后追加,overwrite表示在覆盖。从本地导入数据文件被执行copy操作,从HDFS导入数据文件被执行move操作(因为hdfs文件存在副本的原因)

1.1.2 insert into 的方式

Standard syntax:
INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] select_statement1 FROM from_statement;
INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] select_statement1 FROM from_statement;
 
Hive extension (multiple inserts):
FROM from_statement
INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] select_statement1
[INSERT OVERWRITE TABLE tablename2 [PARTITION ... [IF NOT EXISTS]] select_statement2]
[INSERT INTO TABLE tablename2 [PARTITION ...] select_statement2] ...;
FROM from_statement
INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] select_statement1
[INSERT INTO TABLE tablename2 [PARTITION ...] select_statement2]
[INSERT OVERWRITE TABLE tablename2 [PARTITION ... [IF NOT EXISTS]] select_statement2] ...;
 
Hive extension (dynamic partition inserts):
INSERT OVERWRITE TABLE tablename PARTITION (partcol1[=val1], partcol2[=val2] ...) select_statement FROM from_statement;
INSERT INTO TABLE tablename PARTITION (partcol1[=val1], partcol2[=val2] ...) select_statement FROM from_statement;

示例:

//假设psn存在 id,name .likes字段,psn1,psn2表存在

//方式1
FROM psn
INSERT OVERWRITE TABLE psn1
SELECT id,name
insert into psn2
select id,likes

用途:选取一个表的某些字段的值,存到一个已经存在的表中或多个表中(此表字段属性必须和插入的值的属性一样)。

//方式2

FROM psn
INSERT OVERWRITE TABLE psn1
SELECT id,name

FROM psn
insert into psn2
select id,likes

注意:方式1和2表达的意思虽然相同,但执行过程却不同。这两种方式执行都会执行mr任务。方式1,读取一次表数据,执行一次mr任务。方式2,读取两次表数据,执行两次mr任务。方式1效率更高,更常使用。我们在执行任务时,尽量减少IO(磁盘IO或是网络IO)

1.2 更新(update)和删除(delete)

FAILED: SemanticException [Error 10294]: Attempt to do update or delete using transaction manager that does not support these operations.

hive不能单独使用update和delete操作。
必须设置这些配置参数以打开Hive中的事务支持。
hive官网事务介绍与配置

hive支持事务,但是一般不会使用。

2.SerDe

Hive SerDe - Serializer and Deserializer
--------SerDe 用于做序列化和反序列化。
--------构建在数据存储和执行引擎之间,对两者实现解耦。
--------Hive通过ROW FORMAT DELIMITED以及SERDE进行内容的读写。

row_format
  : DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char]] [COLLECTION ITEMS TERMINATED BY char]
        [MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]
        [NULL DEFINED AS char]   -- (Note: Available in Hive 0.13 and later)
  | SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, property_name=property_value, ...)]

实例;
数据格式:
192.168.57.4 - - [29/Feb/2016:18:14:35 +0800] “GET /bg-upper.png HTTP/1.1” 304 -
192.168.57.4 - - [29/Feb/2016:18:14:35 +0800] “GET /bg-nav.png HTTP/1.1” 304 -
192.168.57.4 - - [29/Feb/2016:18:14:35 +0800] “GET /asf-logo.png HTTP/1.1” 304 -
192.168.57.4 - - [29/Feb/2016:18:14:35 +0800] “GET /bg-button.png HTTP/1.1” 304 -
192.168.57.4 - - [29/Feb/2016:18:14:35 +0800] “GET /bg-middle.png HTTP/1.1” 304 -
192.168.57.4 - - [29/Feb/2016:18:14:36 +0800] “GET / HTTP/1.1” 200 11217
192.168.57.4 - - [29/Feb/2016:18:14:36 +0800] “GET / HTTP/1.1” 200 11217

SQL语句:

 CREATE TABLE logtbl (
    host STRING,
    identity STRING,
    t_user STRING,
    time STRING,
    request STRING,
    referer STRING,
    agent STRING)
  ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
  WITH SERDEPROPERTIES (
    "input.regex" = "([^ ]*) ([^ ]*) ([^ ]*) \\[(.*)\\] \"(.*)\" (-|[0-9]*) (-|[0-9]*)"
  )
  STORED AS TEXTFILE;
发布了19 篇原创文章 · 获赞 1 · 访问量 325

猜你喜欢

转载自blog.csdn.net/qq_43719634/article/details/102555698