hive中的复杂类型struct、array、map

创建一个员工表：

create table if not exists employees( 
  	name	string, 
  	salary	float, 
  	subordinates	array<string>, 
  	deductions	map<string,float>, 
  	address	struct<street:string, city:string, state:string, zip:int> 
  ) row format delimited 
  fields terminated by '\001' 
  collection items terminated by '\002' 
  map keys terminated by '\003' 
  lines terminated by '\n' 
  stored as textfile;

其中subordinates下属员工字段为array类型，deductions扣除字段为map类型，address地址字段为struct类型。

每个列之间通过'\001'分割，array、struct、map中每组元素通过'\002'分割，map的key、value之间通过'\003'分割。

下面来导入数据：

load data local inpath './data/employees.txt' overwrite into table employees;

employeets.txt文件的内容如下：

John Doe100000.0Mary SmithTodd JonesFederal Taxes.2State Taxes.05Insurance.11 Michigan Ave.ChicagoIL60600
Mary Smith80000.0Bill KingFederal Taxes.2State Taxes.05Insurance.1100 Ontario St.ChicagoIL60601
Todd Jones70000.0Federal Taxes.15State Taxes.03Insurance.1200 Chicago Ave.Oak ParkIL60700
Bill King60000.0Federal Taxes.15State Taxes.03Insurance.1300 Obscure Dr.ObscuriaIL60100
Boss Man200000.0John DoeFred FinanceFederal Taxes.3State Taxes.07Insurance.051 Pretentious Drive.ChicagoIL60500
Fred Finance150000.0Stacy AccountantFederal Taxes.3State Taxes.07Insurance.052 Pretentious Drive.ChicagoIL60500
Stacy Accountant60000.0Federal Taxes.15State Taxes.03Insurance.1300 Main St.NapervilleIL60563

因为分隔符是不可见的，所以在这上面看不出来，可以直接复制。

array类型的数据可以通过'数组名[index]'的方式访问，index从0开始:

select subordinates[0],subordinates[1] from employees limit 1;

查询结果为:

c0	c1
Mary Smith	Todd Jones

第一行是字段的默认列名。

struct类型的数据可以通过'列名.字段名'的方式访问，如下:

select address.street, address.city, address.state, address.zip from employees limit 1;

查询结果为:

street	city	state	zip
1 Michigan Ave.	Chicago	IL	60600

map类型的数据可以通过'列名['key']'的方式访问，如下：

select deductions['Federal Taxes'],deductions['State Taxes'],deductions['Insurance'] from employees limit 1;

查询结果为：

c0	c1	c2
0.2	0.05	0.1

如果没有查到结果可以使用if判断：

select if(deductions['Federal Taxes'] is null, 0, deductions['Federal Taxes']) from employees limit 1;

我们也可以用UDTF把结果变成多行，比如：

select explode(deductions) from employees;

查询结果为：

key	value
Federal Taxes	0.2
State Taxes	0.05
Insurance	0.1
Federal Taxes	0.2
State Taxes	0.05
Insurance	0.1
Federal Taxes	0.15
State Taxes	0.03
Insurance	0.1
Federal Taxes	0.15
State Taxes	0.03
Insurance	0.1
Federal Taxes	0.3
State Taxes	0.07
Insurance	0.05
Federal Taxes	0.3
State Taxes	0.07
Insurance	0.05
Federal Taxes	0.15
State Taxes	0.03
Insurance	0.1

有时候我们需要把name也查询出来：

select name, explode(deductions) from employees;

发现执行语句会报错：

FAILED: SemanticException [Error 10081]: UDTF's are not supported outside the SELECT clause, nor nested in expressions

hive不支持这种语法。

可以用下面语句查询：

select name,dekey,devalue from employees LATERAL VIEW explode(deductions) dedView as dekey,devalue;

查询结果为：

name	dekey	devalue
John Doe	Federal Taxes	0.2
John Doe	State Taxes	0.05
John Doe	Insurance	0.1
Mary Smith	Federal Taxes	0.2
Mary Smith	State Taxes	0.05
Mary Smith	Insurance	0.1
Todd Jones	Federal Taxes	0.15
Todd Jones	State Taxes	0.03
Todd Jones	Insurance	0.1
Bill King	Federal Taxes	0.15
Bill King	State Taxes	0.03
Bill King	Insurance	0.1
Boss Man	Federal Taxes	0.3
Boss Man	State Taxes	0.07
Boss Man	Insurance	0.05
Fred Finance	Federal Taxes	0.3
Fred Finance	State Taxes	0.07
Fred Finance	Insurance	0.05

LATERAL VIEW关键字把explode产生的多行或多列变成一个视图，然后再与其他的数据聚合在一起。

hive中的复杂类型struct、array、map

猜你喜欢