Briefly understand the data transmission of front-end and back-end
data structure mapping
(1) Assume that a table has the following row, and we use JSON format to represent its data structure. The format accessed under Hive is
{
"name": "songsong",
"friends": ["bingbing" , "lili"] , //列表Array,
"children": {
//键值Map,
"xiao song": 19 ,
"xiaoxiao song": 18
}
"address": {
//结构Struct,
"street": "hui long guan" ,
"city": "beijing"
}
}
(2) Based on the above data structure, we create the corresponding table in Hive and import the data.
Create a local test file personInfo.txt in the directory /opt/module/hive/datas
[atguigu@hadoop102 datas]$ vim personInfo.txt
songsong,bingbing_lili,xiao song:18_xiaoxiao song:19,hui long guan_beijing
yangyang,caicai_susu,xiao yang :18_xiaoxiao yang:19,chao yang_beijing
Note: The relationships between elements in MAP, STRUCT and ARRAY can all be represented by the same character, here "_" is used.
Test Case
(1) Create test table personInfo on Hive
hive(default)>create table personInfo (
name string,
friends array<string>,
children map<string, int>,
address struct<street:string, city:string>
)
row format delimited
fields terminated by ','
collection items terminated by '_'
map keys terminated by ':'
lines terminated by '\n';
Specify the delimiter of the row format in the data file.
Use ',' to separate
the specified fields. Use '_' to separate the elements of
the specified collection type. Specify the key and value in the map type to use ':' to separate the
specified lines. The symbol is '\n'
(2) Upload the data to the corresponding path of the above table in hdfs
[atguigu@hadoop102 ~]$ hadoop fs -put /opt/module/hive/datas/personInfo.txt /user/hive/warehouse/personInfo;
(3) Access data in three collection columns. The following are the access methods of ARRAY, MAP, and STRUCT.
hive (default)>
select
friends[1],
children['xiao song'],
address.city
from personInfo
where name="songsong";
结果:
_c0 _c1 city
lili 18 beijing