HiveSQL finishing

1. json string processing function json_tuple

// create the external table read the entire string json
use hive_01;
Create External Table weibo_json (json String) LOCATION '/ usr / Test / weibo_info';
// load data
load data local inpath '/ usr / test / testdate / weibo 'INTO the table weibo_json;
the SELECT * from weibo_json;
// create an internal table statistics do microblogging
the create the table weibo_info (
beCommentWeiboId String,
beForwardWeiboId String,
catchTime String,
commentCount int,
Content String,
createTime String,
INFO1 String,
INFO2 String,
Info3 String,
String mlevel,
musicurl String,
pic_list String,
praiseCount int,
reportCount int,
Source String,
userId String,
videoURL String,
weiboId string,
weiboUrl string) row format delimited fields terminated by '\t';

// json_tuple ( 'character string', json string corresponding to each field name)

 

 

 // string taken to load the data table weibo_info

insert overwrite table weibo_info select json_tuple(substring(a.json,2,length(a.json)-2),"beCommentWeiboId",
"beForwardWeiboId","catchTime","commentCount","content","createTime","info1","info2","info3","mlevel",
"musicurl","pic_list","praiseCount","reportCount","source","userId","videourl","weiboId","weiboUrl") from weibo_json a;

2. Other operations

(Data not before the new partition) // Modify partition
alter table test_partition partition (year = 2016 ) set location '/user/hive/warehouse/new_part/hive_01.db/test_partition/year=2016';

// modify configuration information table synchronized to the meta information database
msck repair table test_partition;

// timestamp converted to date format

FROM_UNIXTIME (timestamp, 'yyyy-MM');

FROM_UNIXTIME (timestamp, 'yyyy-MM-dd');

// with complex data type table

Table Test02 Create (
ID int,
name String,
Hobby Array <String>, // array type
decs struct <age: int, addr : string>, // the type of the object corresponding to the data set in accordance with the array into
others map <string, string >) // map type, is stored K, V
row DELIMITED the format
Fields terminated by ',' symbols // column divided
COLLECTION ITEMS tERMINATED by ':' // delimiter array
MAP KEYS tERMINATED by '-'; / / set delimiter

// type of complex data queries

select id,name,hobby[1],struct.age,map[k值] from test02;

 

 

En: inner join on (to return both tables of data that meets the condition)

Left outer: left outter jon on (the left table reference data is returned, there is no right table return null)

Right outer join: right outter jon on (to the right as a reference table, the data returns, the table is not left returns null)

Full connection: full outter join on (two tables of data are returned, the equivalent of two entire query table)

Left connection: left semi join on (only return data left table, right table can set up filters on the words, the equivalent of IN / EXISTS subqueries)

select count(u.uid) from user_login_info u left semi join weibo_info w on u.uid=w.userid;

 

3. Sort

sort by partial ordering, each internal Reducer sorted, if desired, the final results are combined into a global ranking result, only a merge sort can then be set according to the actual number Reducer

set mapreduce.job.reduces = 5; // Set the number of the Reducer (default is 3)

Distribute and use by sort by sequencing is performed by the packet field set to distribute KEY, HASH data is distributed to different reducer machine will then sort by locally with each data sorted on a machine reducer.

If the field and distribute by sort by use of the same, may be abbreviated to cluster by specifying both the column used, only the default data sorted in ascending order. Not allowed to specify ASC / DESC

 

Guess you like

Origin www.cnblogs.com/TFE-HardView/p/11486868.html