hive用正则表达式的方式使用多字节切割符号

hive默认是不支持多字节切割符号的.
我采用了正则表达式的方式来实现.
数据类似:

1<;>1<;>1<;>PC COOKIE<;>99<;>1024<;><;>2013/07/28
39<;>1<;>1<;>PC手机<;>97<;>272<;>8<;>2013/07/28



我的建表语句
CREATE TABLE business1(
downloads string, 
uniqdownloads string, 
uniqimsis string, 
weightname string, 
porttype string,
subporttype string, 
action string,
addtime string
)
ROW FORMAT  SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'  WITH SERDEPROPERTIES  
("input.regex" = "(.*)\\<\\;\\>(.*)\\<\\;\\>(.*)\\<\\;\\>(.*)\\<\\;\\>(.*)\\<\\;\\>(.*)\\<\\;\\>(.*)\\<\\;\\>(.*)",
 "output.format.string" = "%1$s %2$s %3$s %4$s %5$s %6$s %7$s %8$s")  
STORED AS TEXTFILE  
LOCATION
'hdfs://nameservice1/user/jk/business1';

猜你喜欢

转载自huangyunbin.iteye.com/blog/1918030