业务需求:输入为JSON字符串,JSON字符串的字段个数不确定,但知道最多可能会有哪些字段,顺序确定的,现在要提取其中的value值并以制表符分割,方便后续存储在HDFS中并便于hive建表:
示例输入:
{ "cjdid": "DZQ10012","rfidId": 21412341234123410,"passTime": 1530135600,"plateColor": "1","rectifyCode": "112412","mergeCode": "123125","eid": "12124315","plateCode":4}
{ "cjdid": "DZQ10012","rectifyCode": "112412","mergeCode": "123125","eid": "12124315","plateCode":4}
输出:DZQ10012 21412341234123410 1530135600 1 112412 123125 12124315 4
DZQ10012 112412 123125 12124315 4
logstash中采用gork进行正则匹配获取相应字段,拼接采用mutate实现(这里的代码与示例输入输出不是对应的,但原理一样)
filter{
grok{
match => {
#正则匹配获取字段值
"message" => '^{("COLLECT_TIME":"?(?<time>\d+)"?,?)?("CONTENT1":"?(?<CONTENT1>\w+)"?,?)?("CONTENT2":"?(?<CONTENT2>\d+)"?,?)?'
}
}
#逐个字段判断拼接
if[time]{
mutate{
replace=>{"message"=>"%{time}"}
remove_field => ["time"]
}
}else{
mutate{
replace=>{"message"=>"null"}
}
}
if[CONTENT1]{
mutate{
replace => { "message" =>"%{message} %{CONTENT1}" }
remove_field => ["CONTENT1"]
}
}else{
mutate{
replace => { "message" =>"%{message} null" }
}
}
if[CONTENT2]{
mutate{
replace=>{"message" => "%{message} %{CONTENT2}"}
remove_field => ["CONTENT2"]
}
}else{
mutate{
replace=>{"message" => "%{message} null"}
}
}
}