Hive's get_json_object function

-scenes to be used

In the actual collection scenario, the file fields that are actively collected or passively pushed may contain json strings. Most of these are crawler data or log data. Hive provides a type of json parsing function to preprocess and clean such data. This article introduces the get_json_object function

-data preparation

At this time, there is a DWD table in DW, which contains the fields of the json object (this table actually still belongs to the fact table of a type of business process, but the field contains the json object, so it is stored in the DWD layer, not in the ODS Do processing), here only the json field
Insert picture description hereis a json description of the house information

-Instructions

get_json_object(column,"$.param")

The parameter column is the field to be parsed. In this example, the
second parameter param of result_contxt needs to be divided. If the field content is a json array, use [n] .key, which means the key value of the number of json objects. A single json object is directly
. The key is a single json object in the above data, so use the second method to parse
.

select
  get_json_object(result_contxt, "$.house_type2") as house_type2
from
  库名.表名
where
  p_day = '20191127'

The query result is:
Insert picture description hereall elements are parsed:

select 
get_json_object(result_contxt,'$.house_type2')as house_type2, 
get_json_object(result_contxt,'$.house_type1')as house_type1, 
get_json_object(result_contxt,'$.build_type')as build_type, 
get_json_object(result_contxt,'$.house_belong')as house_belong, 
get_json_object(result_contxt,'$.house_direction')as house_direction, 
get_json_object(result_contxt,'$.years_right')as years_right, 
get_json_object(result_contxt,'$.title')as title, 
get_json_object(result_contxt,'$.last_tran')as last_tran, 
get_json_object(result_contxt,'$.unit_price')as unit_price, 
get_json_object(result_contxt,'$.belong_area2')as belong_area2, 
get_json_object(result_contxt,'$.build_year')as build_year, 
get_json_object(result_contxt,'$.tran_time')as tran_time, 
get_json_object(result_contxt,'$.lift')as lift, 
get_json_object(result_contxt,'$.square_measure2')as square_measure2, 
get_json_object(result_contxt,'$.square_measure1')as square_measure1, 
get_json_object(result_contxt,'$.poi_id')as poi_id, 
get_json_object(result_contxt,'$.total_price')as total_price, 
get_json_object(result_contxt,'$.url')as url, 
get_json_object(result_contxt,'$.floor')as floor, 
get_json_object(result_contxt,'$.decorate')as decorate, 
get_json_object(result_contxt,'$.village_name')as village_name, 
get_json_object(result_contxt,'$.time')as time, 
get_json_object(result_contxt,'$.area_county')as  area_county
from 
库名.表名
where p_day = '20191127'

Query results:
Insert picture description here
Other problems are introduced: If the field is a json array, how to parse the entire json object?
If you are unclear, please see another introduction to the use of the explode function.

Published 14 original articles · Like1 · Visits 684

Guess you like

Origin blog.csdn.net/qq_33891419/article/details/103297105