-scenes to be used
In the actual collection scenario, the file fields that are actively collected or passively pushed may contain json strings. Most of these are crawler data or log data. Hive provides a type of json parsing function to preprocess and clean such data. This article introduces the get_json_object function
-data preparation
At this time, there is a DWD table in DW, which contains the fields of the json object (this table actually still belongs to the fact table of a type of business process, but the field contains the json object, so it is stored in the DWD layer, not in the ODS Do processing), here only the json field
is a json description of the house information
-Instructions
get_json_object(column,"$.param")
The parameter column is the field to be parsed. In this example, the
second parameter param of result_contxt needs to be divided. If the field content is a json array, use [n] .key, which means the key value of the number of json objects. A single json object is directly
. The key is a single json object in the above data, so use the second method to parse
.
select
get_json_object(result_contxt, "$.house_type2") as house_type2
from
库名.表名
where
p_day = '20191127'
The query result is:
all elements are parsed:
select
get_json_object(result_contxt,'$.house_type2')as house_type2,
get_json_object(result_contxt,'$.house_type1')as house_type1,
get_json_object(result_contxt,'$.build_type')as build_type,
get_json_object(result_contxt,'$.house_belong')as house_belong,
get_json_object(result_contxt,'$.house_direction')as house_direction,
get_json_object(result_contxt,'$.years_right')as years_right,
get_json_object(result_contxt,'$.title')as title,
get_json_object(result_contxt,'$.last_tran')as last_tran,
get_json_object(result_contxt,'$.unit_price')as unit_price,
get_json_object(result_contxt,'$.belong_area2')as belong_area2,
get_json_object(result_contxt,'$.build_year')as build_year,
get_json_object(result_contxt,'$.tran_time')as tran_time,
get_json_object(result_contxt,'$.lift')as lift,
get_json_object(result_contxt,'$.square_measure2')as square_measure2,
get_json_object(result_contxt,'$.square_measure1')as square_measure1,
get_json_object(result_contxt,'$.poi_id')as poi_id,
get_json_object(result_contxt,'$.total_price')as total_price,
get_json_object(result_contxt,'$.url')as url,
get_json_object(result_contxt,'$.floor')as floor,
get_json_object(result_contxt,'$.decorate')as decorate,
get_json_object(result_contxt,'$.village_name')as village_name,
get_json_object(result_contxt,'$.time')as time,
get_json_object(result_contxt,'$.area_county')as area_county
from
库名.表名
where p_day = '20191127'
Query results:
Other problems are introduced: If the field is a json array, how to parse the entire json object?
If you are unclear, please see another introduction to the use of the explode function.