Detailed use of Hive's lateral view and explode

About the explode function

explode () is actually a UDTF-user-defined table generation function, and the official definition of the table generation function is to accept zero or more inputs and produce multiple columns or rows of output, just like the meaning of explode, generating explosions The effect is to spread the data.
explode () generally accepts an array of type Array as an input parameter, iterates over the elements in the data, and then returns multiple rows of results. For example: the
select explode(Array(1,2,3)) from  t
returned result is:
Insert picture description here
For better practical use and everyone ’s understanding, I use a production Field format, the format is the crawler data of a website, the following
formatInsert picture description here

This column is a json object, which has many elements, among which the feature element is the feature description, with '' as the separator, the format is as follows.
Insert picture description here
What we want to do is to parse each feature in this feature field into multiple lines, but now The feture is in the json object, so we first call get_json_object to parse the json, as follows:

select
  get_json_object(result_contxt, '$.feature') as feature
from
 testtable
where
  p_day = '20191208'

The query result is:
Insert picture description here

However, the explode or UDTF function has two usage restrictions. First, we cannot normally query other columns from the statement containing UDTF, as follows

select
	explode(split(t.feature,' '))
    ,url
from
(select
  get_json_object(result_contxt, '$.feature') as feature
  ,url
from
testtable
where
  p_day = '20191208'
 ) 

执行sql异常Error while compiling statement: FAILED: SemanticException 1:55 Only a single expression in the SELECT clause is supported with UDTF’s. Error encountered near token 'url’

Second, UDTF cannot be embedded in other functions, as follows

select
	distinct(explode(split(t.feature,' ')))
from
(select
  get_json_object(result_contxt, '$.feature') as feature
  ,url
from
testtable
where
  p_day = '20191208'
 )

执行sql异常Error while compiling statement: FAILED: SemanticException [Error 10081]: UDTF’s are not supported outside the SELECT clause, nor nested in expressions

So the question is, how can you query multiple columns at the same time when using explode or UDTF, then you need the lateral view-side view to achieve the function

as follows

select
	features_1,
    url
from
(select
  get_json_object(result_contxt, '$.feature') as feature
  ,url
from
	testtable
where
  p_day = '20191208'
 ) t
lateral view explode(split(feature,' ')) tempview as features_1

Tempview here is the name of the temporary table, which is related to the above subquery result is Cartesian product. The query result is as follows
Insert picture description here
. The usage of the lateral view is as follows. In combination with other queries, you can also do two-column merging and other operations. Of course, you can also use union all. The specific needs determine the specific use, which also has a certain relationship with personal habits ~

Published 14 original articles · Like1 · Visits 684

Guess you like

Origin blog.csdn.net/qq_33891419/article/details/103297121