Solve FAILED: UDFArgumentException explode() takes an array or a map as a parameter and understand the explode function and side view


Solve FAILED: UDFArgumentException explode() takes an array or a map as a parameter and understand the explode function and side view


一、解决 FAILED: UDFArgumentException explode() takes an array or a map as a parameter

1. Background

After data processing in the project, all the results obtained from the query are as follows. The genre field type is actually a string (the result of business logic processing). When displayed visually, it looks like an array.

SELECT
    get_json_object(map_col,'$.game_name') game_name,
    get_json_object(map_col,'$.genre') genre
FROM
	ods_crawler_table                         
WHERE
	dt = '2023-02-26'       
	AND get_json_object(map_col,'$.code') = 'xxx'

2. Next, you want to use the explode function to convert one line of the genre field into multiple lines.

SELECT
	explode(get_json_object(map_col,'$.genre')) genre 
FROM
	ods_crawler_table                          
WHERE
	dt = '2023-02-26'       
	AND get_json_object(map_col,'$.code') = 'xxx'	

3. Report an error

Error message: FAILED: UDFArgumentException explode() takes an array or a map as a parameter

4. Analyze the reasons:

① This genre only looks like an array on the surface. In fact, after business processing, it is actually a String type.
② And it is a String type with redundant [].
③ It needs to be processed into an array type. The premise is to first remove the redundant [] is removed, and then the split method of String is used to return an array.

split(regexp_replace(get_json_object(map_col,'$.genre'), '\\[|\\]', ''), ",")

------------------------------------------------------------------------------------------------------------------
SELECT
	split(regexp_replace(get_json_object(map_col,'$.genre'), '\\[|\\]', ''), ",") genre 
FROM
	ods_crawler_table                        
WHERE
	dt = '2023-02-26'       
	AND get_json_object(map_col,'$.code') = 'xxx'

The result is as follows:

Try the explode function explosion effect again:

SELECT
    explode(split(regexp_replace(get_json_object(map_col,'$.genre'), '\\[|\\]', ''), ","))  genre 
FROM
	ods_crawler_table                          
WHERE
	dt = '2023-02-26'       
AND get_json_object(map_col,'$.code') = 'xxx'

So far it has successfully exploded~~~~~~~~~~~~~

5. Query explode(genre) together with other fields

  • For actual business, the fields game_name and genre must be queried.
SELECT
    get_json_object(map_col,'$.game_name') game_name,
    explode(split(regexp_replace(get_json_object(map_col,'$.genre'), '\\[|\\]', ''), ","))  genre 
FROM
	ods_crawler_table                           
WHERE
	dt = '2023-02-26'       
AND get_json_object(map_col,'$.code') = 'xxx'	

Error reported:

  • 报错信息:UDTF’s are not supported outside the SELECT clause, nor nested in expressions

Analysis: The reason is that this field genre, after exploding, is converted into multiple columns (3 columns), while the game_name field is still 1 column, and the number of columns does not match.

Solution: Aggregation of side views (tables)

ods_crawler_table -- 原先的表
LATERAL VIEW -- 聚合(本质上就是笛卡尔乘积)
explode(split(regexp_replace(get_json_object(map_col,'$.genre'), '\\[|\\]', ''), ",")) v -- 炸裂后作为一个表,两个表聚合之后成v表
as genre -- 是炸裂函数explode(split(regexp_replace(get_json_object(map_col,'$.genre'), '\\[|\\]', ''), ","))的别名


------------------------------------------------------------------------------------------------------------------------
SELECT
    get_json_object(map_col,'$.game_name') game_name,
    genre 
FROM
	ods_crawler_table      
LATERAL VIEW explode(split(regexp_replace(get_json_object(map_col,'$.genre'), '\\[|\\]', ''), ",")) v as genre		
WHERE
	dt = '2023-02-26'       
    AND get_json_object(map_col,'$.code') = 'xxx'

Aggregation effect:

Small details: lateral view+explode, when used together, has two aliases

  • The first alias is the alias of the two table aggregation, and the second alias is the alias of the burst function.

  • grammar

    tableA
    LATERAL VIEW -- 聚合
    explode(fieldB) v -- v 别名:tableA 和自身表使用了explode函数后得到的那个表,进行聚合后得到的新表
    as b -- b 别名:explode函数使用后起的别名
    



2. Understand the burst function and side view

1、explode :

(1) Function: Convert one row of data into multiple columns of data, used for array and map type data.

(2) Grammar and examples:

explode (array)

-- 炸裂字段array('A','B','C')
select explode(array('A','B','C')) as col;
col
A
B
C

explode (map)

-- 炸裂字段 map('A', 10, 'B',10, 'C',10)
select explode(map('A', 10, 'B',10, 'C',10)) as (key, value);
key value
A 10
B 20
C 30

posexplode (array)

select posexplode(array('A','B','C')) as (pos,val);
pos val
0 A
1 B
2 C

(3) Disadvantages of the burst function:

After a field is exploded, the number of columns in that field does not match the remaining fields in the table, and the exploded field cannot be jointly queried with other fields in the table.

Solution: lateral view


2、lateral view

(1) Function: Used in conjunction with UDTF to solve the problem that additional select columns cannot be added using the UDTF function alone.

Lateral view will put the results generated by UDTF into a virtual table, and then this virtual table will be joined with the input rows to achieve the purpose of connecting the select fields outside the UDTF.

(2) Grammar and examples:

□ Grammar:
tableA
LATERAL VIEW -- 聚合
explode(fieldB) v -- v 别名:tableA 和自身表使用了explode函数后得到的那个表,进行聚合后得到的新表
as b -- b 别名:explode函数使用后起的别名
□ Example:

The original table t query results are as follows:

select festival,good_name from tableA;
festival good_name
Dragon Boat Festival Dragon Boat Festival, Dragon Boat Festival, Chongwu Festival
Mid-Autumn Festival Reunion, worship the moon
Spring Festival new year, new year, new year's day
-- 使用explode 函数
select explode(good_name) as good_name from tableA;
good_name
Duanyang
dragon boat
Chongwu
reunion
festival month
new year
new year
New Year's Day
  • The effect you want to achieve: aggregation of festival names and aliases. When using the explode function, query directly with the field festival. 会报错

    select festival, explode(another_name) from tableA;
    
  • 报错: UDTF's are not supported outside the SELECT clause, nor nested in expressions, meaning: explode (UDTF), and cannot be used directly with other fields.

  • 解决: Use lateral view

    select festival, good_name 
    from tableA
    LATERAL VIEW explode(good_name) v 
    	as good_name;	
    
  • Effect:

festival another_name1
Dragon Boat Festival Duanyang
Dragon Boat Festival dragon boat
Dragon Boat Festival Chongwu
Mid-Autumn Festival reunion
Mid-Autumn Festival festival month
Spring Festival new year
Spring Festival new year
Spring Festival New Year's Day





If this article is helpful to you, please remember to give Yile a like, thank you!

Guess you like

Origin blog.csdn.net/weixin_45630258/article/details/129241233