Splicing function and "pit avoidance" frequently used by hive

Splicing function and "pit avoidance" frequently used by hive

When it comes to the application scenarios and frequency of use of splicing functions, it is still very high. For example, an employee plays multiple roles in the company. When we store data in the bottom layer, we usually have multiple lines, but when we apply it, we usually only need one line, and the role field is spliced. , so that when joining other tables, the data will not be repeatedly referenced for calculation.

1. Splicing multiple strings concat_null(…)

From the description in the above figure, we can see that in the application scenario, concat_null(…) is usually used to prevent splicing exceptions caused by the existence of null.

2. Concat multiple strings with delimiters concat_ws(…)

It should be noted that concat_ws(...) can directly handle null, and if the sep separator is used as '' (null value), the function is the same as concat_null(...).

3. collect_set and collect_list realize column switching

Since collect_set performs deduplication processing when implementing column transfer, it is natural that more applications will be used in practical applications.

4. "Avoid" null and non-string fields

Because in the actual production environment, the engines we use are different, and the compatibility is also different. Sometimes NULL and non-string fields can also cause problems when querying.

1. Usually we can replace null. if(field name is null,'',field name)

2. Convert non-string to string cast (field name as string)

`SELECT  id,     concat_ws(',',collect_set(         cast(if(角色 is null,'',角色) as string)         )) AS `角色`    
      -- 将每个id对应多个的角色去重组合放到一行,并使用英文','分隔角色   
FROM emp   WHERE dt = '20230618'   GROUP BY 1   `

Guess you like

Origin blog.csdn.net/qq_34160248/article/details/132259304