[Big Data] Hive Series - Hive-The most commonly used row-to-column CONCAT/column-to-row EXPLODE usage detailed explanation for business

row to column

Description of related functions

CONCAT(string A/col, string B/col...): Returns the result of concatenating the input strings, supporting any number of input strings;

CONCAT_WS(separator, str1, str2,…): It is a special form of CONCAT(). The separator between the first parameter and the remaining parameters. The delimiter can be a string like the rest of the arguments. If the delimiter is NULL, the return value will also be NULL. This function skips any NULL and empty strings after the delimiter parameter. The delimiter will be added between the concatenated strings;

注意: CONCAT_WS must be "string or array

COLLECT_SET(col): The function only accepts basic data types, and its main function is to deduplicate and summarize the value of a certain field to generate an Array type field.

data preparation

name constellation blood_type
Zhang San Aries A
Li Si Virgo B
Wang Wu Taurus A
Zhu Liu Aries B
Chen Qi Taurus A
Xu Ba Aries A

need

Group people with the same zodiac sign and blood type. The result is as follows:

白羊座,A 	张三|许八
处女座,B		李四
白羊座,B		朱六
金牛座,A		王五|陈七

Create hive table and import data

hive (default)> create table person_info( name string, constellation string, blood_type string)
row format delimited fields terminated by "\t";

hive (default)> load data local inpath "/data/person_info.txt" into table person_info;

Query data on demand

hive (default)> SELECT
t1.c_b, CONCAT_WS("|",collect_set(t1.name))
FROM (
	SELECT
	NAME,
	CONCAT_WS(',', constellation, blood_type) c_b FROM person_info
) t1
GROUP BY t1.c_b

column wrapping

function description

EXPLODE(col): split the complex Array or Map structure in one column of hive into multiple rows.

LATERAL VIEW
Usage: LATERAL VIEW udtf(expression) tableAlias ​​AS columnAlias
​​Explanation: used with UDTF such as split, explode, etc., it can split a column of data into multiple rows of data, and on this basis, the split data can be aggregated .

data preparation

movie category
suspect tracking Suspense, Action, Sci-fi, Drama
Lie to me Suspense, Cops, Action, Psychology, Drama
wolf warrior 2 war, action, disaster

need

Expand the array data in the movie category. The result is as follows:

疑犯追踪	悬疑
疑犯追踪	动作
疑犯追踪	科幻
疑犯追踪	剧情
Lie to me	悬疑
Lie to me	警匪
Lie to me	动作
Lie to me	心理
Lie to me	剧情
战狼 2	战争
战狼 2	动作
战狼 2	灾难

Create hive table and import data

hive (default)> create table movie_info( movie string, category string)
row format delimited fields terminated by "\t";

load data local inpath "/data/movie.txt" into table movie_info;

Query data on demand

hive (default)> SELECT
movie, category_name
FROM
movie_info lateral VIEW
explode(split(category,",")) movie_info_tmp AS category_name;

I hope it will be helpful to you who are viewing the article, remember to pay attention, comment, and favorite, thank you

Guess you like

Origin blog.csdn.net/u013412066/article/details/129541979