1. The splicing of the concat_ws() function and the concat() function is extremely different
1.1 Differences
concat(): When the function concatenates strings, as long as one of them is NULL, it will return NULL
执行代码:
select concat('a','b',null);
执行结果:
NULL
concat_ws() : When the function concatenates stringsconcat_ws(): The function needs to specify the delimiter .
执行代码1:
hive>
select concat_ws('-','a','b');
执行结果:
a-b
执行代码2:
hive>
select concat_ws('-','a','b',null);
执行结果:
a-b
执行代码3:
hive>
select concat_ws('','a','b',null);
执行结果:
ab
2. The difference between collect_set() unordered and collect_list()
Reference link: SQL small knowledge point series-3-collect_list/collect_set (column transfer) - Know almost
2.1 Differences:
They all convert a column in the group into an array and return it.
The difference is that collect_list does not deduplicate and collect_set deduplicates
2.2 The collect_list() function is ordered without deduplication
2.3 collect_set() unordered deduplication
After grouping according to a certain field, use the collect_list() function to merge the data in a group together. The default separator is ',' such as
a | b | c |
---|---|---|
1 | 1 | “1”,“2” |
1 | 2 | “1”,"2” |
1 | 2 | “1”,“2”,“2” |
2.4 Examples
Raw temp data
id | class |
---|---|
loongshaw | 1 |
loongshaw | 2 |
loongshaw | 3 |
loongshaw | 4 |
expected value
id | class |
---|---|
loongshaw | 1,2,3,4 |
Enter code:
select
t.id,
concat_ws(',', collect_set(t.class))
from
temp t
group by
t.id
As a result, the class is not in order after merging
id | class |
---|---|
loongshaw | 1,3,2,4 |
Solution:
Change collect_set unordered collection to collect_list or sort_array for sorting.
concat_ws(',', sort_array(collect_set(t.class), false))
sort_array(e: column, asc: boolean) sorts the elements in the array (natural sorting), the default is asc.
or:
concat_ws(',',collect_list(t.class))
As a result, the classes are merged and ordered
id | class |
---|---|
loongshaw | 1,2,3,4 |