Realize row-column conversion based on SparkSql

Row and column conversion is actually a very commonly used data analysis operation, used in data splicing and splitting, to achieve effects that some ordinary functions cannot achieve

Column to row

First of all, I will introduce the column conversion function. The built-in functions involved are: collect_list does not remove duplicates after column conversion, and collect_set removes duplicates after collect_set column conversion. Column to row is an operation to aggregate a column of data, and the data type of this column is required to be string. Examples of use are as follows

The original data is as follows

2018-01,项目1,100
2018-01,项目2,200
2018-01,项目3,300
2018-01,项目3,400
2018-02,项目1,1000
2018-02,项目2,2000
2018-03,项目x,999

sql is as follows

spark.sql("select yue, collect_set(project) projects,sum(shouru) zsr  from sr group by yue").show()

The result is as follows

+-------+---------------+----+
|    yue|       projects| zsr|
+-------+---------------+----+
|2018-03|          [项目x]| 999|
|2018-02|     [项目1, 项目2]|3000|
|2018-01|[项目1, 项目2, 项目3]| 600|

I use collect_set, and the data will be de-duplicated. When you use it, use it according to your needs.

Row to column

Let’s introduce the row-to-column function, involving the built-in function explode

It is necessary to remind everyone that the explode and UDAF one-in-multiple-out functions are different. Although the results of the two are often used in the use of lateral view, the row-to-column function is not as flexible as UDAF in use. It can be said that row-to-column Compared to UDAF, it is a function to initially split a certain field data in a record into a column of data. UDAF can flexibly control input and output, and row to column usually appears at the same time as split because it needs to be passed in. A container data

Data are as follows

A 20 篮球,排球,乒乓球
B 30 跳舞,唱歌
C 23 唱歌,爬山

The statement is as follows

spark.sql("select name,age,t.hobby from sr2 lateral view explode(split(hobby,',')) t as hobby").show()

The results are as follows, the results will be automatically matched vertically

+----+---+-----+
|name|age|hobby|
+----+---+-----+
|  A| 20|   篮球|
|  A| 20|   排球|
|  A| 20|  乒乓球|
|   B| 30|   跳舞|
|   B| 30|   唱歌|
|   C| 23|   唱歌|
|   C| 23|   爬山|
+----+---+-----+

Guess you like

Origin blog.csdn.net/dudadudadd/article/details/114373379