The big vernacular explains the big data hive knowledge points, Lao Liu is really attentive (3)

Insert picture description here

Foreword: Lao Liu dare not say how good his writing is, but he dare to make sure to explain the content of his review in great vernacular as much as possible, and refuse to use mechanical methods in the materials to have his own understanding!

1. Hive knowledge points (3)

Starting from this article, I decided to make some changes. Lao Liu mainly shared the key knowledge points of each module of big data on his blog, and explained these key contents in detail. The complete knowledge points of each module are shared on the public account: Hardworking Lao Liu. When there is a chance, use the video method to analyze and summarize the knowledge points shared each time, and then post an article for detailed explanation.

Insert picture description here
Now let’s start the main text, it’s the same sentence. Although these are commonly used functions of hive, many people don’t care, but we will encounter many business needs to use these functions in daily development. We must at least be familiar with some commonly used functions. .

The explode, row-to-column, and column-to-row in this article are the key points, and you need to master their examples. Due to hive's preference for practical operations, Lao Liu only explained these key knowledge points in detail.

2. Lateral view and explode in hive

Why is explode used?

In the actual development process, many complex array or map structures will be encountered. We will split these complex structures from one column into multiple rows according to some business requirements. At this time, we need to use explode. Maybe you still don't understand it very well. Let's just give an example to illustrate this exploit. Be sure to practice with Lao Liu. If you can't practice, it's like learning nothing!

需求:现在有数据格式如下
zhangsan    child1,child2,child3,child4 k1:v1,k2:v2
lisi    child5,child6,child7,child8  k3:v3,k4:v4

字段之间使用\t分割,需求将所有的child进行拆开成为一列

+----------+--+
| mychild  |
+----------+--+
| child1   |
| child2   |
| child3   |
| child4   |
| child5   |
| child6   |
| child7   |
| child8   |
+----------+--+

将map的key和value也进行拆开,成为如下结果

+-----------+-------------+--+
| mymapkey  | mymapvalue  |
+-----------+-------------+--+
| k1        | v1          |
| k2        | v2          |
| k3        | v3          |
| k4        | v4          |
+-----------+-------------+--+

The first step: we first create a database, and use the database just created

create database hive_explode;
use hive_explode;

Step 2: After creating the database, we must start to create the hive table

createtable hive_explode.t3(name string,children array<string>,address Map<string,string>) 
row format delimited fields terminated by '\t'  
collection items terminated by ','  
map keys terminated by ':' stored as textFile;

Attention, everyone must look carefully. Due to our needs, we need to create the name, which is in string form, but the child is in array form, and the address is in map form. Although Lao Liu didn’t talk about it, this one is It's really important. According to the separators in these compound functions, our split code is written like this:

The usual space form is

row format delimited fields terminated by '\t' 

The split form in the array is

collection items terminated by ',' 

The split form in the map is

map keys terminated by ':' 

Please remember the difference here!

Step 3: Load data

cd  /kkb/install/hivedatas/

vim maparray
数据内容格式如下
zhangsan    child1,child2,child3,child4 k1:v1,k2:v2
lisi    child5,child6,child7,child8 k3:v3,k4:v4

Then use hive to load the data

load data local inpath '/kkb/install/hivedatas/maparray' into table hive_explode.t3;

After we import the data, we can look at the situation in the table.

Insert picture description here
Step 4: Before importing the data into the table, the next step is to burst the data

Split all children into one column

SELECT explode(children) AS myChild FROM hive_explode.t3;

Insert picture description here
Then split the key and value of the map

SELECT explode(address) AS (myMapKey, myMapValue) FROM hive_explode.t3;

Insert picture description here
Since the lateral view is often used in row-to-column and column-to-row, we will not talk about the lateral view alone.

3. Row to column

The first thing to say about row-to-column and column-to-row is that they are very, very important, and there will be many needs for column conversion.

However, row-to-column and column-to-row do not convert one row to one column and one column to one row. Many materials have their own opinions on row-to-column and column-to-row, and they are often the opposite.

Lao Liu came from Shang Silicon Valley. Let's ignore this concept and just figure out their usage.

Row to column: It means to change the data in multiple columns into one column.

Use an example to demonstrate the row to column.

Insert picture description here
Then it is to classify people with the same constellation and blood type together, and the results are as follows:

射手座,A            老王|冰冰
白羊座,A            孙悟空|猪八戒
白羊座,B            宋宋

This also involves the concat function, let's talk about the connection function first:

concat(): returns the result of the concatenated input string, supports any number of input strings;

concat_ws(): This is to add a separator between the connected strings;

collect_set(): De-duplicate a field and generate an array type field.

Next, all we have to do is to create a table to import data.

1. Create a file, pay attention to the data using \t to split

cd /kkb/install/hivedatas
vim constellation.txt

孙悟空    白羊座 A
老王    射手座 A
宋宋    白羊座 B       
猪八戒    白羊座 A
凤姐    射手座 A

2. Create hive table and load data

create table person_info(name string,constellation string,blood_type string)  
row format delimited fields terminated by "\t";

3. Load data

load data local inpath '/kkb/install/hivedatas/constellation.txt' into table person_info;

You can query the situation after importing the table, select * from person_info.

Insert picture description here
4. Query data

Note that, according to our needs, the query result requires concat_ws operation.

select t1.base, concat_ws('|', collect_set(t1.name)) name from (select name, concat(constellation, "," , blood_type) base from person_info) t1 group by t1.base;

Lao Liu explained that since the constellation and blood type are connected by a comma, we should write the code like this

concat(constellation, "," , blood_type)

The next step is to find out all the people based on the condition that the constellation is the same as the blood type.

select name, concat(constellation, "," , blood_type) base from person_info

Name this temporary table t1, and then convert multiple rows to one row according to the needs of the people who are queried. Since the names are connected by |, we should write the code like this.

concat_ws('|', collect_set(t1.name))

The final query result is like this

select t1.base, concat_ws('|', collect_set(t1.name)) name from t1 group by  t1.base;

Insert picture description here

4. Column to Row

In column to row, two very important functions are involved: explode and lateral view.

explode: Split the complex array or map structure in a hive column into multiple rows.

Lateral view: Generally used to split a row of data into multiple rows of data, on this basis, the split data can be aggregated.

for example:

The data content is as follows, and the fields are divided by \t

cd /kkb/install/hivedatas

vim movie.txt
《疑犯追踪》    悬疑,动作,科幻,剧情
《Lie to me》    悬疑,警匪,动作,心理,剧情
《战狼2》    战争,动作,灾难

Expand the array data in the movie classification, and the results are as follows:

《疑犯追踪》    悬疑
《疑犯追踪》    动作
《疑犯追踪》    科幻
《疑犯追踪》    剧情
《Lie to me》    悬疑
《Lie to me》    警匪
《Lie to me》    动作
《Lie to me》    心理
《Lie to me》    剧情
《战狼2》    战争
《战狼2》    动作
《战狼2》    灾难

This is a typical one-line conversion to multiple lines, using a combination of lateral view and explode.

The first step we need to do is to create a table based on the characteristics of the table and the characteristics of the data in the table. The category should be created as an array type.

create table movie_info(movie string, category array<string>) 
row format delimited fields terminated by "\t" 
collection items terminated by ",";

Next is to load the data

load data local inpath "/kkb/install/hivedatas/movie.txt" into table movie_info;

Finally, query the table according to the needs. Since the category is of the array type, now you need to explode the category with the lateral view, and then you can query the data.

select movie, category_name  from  movie_info 
lateral view explode(category) table_tmp as category_name;

Among them, table_tmp is the table name, and category_name is the column name.

Insert picture description here

5. Summary

Lao Liu mainly talked about row-to-column and column-to-row, as well as the two functions explode and lateral view, and demonstrated them with two cases respectively. Everyone must practice with the cases. Just don’t practice. Bai Xue.

Finally, the complete hive knowledge point (3) is in the public account: Lao Liu who works hard. If you feel that there is something bad or wrong, you can contact Lao Liu for communication. I hope to be helpful to students who are interested in big data development, and hope to get their guidance.

If you think the writing is good, give Lao Liu a thumbs up!

Guess you like

Origin blog.csdn.net/qq_36780184/article/details/111410134