Hive (27): join query

1 A review of the join concept

According to the three-paradigm design requirements of the database and daily work habits, we usually do not design a large table to put all types of data together, but design different table storage for different types of data. For example, when designing an order data table, the customer number can be used as a foreign key to establish a corresponding relationship with the order table. It is not possible to add fields about other customer information (such as name, company, etc.) to the order form.

In this case, sometimes it is necessary to query based on multiple tables to get the final and complete results. The appearance of the join syntax in SQL is used to combine the columns from two or more tables based on the relationship between the columns in these tables. Query data, so sometimes in order to get complete results, we need to perform join.

As an analysis-oriented data warehouse software, Hive also implements the join syntax in order to better support rich data analysis functions. Overall, it is similar to the join syntax in RDBMS, but has its own characteristics in some points. Special attention is required.

2 Hive join syntax

In Hive, the current version 3.1.2 supports a total of 6 join syntaxes. They are:

Inner join (inner connection), left join (left connection), right join (right connection), full outer join (full outer connection), left semi join (left half-open connection), cross join (cross connection, also called Cartesian product).

2.1 Rule tree

join_table:
    t

Guess you like

Origin blog.csdn.net/u013938578/article/details/131688758