Hive basic query statement

Display field name when hive query

  • Run the following code to query and see the field name
set hive.cli.print.header=true;

insert image description here
insert image description here

  • You can also remove the table name
set hive.resultset.use.unique.column.names=false;

insert image description here

Full table and specific column queries

  • Query statement syntax:
SELECT [ALL | DISTINCT] select_expr, select_expr, ...
FROM table_reference
[WHERE where_condition]
[GROUP BY col_list]
[ORDER BY col_list]
[CLUSTER BY col_list
| [DISTRIBUTE BY col_list] [SORT BY col_list]
]
[LIMIT number]

basic query

Full table and specific column queries

  • data preparation

Create two files under / root/hivedata

insert image description here

  • dept:
10      ACCOUNTING      1700
20      RESEARCH        1800
30      SALES   1900
40      OPERATIONS      1700
  • emp:
7369    SMITH   CLERK   7902    1980-12-17      800.00          20
7499    ALLEN   SALESMAN        7698    1981-2-20       1600.00 300.00  30
7521    WARD    SALESMAN        7698    1981-2-22       1250.00 500.00  30
7566    JONES   MANAGER 7839    1981-4-2        2975.00         20
7654    MARTIN  SALESMAN        7698    1981-9-28       1250.00 1400.00 30
7698    BLAKE   MANAGER 7839    1981-5-1        2850.00         30
7782    CLARK   MANAGER 7839    1981-6-9        2450.00         10
7788    SCOTT   ANALYST 7566    1987-4-19       3000.00         20
7839    KING    PRESIDENT               1981-11-17      5000.00         10
7844    TURNER  SALESMAN        7698    1981-9-8        1500.00 0.00    30
7876    ADAMS   CLERK   7788    1987-5-23       1100.00         20
7900    JAMES   CLERK   7698    1981-12-3       950.00          30
7902    FORD    ANALYST 7566    1981-12-3       3000.00         20
7934    MILLER  CLERK   7782    1982-1-23       1300.00         10	
  1. Create department table
create table if not exists dept(depton int,dname string,loc int)
row format delimited fields terminated by '\t';

insert image description here

  1. Create employee table
create table if not exists emp(empno int,ename string,job string,mgr int,hiredate string,sal double,comm double,deptno int)
row format delimited fields terminated by '\t';

insert image description here

  1. Import Data
load data local inpath '/root/hivedata/dept.txt' into table dept;
load data local inpath '/root/hivedata/emp.txt' into table emp;

insert image description here

Full table query

select * from emp;

insert image description here

Select specific column query

select empno,ename from emp;

insert image description here

Notice:

  • (1) The SQL language is case insensitive.
  • (2) SQL can be written in one or more lines
  • (3) Keywords cannot be abbreviated and cannot be divided into lines
  • (4) Each clause should generally be written on a separate line.
  • (5) Use indentation to improve the readability of the statement.

column alias

  • 1) Rename a column
  • 2) Easy to calculate
  • 3) Immediately following the column name, you can also add the keyword 'AS' between the column name and the alias

Case Practice

  1. Query name and department
select ename as name,deptno dn from emp;

insert image description here

Arithmetic operators (commonly used)

operator describe
A+B Add A and B
A-B A minus B
A/B A divided by B
A*B A and B are multiplied
  • Case practice: add 1 to display after querying the salaries of all employees.
select sal +1 from emp;

insert image description here

Common functions

Find the total number of rows (count)

select count(*) cnt from emp;

insert image description here

Find the maximum salary (max)

select max(sal) max_sal from emp;

insert image description here

Find the minimum value of wages (min)

select min(sal) min_sal from emp;

insert image description here

Find the sum of salaries (sum)

select sum(sal) sum_sal from emp;

insert image description here

Find the average salary (avg)

select avg(sal) avg_sal from emp;

insert image description here

Limit statement

  • Typical queries return multiple rows of data. The LIMIT clause is used to limit the number of rows returned.
select * from emp limit 5;

insert image description here

where statement

  1. Use the WHERE clause to filter out rows that do not meet the conditions
  2. The WHERE clause follows the FROM clause

Case Practice

  • Query all employees whose salary is greater than 1000
select ename,sal from emp where sal > 1000;

insert image description here

  • Note: Field aliases cannot be used in the where clause.

Comparison Operators (Between/In/Is Null)

insert image description here

Case Practice

  • Query all employees whose salary is equal to 5000
select ename,sal from emp where sal=5000;

insert image description here

  • Query the information of employees whose salary is between 500 and 1000
select ename,sal from emp where between 500 and 1000;

insert image description here

  • Query all employee information whose comm is empty
select ename,comm from emp where comm is null;

insert image description here

  • Query employee information whose salary is 1500 or 5000
select ename,sal from emp where sal in (1500,5000);

insert image description here

Logical operators (And/Or/Not)

insert image description here

  • Query salary is greater than 1000, department is 30
select * from emp where sal>1000 and deptno=30;

insert image description here

  • Query salary is greater than 1000, or the department is 30
select * from emp where sal>1000 or deptno=30;

insert image description here

  • Query employee information other than 20 departments and 30 departments
select * from emp where deptno not in(20,30);

insert image description here

group

Group By statement

The GROUP BY statement is usually used with aggregate functions to group by one or more queued results, and then perform an aggregate operation on each group.

Case Practice

  • Calculate the average salary of each department in the emp table
select deptno,avg(sal) avg_sal from emp group by deptno;

insert image description here

  • Calculate the maximum salary for each position in each department of emp
select deptno,job,max(sal) max_sal from emp group by deptno,job;

insert image description here

Having statement

(1) Grouping functions cannot be written after where, but grouping functions can be used after having.
(2) having is only used for group by group statistics statement.

  • Find the departments whose average salary of each department is greater than 2000
select deptno,avg(sal) avg_sal from emp group by deptno having avg_sal>2000;

insert image description here

Join statement

  • According to the equality of the department numbers in the employee table and the department table, query the employee number, employee name and department name;
select e.empno,e.ename,d.dname from emp e join dept d on e.deptno = d.depton;

insert image description here

table alias

(1) Using aliases can simplify queries.
(2) Using table name prefixes can improve execution efficiency

to sort

Global sorting (Order By)

  • Order By: global sorting, only one Reducer
  • Sorting using the order by clause
  • asc (ascend): ascending order (default)
  • desc (descend): descending order
  • order by clause at the end of the select statement

Case Practice

  1. Query employee information in ascending order of salary
select * from emp order by sal;

insert image description here

  1. Query employee information in descending order of salary
select * from emp order by sal desc;

insert image description here

sort by alias

  • Sort employees by 2 times their salary
select ename,sal*2 twosal from emp order by twosal;

insert image description here

sort by multiple columns

  • Sort by department and salary in ascending order
select ename,deptno,sal from emp order by deptno,sal;

insert image description here

Each Reduce Internal Sorting (Sort By)

  • Sort By: The efficiency of order by is very low for large-scale data sets. In many cases, global sorting is not required, and sort by can be used at this time.
  • Sort by produces a sort file for each reducer. Sorting is performed internally by each Reducer, not for the global result set.
  • Set the number of reduce
set mapreduce.job.reduces=3;

insert image description here

  • View the number of reduce settings
set mapreduce.job.reduces;

insert image description here

  • View employee information in descending order by department number
select * from emp sort by deptno desc;

insert image description here

  • Import query results into a file (sort in descending order by department number)
insert overwrite local directory '/root/hivedata'
select * from emp sort by deptno desc;

insert image description here

Guess you like

Origin blog.csdn.net/weixin_51309151/article/details/126897742