Display field name when hive query
- Run the following code to query and see the field name
set hive.cli.print.header=true;
- You can also remove the table name
set hive.resultset.use.unique.column.names=false;
Full table and specific column queries
- Query statement syntax:
SELECT [ALL | DISTINCT] select_expr, select_expr, ...
FROM table_reference
[WHERE where_condition]
[GROUP BY col_list]
[ORDER BY col_list]
[CLUSTER BY col_list
| [DISTRIBUTE BY col_list] [SORT BY col_list]
]
[LIMIT number]
basic query
Full table and specific column queries
- data preparation
Create two files under / root/hivedata
- dept:
10 ACCOUNTING 1700
20 RESEARCH 1800
30 SALES 1900
40 OPERATIONS 1700
- emp:
7369 SMITH CLERK 7902 1980-12-17 800.00 20
7499 ALLEN SALESMAN 7698 1981-2-20 1600.00 300.00 30
7521 WARD SALESMAN 7698 1981-2-22 1250.00 500.00 30
7566 JONES MANAGER 7839 1981-4-2 2975.00 20
7654 MARTIN SALESMAN 7698 1981-9-28 1250.00 1400.00 30
7698 BLAKE MANAGER 7839 1981-5-1 2850.00 30
7782 CLARK MANAGER 7839 1981-6-9 2450.00 10
7788 SCOTT ANALYST 7566 1987-4-19 3000.00 20
7839 KING PRESIDENT 1981-11-17 5000.00 10
7844 TURNER SALESMAN 7698 1981-9-8 1500.00 0.00 30
7876 ADAMS CLERK 7788 1987-5-23 1100.00 20
7900 JAMES CLERK 7698 1981-12-3 950.00 30
7902 FORD ANALYST 7566 1981-12-3 3000.00 20
7934 MILLER CLERK 7782 1982-1-23 1300.00 10
- Create department table
create table if not exists dept(depton int,dname string,loc int)
row format delimited fields terminated by '\t';
- Create employee table
create table if not exists emp(empno int,ename string,job string,mgr int,hiredate string,sal double,comm double,deptno int)
row format delimited fields terminated by '\t';
- Import Data
load data local inpath '/root/hivedata/dept.txt' into table dept;
load data local inpath '/root/hivedata/emp.txt' into table emp;
Full table query
select * from emp;
Select specific column query
select empno,ename from emp;
Notice:
- (1) The SQL language is case insensitive.
- (2) SQL can be written in one or more lines
- (3) Keywords cannot be abbreviated and cannot be divided into lines
- (4) Each clause should generally be written on a separate line.
- (5) Use indentation to improve the readability of the statement.
column alias
- 1) Rename a column
- 2) Easy to calculate
- 3) Immediately following the column name, you can also add the keyword 'AS' between the column name and the alias
Case Practice
- Query name and department
select ename as name,deptno dn from emp;
Arithmetic operators (commonly used)
operator | describe |
---|---|
A+B | Add A and B |
A-B | A minus B |
A/B | A divided by B |
A*B | A and B are multiplied |
– | – |
- Case practice: add 1 to display after querying the salaries of all employees.
select sal +1 from emp;
Common functions
Find the total number of rows (count)
select count(*) cnt from emp;
Find the maximum salary (max)
select max(sal) max_sal from emp;
Find the minimum value of wages (min)
select min(sal) min_sal from emp;
Find the sum of salaries (sum)
select sum(sal) sum_sal from emp;
Find the average salary (avg)
select avg(sal) avg_sal from emp;
Limit statement
- Typical queries return multiple rows of data. The LIMIT clause is used to limit the number of rows returned.
select * from emp limit 5;
where statement
- Use the WHERE clause to filter out rows that do not meet the conditions
- The WHERE clause follows the FROM clause
Case Practice
- Query all employees whose salary is greater than 1000
select ename,sal from emp where sal > 1000;
- Note: Field aliases cannot be used in the where clause.
Comparison Operators (Between/In/Is Null)
Case Practice
- Query all employees whose salary is equal to 5000
select ename,sal from emp where sal=5000;
- Query the information of employees whose salary is between 500 and 1000
select ename,sal from emp where between 500 and 1000;
- Query all employee information whose comm is empty
select ename,comm from emp where comm is null;
- Query employee information whose salary is 1500 or 5000
select ename,sal from emp where sal in (1500,5000);
Logical operators (And/Or/Not)
- Query salary is greater than 1000, department is 30
select * from emp where sal>1000 and deptno=30;
- Query salary is greater than 1000, or the department is 30
select * from emp where sal>1000 or deptno=30;
- Query employee information other than 20 departments and 30 departments
select * from emp where deptno not in(20,30);
group
Group By statement
The GROUP BY statement is usually used with aggregate functions to group by one or more queued results, and then perform an aggregate operation on each group.
Case Practice
- Calculate the average salary of each department in the emp table
select deptno,avg(sal) avg_sal from emp group by deptno;
- Calculate the maximum salary for each position in each department of emp
select deptno,job,max(sal) max_sal from emp group by deptno,job;
Having statement
(1) Grouping functions cannot be written after where, but grouping functions can be used after having.
(2) having is only used for group by group statistics statement.
- Find the departments whose average salary of each department is greater than 2000
select deptno,avg(sal) avg_sal from emp group by deptno having avg_sal>2000;
Join statement
- According to the equality of the department numbers in the employee table and the department table, query the employee number, employee name and department name;
select e.empno,e.ename,d.dname from emp e join dept d on e.deptno = d.depton;
table alias
(1) Using aliases can simplify queries.
(2) Using table name prefixes can improve execution efficiency
to sort
Global sorting (Order By)
- Order By: global sorting, only one Reducer
- Sorting using the order by clause
- asc (ascend): ascending order (default)
- desc (descend): descending order
- order by clause at the end of the select statement
Case Practice
- Query employee information in ascending order of salary
select * from emp order by sal;
- Query employee information in descending order of salary
select * from emp order by sal desc;
sort by alias
- Sort employees by 2 times their salary
select ename,sal*2 twosal from emp order by twosal;
sort by multiple columns
- Sort by department and salary in ascending order
select ename,deptno,sal from emp order by deptno,sal;
Each Reduce Internal Sorting (Sort By)
- Sort By: The efficiency of order by is very low for large-scale data sets. In many cases, global sorting is not required, and sort by can be used at this time.
- Sort by produces a sort file for each reducer. Sorting is performed internally by each Reducer, not for the global result set.
- Set the number of reduce
set mapreduce.job.reduces=3;
- View the number of reduce settings
set mapreduce.job.reduces;
- View employee information in descending order by department number
select * from emp sort by deptno desc;
- Import query results into a file (sort in descending order by department number)
insert overwrite local directory '/root/hivedata'
select * from emp sort by deptno desc;