Hive经典HQL语句练习(一)

搜集了50个经典SQL语句,以便加强对Hive的理解,包含了基本操作,UDF函数,以及很多常用统计函数,与mySQL写法有一定差别,用来做HQL练习

1.数据准备

home目录下新建data文件夹存放准备数据
mkdir data

创建student数据文本

vi student.txt

添加如下数据:

01	赵雷	1990-01-01	男
02	钱电	1990-12-21	男
03	孙风	1990-05-20	男
04	李云	1990-08-06	男
05	周梅	1991-12-01	女
06	吴兰	1992-03-01	女
07	郑竹	1989-07-01	女
08	王菊	1990-01-20	女

创建course数据文本

vi course.txt

添加如下数据:

01 语文 02
02 数学 01
03 英语 03

创建teacher数据文本

vi teacher.txt

添加如下数据:

01 张三
02 李四
03 王五

创建score数据文本

vi score.txt

添加如下数据:

01      01      80
01      02      90
01      03      99
02      01      70
02      02      60
02      03      80
03      01      80
03      02      80
03      03      80
04      01      50
04      02      30
04      03      20
05      01      76
05      02      87
06      01      31
06      03      34
07      02      89
07      03      98

进入hive,创建测试表
建表语句:

create table if not exists student(
s_id string,
s_name string,
s_birth string,
s_sex string
)
row format delimited
fields terminated by '/t'
stored as textfile
create table if not exists course(
c_id string,
c_name string,
t_id string
)
row format delimited
fields terminated by '/t'
stored as textfile
create table if not exists teacher(
t_id string,
t_name string
)
row format delimited
fields terminated by '/t'
stored as textfile
create table if not exists score(
s_id string,
c_id string,
s_score int
)
row format delimited
fields terminated by '/t'
stored as textfile

加载本地数据到hive表中

备注:这里有个坑,因为集群hive部署角色分配是每台节点上都有hive代理,所以我随意任何一台机器执行hive命令都是能够访问Hive Cli 的,所以当我选择在worknode1节点创建txt的时候再访问node执行load指令的时候执行失败,报错信息是找不到路径,所以这里给出两种解决方案:
一、把文件放到Hive server节点的目录才能被hive访问到,执行

scp -r data/ root@toolnode1:/home/

把文件复制到toolnode1节点再执行load指令成功执行
二、把文件上传到hdfs目录,再把路径替换成为hdfs路径

依次执行加载命令:

load data local inpath '/home/data/student.txt' into table student;
load data local inpath '/home/data/course.txt' into table course;
load data local inpath '/home/data/score.txt' into table score;
load data local inpath '/home/data/teacher.txt' into table teacher;

题目

1、查询"01"课程比"02"课程成绩高的学生的信息及课程分数:

select a.*,b.s_score as 01_score,c.s_score as 02_score 
from student a 
join score b on a.s_id = b.s_id and b.c_id = '01'
left join score c on a.s_id = c.s_id and c.c_id = '02'
where b.s_score > c.s_score

2、查询"01"课程比"02"课程成绩低的学生的信息及课程分数:

select a.*,b.s_score as 01_score,c.s_score as 02_score 
from student a 
join score b on a.s_id = b.s_id and b.c_id = '01'
left join score c on a.s_id = c.s_id and c.c_id = '02'
where b.s_score < c.s_score

3、查询平均成绩大于等于60分的同学的学生编号和学生姓名和平均成绩:

select * from (
select a.s_id,a.s_name,round(avg(b.s_score),2) as avg_score
from student a join score b on a.s_id = b.s_id 
group by a.s_id,a.s_name
) t where t.avg_score>=60

这里round函数保留两位小数没有生效,round参数改为0就能取整了,找不到原因

4、查询平均成绩小于60分的同学的学生编号和学生姓名和平均成绩(包括有成绩的和无成绩的):

select * from (
select a.s_id,a.s_name,round(avg(b.s_score),2) as avg_score
from student a join score b on a.s_id = b.s_id 
group by a.s_id,a.s_name
) t where t.avg_score<60

5、查询所有同学的学生编号、学生姓名、选课总数、所有课程的总成绩:

select a.s_id,a.s_name,count(b.c_id) num_subject,sum(b.s_score) total_score
from student a 
join score b on a.s_id = b.s_id
group by a.s_id,a.s_name

6、查询"李"姓老师的数量:

select count(*)
from teacher where t_name like '李%'

7、查询学过"张三"老师授课的同学的信息:

select distinct a.*
from student a 
join score b on a.s_id = b.s_id
left join course c on b.c_id = c.c_id
left join teacher d on c.t_id = d.t_id 
where  d.t_name = '张三'

8、查询没学过"张三"老师授课的同学的信息:

select * from student a 
where a.s_id not in(select distinct a.s_id
from student a 
join score b on a.s_id = b.s_id
left join course c on b.c_id = c.c_id
left join teacher d on c.t_id = d.t_id 
where  d.t_name = '张三') 

9、查询学过编号为"01"并且也学过编号为"02"的课程的同学的信息:

SELECT *
FROM student a
JOIN score b ON a.s_id=b.s_id AND b.c_id='01'
JOIN score c ON a.s_id=c.s_id AND c.c_id='02'

10、查询学过编号为"01"但是没有学过编号为"02"的课程的同学的信息:

SELECT a.*
FROM student a
JOIN score b ON a.s_id=b.s_id AND b.c_id='01'
WHERE a.s_id NOT IN(SELECT s_id FROM score WHERE c_id='02');

11、查询没有学全所有课程的同学的信息:

select distinct
a.*
from student a
join score b
left join score c on a.s_id=c.s_id and b.c_id=c.c_id
where c.s_score is null

12、查询至少有一门课与学号为"01"的同学所学相同的同学的信息:

select distinct
a.*
from student a
join score b on b.s_id='01'
join score c on b.c_id=c.c_id and a.s_id=c.s_id
where a.s_id<>'01'

13、查询和"01"号的同学学习的课程完全相同的其他同学的信息:

select p.s_id,p.s_name,p.s_birth,p.s_sex from(
SELECT a.s_id as s_id,a.s_name as s_name,a.s_birth as s_birth,a.s_sex as s_sex,count(b.c_id) as num_course
FROM student a
JOIN score b ON a.s_id=b.s_id
WHERE b.s_id!='01' AND b.c_id IN (SELECT c_id FROM score WHERE s_id='01')
GROUP BY a.s_id,a.s_name,a.s_birth,a.s_sex
) p
right join (SELECT COUNT(c_id) as 01_num_course FROM score WHERE s_id='01') q on p.num_course=q.01_num_course

SELECT s.s_name
FROM student s
JOIN score sc ON s.s_id=sc.s_id join
  (SELECT collect_set(c_id) AS sub,count(c_id) AS num
   FROM score
   WHERE s_id='01' ) sc2
WHERE array_contains(sc2.sub,sc.c_id) and s.s_id!='01'
GROUP BY s.s_id,
         s.s_name,
         s.s_birth,
         s.s_sex,
         sc2.num
HAVING count(sc.c_id)=sc2.num

14、查询没学过"张三"老师讲授的任一门课程的学生姓名:

select * from student a where a.s_id not in(select w.s_id from(
SELECT distinct
a.*
FROM student a
JOIN score b ON a.s_id=b.s_id
right join (SELECT c_id FROM course a join (SELECT t_id FROM teacher WHERE t_name='张三') b on a.t_id=b.t_id) t on b.c_id=t.c_id
) as w
)

15、查询两门及其以上不及格课程的同学的学号,姓名及其平均成绩:

select a.s_id,a.s_name,round(avg(b.s_score),2)
from student a 
join score b on a.s_id =b.s_id 
where a.s_id in (select w.s_id from (select c.s_id,count(d.s_score) num_subject
from student c 
join score d on c.s_id = d.s_id and d.s_score<60
group by c.s_id
having num_subject>=2) as w)
group by a.s_id,a.s_name

同样round不生效的问题依然存在

16、检索"01"课程分数小于60,按分数降序排列的学生信息:

select * 
from student a 
join score b on a.s_id = b.s_id and b.c_id = '01' and b.s_score<60
order by b.s_score desc

17、按平均成绩从高到低显示所有学生的所有课程的成绩以及平均成绩:

select a.s_id,a.s_name,b.c_id,c.c_name,b.s_score,w.avg_score
from student a 
join score b on a.s_id = b.s_id 
left join course c on b.c_id = c.c_id
left join (select a.s_id,round(avg(b.s_score),2) avg_score
from student a
join score b on a.s_id = b.s_id 
group by a.s_id) as w
on a.s_id = w.s_id
order by w.avg_score desc

18.查询各科成绩最高分、最低分和平均分:以如下形式显示:课程ID,课程name,最高分,最低分,平均分,及格率,中等率,优良率,优秀率:

SELECT a.c_id,
       b.c_name,
       max(a.s_score) max_score,
       min(a.s_score) min_score,
       avg(a.s_score) avg_score,
       count(if(a.s_score>=60,a.s_score,null))/count(a.s_score)*100 as jg,
       count(if(a.s_score>=70 and a.s_score<80,a.s_score,null))/count(a.s_score)*100 as zd,
       count(if(a.s_score>=80 and a.s_score<90,a.s_score,null))/count(a.s_score)*100 as yl,
       count(if(a.s_score>=90,a.s_score,null))/count(a.s_score)*100 as yx
FROM score a
JOIN course b ON a.c_id = b.c_id
GROUP BY a.c_id,
         b.c_name

19、按各科成绩进行排序,并显示排名:– row_number() over()分组排序功能

备注:重点掌握,统计很好用的函数

select
*,
row_number() over(distribute by c_id sort by s_score desc) as rm
from score

20、查询学生的总成绩并进行排名:

select s_id,
sum(s_score) as total_score,
row_number() over(sort by sum(s_score) desc) as rm
from score
group by s_id

21、查询不同老师所教不同课程平均分从高到低显示:

select c.t_id,c.t_name,b.c_id,b.c_name,round(avg(a.s_score),2) avg_score
from score a
join course b on a.c_id = b.c_id
left join teacher c on b.t_id = c.t_id
group by c.t_id,c.t_name,b.c_id,b.c_name
order by avg_score desc

22、查询所有课程的成绩第2名到第3名的学生信息及该课程成绩:

SELECT *
from (select *,
row_number() over(distribute by c_id sort by s_score) rm
from score) as a
join student b on a.s_id = b.s_id
where a.rm =2 or a.rm =3

23、统计各科成绩各分数段人数:课程编号,课程名称,[100-85],[85-70],[70-60],[0-60]及所占百分比

select
a.c_id,
b.c_name,
count(if(a.s_score>85 and a.s_score<=100,a.s_score,null)) as 85_100,
round(count(if(a.s_score>85 and a.s_score<=100,a.s_score,null))/count(a.s_score)*100,2) as percentage,
count(if(a.s_score>70 and a.s_score<=85,a.s_score,null)) as 70_85,
round(count(if(a.s_score>70 and a.s_score<=85,a.s_score,null))/count(a.s_score)*100,2) as percentage,
count(if(a.s_score>60 and a.s_score<=70,a.s_score,null)) as 60_70,
round(count(if(a.s_score>60 and a.s_score<=70,a.s_score,null))/count(a.s_score)*100,2) as percentage,
count(if(a.s_score>0 and a.s_score<=60,a.s_score,null)) as 0_60,
round(count(if(a.s_score>0 and a.s_score<=60,a.s_score,null))/count(a.s_score)*100,2) as percentage
from score a
join course b on a.c_id=b.c_id
group by a.c_id,
b.c_name

24、查询学生平均成绩及其名次:

select 
a.*,
row_number() over(order by a.avg_score desc)
from (select a.s_id,a.s_name,round(avg(b.s_score),2) avg_score
from student a 
join score b on a.s_id = b.s_id
group by a.s_id,a.s_name) a

25、查询各科成绩前三名的记录三个语句

select a.* from (
select
s_id,
c_id,
s_score,
row_number() over(distribute by c_id sort by s_score desc) as rm
from score
) a
where a.rm<=3

猜你喜欢

转载自blog.csdn.net/qq_42694052/article/details/89958000
今日推荐