1. Gramática básica
Function (arg1,..., argn) OVER ([PARTITION BY <...>] [ORDER BY <....>]
[<window_expression>])
Function (arg1,..., argn) 可以是下面的函数:
Aggregate Functions: 聚合函数,比如:sum(...)、 max(...)、min(...)、avg(...)等.
Sort Functions: 数据排序函数, 比如 :rank(...)、row_number(...)等.
Analytics Functions: 统计和比较函数, 比如:lead(...)、lag(...)、 first_value(...)等.
2. Preparación preliminar:
2.1. Declaración de construcción de tablas:
CREATE TABLE IF NOT EXISTS temp.test (
`name` string COMMENT '姓名',
`dept_num` int COMMENT '编号',
`employee_id` int COMMENT 'id',
`salary` int COMMENT '工资',
`type` string COMMENT '岗位类型',
`start_date` date COMMENT '入职时间'
) ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
STORED as TEXTFILE;
2.2. Cree un archivo de prueba localmente:
name dept_num employee_id salary type start_date
Michael 1000 100 5000 full 2014-01-29
Will 1000 101 4000 full 2013-10-02
Wendy 1000 101 4000 part 2014-10-02
Steven 1000 102 6400 part 2012-11-03
Lucy 1000 103 5500 full 2010-01-03
Lily 1001 104 5000 part 2014-11-29
Jess 1001 105 6000 part 2014-12-02
Mike 1001 106 6400 part 2013-11-03
Wei 1002 107 7000 part 2010-04-03
Yun 1002 108 5500 full 2014-01-29
Richard 1002 109 8000 full 2013-09-01
2.3. Cargue el archivo local creado a la biblioteca de la colmena:
load data local inpath '/root/test' into table temp.test;
3. Función de agregación de ventanas
3.1. Consultar nombre, número de departamento, salario y número de personas del departamento
select `name`,`dept_num`,`salary`,
count(*) over (partition by dept_num) as cnt
from employee;
3.1.1. El resultado de salida es:
Total MapReduce CPU Time Spent: 1 seconds 780 msec
OK
name dept_num salary cnt
Lucy 1000 5500 5
Steven 1000 6400 5
Wendy 1000 4000 5
Will 1000 4000 5
Michael 1000 5000 5
Mike 1001 6400 3
Jess 1001 6000 3
Lily 1001 5000 3
Richard 1002 8000 3
Yun 1002 5500 3
Wei 1002 7000 3
Time taken: 22.624 seconds, Fetched: 12 row(s)
3.2. Consultar el nombre, número de departamento, salario y el salario total de cada departamento. El salario total del departamento se emite en orden descendente
select name,dept_num,salary,
sum(salary) over (partition by dept_num order by dept_num) as sum_dept_salary
from employee order by sum_dept_salary desc;
3.2.1. El resultado de la salida es:
Total MapReduce CPU Time Spent: 3 seconds 470 msec
OK
name dept_num salary sum_dept_salary
Michael 1000 5000 24900
Will 1000 4000 24900
Wendy 1000 4000 24900
Steven 1000 6400 24900
Lucy 1000 5500 24900
Wei 1002 7000 20500
Yun 1002 5500 20500
Richard 1002 8000 20500
Lily 1001 5000 17400
Jess 1001 6000 17400
Mike 1001 6400 17400
Time taken: 47.313 seconds, Fetched: 12 row(s)
4. Función de clasificación de ventanas
4.1. Nombre de la consulta, número de departamento, salario, número de clasificación (clasificación por salario)
select `name`,`dept_num`,`salary`,
row_number() over (order by salary desc ) rnum
from employee;
4.1.1. El resultado de la salida es:
Total MapReduce CPU Time Spent: 1 seconds 890 msec
OK
name dept_num salary rnum
Richard 1002 8000 1
Wei 1002 7000 2
Mike 1001 6400 3
Steven 1000 6400 4
Jess 1001 6000 5
Yun 1002 5500 6
Lucy 1000 5500 7
Lily 1001 5000 8
Michael 1000 5000 9
Wendy 1000 4000 10
Will 1000 4000 11
Time taken: 22.453 seconds, Fetched: 12 row(s)
4.2. Consultar la información de las dos personas con mayor salario en cada departamento (nombre, departamento, salario)
select name,dept_num,salary
from (
select `name`,`dept_num`,`salary`,
row_number() over (partition by dept_num order by salary desc ) rnum
from employee) t1 where rnum <= 2;
4.2.1. El resultado de salida es:
Total MapReduce CPU Time Spent: 2 seconds 680 msec
OK
name dept_num salary
Steven 1000 6400
Lucy 1000 5500
Mike 1001 6400
Jess 1001 6000
Richard 1002 8000
Wei 1002 7000
Time taken: 24.083 seconds, Fetched: 7 row(s)
4.3. Consultar información de clasificación salarial de los empleados de cada departamento
select `name`,`dept_num`,`salary`,
row_number() over (partition by dept_num order by salary desc ) rnum
from employee;
4.3.1. El resultado de la salida es:
Total MapReduce CPU Time Spent: 1 seconds 860 msec
OK
name dept_num salary rnum
Steven 1000 6400 1
Lucy 1000 5500 2
Michael 1000 5000 3
Wendy 1000 4000 4
Will 1000 4000 5
Mike 1001 6400 1
Jess 1001 6000 2
Lily 1001 5000 3
Richard 1002 8000 1
Wei 1002 7000 2
Yun 1002 5500 3
Time taken: 23.202 seconds, Fetched: 12 row(s)
4.4, use la función de rango para clasificar
select `name`,`dept_num`,`salary`,
rank() over (order by salary desc) rank
from employee;
4.4.1. El resultado de salida es:
Total MapReduce CPU Time Spent: 1 seconds 830 msec
OK
name dept_num salary rank
Richard 1002 8000 1
Wei 1002 7000 2
Mike 1001 6400 3
Steven 1000 6400 3
Jess 1001 6000 5
Yun 1002 5500 6
Lucy 1000 5500 6
Lily 1001 5000 8
Michael 1000 5000 8
Wendy 1000 4000 10
Will 1000 4000 10
Time taken: 21.547 seconds, Fetched: 12 row(s)
4.5, use dense_rank para clasificar
select `name`,`dept_num`,`salary`,
dense_rank() over (order by salary desc) rank
from employee;
4.5.1 El resultado de la salida es:
Total MapReduce CPU Time Spent: 1 seconds 710 msec
OK
name dept_num salary rank
Richard 1002 8000 1
Wei 1002 7000 2
Mike 1001 6400 3
Steven 1000 6400 3
Jess 1001 6000 4
Yun 1002 5500 5
Lucy 1000 5500 5
Lily 1001 5000 6
Michael 1000 5000 6
Wendy 1000 4000 7
Will 1000 4000 7
Time taken: 21.879 seconds, Fetched: 12 row(s)
4.6, use percent_rank () para la clasificación
select name,dept_num,salary,
percent_rank() over (order by salary desc) rank
from employee;
4.6.1. El resultado de salida es:
Total MapReduce CPU Time Spent: 1 seconds 940 msec
OK
name dept_num salary rank
Richard 1002 8000 0.0
Wei 1002 7000 0.09090909090909091
Mike 1001 6400 0.18181818181818182
Steven 1000 6400 0.18181818181818182
Jess 1001 6000 0.36363636363636365
Yun 1002 5500 0.45454545454545453
Lucy 1000 5500 0.45454545454545453
Lily 1001 5000 0.6363636363636364
Michael 1000 5000 0.6363636363636364
Wendy 1000 4000 0.8181818181818182
Will 1000 4000 0.8181818181818182
Time taken: 22.401 seconds, Fetched: 12 row(s)
4.7. Utilice ntile para clasificar segmentos de datos
SELECT name,dept_num as deptno,salary,
ntile(4) OVER(ORDER BY salary desc) as ntile
FROM employee;
4.7.1. El resultado de salida es:
Total MapReduce CPU Time Spent: 1 seconds 940 msec
OK
name deptno salary ntile
Richard 1002 8000 1
Wei 1002 7000 1
Mike 1001 6400 1
Steven 1000 6400 2
Jess 1001 6000 2
Yun 1002 5500 2
Lucy 1000 5500 3
Lily 1001 5000 3
Michael 1000 5000 3
Wendy 1000 4000 4
Will 1000 4000 4
Time taken: 28.829 seconds, Fetched: 12 row(s)
5. Función de análisis de ventana
5.1. Contar la proporción del número de personas menor o igual al salario actual respecto al número total de personas
SELECT name,dept_num,salary,
cume_dist() OVER (ORDER BY salary) as cume
FROM employee;
5.1.1. El resultado de la salida es:
name deptno salary cume
Wendy 1000 4000 0.25
Will 1000 4000 0.25
Lily 1001 5000 0.4166666666666667
Michael 1000 5000 0.4166666666666667
Yun 1002 5500 0.5833333333333334
Lucy 1000 5500 0.5833333333333334
Jess 1001 6000 0.6666666666666666
Mike 1001 6400 0.8333333333333334
Steven 1000 6400 0.8333333333333334
Wei 1002 7000 0.9166666666666666
Richard 1002 8000 1.0
Time taken: 20.869 seconds, Fetched: 12 row(s)
5.2. Contar la proporción del número de personas menor o igual al salario actual con el número total de personas
SELECT name,dept_num,salary,
cume_dist() OVER (ORDER BY salary desc) as cume
FROM employee;
5.2.1. El resultado de salida es:
Total MapReduce CPU Time Spent: 1 seconds 790 msec
OK
name dept_num salary cume
Richard 1002 8000 0.08333333333333333
Wei 1002 7000 0.16666666666666666
Mike 1001 6400 0.3333333333333333
Steven 1000 6400 0.3333333333333333
Jess 1001 6000 0.4166666666666667
Yun 1002 5500 0.5833333333333334
Lucy 1000 5500 0.5833333333333334
Lily 1001 5000 0.75
Michael 1000 5000 0.75
Wendy 1000 4000 0.9166666666666666
Will 1000 4000 0.9166666666666666
Time taken: 21.672 seconds, Fetched: 12 row(s)
5.3. Contar la proporción del número de personas menor o igual al salario actual con el número total de personas
SELECT name,dept_num,salary,
cume_dist() OVER (PARTITION BY dept_num ORDER BY salary) as cume
FROM employee;
5.3.1. El resultado de la salida es:
Total MapReduce CPU Time Spent: 2 seconds 130 msec
OK
name dept_num salary cume
Wendy 1000 4000 0.4
Will 1000 4000 0.4
Michael 1000 5000 0.6
Lucy 1000 5500 0.8
Steven 1000 6400 1.0
Lily 1001 5000 0.3333333333333333
Jess 1001 6000 0.6666666666666666
Mike 1001 6400 1.0
Yun 1002 5500 0.3333333333333333
Wei 1002 7000 0.6666666666666666
Richard 1002 8000 1.0
Time taken: 22.055 seconds, Fetched: 12 row(s)
5.4. Contar la proporción del número de personas menor o igual al salario actual con el número total de personas
SELECT name,dept_num,salary,
lead(salary,1) OVER (PARTITION BY dept_num ORDER BY salary) as lead
FROM employee;
5.4.1. El resultado de salida es:
Total MapReduce CPU Time Spent: 1 seconds 880 msec
OK
name dept_num salary lead
Wendy 1000 4000 4000
Will 1000 4000 5000
Michael 1000 5000 5500
Lucy 1000 5500 6400
Steven 1000 6400 NULL
Lily 1001 5000 6000
Jess 1001 6000 6400
Mike 1001 6400 NULL
Yun 1002 5500 7000
Wei 1002 7000 8000
Richard 1002 8000 NULL
Time taken: 21.57 seconds, Fetched: 12 row(s)
5.5. Calcular la proporción del número de personas menor o igual al salario actual con respecto al número total
SELECT name,dept_num,salary,
lag(salary,1) OVER (PARTITION BY dept_num ORDER BY salary) as lead
FROM employee;
5.5.1 El resultado de salida es:
Total MapReduce CPU Time Spent: 1 seconds 700 msec
OK
name dept_num salary lead
Wendy 1000 4000 NULL
Will 1000 4000 4000
Michael 1000 5000 4000
Lucy 1000 5500 5000
Steven 1000 6400 5500
Lily 1001 5000 NULL
Jess 1001 6000 5000
Mike 1001 6400 6000
Yun 1002 5500 NULL
Wei 1002 7000 5500
Richard 1002 8000 7000
Time taken: 21.423 seconds, Fetched: 12 row(s)
5.6. Contar la proporción del número de personas menor o igual al salario actual con el número total de personas
SELECT name,dept_num,salary,
first_value(salary) OVER (PARTITION BY dept_num ORDER BY salary) as fval
FROM employee;
5.6.1. El resultado de salida es:
Total MapReduce CPU Time Spent: 1 seconds 720 msec
OK
name dept_num salary fval
Wendy 1000 4000 4000
Will 1000 4000 4000
Michael 1000 5000 4000
Lucy 1000 5500 4000
Steven 1000 6400 4000
Lily 1001 5000 5000
Jess 1001 6000 5000
Mike 1001 6400 5000
Yun 1002 5500 5500
Wei 1002 7000 5500
Richard 1002 8000 5500
Time taken: 20.379 seconds, Fetched: 12 row(s)
5.7. Contar la proporción del número de personas menor o igual al salario actual con el número total de personas
SELECT name,dept_num,salary,
last_value(salary) OVER (PARTITION BY dept_num ORDER BY salary RANGE
BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as lval
FROM employee;
5.7.1. El resultado de salida es:
Total MapReduce CPU Time Spent: 1 seconds 770 msec
OK
name dept_num salary lval
Wendy 1000 4000 6400
Will 1000 4000 6400
Michael 1000 5000 6400
Lucy 1000 5500 6400
Steven 1000 6400 6400
Lily 1001 5000 6400
Jess 1001 6000 6400
Mike 1001 6400 6400
Yun 1002 5500 8000
Wei 1002 7000 8000
Richard 1002 8000 8000
Time taken: 21.649 seconds, Fetched: 12 row(s)