ハイブウィンドウ機能について

1.基本的な文法

    Function (arg1,..., argn) OVER ([PARTITION BY <...>] [ORDER BY <....>]
    [<window_expression>])
Function (arg1,..., argn) 可以是下面的函数:
    Aggregate Functions: 聚合函数,比如:sum(...)、 max(...)、min(...)、avg(...)等.
    Sort Functions: 数据排序函数, 比如 :rank(...)、row_number(...)等.
    Analytics Functions: 统计和比较函数, 比如:lead(...)、lag(...)、 first_value(...)等.

2.準備:

2.1。テーブル作成ステートメント:

CREATE TABLE IF NOT EXISTS temp.test (
                `name` string COMMENT '姓名',
                `dept_num` int COMMENT '编号',
                `employee_id` int COMMENT 'id',
                `salary` int COMMENT '工资',
                `type` string COMMENT '岗位类型',
                `start_date` date COMMENT '入职时间'
) ROW FORMAT DELIMITED
  FIELDS TERMINATED BY '\t'
  STORED as TEXTFILE;

2.2。ローカルでテストファイルを作成します。

name    dept_num    employee_id salary  type    start_date
Michael 1000        100         5000    full    2014-01-29
Will    1000        101         4000    full    2013-10-02
Wendy   1000        101         4000    part    2014-10-02
Steven  1000        102         6400    part    2012-11-03
Lucy    1000        103         5500    full    2010-01-03
Lily    1001        104         5000    part    2014-11-29
Jess    1001        105         6000    part    2014-12-02
Mike    1001        106         6400    part    2013-11-03
Wei     1002        107         7000    part    2010-04-03
Yun     1002        108         5500    full    2014-01-29
Richard 1002        109         8000    full    2013-09-01

2.3。作成したローカルファイルをハイブライブラリにアップロードします。

load data local inpath '/root/test' into table temp.test;

3.ウィンドウ集約機能

3.1。名前、部門番号、給与、部門の人数を照会します

select `name`,`dept_num`,`salary`,
count(*) over (partition by dept_num) as cnt 
from employee;

3.1.1。出力結果は次のとおりです。

Total MapReduce CPU Time Spent: 1 seconds 780 msec
OK
name    dept_num    salary    cnt
Lucy    1000        5500      5
Steven  1000        6400      5
Wendy   1000        4000      5
Will    1000        4000      5
Michael 1000        5000      5
Mike    1001        6400      3
Jess    1001        6000      3
Lily    1001        5000      3
Richard 1002        8000      3
Yun     1002        5500      3
Wei     1002        7000      3
Time taken: 22.624 seconds, Fetched: 12 row(s)

3.2。各部門の名前、部門番号、給与、合計給与を照会します。部門の合計給与は降順で出力されます。

select name,dept_num,salary,
sum(salary) over (partition by dept_num order by dept_num) as sum_dept_salary 
from employee order by sum_dept_salary desc;

3.2.1。出力結果は次のとおりです。

Total MapReduce CPU Time Spent: 3 seconds 470 msec
OK
name    dept_num    salary    sum_dept_salary
Michael 1000        5000      24900
Will    1000        4000      24900
Wendy   1000        4000      24900
Steven  1000        6400      24900
Lucy    1000        5500      24900
Wei     1002        7000      20500
Yun     1002        5500      20500
Richard 1002        8000      20500
Lily    1001        5000      17400
Jess    1001        6000      17400
Mike    1001        6400      17400
Time taken: 47.313 seconds, Fetched: 12 row(s)

4.ウィンドウソート機能

4.1。クエリ名、部門番号、給与、ランキング番号(給与でランク付け)

select `name`,`dept_num`,`salary`,
row_number() over (order by salary desc ) rnum 
from employee;

4.1.1。出力結果は次のとおりです。

Total MapReduce CPU Time Spent: 1 seconds 890 msec
OK
name    dept_num    salary    rnum
Richard 1002        8000      1
Wei     1002        7000      2
Mike    1001        6400      3
Steven  1000        6400      4
Jess    1001        6000      5
Yun     1002        5500      6
Lucy    1000        5500      7
Lily    1001        5000      8
Michael 1000        5000      9
Wendy   1000        4000      10
Will    1000        4000      11
Time taken: 22.453 seconds, Fetched: 12 row(s)

4.2。各部門(名前、部門、給与)で最も給与が高い2人の情報を照会する

select name,dept_num,salary 
from (
select `name`,`dept_num`,`salary`,
row_number() over (partition by dept_num order by salary desc ) rnum 
from employee) t1 where rnum <= 2;

4.2.1。出力結果は次のとおりです。

Total MapReduce CPU Time Spent: 2 seconds 680 msec
OK
name    dept_num    salary
Steven  1000        6400
Lucy    1000        5500
Mike    1001        6400
Jess    1001        6000
Richard 1002        8000
Wei     1002        7000
Time taken: 24.083 seconds, Fetched: 7 row(s)

4.3。各部門の従業員の給与ランキング情報を照会する

select `name`,`dept_num`,`salary`,
row_number() over (partition by dept_num order by salary desc ) rnum 
from employee;

4.3.1。出力結果は次のとおりです。

Total MapReduce CPU Time Spent: 1 seconds 860 msec
OK
name    dept_num    salary    rnum
Steven  1000        6400      1
Lucy    1000        5500      2
Michael 1000        5000      3
Wendy   1000        4000      4
Will    1000        4000      5
Mike    1001        6400      1
Jess    1001        6000      2
Lily    1001        5000      3
Richard 1002        8000      1
Wei     1002        7000      2
Yun     1002        5500      3
Time taken: 23.202 seconds, Fetched: 12 row(s)

4.4、ランク付けにランク付け機能を使用する

select `name`,`dept_num`,`salary`,
rank() over (order by salary desc) rank
from employee;

4.4.1。出力結果は次のとおりです。

Total MapReduce CPU Time Spent: 1 seconds 830 msec
OK
name    dept_num    salary    rank
Richard 1002        8000      1
Wei     1002        7000      2
Mike    1001        6400      3
Steven  1000        6400      3
Jess    1001        6000      5
Yun     1002        5500      6
Lucy    1000        5500      6
Lily    1001        5000      8
Michael 1000        5000      8
Wendy   1000        4000      10
Will    1000        4000      10
Time taken: 21.547 seconds, Fetched: 12 row(s)

4.5、ランキングにdense_rankを使用

select `name`,`dept_num`,`salary`,
dense_rank() over (order by salary desc) rank
from employee;

4.5.1出力結果は次のとおりです。

Total MapReduce CPU Time Spent: 1 seconds 710 msec
OK
name    dept_num    salary    rank
Richard 1002        8000      1
Wei     1002        7000      2
Mike    1001        6400      3
Steven  1000        6400      3
Jess    1001        6000      4
Yun     1002        5500      5
Lucy    1000        5500      5
Lily    1001        5000      6
Michael 1000        5000      6
Wendy   1000        4000      7
Will    1000        4000      7
Time taken: 21.879 seconds, Fetched: 12 row(s)

4.6、ランク付けにpercent_rank()を使用

select name,dept_num,salary,
percent_rank() over (order by salary desc) rank
from employee;

4.6.1。出力結果は次のとおりです。

Total MapReduce CPU Time Spent: 1 seconds 940 msec
OK
name    dept_num    salary  rank
Richard 1002        8000    0.0
Wei     1002        7000    0.09090909090909091
Mike    1001        6400    0.18181818181818182
Steven  1000        6400    0.18181818181818182
Jess    1001        6000    0.36363636363636365
Yun     1002        5500    0.45454545454545453
Lucy    1000        5500    0.45454545454545453
Lily    1001        5000    0.6363636363636364
Michael 1000        5000    0.6363636363636364
Wendy   1000        4000    0.8181818181818182
Will    1000        4000    0.8181818181818182
Time taken: 22.401 seconds, Fetched: 12 row(s)

4.7.ntileを使用してデータスライスをランク付けする

SELECT name,dept_num as deptno,salary,
ntile(4) OVER(ORDER BY salary desc) as ntile
FROM employee;

4.7.1。出力結果は次のとおりです。

Total MapReduce CPU Time Spent: 1 seconds 940 msec
OK
name    deptno  salary  ntile
Richard 1002    8000    1
Wei     1002    7000    1
Mike    1001    6400    1
Steven  1000    6400    2
Jess    1001    6000    2
Yun     1002    5500    2
Lucy    1000    5500    3
Lily    1001    5000    3
Michael 1000    5000    3
Wendy   1000    4000    4
Will    1000    4000    4
Time taken: 28.829 seconds, Fetched: 12 row(s)

5.ウィンドウ分析機能

5.1。総人数に対する現在の給与以下の人数の割合を数える

SELECT name,dept_num,salary,
cume_dist() OVER (ORDER BY salary) as cume
FROM employee;

5.1.1。出力結果は次のとおりです。

name    deptno  salary  cume
Wendy   1000    4000    0.25
Will    1000    4000    0.25
Lily    1001    5000    0.4166666666666667
Michael 1000    5000    0.4166666666666667
Yun     1002    5500    0.5833333333333334
Lucy    1000    5500    0.5833333333333334
Jess    1001    6000    0.6666666666666666
Mike    1001    6400    0.8333333333333334
Steven  1000    6400    0.8333333333333334
Wei     1002    7000    0.9166666666666666
Richard 1002    8000    1.0
Time taken: 20.869 seconds, Fetched: 12 row(s)

5.2。総人数に対する現在の給与以下の人数の割合を数える

SELECT name,dept_num,salary,
cume_dist() OVER (ORDER BY salary desc) as cume
FROM employee;

5.2.1。出力結果は次のとおりです。

Total MapReduce CPU Time Spent: 1 seconds 790 msec
OK
name    dept_num    salary  cume
Richard 1002        8000    0.08333333333333333
Wei     1002        7000    0.16666666666666666
Mike    1001        6400    0.3333333333333333
Steven  1000        6400    0.3333333333333333
Jess    1001        6000    0.4166666666666667
Yun     1002        5500    0.5833333333333334
Lucy    1000        5500    0.5833333333333334
Lily    1001        5000    0.75
Michael 1000        5000    0.75
Wendy   1000        4000    0.9166666666666666
Will    1000        4000    0.9166666666666666
Time taken: 21.672 seconds, Fetched: 12 row(s)

5.3。総人数に対する現在の給与以下の人数の割合を数える

SELECT name,dept_num,salary,
cume_dist() OVER (PARTITION BY dept_num ORDER BY salary) as cume
FROM employee;

5.3.1。出力結果は次のとおりです。

Total MapReduce CPU Time Spent: 2 seconds 130 msec
OK
name    dept_num    salary  cume
Wendy   1000        4000    0.4
Will    1000        4000    0.4
Michael 1000        5000    0.6
Lucy    1000        5500    0.8
Steven  1000        6400    1.0
Lily    1001        5000    0.3333333333333333
Jess    1001        6000    0.6666666666666666
Mike    1001        6400    1.0
Yun     1002        5500    0.3333333333333333
Wei     1002        7000    0.6666666666666666
Richard 1002        8000    1.0
Time taken: 22.055 seconds, Fetched: 12 row(s)

5.4。総人数に対する現在の給与以下の人数の割合を数える

SELECT name,dept_num,salary,
lead(salary,1) OVER (PARTITION BY dept_num ORDER BY salary) as lead
FROM employee;

5.4.1。出力結果は次のとおりです。

Total MapReduce CPU Time Spent: 1 seconds 880 msec
OK
name    dept_num    salary  lead
Wendy   1000        4000    4000
Will    1000        4000    5000
Michael 1000        5000    5500
Lucy    1000        5500    6400
Steven  1000        6400    NULL
Lily    1001        5000    6000
Jess    1001        6000    6400
Mike    1001        6400    NULL
Yun     1002        5500    7000
Wei     1002        7000    8000
Richard 1002        8000    NULL
Time taken: 21.57 seconds, Fetched: 12 row(s)

5.5。総数に対する現在の給与以下の人数の割合を計算します

SELECT name,dept_num,salary,
lag(salary,1) OVER (PARTITION BY dept_num ORDER BY salary) as lead
FROM employee;

5.5.1出力結果は次のとおりです。

Total MapReduce CPU Time Spent: 1 seconds 700 msec
OK
name    dept_num    salary  lead
Wendy   1000        4000    NULL
Will    1000        4000    4000
Michael 1000        5000    4000
Lucy    1000        5500    5000
Steven  1000        6400    5500
Lily    1001        5000    NULL
Jess    1001        6000    5000
Mike    1001        6400    6000
Yun     1002        5500    NULL
Wei     1002        7000    5500
Richard 1002        8000    7000
Time taken: 21.423 seconds, Fetched: 12 row(s)

5.6。総人数に対する現在の給与以下の人数の割合を数える

SELECT name,dept_num,salary,
first_value(salary) OVER (PARTITION BY dept_num ORDER BY salary) as fval
FROM employee;

5.6.1。出力結果は次のとおりです。

Total MapReduce CPU Time Spent: 1 seconds 720 msec
OK
name    dept_num    salary  fval
Wendy   1000        4000    4000
Will    1000        4000    4000
Michael 1000        5000    4000
Lucy    1000        5500    4000
Steven  1000        6400    4000
Lily    1001        5000    5000
Jess    1001        6000    5000
Mike    1001        6400    5000
Yun     1002        5500    5500
Wei     1002        7000    5500
Richard 1002        8000    5500
Time taken: 20.379 seconds, Fetched: 12 row(s)

5.7。総人数に対する現在の給与以下の人数の割合を数える

SELECT name,dept_num,salary,
last_value(salary) OVER (PARTITION BY dept_num ORDER BY salary RANGE
BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as lval
FROM employee;

5.7.1。出力結果は次のとおりです。

Total MapReduce CPU Time Spent: 1 seconds 770 msec
OK
name    dept_num    salary  lval
Wendy   1000        4000    6400
Will    1000        4000    6400
Michael 1000        5000    6400
Lucy    1000        5500    6400
Steven  1000        6400    6400
Lily    1001        5000    6400
Jess    1001        6000    6400
Mike    1001        6400    6400
Yun     1002        5500    8000
Wei     1002        7000    8000
Richard 1002        8000    8000
Time taken: 21.649 seconds, Fetched: 12 row(s)

おすすめ

転載: blog.51cto.com/15080921/2595867