Common window functions and analysis functions

Hive window function, function analysis
1 analysis functions: for level, points, and other n-fragmentation

Ntile Hive is a very powerful analysis functions.

It can be seen: it is the ordered set of data allocated to the average number of specified (num) buckets, the bucket number allocated to each row. If it is not evenly distributed, the priority assigned barrel smaller number, and number of rows in each bucket can put up to 1 difference.
The syntax is:
     NTILE (NUM) over ([partition_clause] order_by_clause) AS your_bucket_num

   The number of barrels may then, or after n data before selecting the parts per.
example:

    To the user corresponding to each user and consumer information table, calculates the average consumption of 50% of the spent before the user;

 


- the user table and consumption, consumption decreased by sequentially divided into two parts by the average
drop Table IF EXISTS test_by_payment_ntile;
Create Table test_by_payment_ntile AS
SELECT 
      Nick, 
      Payment,
      the NTILE (2) the OVER (the ORDER BY Payment desc) the AS RN 
from test_nick_payment;

- calculating an average value every respectively, can obtain a 50% and 50% post-consumer forward average consumption
SELECT 
   'avg_payment' AS INF,
   t1.avg_payment_up_50 AS avg_payment_up_50,
   t2.avg_payment_down_50 AS avg_payment_down_50
from
 (SELECT
         AVG (Payment ) AS avg_payment_up_50 
  from test_by_payment_ntile 
  WHERE RN =. 1
) T1
   the Join
(SELECT 
          AVG (Payment) AS avg_payment_down_50 
 from test_by_payment_ntile 
 WHERE RN = 2
) T2
ON (= t1.dp_id t2.dp_id);

 

Rank,Dense_Rank, Row_Number

SQL is very familiar with the three groups within the sort function. Syntax Like:

R()  over  (partion  by  col1...  order  by  col2...  desc/asc)


select 
   class1,
   score,
   rank() over(partition by class1 order by score desc) rk1,
   dense_rank() over(partition by class1 order by score desc) rk2,
   row_number() over(partition by class1 order by score desc) rk3
from zyy_test1;


Difference:
        Rank have the same value, the same output sequence number, and the next sequence number uninterrupted;

       dense_rank have the same value, the output of the same number, but in a number, intermittent

       Different values ​​on all outputs row_number number, serial number uniquely continuous;

2. The window function Lag, Lead, First_value, Last_value

Lag, Lead

LAG (col, n, DEFAULT) up to a value in the n-th row statistics window

LEAD (col, n, DEFAULT) down to the n-th row statistics window value, and the opposite LAG


- After the sorted group, shifted forward or backwards
- if the third parameter is omitted, the default is NULL, otherwise fill.

select
    dp_id,
    mt,
    payment,
    LAG(mt,2) over(partition by dp_id order by mt) mt_new
from test2;

 


- After the sorted group, shifted forward or backwards
- if the third parameter is omitted, the default is NULL, otherwise fill.

select
   dp_id,
   mt,
   payment,
   LEAD(mt,2,'1111-11') over(partition by dp_id order by mt) mt_new
from test2;

 

FIRST_VALUE, LAST_VALUE

first_value: after taking the sorted packets to the current cut-off line, a first value

last_value: after taking the sorted packets to the current cut-off line, the last value


- FIRST_VALUE get the current row in the group ahead of the first value
- LAST_VALUE get the last value in the group ahead of the current row
- FIRST_VALUE (DESC) to obtain the final value within a global group of
the SELECT
   DP_ID,
   MT,
   Payment,
   FIRST_VALUE (Payment) over (Partition by DP_ID Order by MT) payment_g_first,
   The LAST_VALUE (Payment) over (Partition by DP_ID Order by MT) payment_g_last,
  FIRST_VALUE (Payment) over (Partition by DP_ID Order by MT desc) payment_g_last_global
from test2
the ORDER bY DP_ID, mt;

Published 25 original articles · won praise 1 · views 10000 +

Guess you like

Origin blog.csdn.net/kimi_Christmas/article/details/89177190
Recommended