grammar:
Analysis function over (partition by the column name column name order by rows between the start position and end position)
Common analysis functions:
Aggregate class
avg (), sum (), max (), min ()Ranking categories
ROW_NUMBER () value is generated according to a sort of self-energizing ID not duplicate
Rank () generating an auto-incremented number is repeated equal value, a gap is generated in accordance with the sorting value
DENSE_RANK () value is generated according to a sort incrementing number will be repeated when the values are equal, no vacancy
- other kind
LAG (column names, the number of rows onward, [the number of rows is the default value of null, null is not specified])
Lead (column names, the number of rows of the future, [the number of rows is the default value of null, null is not specified])
ntile (n) of the partition lines ordered distribution to the specified data group, each group numbered, starting at 1, for each row, this row belongs NTILE returns the number of the group
important point:
- over () function in the partitions, sorting, specifies the window bounds may be used in combination is not specified, used in combination depending on the business needs
- over () function if the partition is not specified, the window size is generated for all the data query, if the partition is specified, the window size for each partition of the data
over () function in the window range described:
current row: the current line
unbounded: the starting point, unbounded preceding represents the starting point from the front, unbounded following indicates the end to the rear
n preceding: Previous data row n
n following: future data row n
Real case:
Raw data (user data purchase details)
name,orderdate,cost
jack,2017-01-01,10
tony,2017-01-02,15
jack,2017-02-03,23
tony,2017-01-04,29
jack,2017-01-05,46
jack,2017-04-06,42
tony,2017-01-07,50
jack,2017-01-08,55
mart,2017-04-08,62
mart,2017-04-09,68
neil,2017-05-10,12
mart,2017-04-11,75
neil,2017-06-12,80
mart,2017-04-13,94
建表加载数据
vi business.txt
create table business
(
name string,
orderdate string,
cost int
)ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
load data local inpath "/opt/module/data/business.txt" into table business;
demand
And the total number of customers (1) Inquiry Buy in April 2017 had
分析:按照日期过滤、分组count求总人数(分组为什么不是用group by?自己思考)
select
name,
orderdate,
cost,
count(*) over() total_people
FROM
business
where date_format(orderdate,'yyyy-MM')='2017-04';
(2) query the customer's purchase details and monthly purchase total
分析:按照顾客分组、sum购买金额
select
name,
orderdate,
cost,
sum(cost) over(partition by name) total_amount
FROM
business;
(3) the above scenario, to date in accordance with the accumulated cost
分析:按照顾客分组、日期升序排序、组内每条数据将之前的金额累加
select
name,
orderdate,
cost,
sum(cost) over(partition by name order by orderdate rows between unbounded preceding and current row) cumulative_amount
FROM
business;
(4) customer inquiries to buy the last time
分析:查询出明细数据同时获取上一条数据的购买时间(肯定需要按照顾客分组、时间升序排序)
select
name,
orderdate,
cost,
lag(orderdate,1) over(partition by name order by orderdate) last_date
FROM
business;
(5) Order Information query the top 20% of the time
分析:按照日期升序排序、取前20%的数据
select
*
from
(
select
name,
orderdate,
cost,
ntile(5) over(order by orderdate) sortgroup_num
FROM
business
) t
where t.sortgroup_num=1;