postgresql-aggregation window function
aggregate function
Commonly used aggregate functions, such as AVG, SUM, COUNT, etc., can also be used as window functions.
--计算移动平均值
select saledate, amount, avg(amount) over (order by saledate rows between 1
preceding and 1 following)
from sales_data
where product = '桔子' and channel = '淘宝';
ranking window function
Case 1
-- 计算员工部门薪资排名
select
e.first_name,
e.last_name ,
e.department_id ,
e.salary,
row_number() over(partition by e.department_id order by e.salary),
rank() over(partition by e.department_id order by e.salary) ,
dense_rank() over(partition by e.department_id order by e.salary) ,
percent_rank() over(partition by e.department_id order by e.salary)
from employees e;
The OVER clauses of the four window functions are exactly the same, and a simpler way of writing can be used at this time
-- 计算员工部门薪资排名
select
e.first_name,
e.last_name ,
e.department_id ,
e.salary,
row_number() over w,
rank() over w ,
dense_rank() over w ,
percent_rank() over w
from employees e
-- 定义窗口(在sql语句的最后)
window w as (partition by e.department_id order by e.salary)
;
Case 2
-- 计算员工部门薪资排名
select
e.first_name,
e.last_name ,
e.department_id ,
e.salary,
cume_dist() over w as "累积占比",
ntile(5) over w as "相对位置"
from employees e
-- 定义窗口(在sql语句的最后)
window w as (partition by e.department_id order by e.salary)
;
Value window function
The value window function is used to return the data at the specified position. Common value window functions include:
- FIRST_VALUE , returns the first row of data in the window
- LAST_VALUE , returns the data of the last row in the window
- NTH_VALUE , returns the data of the Nth row in the window
- LAG , returns the data of the Nth row before the current row in the partition
- LEAD , returns the data of the Nth row after the current row in the partition.
Among them, the LAG and LEAD functions do not support dynamic window size (frame_clause), but use the current partition as the analysis window.
/*
* first_value、last_value 以及 NTH 函数分别获取每个部门内部月薪最高、月薪最低以及月薪第三高的员工
* */
select
e.first_name,
e.last_name ,
e.department_id ,
e.salary ,
first_value(e.salary) over(partition by e.department_id order by e.salary),
last_value(e.salary) over(partition by e.department_id order by e.salary),
nth_value(e.salary,3) over(partition by e.department_id order by e.salary)
from employees e;
month-on-month growth rate
The LAG and LEAD functions are also used to calculate month-on-month/year-on-year growth in sales data.
-- 首先,创建一个通用表表达式 sales_monthly,得到了不同产品每个月的销量汇总;
-- LAG(sum_amount, 1)表示获取上一期的销量;当前月份的销量减去上个月的销量,再除以上个月
-- 的销量,就是环比增长率
WITH sales_monthly AS (
SELECT product, to_char(saledate,'YYYYMM') ym, sum(amount) sum_amount
FROM sales_data
GROUP BY product, to_char(saledate,'YYYYMM')
)
SELECT product AS "产品", ym "年月", sum_amount "销量",
(sum_amount - LAG(sum_amount, 1) OVER (PARTITION BY product ORDER BY
ym))/
LAG(sum_amount, 1) OVER (PARTITION BY product ORDER BY ym) * 100 AS "
环比增长率(%)"
FROM sales_monthly
ORDER BY product, ym;
year-on-year growth rate
-- 2019年1月份数据 -2018年1月份数据,然后再和2018年1月份数据进行对比
-- 两年之间的相同月份
select
s.*,
100*(s.amount - lag(s.amount,12) over(partition by s.product order by s.ym))
/ lag(s.amount,12 ) over(partition by s.product order by s.ym) as "同比增长率"
from
sales_monthly s;