Postgresql's advanced aggregate function usage examples (super detailed)

PostgreSQL is an open source powerful SQL database management system, which provides a wealth of aggregation functions for data analysis and calculation. Advanced aggregate functions are a very powerful aggregate function in PostgreSQL, which can perform some advanced data calculations in a customized way, such as calculating average values, cumulative values, or other complex summary values ​​according to a given index or condition.

Let's use an example to demonstrate how to use the advanced aggregate functions in PostgreSQL.

Suppose there is a sales table, which stores sales data, including sales, order time and other information, the table structure and data are as follows:

Copy

CREATE TABLE sales(
  id SERIAL PRIMARY KEY,
  amount NUMERIC(12, 2),
  order_date DATE
);

INSERT INTO sales(amount, order_date) VALUES
  (100.50, '2022-01-01'),
  (200.45, '2022-01-01'),
  (300.20, '2022-01-02'),
  (400.10, '2022-01-02'),
  (500.75, '2022-01-03'),
  (600.20, '2022-01-04'),
  (700.50, '2022-01-04'),
  (800.00, '2022-01-05'),
  (900.30, '2022-01-05'),
  (1000.00, '2022-01-05');

Now, suppose you need to calculate the cumulative sales for each date, and the average of sales for each date.

First, we can use advanced aggregate functions ordered_set()to calculate cumulative sales. This function can be accumulated according to the specified sorting conditions (such as date), the syntax is as follows:

Copy

SELECT 
  order_date, 
  sum(amount) AS total_sales, 
  ordered_set('sum', amount ASC) WITHIN GROUP (ORDER BY order_date) OVER (ORDER BY order_date) AS cumulative_sales
FROM sales
ORDER BY order_date;

This query ORDERED_SET()calculates the cumulative sales for each date using a function. The first parameter of the function specifies the aggregation function to be accumulated, here it is sum()used to calculate the total sales of each date; the second parameter specifies the column and direction for sorting, here it is sorted in ascending order of date.

The query results are as follows:

Copy

 order_date | total_sales | cumulative_sales
------------+-------------+-----------------
 2022-01-01 |      300.95 |           300.95
 2022-01-01 |      300.95 |           300.95
 2022-01-02 |      700.30 |          1001.25
 2022-01-02 |      700.30 |          1001.25
 2022-01-03 |      500.75 |          1502.00
 2022-01-04 |     1300.70 |          2802.70
 2022-01-04 |     1300.70 |          2802.70
 2022-01-05 |     2700.30 |          5503.00
 2022-01-05 |     2700.30 |          5503.00
 2022-01-05 |     2700.30 |          5503.00
(10 rows)

It can be seen that the calculation of the cumulative sales in the result is correct, and the cumulative value of each date is the sum of the previous days plus the sales of the current date.

Next, we can use another advanced aggregation function mode()to calculate the average sales for each date. This function can calculate indicators such as mode, median or mean according to the given index or condition. Here's an example query that calculates the average:

Copy

SELECT 
  order_date, 
  sum(amount) / count(DISTINCT order_date) AS avg_sales, 
  mode() WITHIN GROUP (ORDER BY amount ASC) OVER (PARTITION BY order_date) AS median_sales
FROM sales
GROUP BY order_date
ORDER BY order_date;

This query uses MODE()functions to calculate the mean and median sales for each date. The function takes no arguments because it calculates the average by default. The parameters of the function WITHIN GROUPspecify how the calculation should be performed, here sorted in ascending order of amount.

This query also uses GROUP BYclauses and DISTINCToperators to ensure that each date is only counted once when calculating the average. A clause is also used at the end ORDER BYto sort the results.

The query results are as follows:

Copy

 order_date | avg_sales | median_sales
------------+-----------+--------------
 2022-01-01 | 150.47500 |       150.45
 2022-01-02 | 350.15000 |       350.20
 2022-01-03 | 500.75000 |       500.75
 2022-01-04 | 500.35000 |       650.35
 2022-01-05 | 900.10000 |       800.00
(5 rows)

You can see that the average and median sales for each date in the results are calculated correctly. SUM()For the average, the sum function is used COUNT()to calculate, and then they are divided to get the average; for the median, MODE()the function is used to calculate and sorted in ascending order of amount.

In this way, we have used two advanced aggregation functions to calculate some advanced summary metrics. With the help of custom sorting and calculation methods, these functions can calculate various complex summary indicators, allowing us to better understand and analyze data.

Guess you like

Origin blog.csdn.net/qq_60870118/article/details/131242534