MySQL grouping by time unit

Note: The test database version is MySQL 8.0

Test data preparation:

drop table trx_log;

create table trx_log(trx_id int,trx_date timestamp,trx_cnt int);

insert into trx_log values (1,'2020-10-28 19:03:07',44);
insert into trx_log values (2,'2020-10-28 19:03:08',18);
insert into trx_log values (3,'2020-10-28 19:03:09',23);
insert into trx_log values (4,'2020-10-28 19:03:10',29);
insert into trx_log values (5,'2020-10-28 19:03:11',27);
insert into trx_log values (6,'2020-10-28 19:03:12',45);
insert into trx_log values (7,'2020-10-28 19:03:13',45);
insert into trx_log values (8,'2020-10-28 19:03:14',32);
insert into trx_log values (9,'2020-10-28 19:03:15',41);
insert into trx_log values (10,'2020-10-28 19:03:16',15);
insert into trx_log values (11,'2020-10-28 19:03:17',24);
insert into trx_log values (12,'2020-10-28 19:03:18',47);
insert into trx_log values (13,'2020-10-28 19:03:19',37);
insert into trx_log values (14,'2020-10-28 19:03:20',48);
insert into trx_log values (15,'2020-10-28 19:03:21',46);
insert into trx_log values (16,'2020-10-28 19:03:22',44);
insert into trx_log values (17,'2020-10-28 19:03:23',36);
insert into trx_log values (18,'2020-10-28 19:03:24',41);
insert into trx_log values (19,'2020-10-28 19:03:25',33);
insert into trx_log values (20,'2020-10-28 19:03:26',19);


1. Demand

Calculate the sum of the data according to a certain time interval.

For example, there is a transaction log, and I want to find the total number of transactions in every 5 seconds.
The total rows of table trx_log are as follows:

mysql> select trx_id,
-> trx_date,
-> trx_cnt
-> from trx_log;
±-------±--------------------±--------+
| trx_id | trx_date | trx_cnt |
±-------±--------------------±--------+
| 1 | 2020-10-28 19:03:07 | 44 |
| 2 | 2020-10-28 19:03:08 | 18 |
| 3 | 2020-10-28 19:03:09 | 23 |
| 4 | 2020-10-28 19:03:10 | 29 |
| 5 | 2020-10-28 19:03:11 | 27 |
| 6 | 2020-10-28 19:03:12 | 45 |
| 7 | 2020-10-28 19:03:13 | 45 |
| 8 | 2020-10-28 19:03:14 | 32 |
| 9 | 2020-10-28 19:03:15 | 41 |
| 10 | 2020-10-28 19:03:16 | 15 |
| 11 | 2020-10-28 19:03:17 | 24 |
| 12 | 2020-10-28 19:03:18 | 47 |
| 13 | 2020-10-28 19:03:19 | 37 |
| 14 | 2020-10-28 19:03:20 | 48 |
| 15 | 2020-10-28 19:03:21 | 46 |
| 16 | 2020-10-28 19:03:22 | 44 |
| 17 | 2020-10-28 19:03:23 | 36 |
| 18 | 2020-10-28 19:03:24 | 41 |
| 19 | 2020-10-28 19:03:25 | 33 |
| 20 | 2020-10-28 19:03:26 | 19 |
±-------±--------------------±--------+

To return the following result set:
±-----±--------------------±---------------- ----±------+
| grp | trx_start | trx_end | total |
±-----±--------------------±- -------------------±------+
| 62 | 2020-10-28 19:03:07 | 2020-10-28 19:03: 11 | 141 |
| 63 | 2020-10-28 19:03:12 | 2020-10-28 19:03:16 | 178 |
| 64 | 2020-10-28 19:03:17 | 2020-10-28 19:03:21 | 202 |
| 65 | 2020-10-28 19:03:22 | 2020-10-28 19:03:26 | 173 |
±-----±-------- ------------±--------------------±------+

2. Solution

Group all items into 1 bucket for every 5 rows.
There are many ways to achieve this logical grouping; this section uses the trx_id divided by 5 technique.

Once a "group" is created, the constrained functions min, max, and sum can be used to find the start time, end time, and the total number of transactions for each "group".

select ceil(trx_id/5.0) as grp,
       min(trx_date)    as trx_start,
       max(trx_date)    as trx_end,
       sum(trx_cnt)     as total
  from trx_log
 group by ceil(trx_id/5.0);

Test Record:

mysql> select ceil(trx_id/5.0) as grp,
    ->        min(trx_date)    as trx_start,
    ->        max(trx_date)    as trx_end,
    ->        sum(trx_cnt)     as total
    ->   from trx_log
    ->  group by ceil(trx_id/5.0);
+------+---------------------+---------------------+-------+
| grp  | trx_start           | trx_end             | total |
+------+---------------------+---------------------+-------+
|    1 | 2020-10-28 19:03:07 | 2020-10-28 19:03:11 |   141 |
|    2 | 2020-10-28 19:03:12 | 2020-10-28 19:03:16 |   178 |
|    3 | 2020-10-28 19:03:17 | 2020-10-28 19:03:21 |   202 |
|    4 | 2020-10-28 19:03:22 | 2020-10-28 19:03:26 |   173 |
+------+---------------------+---------------------+-------+
4 rows in set (0.00 sec)

So at this time you have a question, if the id value is not so uniform, how to deal with it?
In fact, you can convert the time to a number and round up after dividing by 5.
In this example, I use intercepted minutes and seconds to convert into numbers and then divide by 5. If the time distribution span is large, year, month, day and hour can be added.

-- 对时间段进行分组
SELECT trx_id,
       trx_date,
       trx_cnt,
       ceil(DATE_FORMAT(TIMESTAMPADD(second,-1,trx_date),'%i%s')/5.0) grp
from trx_log;

mysql> SELECT trx_id,
    ->        trx_date,
    ->        trx_cnt,
    ->        ceil(DATE_FORMAT(TIMESTAMPADD(second,-1,trx_date),'%i%s')/5.0) grp
    -> from trx_log;
+--------+---------------------+---------+------+
| trx_id | trx_date            | trx_cnt | grp  |
+--------+---------------------+---------+------+
|      1 | 2020-10-28 19:03:07 |      44 |   62 |
|      2 | 2020-10-28 19:03:08 |      18 |   62 |
|      3 | 2020-10-28 19:03:09 |      23 |   62 |
|      4 | 2020-10-28 19:03:10 |      29 |   62 |
|      5 | 2020-10-28 19:03:11 |      27 |   62 |
|      6 | 2020-10-28 19:03:12 |      45 |   63 |
|      7 | 2020-10-28 19:03:13 |      45 |   63 |
|      8 | 2020-10-28 19:03:14 |      32 |   63 |
|      9 | 2020-10-28 19:03:15 |      41 |   63 |
|     10 | 2020-10-28 19:03:16 |      15 |   63 |
|     11 | 2020-10-28 19:03:17 |      24 |   64 |
|     12 | 2020-10-28 19:03:18 |      47 |   64 |
|     13 | 2020-10-28 19:03:19 |      37 |   64 |
|     14 | 2020-10-28 19:03:20 |      48 |   64 |
|     15 | 2020-10-28 19:03:21 |      46 |   64 |
|     16 | 2020-10-28 19:03:22 |      44 |   65 |
|     17 | 2020-10-28 19:03:23 |      36 |   65 |
|     18 | 2020-10-28 19:03:24 |      41 |   65 |
|     19 | 2020-10-28 19:03:25 |      33 |   65 |
|     20 | 2020-10-28 19:03:26 |      19 |   65 |
+--------+---------------------+---------+------+
20 rows in set (0.00 sec)

After the grouping is complete, you can directly perform aggregation operations

SELECT ceil(DATE_FORMAT(TIMESTAMPADD(second,-1,trx_date),'%i%s')/5.0) grp,
       min(trx_date)    as trx_start,
       max(trx_date)    as trx_end,
       sum(trx_cnt)     as total
from trx_log
group by ceil(DATE_FORMAT(TIMESTAMPADD(second,-1,trx_date),'%i%s')/5.0)
;

Test Record

mysql> SELECT ceil(DATE_FORMAT(TIMESTAMPADD(second,-1,trx_date),'%i%s')/5.0) grp,
    ->        min(trx_date)    as trx_start,
    ->        max(trx_date)    as trx_end,
    ->        sum(trx_cnt)     as total
    -> from trx_log
    -> group by ceil(DATE_FORMAT(TIMESTAMPADD(second,-1,trx_date),'%i%s')/5.0)
    -> ;
+------+---------------------+---------------------+-------+
| grp  | trx_start           | trx_end             | total |
+------+---------------------+---------------------+-------+
|   62 | 2020-10-28 19:03:07 | 2020-10-28 19:03:11 |   141 |
|   63 | 2020-10-28 19:03:12 | 2020-10-28 19:03:16 |   178 |
|   64 | 2020-10-28 19:03:17 | 2020-10-28 19:03:21 |   202 |
|   65 | 2020-10-28 19:03:22 | 2020-10-28 19:03:26 |   173 |
+------+---------------------+---------------------+-------+
4 rows in set (0.00 sec)

Guess you like

Origin blog.csdn.net/u010520724/article/details/113935397