MySQL calculation of different rows in the same column

One, the problem

There is a table like this:

date amount
2015-12-31 3000
2016-01-22 3100
2016-01-23 3100
2016-01-24 3100
2016-01-25 3100
2016-01-26 3100
2016-01-27 3100
2016-01-28 3100
2016-01-29 3100
2016-01-30 3100
2016-01-31 3300
2016-02-01 3400
2016-02-02 3500

Want to get results like the following:

year month diff
2016 1 300
2016 2 200

Write out SQL statements.

It can be guessed from the result that it is to find the difference between the cumulative value of each month and the previous month, where the value of amount is already the cumulative value, so it needs to be calculated again.

It’s very simple at first glance, isn’t it just grouping statistics by year and month?

If you think about it carefully, it’s not as easy as you think. The most important thing is to calculate the difference between the rows. It is very easy to calculate the difference between the columns in MySQL. The difficulty lies in calculating the difference between the rows , which requires a little trick. , Convert column values ​​to row values ​​through MySQL variables and subqueries.

Note: For high-concurrency businesses, we generally do not put such calculations in MySQL, try to process them at the application layer, or directly use statistics, because protecting the database in high-concurrency businesses is our important responsibility.

Of course, if it is just some offline reports or statistical services, of course there is no problem, because offline reports and other services can be used, so the following tips can still be understood.

Two, import data

First create the table:

CREATE TABLE `stat_year`  (
  `stat_date` date NULL DEFAULT NULL,
  `amount` int UNSIGNED NULL DEFAULT NULL
) ENGINE = InnoDB CHARACTER SET = utf8 COLLATE = utf8_general_ci ROW_FORMAT = Dynamic;

Load the data, ignoring the first row header:

load data local infile 'data.txt' into table amount fields terminated by '\t' ignore 1 lines;

Data after import

Three, use variables to save the previous value

Let's first understand how MySQL uses variables in SQL statements.

SELECT
    tmp.stat_date,
    tmp.current_amount,
    tmp.pre,
    ( tmp.current_amount - tmp.pre ) AS diff 
FROM
    (
    SELECT
        stat_date,
        amount AS current_amount,
        @pre_amount AS pre,
        @pre_amount := sp.amount 
    FROM
        stat_pay sp,
    ( SELECT @pre_amount := 0 ) AS pre_temp 
    ) AS tmp;

Use variables to save the previous value

First of all, user variables in MySQL start with @, and system variables start with @@. Assignment uses:=

So, in sql

( SELECT @pre_amount := 0 ) AS pre_temp 

It is equivalent to defining a user variable @pre_amount and initializing its value to 0.

The subquery part of the first from statement is equivalent to not selecting a row. First, the value of accessing @pre_amount is used as the previous value and an alias pre is given, and then the value of the current row is assigned to @pre_amount.

Now the outermost query is much easier to understand, which is to check the value of the current row, the value of the previous row of the current row, and the difference between the value of the current row and the previous row value.

Let's take a look at the explain result of the above SQL statement:

explain

The output description of explain:

  1. The id is the identification of each select. The larger the id, the higher the priority, the more it will be executed first, and the same id will execute from top to bottom.
  2. select_type: PRIMARY represents the last executed select; DERIVED represents the subquery in the from statement
  3. table means the table used, means the derived table obtained by using id 2

Now let's look at the output of explain again, it is much clearer:

First find the largest id, the largest id 3, execute first, we can see that select_type is DERIVED, which means it will generate a derived table, in fact, it is equivalent to defining a variable @pre_amount in a table, this table’s The alias is pre_temp, which is the subquery in the second from statement.

There are two with an id of 2, and both select_type are DERIVED, because these two are subqueries in the first from statement.
From top to bottom, we see that the table in row 2 is, indicating that it uses the derived table generated by the query with id 3, which is the pre_temp table. The type is system, which means that this table has only 1 row, which can also be seen from rows.

The third row of table is sp, which means that the actual table of sp is used directly, and sp is the alias of stat_pay.

Finally, select_type with id 1 is PRIMARY, which means that this is the outermost query executed last, and table means that the table used is a derived table obtained from a query with id 2.

Fourth, the final solution

Because we want to group by year and month, and we only have date, we can calculate the value of year and month through substring or date_format.

SELECT substring(stat_date,1,4) AS stat_year,substring(stat_date,6,2) AS mon FROM stat_pay;
SELECT date_format(stat_date,'%Y') AS stat_year,DATE_FORMAT(stat_date,'%m') AS mon FROM stat_pay;

Let's take a look at our final SQL:

SELECT
    tmp.stat_year,
    tmp.mon,
    tmp.current_amount,
    tmp.pre,
    ( tmp.current_amount - tmp.pre ) AS diff 
FROM
    (
    SELECT
        total_tmp.stat_year,
        total_tmp.mon,
        total_tmp.total_amount AS current_amount,
        @pre_amount AS pre,
        @pre_amount := total_tmp.total_amount 
    FROM
        (
        SELECT
            substring( stat_date, 1, 4 ) AS stat_year,
            substring( stat_date, 6, 2 ) AS mon,
            max( amount ) AS total_amount 
        FROM
            stat_pay 
        GROUP BY
            stat_year,
            mon 
        ) AS total_tmp,
    ( SELECT @pre_amount := 0 ) AS pre_temp 
    ) AS tmp;

result

If you are a perfectionist and want exactly the same result and don't want the prefix 0 in the year and month, you can convert the string to an integer in any of the following three ways:

substring( stat_date, 1, 4 ) + 0 AS stat_year
convert(substring( stat_date, 6, 2 ),unsigned integer) as stat_year
cast(substring( stat_date, 6, 2 ) as unsigned integer) as stat_year

Finally, filter out the first line through the limit statement to get the final result:

perfect-result

V. Summary

We can create a derived table to store a temporary variable by using select in the from statement, and then manipulate this variable in the select statement.

By analogy, we can of course also store multiple variables in the temporary table, not only calculations between the same column, but also calculations in different columns.

Guess you like

Origin blog.csdn.net/trayvontang/article/details/103427864