Database statistics function COUNT

Disclaimer: This article is a blogger original article, follow the CC 4.0 BY-SA copyright agreement, reproduced, please attach the original source link and this statement.
This link: https://blog.csdn.net/weixin_45505313/article/details/102716183

1. Role Function COUNT

Database system COUNT(expr)for statistics data lines, its main role is 返回SELECT语句检索的行中 expr 表达式的值不为NULL的行的数量, the return value is a BIGINT value, if the query results did not hit any record 0 is returned. Usually use mainly the following two:

  1. COUNT (column) indicates the number of lines corresponding to the column with a statistical value (not null) data
  2. COUNT (*) indicate statistical result sets the number of head office

2. COUNT(*) 与 COUNT(1) 与 COUNT(column)

COUNT (expr) expr statistics is the number of rows is not NULL, and in COUNT(column), COUNT(1)and COUNT(*)three uses in, expr, respectively 列名, 常量and *. These three conditions constant is a fixed value, certainly not NULL. *Queries can be understood as the entire line, so certainly not as NULL, only the query results column names are likely to be NULL. Therefore, COUNT (1) and COUNT (*) is considered to be a direct query the number of rows qualified database table, and COUNT (column) indicates the query qualifies for the value of the column is not NULL of the number of rows.

  • For COUNT (1) and COUNT (*), official statement in InnoDB engine is exactly the same in both, there is no speed difference
  • COUNT (column) is relatively simple and crude query, a full table scan is performed, and then determination value is not specified field is NULL, then the additive is not NULL

Compared COUNT (*), COUNT (column) more than a field judge inquired whether the steps NULL, so efficient than COUNT (*) Low

3. MySQL optimization of the COUNT

The premise is optimized query does not contain WHERE and GROUP BY qualification

3.1 MyISAM engine optimization

MyISAM engine locks are table-level lock operations on the same table are done serially, so MyISAM made a simple optimization is 把表的总行数单独记录下来. Use COUNT (*) when the number of rows in the lookup table, the direct return value recorded on it, of course, the premise is not there where conditions

  • The reason why the number of rows can MyISAM tables recorded for COUNT (*) query used because MyISAM数据库是表级锁,不会有并发的数据库行数修改, so the number of rows resulting from the query is accurate

3.2 InnoDB engine optimization

Because InnoDB supports transactions, most operations are row-level locking, so the number of rows in the table may be concurrent modifications, in which case the total number of lines recorded by the way is not accurate. So InnoDB use COUNT (*) query number of rows when the inevitable to sweep the table, the table can only scan process to optimize efficiency.
We know, InnoDB index into a clustered index (primary key index) and non-clustered index (non-primary key index), saved leaf nodes clustered indexes are whole rows, rather than clustered index leaf nodes save is the value of the primary key rows. COUNT(*)只是为了统计总行数,不用关心查到的具体值,如果在扫表的过程中选择一个成本较低的索引的话,那就可以大大节省时间. Obviously, non-clustered index is much smaller than the clustered index, the InnoDB engine will give priority to a minimum non-clustered index to sweep table. According to this feature shows that in addition to the construction of the table when the primary key index, create a non-primary key index is also necessary

4. COUNT query optimization

Say you have a table t_user, there are about 5000 records, statistics Id is a demand greater than the number of users of 20, following two very different writing efficiency

  1. Find a simple condition, almost a full table scan, high efficiency table is small, it is more time-consuming data table

    SELECT
        count( * ) 
    FROM
        t_user
    WHERE
        Id > 20;
    
  2. Another idea is to record statistical Id is less than 20, the total number of lines and then use that value is obtained by subtracting the results. This is because 查询优化阶段会把该类子查询当作常数处理, simply scan Id <line 20, substantially reduced overhead

    SELECT
        ( SELECT COUNT( * ) FROM t_user ) - count( * ) 
    FROM
        t_user 
    WHERE
        Id < 20;
    

Guess you like

Origin blog.csdn.net/weixin_45505313/article/details/102716183