Mysql Performance Optimization: Why do you count (*) so slow?

Original: Mysql Performance Optimization: Why do you count (*) so slow?

REVIEW

  • In development will use the number of rows in a table of statistics, such a trading system, the boss will let you generate a report every day, these statistics are indispensable in sql count function.
  • But as more and more records, query speed will become slower, Why is this so? Internal Mysql in the end is how to deal with?
  • Today article from Mysql for internal countfunctions is how to deal with?
  • This article first appeared in the public micro-channel number [of code] ape Technology Column Mysql Performance Optimization: Why do you count (*) so slow? , The original is not easy, like your support, thank you! ! !

count of implementation

  • Different storage engines in Mysql in the countfunction has different implementations.
  • MyISAMEngine the total number of rows of a table exists on the disk, so the execution count(*)time will be returned directly this number is high (inefficient wherequery).
  • InnoDBThe total number of the engine and not directly on disk, in the execution count(*)time required function line by line data is read out, and then the cumulative total.

Why will not the total number of InnoDB save up?

  • Speaking InnoDB believe that readers will always think of their transactional features, the transaction has isolation, if the total number of save up, how to ensure consistency between the total number of individual transactions it? Figure do not understand

  • 事务AAnd 事务Bthe count(*)results of the implementation is different, so InnoDB engine returns in each transaction is uncertain how many rows can only be used to determine the total number of read out line by line.

How to improve the efficiency of count

  • In InnoDBhow to improve the count(*)query efficiency, there are a variety of online solutions here introduces three kinds of analysis and feasibility.

show table status

  • show table statusThis command can quickly check out the number of rows in each table in the database, but it really can replace count(*)it?
  • The answer is no. The reason is simple, this command is a statistical value out of "Valuation" , and therefore is not accurate, official documents say probably in error 40%-50%.
  • Therefore, this method directly pass, but also inaccurate doing with it.

The total number of system cache memory

  • This method is also the most likely to think, to increase his party +1, deleted row -1, and read caching system is fast, simple and convenient Why not?

  • Cache system and Mysql are two systems, for example redis, and Mysqlthese two are typical comparison. The two systems is the most difficult in a highly concurrent can not guarantee data consistency.  

  • Through the above two graphs, both redis计数+1or the insert into userfirst implementation, will eventually lead to data inconsistency logically. FIG occur first redis计数second Although FIG counted correctly but did not check out the row of data into less.

  • In which concurrent system, we can not precisely control the timing different threads of execution, since the presence of such a sequence of operations in FIG. Therefore, even if we say that the imprecise Redis normal operation, this count value is logic.

Save count in the database

  • Analysis saved through the use of caching caching system that can not guarantee the consistency of data in a logical, so we thought of using a database to store directly, with the "Transaction" support, will ensure the consistency of the data.

  • How to use it? Very simple, direct count stored in a table (table_name,total).

  • As the cache only need to perform logic system redis计数+1into totalfield + 1 can, as shown below: 

  • Because in the same transaction, to ensure the consistency of data in the logic.

Different usage count

  • count()Is a function of the polymerization, the results set returned, line by line to determine if the count function parameter is not NULL, the cumulative value is increased by one, or without. Finally, the cumulative return value.
  • countThere are a variety of uses, respectively count(*), count(字段), count(1),count(主键id) . So a variety of uses, in the end what is the difference? Of course, "the premise is not whereconditional statements」 .
  • count(id): InnoDB engine will traverse the entire table, the id values ​​for each row are taken out, returned to the server layer. Get the server layer id, judgment is unlikely to be empty, the accumulated row.
  • count(1): InnoDB engine traverse the entire table, but not the value. For each row returned server layer, put a digital 1into, determination is impossible to empty the accumulated row.
  • count(字段): : count(*)Will not take out all the fields, but specifically optimized, not value. count(*)Certainly not null, rows accumulate.
    • If the "field" is defined as a not nullword, line by line read out from the record field inside, is determined not null, the cumulative rows;
    • If this field is defined as allowed null, then the implementation of the time, the judge might be null, but also to determine what value is taken out again, not only accumulate null.
  • So the conclusion is simple: "According to sort the words efficiency, count(字段)< count(主键id)< count(1)count(*), so the reader is advised to make use of count(*)."
  • "Note" : Here surely someone will ask, count(id)is not taking the index it, and why query efficiency and other similar it? Here to explain Chen, although taking the index, but still can be scanned line by line figured out the total.

to sum up

  • MyISAMAlthough the table count(*)quickly, but does not support transactions;
  • show table statusAlthough the command to return soon, but not accurate;
  • InnoDBDirect count(*)will traverse the whole table (where no condition), although accurate, but can cause performance problems.
  • Cache memory counting system is simple, although high efficiency, but can not guarantee data consistency.
  • Database holds the count is very simple, but also to ensure data consistency is recommended.
  • "Questions, reader comments area to discuss" : In the case of high concurrency of the system, using the database stored count, is the first 更新计数+1, or the first 插入数据. That is the first update total+=1or the first insert into.

 

Guess you like

Origin www.cnblogs.com/lonelyxmas/p/12630080.html