How to optimize MySQL for tens of millions of large tables?

How to optimize MySQL for tens of millions of large tables?
https://www.zhihu.com/question/19719997

Author: zhuqz
Link: https://www.zhihu.com/question/19719997/answer/81930332
Source: Zhihu The
copyright belongs to the author. For commercial reprints, please contact the author for authorization, and for non-commercial reprints, please indicate the source.

The first reaction of many people is various segmentation; the order I give is:
optimize your sql and index first;

The second add cache, memcached, redis;

Third, after all the above are done, it is still slow, just do master-slave replication or master-master replication, read-write separation, which can be done at the application layer, with high efficiency, or you can use third-party tools, third-party tools recommend 360 atlas, other or Inefficient, or no maintenance;

Fourth, if all of the above are still slow, don’t think about doing segmentation. MySQL has its own partition table, try this first, it is transparent to your application, no need to change the code, but the sql statement needs to be done for the partition table. Optimized, the column of the partition condition should be included in the SQL condition, so that the query can be located on a small number of partitions, otherwise all partitions will be scanned, and there are some pits in the partition table, so I won't talk about it here;

Fifth, if all of the above are done, then vertical splitting is done first. In fact, according to the coupling degree of your modules, a large system is divided into multiple small systems, that is, distributed systems;

The sixth step is horizontal sharding. For tables with a large amount of data, this step is the most troublesome and can test the technical level the most. It is necessary to choose a reasonable sharding key. In order to have good query efficiency, the table structure should also be changed. Redundancy, the application should also be changed, try to bring the sharding key in SQL, locate the data in the limited table, instead of scanning all the tables;

The MySQL database generally evolves according to this step, and the cost is also from low to high;

Some people may want to say that the first step is to optimize sql and indexes. Indeed, we all know that, but in many cases, this step is not in place, and some even only build indexes based on SQL, without optimizing SQL at all (have you been shot?), except for the simplest additions, deletions and changes In addition to checking, if you want to implement a query, you can write a variety of query statements, different statements, according to the engine you choose, the distribution of data in the table, the index situation, the database optimization strategy, the lock strategy in the query and other factors, the final query The efficiency varies greatly; optimization should be considered as a whole. Sometimes after you optimize a statement, the efficiency of other queries will be reduced, so a balance point must be taken; even if you are proficient in MySQL, in addition to pure technical optimization, you must also optimize according to the business. To optimize the sql statement, so as to achieve the best effect; do you dare to say that your sql and index are already optimal?

Let’s talk about the optimization of different engines. Myisam has good reading effect and poor writing efficiency. This is related to its data storage format, index pointer and lock strategy. Its data is stored sequentially (innodb data storage method is clustered) index), the node on his index btree is a pointer to the physical location of the data, so it is very fast to find (the innodb index node stores the primary key of the data, so it needs to be searched twice according to the primary key); myisam lock is a table Lock, only between reads and reads are concurrent, between writes and writes and between reads and writes (read and insert can be concurrent, to set the concurrent_insert parameter, perform table optimization operations regularly, there is no way for update operations) Yes Serial, so it is slow to write, and the default write priority is higher than the read priority. After the write operation comes, you can immediately insert it before the read operation. If you write in batches, the read request will starve to death, so It is necessary to set the read and write priority or set the strategy for executing read operations after writing operations; myisam should not use sql with too long query time. If the strategy is not used properly, it will also lead to write starvation, so try to split the sql with low query efficiency ,

Innodb is generally a row lock. This generally means that when SQL uses an index, the row lock is added to the index, not to the data record. If SQL does not use the index, the table will still be locked. Read and write can be concurrent. Ordinary select does not require locks. When a query record encounters a lock, a consistent non-locking snapshot read is used, that is, according to the database isolation level policy, it will read Snapshots of locked rows, other update or locked read statements use the current read and read the original row; because ordinary read and write do not conflict, so innodb will not starve to read and write, and because when using indexes Row locks are used, the granularity of locks is small, and there is less competition for the same lock, which increases concurrent processing, so the efficiency of concurrent reading and writing is still very good. The problem is that the secondary search based on the primary key after the index query leads to efficiency. Low;

PS: It is very strange, why does the index leaf node of innodb store the primary key instead of the physical address pointer that stores the data like mysism? If the physical address pointer is stored, there is no need for a second search. This is also my initial doubt. Think about the difference between mysism and innodb data storage methods, you will understand, and I will not bother!

Therefore, innodb can use index coverage technology to avoid secondary search. If index coverage cannot be used, it can be extended to implement delayed association based on index coverage; I don’t know what index coverage is. It is recommended that you find out how it returns anyway. thing!

Do your best to optimize your sql! It is said that it is low-cost, but it is a time-consuming and labor-intensive job. It needs to be optimized carefully when both technology and business are familiar. The optimization effect is also immediate!

How to optimize MySQL for tens of millions of large tables?

Guess you like