3 ways to divide tables in MySQL

First, let’s talk about why we need to divide the table

When a piece of data reaches millions, the time it takes you to query once will increase. If there is a joint query, I think it may die there. The purpose of sub-tables is to reduce the burden on the database and shorten the query time.

 

According to personal experience, the process of mysql executing a sql is as follows:

1. Receive sql;

2. Put sql in the queue;

3. Execute sql;

4. Return the execution result.

Where do you spend the most time in this execution process? The first is the waiting time in the queue; the second is the execution time of sql. In fact, these two are the same thing. While waiting, there must be sql executing. So we want to shorten the execution time of sql.

 

There is a mechanism in mysql that is table locking and row locking. Why this mechanism appears is to ensure the integrity of data. Let me give an example. If two SQLs need to modify the same data in the same table, what should I do at this time? Can both SQLs modify this data at the same time?

 

Obviously, mysql's handling of this situation is that one is table locking (myisam storage engine) and the other is row locking (innodb storage engine). Table locking means that none of you can operate on this table, you must wait for me to finish operating the table. The same is true for row locking. Other SQL must wait for me to complete the operation of this data before operating on this data. If there is too much data, the execution time is too long and the waiting time will be longer, which is why we divide the table.

 

Second, the table

1. Do mysql cluster. For example: use mysql cluster, mysql proxy, mysql replication, drdb, etc.

Some people will ask mysql cluster, what does it have to do with sub-tables? Although it is not a sub-table in the actual sense, it serves as a sub-table. What does it mean to be a cluster? To reduce the burden on a database, to put it bluntly, is to reduce the number of SQLs in the SQL queue.

 

For example: if there are 10 sql requests, if they are placed in the queue of one database server, he will have to wait for a long time. If these 10 sql requests are allocated to the queue of 5 database servers, one database server There are only 2 in the queue, so the waiting time is greatly shortened? This is already obvious.

 

Advantages: good scalability, no complicated operations after multiple sub-tables (php code)

Disadvantages: The amount of data in a single table has not changed, the time spent in one operation is still so much, and the hardware overhead is high.

 

2. It is estimated in advance that there will be tables with large amounts of data and frequent access, and divide them into several tables

This kind of estimation is not bad. The list of posts in the forum will be very large after a long time, hundreds of thousands or millions. In the information table in the chat room, dozens of people chatted together for a night. After a long time, the data in this table must be very large. There are many cases like this. Therefore, for this kind of large data scale that can be estimated, we will divide N tables in advance. The value of this N depends on the actual situation. Take the chat information table as an example:

 

I built 100 such tables in advance, message_00, message_01, message_02............message_98, message_99. Then according to the user's ID to determine which table the user's chat information is placed in. You can use the hash method to obtain, you can use the surplus method to obtain, there are many ways, everyone thinks about it. The following uses the hash method to get the table name:

 

<?php  
function get_hash_table($table,$userid) {  
$str = crc32($userid);  
if($str<0){  
$hash = "0".substr(abs($str), 0, 1);  
}else{  
$hash = substr($str, 0, 2);  
}  
 
return $table."_".$hash;  
}  
 
echo get_hash_table('message','user18991'); //The result is message_10  
echo get_hash_table('message','user34523'); //The result is message_13  
?>

 

 

Explain, the above method tells us that the messages of user18991 are recorded in the message_10 table, and the messages of user34523 are recorded in the message_13 table. When reading, just read from their respective tables. Just take it.

 

Advantages: Avoid millions of pieces of data in a table, shorten the execution time of a sql

Disadvantage: When a rule is determined, it will be very troublesome to break this rule. In the above example, the hash algorithm I used is crc32. If I don't want to use this algorithm now, after using md5, the same user's message will be changed. are stored in different tables, so the data is messed up. Scalability is poor.

 

3. Use the merge storage engine to implement sub-tables

I think this method is more suitable for those situations that have not been considered in advance, and the data query is slow. At this time, if it is more painful to separate the existing large data scales, the most painful thing is to change the code, because the sql statement in the program has been written. Now a table needs to be divided into dozens of tables, or even hundreds of tables, so should the sql statement be rewritten? For example, I like to raise

When mysql>show engines; you will find that mrg_myisam is actually merge.

 

mysql> CREATE TABLE IF NOT EXISTS `user1` (  
->   `id` int(11) NOT NULL AUTO_INCREMENT,  
->   `name` varchar(50) DEFAULT NULL,  
->   `sex` int(1) NOT NULL DEFAULT '0',  
->   PRIMARY KEY (`id`)  
-> ) ENGINE=MyISAM  DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;  
Query OK, 0 rows affected (0.05 sec)  
 
mysql> CREATE TABLE IF NOT EXISTS `user2` (  
->   `id` int(11) NOT NULL AUTO_INCREMENT,  
->   `name` varchar(50) DEFAULT NULL,  
->   `sex` int(1) NOT NULL DEFAULT '0',  
->   PRIMARY KEY (`id`)  
-> ) ENGINE=MyISAM  DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;  
Query OK, 0 rows affected (0.01 sec)  
 
mysql> INSERT INTO `user1` (`name`, `sex`) VALUES('张映', 0);  
Query OK, 1 row affected (0.00 sec)  
 
mysql> INSERT INTO `user2` (`name`, `sex`) VALUES('tank', 1);  
Query OK, 1 row affected (0.00 sec)  
 
mysql> CREATE TABLE IF NOT EXISTS `alluser` (  
->   `id` int(11) NOT NULL AUTO_INCREMENT,  
->   `name` varchar(50) DEFAULT NULL,  
->   `sex` int(1) NOT NULL DEFAULT '0',  
->   INDEX(id)  
-> ) TYPE=MERGE UNION=(user1,user2) INSERT_METHOD=LAST AUTO_INCREMENT=1 ;  
Query OK, 0 rows affected, 1 warning (0.00 sec)  
 
mysql> select id,name,sex from alluser;  
+----+--------+-----+  
| id | name   | sex |  
+----+--------+-----+  
| 1 | Zhang Ying | 0 |  
|  1 | tank   |   1 |  
+----+--------+-----+  
2 rows in set (0.00 sec)  
 
mysql> INSERT INTO `alluser` (`name`, `sex`) VALUES('tank2', 0);  
Query OK, 1 row affected (0.00 sec)  
 
mysql> select id,name,sex from user2  
-> ;  
+----+-------+-----+  
| id | name  | sex |  
+----+-------+-----+  
|  1 | tank  |   1 |  
|  2 | tank2 |   0 |  
+----+-------+-----+  
2 rows in set (0.00 sec)

 

 

From the above operation, I don't know if you found something? Suppose I have a user table user with 50W pieces of data, and now I have to split it into two tables user1 and user2, each with 25W pieces of data,

 

INSERT INTO user1(user1.id,user1.name,user1.sex)
SELECT (user.id,user.name,user.sex)FROM user where user.id <= 250000
INSERT INTO user2(user2.id,user2.name,user2.sex)
SELECT (user.id,user.name,user.sex)FROM user where user.id > 250000

 

 

In this way, I successfully divided a user table into two tables. At this time, there is a problem. What should I do with the sql statement in the code? It used to be one table, but now it has become two tables. The code has changed a lot, which brings a lot of workload to the programmer. Is there a good way to solve this?

 

The method is to back up the previous user table, and then delete it. In the above operation, I created an alluser table, just change the table name of the alluser table to user. However, not all mysql operations can be used.

 

a, if you use alter table to change the merge table to another table type, the mapping to the underlying table is lost. Instead, rows from the underlying myisam table are copied into the replaced table, which is then assigned the new type.

 

b, I saw some on the Internet that replace does not work, I tried it and it works. Halo first

 

mysql> UPDATE alluser SET sex=REPLACE(sex, 0, 1) where id=2;  
Query OK, 1 row affected (0.00 sec)  
Rows matched: 1  Changed: 1  Warnings: 0  
 
mysql> select * from alluser;  
+----+--------+-----+  
| id | name   | sex |  
+----+--------+-----+  
| 1 | Zhang Ying | 0 |  
|  1 | tank   |   1 |  
|  2 | tank2  |   1 |  
+----+--------+-----+  
3 rows in set (0.00 sec)

 

 

c, a merge table cannot maintain unique constraints on the entire table. When you perform an insert, data goes into the first or last myisam table (depending on the value of the insert_method option). mysql ensures that unique key values ​​remain unique within that myisam table, but not across all tables in the collection.

 

 

 

d. When you create a merge table, there is no check to ensure that the underlying table exists and has the same organization. When merge tables are used, mysql checks that the record lengths of each mapped table are equal, but this is not very reliable. If you create a merge table from dissimilar myisam tables, you are very likely to run into strange problems.

 

I saw c and d on the Internet, but there is no test, let's try it.

 

Advantages: good scalability, and the program code is not changed very much

Disadvantage: This method is less effective than the second

 

Third, to summarize

Of the three methods mentioned above, I have actually done two, the first and the second. The third one has not been done, so I will be more specific. Ha ha. There is a degree in everything you do. If you exceed the degree, you will become very poor. You can't just do database server clusters. You need to spend money to buy hardware. Don’t just divide the table into 1,000 tables. In the final analysis, the storage of mysql is stored on the hard disk in the form of files. One table corresponds to three files, and 1,000 sub-tables correspond to 3,000 files, so the retrieval will also change. very slow. my suggestion is:

The combination of method 1 and method 2 to split the table

The combination of method 1 and method 3 is used to divide the table

My two suggestions are suitable for different situations. According to personal circumstances, I think many people will choose the combination of method 1 and method 3.

 

作者:张映
原文地址:http://blog.51yip.com/mysql/949.html

More reference content: http://www.roncoo.com/article/index

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326183951&siteId=291194637