[MySQL] Performance Tuning (5): Architecture. Cluster and sub-database sub-table

1. Database cluster

If a single database service cannot meet the access requirements, then we can do a database cluster solution. The cluster will inevitably face a problem, that is, the problem of data consistency between different nodes. If you read and write multiple database nodes at the same time, how to keep all the node data consistent?

1.1 Master-slave architecture

At this time, we need to use replication technology. The replicated node is called the master, and the replicated node is called the slave. The slave itself can also be used as a data source for other nodes. This is called cascading replication.

Insert picture description here

How is master-slave replication achieved? The update statement will record the binlog , which is a kind of logical log. With this binlog, the slave server will obtain the binlog file of the master server, then parse the SQL statement inside, and execute it on the slave server to keep the master and slave data consistent.

There are three threads involved:

  1. Log dump thread: There is a log dump thread on the Master node, which is used to send binlog to the slave.
  2. I/O thread: connect to the master to obtain the binlog, and parse the binlog to write the relay log (relay log).
  3. SQL thread: The SQL thread from the library is used to read the relay log and write data to the database.

Insert picture description here

1.2 Read and write separation

After doing the master-slave replication scheme, we only write data to the master node, and the read request can be shared to the slave node. We call this solution read-write separation.

PS: What is the relationship between master-slave replication and separation of read and write? The master-slave replication of mysql and the separation of read and write of mysql are closely related.

  1. First, we must deploy master-slave replication. Only when the master-slave replication is completed, can the data read and write be separated on this basis.
  2. Generally, after synchronizing data through master-slave replication, read-write separation is used to improve the concurrent load capacity of the database.

Read-write separation means to write only on the mysql master server and only read on the mysql slave server. The basic principle is to let the main database handle transactional queries, while the slave database handles select queries. Database replication is used to synchronize changes caused by transactional queries to the databases in the cluster.

Insert picture description here

Read-write separation can reduce the access pressure of the database server to a certain extent, but it is necessary to pay special attention to the problem of master-slave data consistency. If we write in the master and immediately query the slave, and the data of the slave has not been synchronized at this time, what should we do?

problem

Where is the master-slave replication slow? In the early MySQL, the SQL thread of the slave was single threaded . The master can support the parallel execution of SQL statements. The maximum number of connections configured is the maximum number of simultaneous SQL executions. The slave's SQL can only be executed in a single-threaded queue. In the case of a large amount of concurrency in the main library, synchronization data will definitely be delayed.

Why can't the SQL Thread on the slave library be executed in parallel? For example, the main library executed multiple SQL statements. First, the user posted a comment, then modified the content, and finally deleted the comment. The execution order of these three statements on the slave library must not be reversed. ·

insert into user_comments(10000009,'nice'); 
update user_comments set content ='verygood' where id=10000009; 
delete from user_comments where id=10000009;

How to solve this problem, that is, how to reduce the delay of master-slave replication?

1.3 Delay solution

First of all, we need to know that in the process of master-slave replication, MySQL replicates asynchronously by default. In other words, for the master node, the binlog is written, and the transaction ends, and it is returned to the client. For the slave, when the binlog is received, it is over. The master does not care whether the data of the slave is successfully written.

Insert picture description here

If you want to reduce the delay of master-slave replication, can you wait for all slave database transactions to complete before returning to the client? This method is called full synchronous replication. After the data is written from the library, the main library will return to the client. Although this method can ensure that the data has been successfully synchronized before reading, you should be able to think of the side effects that the transaction execution time will become longer, which will cause the performance of the master node to degrade.

Is there a better way? Not only reduces the latency of slave writes, but does not significantly increase the time the master returns to the client?

1. Master-slave connection mode: semi-synchronous replication

Between asynchronous replication and fully synchronous replication, there is another way of semi-synchronous replication. What does semi-synchronous replication look like?

The main library does not return to the client immediately after executing the transaction submitted by the client, but waits for at least one of the binlogs received from the library and written to the relay log before returning to the client. The master will not wait for a long time, but when it returns to the client, the data will be written successfully, because it has only the last step left: read the relay log and write to the slave library.

Insert picture description here
If we want to use semi-synchronous replication in the database, we must install a plug-in, which is contributed by an engineer from Google. This plug-in is already available in the mysql plug-in directory:

cd /usr/lib64/mysql/plugin/

The main library and the slave library are different plug-ins, which need to be enabled after installation:

-- 主库执行 
INSTALL PLUGIN rpl_semi_sync_master SONAME 'semisync_master.so';
set global rpl_semi_sync_master_enabled=1; 
show variables like '%semi_sync%';

-- 从库执行 
INSTALL PLUGIN rpl_semi_sync_slave SONAME'semisync_slave.so'; 
set global rpl_semi_sync_slave_enabled=1;
show global variables like '%semi%';

Compared with asynchronous replication, semi-synchronous replication improves the security of data. At the same time, it also causes a certain degree of delay. It needs to wait for a slave to write the relay log. There is an additional network interaction process. Therefore, semi-synchronous replication It is best used in low-latency networks.

This is to ensure the writing of slave data from the perspective of the connection between the main library and the slave library. Another way of thinking, if you want to reduce the delay of master-slave synchronization and reduce the waiting time caused by SQL execution, is there a way to allow multiple SQL statements to be executed in parallel on the slave database instead of being queued for execution?

2. Multi-database parallel replication: GTID replication of asynchronous replication

How to implement parallel replication? Imagine that if three statements are executed in three databases and operate on their respective databases, is it sure that there will be no concurrency problems? The order of execution is also not required. Of course it is, so if you are operating three databases, the SQL threads from the three databases can be executed concurrently. This is the multi-database parallel replication supported in MySQL 5.6 version.

Insert picture description here
But in most cases, we have a single database with multiple tables. How can we achieve parallel replication in a database? In other words, we know that the database itself supports multiple transactions at the same time; why these transactions can be executed in parallel on the main database, but there will be no problems?

Because they do not interfere with each other, for example, these transactions operate on different tables or operate on different rows. There is no competition for resources and interference with data. The transactions executed in parallel on the main library can certainly be executed in parallel on the slave library, right? For example, there are three transactions on the master that operate on three tables at the same time. Can these three transactions be executed in parallel on the slave?

Therefore, we can divide the transactions that are executed in parallel on the main library into a group and give them numbers. The transactions of this group can also be executed in parallel on the slave library. This number, we call it GTID (GlobalTransaction Identifiers), this kind of master-slave replication, we call it GTID-based replication.

Insert picture description here

If we want to use GTID replication, we can open it by modifying the configuration parameters, which is closed by default:

show global variables like 'gtid_mode';

summary

Whether it is optimizing the connection between master and slave, or allowing the slave to execute SQL in parallel, it is all about solving the problem of master-slave replication delay from the database level.

In addition to the level of the database itself, at the application level, we also have some methods to reduce the delay of master-slave synchronization. After we have done master-slave replication, if the data stored in a single master node or a single table is too large, such as a table with hundreds of millions of data, the query performance of the single table will still decrease. Data classification and splitting of database nodes, this is the sub-database sub-table.

2. Sub-database and sub-table

Vertical sub-database reduces concurrency pressure. Level the table to solve the storage bottleneck.

1) Vertical sub-library

Split a database into different databases according to business

Insert picture description here

2) Horizontal sub-database sub-table

Distribute the data of a single table to multiple databases according to certain rules

Insert picture description here

3. Achieve high availability

Through master-slave or sub-database sub-table can reduce the access pressure and storage pressure of a single database node, and achieve the purpose of improving database performance, but what if the master node is down? Therefore, High Available is also the basis for high performance.

1) Master-slave replication: The traditional HAProxy + keepalived scheme is based on master-slave replication.

2) NDB Cluster: MySQL Cluster based on the NDB cluster storage engine. Reference link ...
Insert picture description here
3) Galera: a multi-master synchronous replication cluster solution.
Insert picture description here
4) MHA/MMM: MMM (Master-Master replication manager for MySQL), a multi-master high-availability architecture, was developed by a Japanese. Companies like Meituan also used MMM extensively in the early days. MHA (MySQL Master High Available).

Both MMM and MHA provide a virtual IP and monitor the master node and the slave node. When the master node fails, a slave node needs to be promoted to the master node, and the data missing from the master node in the slave node is added. Point the VIP to the new master node. Reference link...

5) MGR: InnoDB Cluster launched by MySQL version 5.7.17, also called MySQL Group Replicatioin (MGR), this package includes mysql shell and mysql-route. Reference link 1 , reference link 2

,

To sum up: the problem that a high-availability HA solution needs to solve is how to upgrade a slave with the latest data to become a master when a master node is down. If you run multiple masters at the same time, you must solve the problem of data replication between masters and connection routing for clients. Different solutions have different implementation difficulties and different operation and maintenance management costs.

The above is the optimization of the architecture. You can start with clusters, sub-databases and sub-tables, and pay attention to achieving high availability.

Guess you like

Origin blog.csdn.net/weixin_43935927/article/details/114002683