[Advanced] Detailed explanation of MySQL's read-write separation


insert image description here

0. Preface

For example, your team develops and maintains an e-commerce platform. With the development of the business and the continuous maturity of online shopping habits, there are currently millions of users browsing and purchasing products on the platform every day. However, after a specific analysis by your team, it was found that 1000 products were ordered for 1 product, that is, a large number of requests were dealing with product inquiries and courier information inquiries.

Especially during peak hours, the server may need to handle a large number of read and write requests. If all read requests are processed by a database server, it may cause the server to respond slowly, or even the server cannot respond to the order request, or even the server crashes directly.

At this time, as a senior, you should already be considering using the read-write separation strategy. Specifically, you must think that you can set up a master library to handle all write operations (such as order creation, user registration), and then set up multiple slave libraries to handle read operations (such as product browsing, order query). In this way, the pressure on the master database will be greatly reduced. At the same time, since the read request can be distributed to multiple slave databases, the response speed of user queries will also be improved. That's right, most of them think this way. Separating databases and tables, reading and writing separation is actually to solve the problem of large amount of data, but when you complete the separation of reading and writing, you find a series of problems that need further processing. In this article, we will Focus on the separation of reading and writing.


For more information about sub-databases and tables, please refer to my previous blog "Detailed Explanation of MySQL Sub-databases and Tables". In this article, we focus on the common implementation of MySQL's read-write separation. and problem solving.

Whether to implement read-write separation needs to be weighed according to specific business requirements (such as system concurrency, user experience requirements, etc.) and system conditions (such as hardware resources, maintenance costs, etc.).

Rumors about the performance bottleneck of MySQL's 20 million entries

As a development student who has worked for more than 3-5 years, you should have heard of the performance bottleneck of MySQL single table by seniors or online posts, that is, when the data volume of a single table exceeds 20 million rows, the performance will drop significantly. In fact, this rumor has always existed and has been passed on. I don’t think it’s a bad thing, at least in terms of performance optimization. After I saw the unofficial history is. Regarding the performance bottleneck of MySQL single table, that is, when the data volume of a single table exceeds 20 million rows, the performance will drop significantly. According to Noshi, this statement was first said to have originated from Baidu, and was later brought to other companies by Baidu engineers, and gradually spread in the industry.


A long time ago, I was curious and did a test verification. 其实在8核、16G、 机械硬盘、单表32个字段 情况下。数据库表数据达到1000多万条,没有经过索引优化的情况下。时候性能大概在查询一次的时间5-8秒不等. But after index optimization, it can be reduced to within 3 seconds. In fact, this can be regarded as a performance bottleneck, which is less than 20 million.

Best practices given by Alibaba

Alibaba's "Java Development Manual" proposes that the number of rows in a single table exceeds 5 million rows or the capacity of a single table exceeds 2GB.. However, this value is not fixed, it is related to the configuration of MySQL and the hardware of the machine. When the single-table database reaches a certain upper limit, the memory cannot store its indexes, causing subsequent SQL queries to generate disk IO, thereby degrading performance. Increasing the hardware configuration may improve performance.

Alibaba's "Java Development Manual" adds that if the data volume is not expected to reach this level in three years, do not divide the database and table when creating the table. According to the comprehensive evaluation of its own machine conditions, 5 million lines can be used as a unified standard. According to actual tests, under the InnoDB engine, the query performance of a single MySQL table with a data volume of 8 million may be poor. The query speed may be faster using the MyISAM engine, but it is not as good as InnoDB for data integrity and transaction support. Therefore, an appropriate optimization scheme should be selected according to actual needs.

1. MySQL read and write separation

1. Why read-write separation is needed

In the business system, when faced with a large number of read operations, and the pressure of the read operation on the database increases, resulting in a decrease in the efficiency of the write operation, it is necessary to use read-write separation. Read-write separation can effectively share the pressure on the database, optimize system performance, and improve data processing efficiency.

For example, on an e-commerce website, operations such as browsing products and viewing orders by users are read operations, while operations such as submitting orders and modifying orders are write operations. During the big promotion period, the number of user visits increases, and the requests for reading product information surge. If all read and write operations are performed on one database, it will increase the pressure on the database, affect the efficiency of write operations, and may cause order processing. Delay.

At this time, read-write separation is required. Configure one master database for write operations and multiple slave databases for read operations. In this way, the read operation can be shared among multiple slave databases without affecting the write operation of the master database, thereby improving the processing efficiency of the system.

At the same time, read-write separation can also improve the availability of the system. When the primary database fails, it can immediately switch to the secondary database for read and write operations, reducing system downtime.

2. Mysql performance bottleneck, read and write separation scheme

Regarding the performance bottleneck of MySQL, we can also see from the preface, so there are mainly two aspects:

  1. IO bottleneck Because the disk read and write speed is much lower than the memory read and write speed, a large number of disk IO operations will become a performance bottleneck.
  2. CPU bottleneck When there are a large number of complex queries, CPU resources will be heavily occupied, affecting database performance.

For these bottlenecks, you can use the scheme of read-write separation to improve the performance of MySQL:

Read-write separation is to separate the read and write operations of the database, which are processed by different database servers. In general, we will set up a master database (Master) for write operations, and multiple slave databases (Slave) for read operations. When new data is written, the master database will copy the data to each slave database.

  1. Reduce the pressure on a single database server: By distributing read and write operations to different servers, the load on a single database server can be effectively reduced and the response speed can be improved.
  2. Improve the concurrent processing capability of the database: In application scenarios with more reads and fewer writes, the concurrent processing capability of the database can be improved by increasing the number of slave databases.
  3. Improve data availability and security: When the primary database fails, it can quickly switch to the secondary database for read and write operations to ensure data availability. At the same time, since the data is distributed in multiple servers, the security of the data can be improved.

Frequently asked questions

Separation of reading and writing will also bring some problems, such as data synchronization delay, which may cause the read data to be out of date. Therefore, these factors need to be considered when implementing read-write separation.

Other common read-write separation scenarios

Then under what other circumstances do we need read-write separation?
Read-write separation is a common strategy in database architecture, mainly used to solve the following situations:

  1. When the concurrent read and write requests of the database are very high, a single database server may not be able to withstand such pressure. At this time, it is necessary to disperse the pressure by separating reads and writes. Read requests can be distributed to multiple slave libraries, while write requests are processed by the master library.

  2. Resource optimization, read-write separation can make the main library focus on processing write operations and transactional operations, while the slave library handles read operations. In this way, the optimal configuration of hardware and system resources can be carried out according to the different characteristics of the master-slave library.

  3. Data backup and failover, this is an indirect role. Through read-write separation, real-time data backup can be realized. When the master library fails, it can quickly switch to the slave library to ensure the continuous availability of services.

  4. To improve query performance, read-write separation can distribute complex query operations to multiple slave libraries for execution, thereby improving query performance.

2. Project practice

In our project we use Java and MySQL database for read and write separation. Our goal is to ensure the stability and performance of the system, but also to meet the rapid development of business and the continuous growth of data volume. The following is the process by which we achieve read-write separation:

1. Create a database copy

We created a master library (for write operations) and multiple slave libraries (for read operations) in MySQL. Both the master database and the slave database are independent database servers. Through the replication function of MySQL, the slave database can synchronize the data of the master database in real time.

2. Configure the data source

In the Java project, we configured two data sources, one connected to the main library and one connected to the slave library. We use Spring Boot's data source configuration to easily manage multiple data sources.

3. Implement database routing

We use MyBatis as our ORM framework, and by implementing RoutingDataSourcethe interface, we implement a custom data source routing strategy. When performing database operations, we decide which data source to handle according to the type of operation (read or write).

Define a DbContextHolderclass to save the database type of the current thread Create a RoutingDataSourceclass that inherits from AbstractRoutingDataSource:

public class RoutingDataSource extends AbstractRoutingDataSource {
    
    

    @Override
    protected Object determineCurrentLookupKey() {
    
    
        return DbContextHolder.getDbType();
    }

}

Next, we need to configure the data source in the MyBatis configuration file:

@Configuration
public class DataSourceConfig {
    
    

    @Bean
    public DataSource masterDataSource() {
    
    
        // 配置主数据源
    }

    @Bean
    public DataSource slaveDataSource() {
    
    
        // 配置从数据源
    }

    @Bean
    public DataSource routingDataSource(@Qualifier("masterDataSource") DataSource masterDataSource,
                                        @Qualifier("slaveDataSource") DataSource slaveDataSource) {
    
    
        Map<Object, Object> targetDataSources = new HashMap<>();
        targetDataSources.put(DbContextHolder.DbType.MASTER, masterDataSource);
        targetDataSources.put(DbContextHolder.DbType.SLAVE, slaveDataSource);

        RoutingDataSource routingDataSource = new RoutingDataSource();
        routingDataSource.setDefaultTargetDataSource(masterDataSource);
        routingDataSource.setTargetDataSources(targetDataSources);
        return routingDataSource;
    }

    @Bean
    public SqlSessionFactory sqlSessionFactory(@Qualifier("routingDataSource") DataSource routingDataSource) throws Exception {
    
    
        SqlSessionFactoryBean sessionFactory = new SqlSessionFactoryBean();
        sessionFactory.setDataSource(routingDataSource);
        return sessionFactory.getObject();
    }

}

DbContextHolder.setDbType(DbType)We can use methods to switch data sources where we need to switch data sources.
Well, the basic logic is almost sorted out. Let's implement a separate annotation for reading and writing, which is convenient for use in code.
@ReadOnlyCustom annotations are used to mark a method or class that only reads data and does not modify data. This annotation is used in the scenario where the database reads and writes are separated. When the system detects this annotation, it will automatically switch to operate from the database. Then, this annotation needs to be detected in the data source switching logic, and if this annotation is detected, switch to the secondary data source. This part of the code may be complicated by other subgrade judgments. This is my simplified way of writing. In fact, the code is more complicated than this, and it needs to be fault-tolerant.

import java.lang.annotation.ElementType;
import java.lang.annotation.Retention;
import java.lang.annotation.RetentionPolicy;
import java.lang.annotation.Target;

@Target({
    
    ElementType.METHOD, ElementType.TYPE})
@Retention(RetentionPolicy.RUNTIME)
public @interface ReadOnly {
    
    
}

In order to resolve this annotation, we create an aspect (Aspect) to switch from the data source when processing the read operation:

@Aspect
@Component
public class DataSourceAspect {
    
    

    @Around("@annotation(ReadOnly)")
    public Object setReadDataSourceType(ProceedingJoinPoint joinPoint) throws Throwable {
    
    
        try {
    
    
            DbContextHolder.setDbType(DbContextHolder.DbType.SLAVE);
            return joinPoint.proceed();
        } finally {
    
    
            DbContextHolder.clearDbType();
        }
    }

}

4. Switch data source

We created a custom annotation ReadOnly, which can be used on the method of the Service layer to specify which database operation the method needs to use. Then, we implemented an AOP aspect. The aspect code example is as above. When the annotated ReadOnlymethod is called, it will switch to the corresponding data source according to the parameters of the annotation.
Use the @ReadOnly annotation on the method you need to use the read operation, so that when this method is executed, the system will automatically switch to the slave data source. For example, when calling getUserByIdthe method, the data source is used.

@Service
public class UserService {
    
    

    @Autowired
    private UserDao userDao;

    @ReadOnly
    public User getUserById(int id) {
    
    
        return userDao.getUserById(id);
    }

}

For write operations, no special marking is required, and the system will use the primary data source by default. When the method is called addUser, the primary data source is used. When processing read requests, the system will automatically switch to the secondary data source, which improves the read performance of the system; when processing write requests, the system will use the primary data source to ensure data consistency.

@Service
public class UserService {
    
    

    @Autowired
    private UserDao userDao;

    public void addUser(User user) {
    
    
        userDao.addUser(user);
    }

}

5. Load balancing

Since we have multiple slaves, we implemented a simple load balancing strategy to evenly distribute read requests to different slaves.

2. Reference documents

It is recommended that you read this article written by Meituan in great detail.
"The Application of SQL Parser in Meituan Author: Guangyou" https://tech.meituan.com/2018/05/20/sql-parser-used-in-mtdp.html

Guess you like

Origin blog.csdn.net/wangshuai6707/article/details/132657321