After the separate read and write delay --MySQL database solutions

Disclaimer: if reproduced - please add the next micro-channel can inform 457,556,886 knowledge is shared https://blog.csdn.net/wolf_love666/article/details/90444154

Background: Here Insert Picture Description
According to the map you can see QPS: 10.73k, in fact, a large number of concurrent real time data arrives, I have here the highest QPS is nearly 15k and the current fragmentation under a single database (instance) 4CPU8G memory configuration, the highest. QPS 7k of performance.
Here Insert Picture Description
Based on the articles I had sub-library sub-table for the performance is greatly improved, sub-sub-table database and middleware extended practice
examples I explain here is the main 8 4 from the library (12 instances), each of the following examples are referred to as slices. Single slice configuration mysql version 5.7.19 (a description will be separate read and write different versions of different strategies), 12CPU16G memory, 128G disk, Raid: 10.
Here Insert Picture Description

Separate read and write practice

Separate read and write can refer to the article on the sub-library sub-table practice usage of middleware to achieve. Mainstream generally use mycat, but each has its own advantages middleware can merit and operational characteristics and use. Next, after the separate read and write speaking sequelae.

Separating real-time and delayed write insert / update and query operations

For example, I have here a scene: As the amount of data, human dimension to the case, the amount of goods 20w ~ 50w. Then paged query downstream unsynchronized state, and then update the data synchronization paged data. I was set up four scenarios below, the final choice to read and write separation and separation exist, the requirements for real-time high result is still the master of the main library to read and write, changes in demand for small data, all transferred from slave library .

Here are four scenarios of the program:

1, completely separated: Full Read -> from the library, the full amount of the write -> main reservoir
provided: a first logical page query the same
characteristics: semi-synchronous replication, the master is now a 2 from the library, using the principle of semi-sync, 1 / possibility 2 will repeat the query, of course, the chance of testing required and delay calculation can be obtained, that is, the possibility of the worst results was a 50% chance repetitive queries. Current feedback delay 1s master-slave synchronization
scheme:
(1) redundancy: to recalibration, for a 50% probability of repeat the query data.
(2) Performance: repeated performance data and parity will be reduced, but from the library is two assessments QPS pressure, improved performance will cancel each other out portion.

2, incomplete separation: Commodity master reader module is still the main library, read elsewhere -> from the library, write -> main library.
Premise: the first page of query logic unchanged
Features: Due to joint marketing system single scene, mainly carried out around SKU. But part of the pressure will improve.
Program:
(1) redundancy: code redundancy and more local style is not uniform.
(2) Performance: there will be some improvement, but the whole, the large amount of data when the main library is still the master reading and writing pressure.

3, completely separated: Full Read -> from the library, the full amount of the write -> main library.
Premise: paging query (without synchronization status)
Features: With the paging query large volumes of data and the number of pages was positively related to the case will be more and more time.
Program:
(1) redundancy: repeat queries, since the tab and a positive correlation properties, the greater the amount of data, the greater the time-consuming.
(2) paging query reduction schemes to solve the loss of performance properties of the response time
(2.1) may be a time delay associated with the policy (database does not support the resilient)
(2.2) using the sequence id (id using the database index filter) and limit the use of a combination of (the effect is not Big).

4, completely separated: Full Read -> from the library, the full amount of the write -> main reservoir
conditions: Query tab (plus synchronization state), the last result set when to exit fallback query count and the total amount of re-executing the logic.
Features: With the paging query large volumes of data and the number of pages was positively related to the case will be more and more time.
Program: 3 of the same program appeal. But to avoid a repeat query data.

Separate read and write and non-separating exist after transformation renderings (here, the amount of data I 200000000):
master repository main CPU utilization 95% to 99% before the separate read and write
Here Insert Picture Description
after write separate master CPU usage primary database 10 %the following.
Here Insert Picture Description

From the point of view of our practice to read and write separation effect is still pretty good, but here the following questions:

  • 0, MySQL master-slave cluster the main problem?
  • 1, MySQL master-slave synchronization of several strategies? And the difference?
  • 2, MySQL master-slave delay in the end how much?
  • 3, how much delay we can accept?
  • 4, what is the root cause of the master-slave delay is?
  • 5, when the amount of data read and write separation as long as there is still a place to write data inconsistencies caused delays occur, what is the solution?
  • 0, MySQL master-slave cluster the main problem?
  • Multi-master repository reason:
    the case of high concurrency, the number of connections a single MySQL database and more, so QPS / OPS will be very large. I like the above-mentioned pressure measurement results I mentioned here, MySQL maximum 7k of QPS. With the number of concurrent more, QPS processing power will fall. So how to solve this bottleneck. This time will be sub-libraries, QPS / OPS capacity sharing, the request would QPS / OPS a single master library is 2w, where I slice four primary master library, the library corresponding to each of the primary master 5000 requests the amount apportioned . (If you can not understand the metaphor of a cluster of servers, services architecture evolution in a single server into a multi-server, if still can not understand, then you can refer to the following article evolution of large sites )
    so we know reduces single the number of connected servers request volume.
  • The main reason from the library:
    So for 5000 the amount of a single request (just based on the assumption that the model), the proportions of his request, how is it? And how to prevent concurrent flow system is not available due to paralysis scene it? Data loss it?
    First, we can consider data backup, as well as traffic analysis, and general often we introduced from the library:
    a master-slave: a Master, a Slave
    a master multi-slave: a Master, Slave multiple
    requests I can refer to the proportions above this figure (actual production environment): Here Insert Picture Description
    can be seen from FIG proportion read: write = 10.73k: 26 is approximately equal to 10000: 1, the average ratio: 298.91: 150 = 2.4: 1 ratio, a read operation is obviously about when a write operation, writes the equivalent of an average of assumed 150 requests a read operation. When concurrent flow is exaggerated when they came up to 1w: 1. Then we can read into the static data to back up the data on it from the library? The answer is clearly yes.
  • 1, MySQL master-slave synchronization of several strategies? And the difference?
  • Master-slave synchronization mechanisms:
    this also needs to be considered is a copy of data synchronization mechanism:
    the case of a master-slave Here Insert Picture Description
    case of a master multi-slaves
    Here Insert Picture Description
    we look at him specifically how to achieve synchronization according to the figure, we all know that in fact mysql performed when data is performed in accordance with binlog log. So of course we can be the most primitive data reprocessing according binlog log.
  • 2, MySQL master-slave delay in the end how much?
  • 3, how much delay we can accept?
  • 4, what is the root cause of the master-slave delay is?
    The principle:
    Here Insert Picture DescriptionHere Insert Picture Description
    Master-slave delay time: Master executed successfully, the Slave executed successfully, the time difference.主库将变更写binlog日志,然后从库连接到主库之后,从库有一个IO线程,将主库的binlog日志拷贝到自己本地,写入一个中继日志(relayLog)中。接着从库中有一个SQL线程会从中继日志(relayLog)读取binlog,然后执行binlog日志中的内容,也就是在自己本地再次执行一遍SQL,这样就可以保证自己跟主库的数据是一样的。
    在这里插入图片描述
    Since the SQL execution characteristics from the main library from the library, and serial copy log, under high concurrency scenarios, data from the library bound to the slower main reservoir, is delayed. So often, the data just written to the primary database might not be read, to over tens of milliseconds, or even hundreds of milliseconds to read
    而且这里还有另外一个问题,就是如果主库突然宕机,然后恰好数据还没同步到从库,那么有些数据可能在从库上是没有的,有些数据可能就丢失了

mysql的两个机制:

  • 一个是半同步复制,用来解决主库数据丢失问题;
    semi-sync复制,指的就是主库写入binlog日志之后,就会将强制此时立即将数据同步到从库,从库将日志写入自己本地的relay log之后,接着会返回一个ack给主库,主库接收到至少一个从库的ack之后才会认为写操作完成了
  • 一个是并行复制,用来解决主从同步延时问题。
    指的是从库开启多个线程,并行读取relay log中不同库的日志,然后并行重放不同库的日志,这是库级别的并行。

监控主从延迟:
Slave 使用本机当前时间,跟 Master 上 binlog 的时间戳比较
pt-heartbeat、mt-heartbeat
本质:同一条 SQL,Master 上执行结束的时间 vs. Slave 上执行结束的时间。

  • 5、当数据量大读写分离只要有写的地方依然会出现延迟导致的数据不一致情况,该如何解决?
  • 1、分析mysql日志 看是否慢查询太多
  • 2、统计高峰时期的写入语句数量以及平均值
  • 3、检查同步时主库和从库的网络数据传输量
  • 4、统计服务器运行状态信息
  • 5, from the perspective of the probe to consider the problem is to add a table in the self-energizing Master, the table contains only one field. When the Master receives any request data update, this will trigger a trigger self-energizing update records in the table. As shown below:
    在这里插入图片描述
    Since Count_table Mysq also involved from the synchronization master, so the Master for the update will be synchronized to the Update Slave. When data read by the Client Proxy, Proxy may send a query request to the Master and Slave Count_table Xianxiang table, when both data are the same, the Proxy can be identified and Master Slave status data is consistent, then select request Slave to the server, otherwise it is sent to the Master. As shown below:
    在这里插入图片描述
    Bottleneck Reflection angle: sql statement contains a large number of slow queries, high concurrency, network transmission issues and server configuration

Note:

NA separate read and write can not be forced to use scenarios:
otherwise separate read and write from the primary impact will result in a delay of more than a few as follows:

  • Under unusual circumstances, the HA can not be switched: HA software needs consistency check data, time delays, inconsistent standby
  • Library equipment Hang cause the backup to fail: flush tables with read lock timeout would 900s
  • Slave as a reference to the backup data is not the latest, but the delay.

Such a result will cause a write separation meaningless standby disaster failure.
Then they return to the scene of the original start, so if you want to use to distinguish their own business scenarios, and refining business, upgrade SQL execution speed, optimization index, reduce unnecessary DML operations, and positioning 2/8 principle in the end is what the table data from the delayed impact of the main big. Then the most important point is that sometimes the business logic is often the root cause of the problem, optimize business logic is the most fundamental problem. Dynamic data changes frequently must take real-time read and write master main library. Otherwise, highly concurrent flow scenarios, separate read and write the losses caused will be even greater.

Pictures portion of the reference sources , data 2

Guess you like

Origin blog.csdn.net/wolf_love666/article/details/90444154