Actual combat: an article takes you to solve the daily errors of Mysql master-slave replication

Friends who have used Mysql database, must have heard of separation of reading and writing, and those who listened a lot, it is estimated that their ears have become cocooned. So how is the separation of reading and writing achieved? The most common method is to build a master-slave replication of Mysql. The main library provides write operations and the slave library provides read operations, thereby achieving application read-write separation.
Actual combat: an article takes you to solve the daily errors of Mysql master-slave replication

For the newcomers who are just entering the development post and the operation and maintenance post, they must understand what read-write separation is and what business problems the read-write separation solves. Only after thoroughly understanding these can they use the read-write separation architecture.

Not much nonsense.
Let’s talk about the two most common errors in master-slave replication. The first type: primary key conflict (Error_code: 1062).
The second type: record loss, such as update and delete operations, in the slave library. The corresponding record could not be found (Error_code: 1032)

Let’s simulate the loss of records in detail and deal with the whole process.

Check whether the master-slave replication is normal


[root@localhost] 11:34:29 [testdb]>show slave status\G;
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.0.1
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: binlog.000029
          Read_Master_Log_Pos: 3683
               Relay_Log_File: mysql-relay-bin.000003
                Relay_Log_Pos: 2207
        Relay_Master_Log_File: binlog.000029
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes

You can see that the IO thread and SQL thread are running normally.

Create test tables and records

[root@localhost] 11:25:48 [testdb]>show create table test1\G;
*************************** 1. row ***************************
       Table: test1
Create Table: CREATE TABLE `test1` (
  `id` int(11) NOT NULL,
  `name1` char(10) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '',
  `name2` char(20) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
1 row in set (0.07 sec)

insert into test1 values(1,'test1','test1');
insert into test1 values(2,'test2','test2');
insert into test1 values(3,'test3','test3');

Simulated master-slave replication failed due to missing records in the slave library

Step 1: Delete the record with id=2 from the library


[root@localhost] 11:26:41 [testdb]>delete from test1 where id=2;
Query OK, 1 row affected (0.44 sec)

[root@localhost] 11:26:52 [testdb]>select * from test1;
+----+-------+-------+
| id | name1 | name2 |
+----+-------+-------+
|  1 | test1 | test1 |
|  3 | test3 | test3 |
+----+-------+-------+
2 rows in set (0.00 sec)

Step 2: Delete the record with id=2 on the main database

[root@localhost] 11:27:11 [testdb]>delete from test1 where id=2;
Query OK, 1 row affected (0.17 sec)

[root@localhost] 11:27:51 [testdb]>select * from test1;
+----+-------+-------+
| id | name1 | name2 |
+----+-------+-------+
|  1 | test1 | test1 |
|  3 | test3 | test3 |
+----+-------+-------+
2 rows in set (0.00 sec)

View master-slave replication on the slave library


[root@localhost] 11:34:05 [testdb]>show slave status\G;
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.0.1
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: binlog.000029
          Read_Master_Log_Pos: 3683
               Relay_Log_File: mysql-relay-bin.000003
                Relay_Log_Pos: 1929
        Relay_Master_Log_File: binlog.000029
             Slave_IO_Running: Yes
            Slave_SQL_Running: No
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 1032
                   Last_Error: Could not execute Delete_rows event on table testdb.test1; Can't find record in 'test1', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log binlog.000029, end_log_pos 3652
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 3405
              Relay_Log_Space: 2414
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 1032
               Last_SQL_Error: Could not execute Delete_rows event on table testdb.test1; Can't find record in 'test1', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log binlog.000029, end_log_pos 3652
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 111213106
                  Master_UUID: 3ada166e-c4db-11ea-b21d-000c29cc2388
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State:
           Master_Retry_Count: 86400
                  Master_Bind:
      Last_IO_Error_Timestamp:
     Last_SQL_Error_Timestamp: 200904 11:33:10
               Master_SSL_Crl:
           Master_SSL_Crlpath:
           Retrieved_Gtid_Set: 3ada166e-c4db-11ea-b21d-000c29cc2388:84830-84835
            Executed_Gtid_Set: 3ada166e-c4db-11ea-b21d-000c29cc2388:1-84834,
3ada166e-c4db-11ea-b21d-000c29cc2389:1-4
                Auto_Position: 1
         Replicate_Rewrite_DB:
                 Channel_Name:
           Master_TLS_Version:
1 row in set (0.00 sec)

At this time, the master-slave sql thread is already in a stopped state, and the data copied by the master-slave is out of sync. A 1032 error was reported when copying started.

To solve the 1032 error, you can have the following 3 solutions.
Solution 1 : Manually export the missing business records on the main library and import them to the slave library, and then start the sql thread of the slave library. Wait a minute, have you noticed a problem, that is, on the main library, which record should be exported? It is not in the error message, but there is a hint, he event's master log binlog.000029, end_log_pos 3652, so you need to log binlog It seems a bit troublesome to analyze the content in the log and find the record to be operated. Don't panic, there are options two and three.

Solution 2 : Mysql database provides a parameter slave_skip_errors. This parameter can skip the sql statement specifying the error code, for example: slave_skip_errors=1032. Unfortunately, this parameter cannot be modified online. The modification takes effect and the instance needs to be restarted. Is it too friendly?


[root@localhost] 11:28:57 [testdb]>set global slave_skip_errors=1032;
ERROR 1238 (HY000): Variable 'slave_skip_errors' is a read only variable

Solution 3 : Use the pt-slave-restart tool in the percona-toolkits tool set to automatically skip the error code sql statement specified by the master-slave synchronization. This method is less invasive to mysql data and does not need to restart the Mysql instance


[mysql@mysql ~]$ pt-slave-restart --user=root --password=root --socket=/data/mysql/run/3306/mysql.sock --error-numbers=1032

# A software update is available:
2020-09-04T11:32:07 S=/data/mysql/run/3306/mysql.sock,p=...,u=root mysql-relay-bin.000003        1651 1032

When the error code sql statement specified by the master-slave synchronization is skipped, after the master-slave replication resumes, at an interval of 64 seconds, the master-slave replication will automatically detect whether there is a 1032 error again.

Other similar errors can be dealt with by the above three methods. It is recommended that you use the third method.

Guess you like

Origin blog.51cto.com/15061930/2642093