Fault Analysis 1677 Err Row-based dual master-master replication master

First, the error message

Recent system upgrade project colleagues on a field test.test_tab_t1 to make changes, SQL statement is as follows:

ALTER TABLE TEST.TEST_TAB_T1 MODIFY BXXX VARCHAR(200);

 

MySQL master-slave synchronization error in the project after doing system upgrades, appears, the error message is as follows:

mysql>show slave status\G
      Master_Log_File: binlog.000233
  Read_Master_Log_Pos: 274415020
       Relay_Log_File: relay-bin.000253
        Relay_Log_Pos: 175535154
Relay_Master_Log_File: binlog.000233
     Slave_IO_Running: Yes
    Slave_SQL_Running: No
    .................: 
           Last_Errno: 1677
           Last_Error: Column 28 of table 'test.test_tab_t1' cannot be converted from type 'varchar(30)(bytes))' to type 'varchar(400(bytes) gbk)'
         Skip_Counter: 0
  Exec_Master_Log_Pos: 175536357
      Relay_Log_Space: 274410464


MySQL alarm log of the error message:    

          

2020-03-24T16:53:16.051244Z 11686 [ERROR] Slave SQL for channel '': Column 28 of table 'test.test_tab_t1' cannot be converted from type 'varchar(30(bytes))' to type 'varchar(400(bytes) gbk)', Error_code: 1677
2020-03-24T16:53:16.051269Z 11686 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'binlog.000233' position 175536357.

Second, the environmental information 

Community mysql version 5.7.24 of the project is to use the character set is: gbk, binlog is ROW format, using the keepalived + dual master (master-master) architecture.

mysql>select @@version,@@character_set_server,@@binlog_format;
+-----------+------------------------+-----------------+
| @@version | @@character_set_server | @@binlog_format |
+-----------+------------------------+-----------------+
| 5.7.24   | gbk             | ROW             |
+-----------+------------------------+-----------------+

Third, the diagnosis process

1, according to MySQL Replication Breaks With Error 1677: Column .. of Table '...' Can not Be Converted (Doc ID 2037712.1) of a given document term analysis information err1677 view test.test_tab_t1 table columns (Column 28), there have been related to the character set conversion errors.

image.png

image.png


First comparison (master1-master2) character set information:

image.png

After comparison (master1-master2) database, table, column-level character set information:

MASTER1 database, table, column-level character set information:

mysql>select * from information_schema.schemata where schema_name='test';
+--------------+-------------+----------------------------+------------------------+----------+--------------------+
| CATALOG_NAME | SCHEMA_NAME | DEFAULT_CHARACTER_SET_NAME | DEFAULT_COLLATION_NAME | SQL_PATH | DEFAULT_ENCRYPTION |
+--------------+-------------+----------------------------+------------------------+----------+--------------------+
| def    | test    | gbk             | gbk_general_ci    |   NULL | NO           |
+--------------+-------------+----------------------------+------------------------+----------+--------------------+

mysql>select table_schema,table_name,table_type,table_collation from information_schema.tables  where table_name='test_tab_t1';
+--------------+------------+------------+-----------------+
| TABLE_SCHEMA | TABLE_NAME | TABLE_TYPE | TABLE_COLLATION |
+--------------+------------+------------+-----------------+
| test    | test_tab_t1| BASE TABLE | gbk_chinese_ci |
+--------------+------------+------------+-----------------+

mysql>select table_name,column_name,character_maximum_length,character_octet_length,character_set_name,collation_name 
from information_schema.columns where table_name='test_tab_t1' and table_schema='test' and ordinal_position=29;
+------------+-------------+--------------------------+------------------------+--------------------+----------------+
| TABLE_NAME | COLUMN_NAME | CHARACTER_MAXIMUM_LENGTH | CHARACTER_OCTET_LENGTH | CHARACTER_SET_NAME | COLLATION_NAME |
+------------+-------------+--------------------------+------------------------+--------------------+----------------+
| test_tab_t1| BXXX     |               200 |              400  | gbk            | gbk_chinese_ci |
+------------+-------------+--------------------------+------------------------+--------------------+----------------+

MASTER2 database, table, column-level character set information:

mysql>select * from information_schema.schemata where schema_name='test';
+--------------+-------------+----------------------------+------------------------+----------+--------------------+
| CATALOG_NAME | SCHEMA_NAME | DEFAULT_CHARACTER_SET_NAME | DEFAULT_COLLATION_NAME | SQL_PATH | DEFAULT_ENCRYPTION |
+--------------+-------------+----------------------------+------------------------+----------+--------------------+
| def       | test     | gbk                | gbk_general_ci      |   NULL | NO           |
+--------------+-------------+----------------------------+------------------------+----------+--------------------+

mysql>select table_schema,table_name,table_type,table_collation from information_schema.tables  where table_name='test_tab_t1';
+--------------+------------+------------+-----------------+
| TABLE_SCHEMA | TABLE_NAME | TABLE_TYPE | TABLE_COLLATION |
+--------------+------------+------------+-----------------+
| test      | test_tab_t1| BASE TABLE | gbk_chinese_ci |
+--------------+------------+------------+-----------------+

mysql>select table_name,column_name,character_maximum_length,character_octet_length,character_set_name,collation_name 
from information_schema.columns where table_name='test_tab_t1' and table_schema='test' and ordinal_position=29;
+------------+-------------+--------------------------+------------------------+--------------------+----------------+
| TABLE_NAME | COLUMN_NAME | CHARACTER_MAXIMUM_LENGTH | CHARACTER_OCTET_LENGTH | CHARACTER_SET_NAME | COLLATION_NAME |
+------------+-------------+--------------------------+------------------------+--------------------+----------------+
| test_tab_t1| BXXX   |           200 |         400 | gbk        | gbk_chinese_ci |
+------------+-------------+--------------------------+------------------------+--------------------+----------------+

According to the document ((Doc ID 2037712.1) look really change the system upgrade column (ALTER TABLE TEST.TEST_TAB_T1 MODIFY BXXX VARCHAR (200) corresponding to the table fields;) there is a problem.

报错:Last_Error: Column 28 of table 'test.test_tab_t1' cannot be converted from type 'varchar(30(bytes))' to type 'varchar(400(bytes) gbk)

Test.test_tab_t1 table is changed fields bxxx (ordinal_position = 29) character set issues emerged during the copy process, gbk is 2 bytes in size, before the change bxxx really varchar (16).

mysql> select * from information_schema.character_sets where character_set_name='gbk';
+--------------------+----------------------+------------------------+--------+
| CHARACTER_SET_NAME | DEFAULT_COLLATE_NAME | DESCRIPTION        | MAXLEN |
+--------------------+----------------------+------------------------+--------+
| gbk           | gbk_chinese_ci     | GBK Simplified Chinese |    2 |
+--------------------+----------------------+------------------------+--------+

But comparing the (master1-master2) database, table, column-level information of the character set is the same, the problem of inconsistent character set does not exist . Then feel the problem is not so simple, and then analyzed down.


I started thinking about whether or not we implement personnel to operate according to specifications, because the environment is a double master, whether it is in both instances simultaneously execute this statement, found during troubleshooting of client /etc/my.cnf set inside character_set_client on master1 is utf8, is gbk on master2. So we take a look at the information point of failure binlog time inside, an error in binlog.000233, position the stop is 175 536 357.

mysqlbinlog --no-defaults --start-position=175536357 --database=test  /opt/mysql/log/binlog/binlog.000233 --verbose
# at 175536357
#200325  0:53:16 server id 1  end_log_pos 175536422 CRC32 0xddd5d37 Anonymous_GTID  last_committed=271185   sequence_number=27186  rbr_only=yes
/*!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED*//*!*/;
SET @@SESSION.GTID_NEXT= 'ANONYMOUS'/*!*/;
# at 175536422
#200325  0:53:16 server id 1  end_log_pos 175536505 CRC32 0x3799f3b Query   thread_id=14154792  exec_time=0   error_code=0
SET TIMESTAMP=1585068796/*!*/;
SET @@session.pseudo_thread_id=14154792/*!*/;
SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0, @@session.unique_checks=1, @@session.autocommit=1/*!*/;
SET @@session.sql_mode=1075838976/*!*/;
SET @@session.auto_increment_increment=2, @@session.auto_increment_offset=2/*!*/;
/*!\C gbk *//*!*/;
SET @@session.character_set_client=28,@@session.collation_connection=28,@@session.collation_server=28/*!*/;
SET @@session.lc_time_names=0/*!*/;
SET @@session.collation_database=DEFAULT/*!*/;
BEGIN
/*!*/;
# at 175536505
#200325  0:53:16 server id 1  end_log_pos 175536685 CRC32 0xe3db6b6b    Table_map: `test`.`test_tab_t1` mapped to number 23434
# at 175536685
#200325  0:53:16 server id 1  end_log_pos 175537481 CRC32 0xf1343123    Update_rows: table id 2334 flags: STMT_END_F
### UPDATE `test`.`test_tab_t1`
### WHERE
###   @1='........'
###   ..........
###   @29='........'
###   ..........
###   @40='...'
### SET
###   @1='........'
###   ..........
###   @29='........'
###   ..........
###   @40='...'


Binlog see the inside of the SET @@ session.character_set_client = 28, @@ session.collation_connection = 28, @@ session.collation_server = 28 character set is indeed GBK

mysql> select * from information_schema.collations where id=28;
+----------------+--------------------+----+------------+-------------+---------+---------------+
| COLLATION_NAME | CHARACTER_SET_NAME | ID | IS_DEFAULT | IS_COMPILED | SORTLEN | PAD_ATTRIBUTE |
+----------------+--------------------+----+------------+-------------+---------+---------------+
| gbk_chinese_ci | gbk       | 28 | Yes     | Yes      |     1 | PAD SPACE   |
+----------------+--------------------+----+------------+-------------+---------+---------------+
1 row in set (0.01 sec)


Certainly more than enough character set is not the cause inconsistent. So use the official recommended method, set the parameters slave_type_conversions = ALL_LOSSY / ALL_NON_LOSSY to address is not valid, but the method is also the risk of lost data conversion.


Further according https://bugs.mysql.com/bug.php?id=83461 also described a field main characters copied from packets err1677 inconsistent error problems, but also mention another case, binlog_format = ROW because the parsing problem parsing the relay log, you can consider binlog_format = MIXED format, try to pull the slave is unable to pull up.


With this analysis, I feel no idea, it continues to find information, sometimes this way, the mountain poor water complex has a silver lining. Finally I found a very representative references: https: //bugs.mysql.com/bug.php id = 88595, this is a Bug # 88595 on the 5.6.37, Row-based master-master replication broken by? add column or drop column. But the project environment is 5.7.24, architecture is binlog_format = ROW double master's.

image.png


Contrast binlog information I posted, does the presence of the above table update test.test_tab_t1 point in time from interruption. Here I say pass, although inconsistent versions, but according to the symptom is almost close.

ALGORITHM = lower INPLACE (Default) where:

master1 performed: ALTER TABLE TEST.TEST_TAB_T1 MODIFY BXXX VARCHAR (200); in the press recorded binlog,

Where the change field information Master2 synchronization statement, during operation, the presence of update test.test_tab_t1 Master2 table by simultaneously pulling binlog (ratio of change of the field statements to complete)

When master1 you synchronize update statement test.test_tab_t1 table, as it has changed the field, table structure has changed, and the information update statements or the original structure, so there has been an error of err1677.


Basically pass over, but here are two questions, one is a dual write this business is not configured to write only one example, is the only interpretable, implementation of colleagues (master1) carried out on non-writing examples the DDL action. There is a version of the problem, and later consulted with his colleagues project implementation, the database is based on the upgrade from 5.6. rare...? rare...? Currently limited capacity, the first record, and then more in-depth follow-up analysis.


Fourth, the solution

image.png

Here there is no reference solution of this situation is for me, and after the above analysis, and based on business situation of the table, decided to skip the error, after checking data consistency.

stop slave; 

Means skip step # error, the digital variable back, skip a step 1 represents

set global sql_slave_skip_counter = 1; 

start slave;

pt-table-checksum can be used to detect the data consistency master1-master2 database of test test_tab_t1 table, if there is inconsistency in the use of this tool to repair.


pt-table-checksum --nocheck-replication-filters --databases=test --replicate=test.test_tab_t1 --create-replicate-table --host=xxx --port 3306 -uroot -pxxx


Fifth, the failure summary


Above failure analysis and troubleshooting of the process, we found that BUG, ​​so I think that with the oracle goldengate copy only DML, DDL source table was changed OGG-01161 and OGG-01163 is very similar.


Many online OGG-01161, OGG-01163 source table structure changes lead to replicat process abend articles have referred to the same table structure comparison of source and target side, R restart process or error.


Although the length of the field source and destination end modified, and the def files are modified, but the meta information file generated in the trail will not be updated. replicat follow the default process meta information trail file. So it is still an error. OVERRIDE option need to add new def content meta information in order to cover the trail of.


So this case is similar master2 failure of update statement is before the DDL changes before synchronizing to master1 table meta information is changed, and master1 already more after the structure of the table meta information, so that there will be an error.


Things have really interlinked, the database is no exception.


Guess you like

Origin blog.51cto.com/wyzwl/2482920