Analysis of unusual character insertion failure/fake death problem under MySQL UTF8 encoding/ETL synchronization data failure

1. Problem description

The ETL process failed. The data in MySQL was synchronized to the MySQL database on another server. The data synchronization failed due to the analysis of rare words.

The right thing is that the word pointed to by the arrow above caused ETL to fail. What should I do?

Second, cause analysis

       The maximum Unicode character that can be encoded by the three-byte UTF-8 is 0xffff, which is the basic multilingual plane (BMP) in Unicode. In other words, any Unicode characters that are not in the basic multi-text plane cannot be stored in Mysql's utf8 character set. Including  Emoji expressions (Emoji is a special Unicode encoding, commonly found on ios and android phones), many infrequently used Chinese characters, and any new Unicode characters, etc.

     utf8 is a character set in Mysql, which only supports UTF-8 characters up to three bytes, which is the basic multi-text plane in Unicode.

     To save 4-byte UTF-8 characters in Mysql, you need to use the utf8mb4 character set, but it is only supported after version 5.5.3 (check version: select version();). I think that for better compatibility, you should always use utf8mb4 instead of utf8. For CHAR type data, utf8mb4 will consume more space. According to the official Mysql recommendation, use VARCHAR instead of CHAR.

Three, the solution

3.1. Directly modify the character set of the table


-- 修改数据库的
alter database test character set = utfmb4;

​-- 修改表的字符集
alter table test convert to character set utf8mb4

3.2.1. Modify the database default configuration

[client]
default-character-set = utf8mb4
[mysqld]
character-set-server=utf8mb4
collation-server=utf8mb4_unicode_ci
[mysql]
default-character-set = utf8mb4

 

 

Guess you like

Origin blog.csdn.net/qq_35995514/article/details/109560474