Better reading experience\color{red}{\huge{better reading experience}}better reading experience
Basic introduction to utf8mb4
Basic Features
-
utf8mb4 is a character set encoding in MySQL that can store and process Unicode characters.
-
The Unicode character set contains almost all characters, including characters, symbols, emoticons, etc. of various languages.
The difference with utf8mb3
version
- The utf8mb4 character set is supported in MySQL version 5.5.3 and later.
- Prior to this version of MySQL, only the utf8 character set was supported, ie utf8mb3.
coding
- In MySQL, the utf8 character set actually only supports UTF-8 encoding up to 3 bytes. This means it cannot store and handle some special characters correctly, like some emoji and some auxiliary characters.
- In order to solve the limitations of the utf8 character set, MySQL introduced the utf8mb4 character set. The utf8mb4 character set supports up to 4 bytes of UTF-8 encoding, which can represent a wider range of characters, including some special characters and emoji.
utf8mb4 collation
common collation
- utf8mb4_general_ci:
- The default collation is case-insensitive and multilingual collations are considered.
- Under this rule, 'a' and 'A' are considered equal.
- utf8mb4_unicode_ci:
- Based on the Unicode Collation Algorithm (UCA) default collation, case insensitive .
- Compared with utf8mb4_general_ci, utf8mb4_unicode_ci is more precise and can correctly sort the characters of various languages.
- utf8mb4_bin:
- This collation is a binary-based collation, case-sensitive , and sorts according to the binary value of the characters.
- Under this rule, 'A' will come before 'a'.
- utf8mb4_0900_ai_ci:
- Introduced in MySQL 8.0.0, a new collation to support the utf8mb4 character set.
- In versions prior to MySQL 8.0.0, the utf8mb4 character set used the utf8mb4_general_ci collation. However, this sorting rule is not accurate enough for some specific character comparisons, which may cause some sorting and comparison results to be unexpected.
- Based on the collation rules of Unicode Collation Algorithm (UCA) 9.0.0, it is case-insensitive and handles the sorting and comparison of various characters more accurately.
In addition to the common collations mentioned above, MySQL also provides some other collations, such as utf8mb4_unicode_520_ci, utf8mb4_unicode_520_bin, etc. These rules can be selected and used according to specific needs.
default collation
When setting the table's default character set to the utf8mb4 character set but not explicitly specifying a collation:
- In MySQL 5.7, the default collation is utf8mb4_general_ci.
- In MySQL 8.0, the default collation is utf8mb4_0900_ai_ci.
Compatibility issues
Since the utf8mb4_0900_ai_ci collation is a collation introduced by MySQL 8.0, when a table of MySQL 8.0 is imported to MySQL 5.7 or MySQL 5.6, there will be a problem that the character set cannot be recognized.
-
[Err] 1273 - Unknown collation: 'utf8mb4_0900_ai_ci'
-
Solution: Modify the collation of the newly created database or manually modify all the collations in the sql file.
Comparison of utf8mb4_unicode_ci and utf8mb4_general_ci
- accuracy:
- The utf8mb4_unicode_ci sorting rule is based on standard unicode for sorting and comparison, can handle special characters, and can sort accurately in various voices.
- The utf8mb4_general_ci collation is not based on standard unicode and cannot handle some special characters.
- Performance:
- The utf8mb4_general_ci collation is relatively good in sorting performance;
- The utf8mb4_unicode_ci collation implements complex sorting algorithms for special characters, and its performance is slightly worse.
- In most scenarios, there is no significant performance difference between the two
Server level sort parameter control
collation_server
- Cited in MySQL 5.6
collation_server
as a system variable, it is used to specify the default character set collation at the server level. - It defines the default character set collation used when creating new tables
View collation_server
the value of the current MySQL server:
SHOW VARIABLES LIKE 'collation_server';
The command will return a result set containing collation_server
the variable named and its corresponding value.
Note :
collation_server
is a server-level variable whose value is set when the MySQL server starts.- Usually configured in configuration files (such as my.cnf or my.ini), restart the MySQL server to take effect.
Default parameter rules
- If the value of the parameter collation_database is not specified when the service starts, the value of the parameter collation_server will be inherited by default.
- If no collation is specified when creating the database, the value of the parameter collation_database is used by default.
Note :
- The parameters character_set_database and collation_database were deprecated in MySQL 5.7 and will be removed in subsequent releases.
- The new MySQL parameter default_collation_for_utf8mb4 is used to control the default collation when using the utf8mb4 character set, and the value is utf8mb4_0900_ai_ci or utf8mb4_general_ci
- The parameter default_collation_for_utf8mb4 takes effect in the following conditions:
- When using the SHOW COLLATION and SHOW CHARACTER SET commands.
- When creating or modifying a library specifying utf8mb4 but not specifying encoding rules.
- When creating or modifying a table specifying utf8mb4 but not specifying an encoding rule.
- When adding or modifying a column, utf8mb4 is specified but no encoding rule is specified.
- Others when utf8mb4 is used but no encoding rules are specified.