utf8 with the difference utf8mb4 Mysql, utf8mb4_bin, utf8mb4_general_ci, utf8mb4_unicode_ci difference

UTF-8 is used from 1 to 4 bytes, a variable-length encoding format, character encoding. mb4 i.e. most bytes 4, use 4 bytes to represent full UTF-8.

mysql utf8 encoding a maximum length of 3 bytes, if they are 4-byte wide character will be inserted anomaly. Three-byte Unicode UTF-8 character encoding maximum energy is 0xffff, i.e. basic multilingual plane in Unicode (BMP). That is, any character not in Unicode basic multilingual text plane, you can not use the utf8 character set is stored in the Mysql. Including Emoji expression (Emoji is a special Unicode encoding, common in the ios and android mobile phones), and many are not commonly used Chinese characters, as well as any new Unicode characters and so on.

Summary: MySQL is utf8 is utfmb3, only three bytes, saving space but can not express all of UTF-8. It is recommended to use utf8mb4.

utf8mb4_bin: each character string stored binary data compiled case-sensitive, and can be stored binary contents.

utf8mb4_general_ci: ci i.e., case insensitive, case insensitive. Does not implement the Unicode collation, in the face of some special language or character set, sort the results may be inconsistent. However, in most cases, the order of these special characters do not need that precise.

utf8mb4_unicode_ci: is based on Unicode standard for sorting and comparison, can accurately sort between different languages, Unicode collation order to be able to handle the special character of the situation, to achieve a slightly more complex sorting algorithm.

utf8mb4_general_ci is a legacy collation, do not support the expansion, it is possible to carry out a comparison between the characters one by one only. Compare the speed utf8_general_ci collation carried out quickly, but compared with utf8mb4_unicode_ci collation, comparison of the correctness of the poor.

Summary: general_ci faster, unicode_ci more accurate. But now compared to the CPU, it is far enough to be considered performance factors, the index involved, SQL design is. Users should be concerned about the character set and collation need to be unified in db. (Field may not produce garbled as a primary key or unique index, for example: to url as a unique index, but it records might be
garbled.) ----------------
Copyright : this article is CSDN blogger "yzh_1346983557 'original article, follow the CC 4.0 BY-SA copyright agreement, reproduced, please attach the original source link and this statement.
Original link: https: //blog.csdn.net/yzh_1346983557/article/details/89643071

Guess you like

Origin www.cnblogs.com/tc310/p/11824328.html