In MySql utf-8 and utf-8mb4 difference

I. Introduction
MySQL after 5.5.3 adds this code utf8mb4, and MB4 is the most bytes mean 4, designed to be compatible four-byte Unicode . Fortunately utf8mb4 utf8 is the superset , except that the outer coding to utf8mb4 no need to do the conversion. Of course, in order to save space, in general, use utf8 is enough.
Second, the description
that the above said since utf8 able to save most of the Chinese characters, then why use utf8mb4? The original mysql support utf8 character encoding maximum length of 3 bytes, if you encounter on 4-byte wide character inserts the anomaly. Three-byte Unicode UTF-8 character encoding maximum energy is 0xffff, i.e. basic multilingual plane in Unicode (BMP). That is, any character not in Unicode basic multilingual text plane, you can not use the utf8 character set is stored in the Mysql. Including Emoji expression (Emoji is a special Unicode encoding, common in the ios and android mobile phones), and many are not commonly used Chinese characters, as well as any new Unicode characters and so on.

Third, the root of the problem
initially UTF-8 format uses one to six bytes and the maximum can encode 31 characters. UTF-8 latest specification only one to four bytes, the maximum coding 21 can, just to represent all Unicode plane 17. Mysql utf8 is the character set, only supports up to three bytes in UTF-8 characters, that is, Unicode text in multiple substantially planar.
Mysql in utf8 why only supports holding up three bytes in UTF-8 characters? I thought for a moment, probably because it would Mysql beginning to develop, Unicode auxiliary plane that has not yet say. At that time, Unicode Commission also doing "65535 characters enough around the world used the word" dream. Mysql string length of the number of characters is counted rather than the number of bytes for the data type is CHAR, needs to be long enough to retain the string. When the utf8 character set, need to keep the length of the longest character length is multiplied utf8 string length, so here utf8 naturally limits the maximum length of 3, such as CHAR (100) Mysql retains 300 bytes in length.

Reprinted from:
http://ourmysql.com/archives/1402

Released two original articles · won praise 4 · Views 751

Guess you like

Origin blog.csdn.net/MRcheng12138/article/details/104581059