Character set and comparison rules
The process of mapping characters into binary is called encoding, and the process of mapping binary into characters is called decoding.
Let's take a look at the commonly used character sets:
ASCII字符集:
128 characters are included, including spaces, punctuation marks, numbers, upper and lower case letters, and some invisible characters.
ISO 8859-1字符集:
A total of 256 characters are included, and ASCII字符集
128 characters commonly used in Western Europe are expanded on the basis.
GB2312字符集:
Contains Chinese characters, Latin letters, Greek letters, Japanese hiragana and katakana letters, Russian Cyrillic letters, and 6763 Chinese characters.
GBK字符集
: GBK字符集
Expanded the GB2312 character set.
UTF-8字符集:
Almost the characters used in various countries/regions in the world today are included, and they are constantly expanding.
umf8mb3: "castrated" UTF-8 character set, using 1~3 bytes to represent characters. umf8mb4: Authentic UTF-8 character set, using 1 to 4 bytes to represent characters.
utf8 is an alias of umf8mb3.
A comparison rule is a rule for comparing the sizes of characters in a certain character set. A character set corresponds to several comparison rules. There is a default comparison rule, and a comparison rule must correspond to a character set.
MySQL
There are 4 character sets and comparison rules, which are server level, database level, table level, and column level.
Server level
(1) View the character set of the server level
mysql> show variables like 'character_set_server';
(2) View the comparison rules at the server level
mysql> show variables like 'collation_server';
Database level
(1) View the character set of the database level
mysql> show variables like 'character_set_database';
(2) View the comparison rules at the database level
mysql> show variables like 'collation_database';
Table level
We can specify the character set and comparison rules of the table when creating and modifying the table. If not specified in the table, the comparison rule of the database and character set of the table is used as the character set and comparison rule of the table.
Column level
We can specify the character set and comparison rules of the column when creating and modifying the column. If the character set and comparison rule are not explicitly specified when creating the column, the column defaults to the character set and comparison rule of the table.
The character set used in the communication between the client and the server
Essentially, the request received by the server is a byte sequence, and the server regards this byte sequence as character_set_client
a byte sequence encoded using the character set represented by the system variable . (After each client establishes a connection with the server, the server will maintain a separate character_set_client
variable for the client , which is at the SESSION level) The
server will treat the requested byte sequence as character_set_client
a byte sequence encoded by the corresponding character set. When actually processing the request, it is converted into character_set_connection
a byte sequence encoded using the character set corresponding to the SESSION level system variable .
When comparing the value received by the client with the value of a certain column in the table, the priority of the character set and collation of the column is higher. E.g,
select * from t where c='我';
The gbk
encoding used by "I" and c
the utf8
character set encoding used by the column . Here, the "I" in the request needs to be gbk
converted from the character set first utf8
.
Taking the above example as an example, when the server generates a response, the server converts the string "I" from the utf8 character set encoding to character_set_result
the byte sequence of the character set encoding corresponding to the system variable, and then sends it to the client.
System variable | description |
---|---|
character_set_client | The server believes that the request is encoded according to the character set specified by the system variable |
character_set_connection | When the server processes the request, it converts the request byte sequence from character_set_client to character_set_connection |
character_set_result | The server uses the character set specified by the system variable to encode the string returned to the client |
Transmitting information when connecting to the server, the client with the default character set user name and password to the server together with the server after the reception character_set_client、character_set_connection
and character_set_result
initializes the value of these three system variables for the client's default character set.