Character Sets and Comparison Rules

Introduction to Character Sets


Function: In order to make the data "understandable" by both humans and computers.

We know that only binary data can be stored in a computer, so how to store strings? Of course, the mapping relationship between characters and binary data is established. To establish this relationship, at least two things need to be clarified:

  1. Which characters are you mapping to binary data? That is to define a clear range of characters.
  2. How to map? The process of mapping a character to a binary data is also called encoding, and the process of mapping a binary data to a character is called decoding. People abstract the concept of a character set to describe the encoding rules of a certain range of characters. For example, let's define a character set named xiaohaizi, which contains character ranges and encoding rules as follows: Contains characters 'a', 'b', 'A', 'B'. The encoding rules are as follows: One byte is used to encode one character, and the mapping relationship between characters and bytes is as follows: 'a' -> 00000001 (hexadecimal: 0x01) 'b' -> 00000010 (hexadecimal: 0x02) 'A' -> 00000011 (hexadecimal: 0x03) 'B' -> 00000100 (hexadecimal: 0x04) With the xiaohaizi character set, we can represent some strings in binary form, the following is Binary representation of some strings encoded in the xiaohaizi character set: 'bA' -> 0000001000000011 (hexadecimal: 0x0203) 'baB' -> 000000100000000100000100 (hexadecimal: 0x020104) 'cd' -> unrepresentable, character Set xiaohaizi does not contain characters 'c' and 'd'

——Quoting "How MySQL Works"

Introduction to Comparison Rules


  1. Convert two characters with different case to uppercase or lowercase.
  2. Then compare the binary data corresponding to these two characters.

Note: The same character set can have multiple comparison rules

utf8 and utf8mb4 in MySQL


utf8mb3: The castrated utf8 character set, which only uses 1 to 3 bytes to represent characters.

utf8mb4: The authentic utf8 character set, using 1 to 4 bytes to represent characters.

View mysql character set

SHOW (CHARACTER SET|CHARSET) [LIKE 匹配的模式];

Check mysql comparison rules

SHOW COLLATION [LIKE 匹配的模式];

Note: Each character set corresponds to several comparison rules, and each character set has a default comparison rule

Welcome to the ggball blog !

Guess you like

Origin blog.csdn.net/ZHUXIUQINGIT/article/details/122954391