Summary of issues related to MySQL character set, collation and case sensitivity

Related concepts

Several important terms and concepts:
encoding: the internal representation of a character set member
character set: a collection of alphabetic symbols collation
: instructions specifying how characters are compared and sorted (also called validation rules or collation rules)
when not specified ( 默认情况): The library inherits the instance, the table inherits the library, and the field inherits the character set and collation of the table

View character sets and collations

View the character set supported by the current instance

Also show the default collation used by each charset

show charset;
或者
show character set;
或者
## 在MySQL中,全部的字符集与排列字符集的信息都存放在information_schema库中,所以可以通过下面的方式
select * from INFORMATION_SCHEMA.CHARACTER_SETS;

Can filter:

show charset like '%utf8%';

show charset where charset like '%utf8%';

show character set like '%utf8%';

utf8mb4

The utf8 in mysql can only support character encoding with a length of 3bytes at most. For some characters that need to occupy 4bytes, the utf8 of mysql does not support it, and you must use utf8mb4.It is recommended to use utf8mb4.

There are utf8mb4_bin, utf8mb4_unicode_ci, and utf8mb4_general_ci in utf8mb4, where bin compares characters as binary strings, so they are case-sensitive, and the other two are case-insensitive. Unicode is newer than general and conforms to newer standards.

String length calculation note

Due to multi-byte encoding, the number of characters is not necessarily equal to the number of bytes, so when querying, pay attention to the distinction, length is the number of bytes, char_length is the number of characters

select length('你'); 
/*结果是3,3个字节,数据库采用的utf8编码*/

select char_length('你'); 
/*结果是1,1个字符*/

View the collations supported by the current instance

show collation;
或者
select * from INFORMATION_SCHEMA.COLLATIONS;

Collation generally ends with "_ci", "_cs", and "_bin", among which, ciit means case insensitive, csmeans case sensitive, and binmeans binary (characters are compared as binary strings, so they are case sensitive).

For example, charset utf8 is used in the project and its collation utf8_unicode_ci is used, so string comparison is case insensitive.

Can filter:

SHOW COLLATION WHERE Charset = 'utf8';

View all collations supported by a character set under the current instance

show collation where charset = 'utf8'

View the character set and collation used by the current instance

Remarks: When the library is not specified, use this default value

SHOW VARIABLES LIKE 'character%';

SHOW VARIABLES LIKE 'collation%';

Note that this parameter collation_serveris important: this parameter defines the character set and collation used by the server.
Correspondingly find or add the following lines in the MySQL configuration file, for example:

[mysqld]
collation_server=utf8_general_ci

View database encoding (including character set and collation)

USE database_name;

SHOW CREATE DATABASE db_name;

或者

SHOW VARIABLES LIKE 'character_set_database';
SHOW VARIABLES LIKE 'collation_database';

View table encoding (including character set and collation)

SHOW CREATE TABLE tbl_name;

View the encoding (including collation) of all fields in the table

In the specified library, use the SHOW FULL COLUMNS FROM table_name statement to query the information of all columns in the table.

USE database_name;
SHOW FULL COLUMNS FROM table_name;

Specified column
In order to query the information of the specified column, you can add conditions to limit the Field field in the above statement.

SHOW FULL COLUMNS FROM table_name WHERE Field = column_name;

Set character set and collation

set up

Set character set and collation:
MySQL allows character set and collation to be set at four levels including server, database, table, and column. The character set attribute set at the smaller level has a higher priority, that is, the character set attribute at the column level will override the character set attribute set at other levels.

server < database < table < column

server

The character set uses utf8mb4 by default. If the character set is set but the collation is not set, the default collation of the character set is used.

The following command serversets the character set to utf8 at the level, and the collation to utf8_unicode_ci. In this way, all databases under the server use the character set information set by the server.

mysqld --character-set-server=utf8 --collation-server=utf8_unicode_ci

database

The following SQL sets the character set and collation at the DB level. In this way, all tables created in DB will use the character set attribute set by db.

CREATE DATABASE database_name CHARACTER SET utf8 COLLATE utf8_bin;

ALTER DATABASE database_name CHARACTER SET utf8 COLLATE utf8_bin;

Note: The newly set character set attribute does not affect the created table, only the default character set will be assigned to the newly created table without specified character set

table

CREATE TABLE TABLE_NAME (
   ...
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin

ALTER TABLE TABLE_NAME CONVERT TO CHARACTER SET utf8 COLLATE utf8_bin;

ALTER TABLE table_name CHARACTER SET charset_name COLLATE collation_name;

column

Character set attributes can be set for columns of data types such as CHAR, VARCHAR, and TEXT.
The format is as follows:

column_name [CHAR | VARCHAR | TEXT] (length) CHARACTER SET character_set_name COLLATE collation_name

-- 案例:
ALTER TABLE table_name MODIFY column_name column_type CHARACTER SET charset_name COLLATE collation_name;

Explicitly specify collation when querying

The above describes setting character sets and collation at different levels. We can explicitly specify collation when querying data to override the collation specified on the table column, or use binary.

The format is as follows:

SELECT DISTINCT field1 COLLATE utf8mb4_general_ci FROM table1;

SELECT field1, field2 FROM table1 ORDER BY field1 COLLATE utf8mb4_unicode_ci;

-- 在每一个条件前加上binary关键字
select * from user where binary username = 'admin' and binary password = 'admin';

-- 将参数以binary('')包围
select * from user where username like binary('admin') and password like binary('admin');

case sensitive issue

View the current case-sensitive configuration of mysql

show global variables like '%case%';

lower_case_file_system

This parameter is used to describe whether the file directory of the operating system where the data is located is case-sensitive. This parameter is a bool type, a read-only parameter, and cannot be modified.
This variable is read-only because it reflects the properties of the file system and setting it has no effect on the file system.
0 – case sensitive, OFF
1 – case insensitive, ON

For example, all running on linux are OFF or 0.

lower_case_table_names

Indicates whether the table name is case sensitive and can be modified. This parameter is static and can be set to 0, 1, or 2.

0 – case sensitive. (Unix, Linux default)
Created library tables will be saved as-is on disk. Such as create database TeSt; will create a directory of TeSt, create table AbCCC ... will generate AbCCC.frm as it is.
SQL statements are also parsed as-is.

1 – Case insensitive. (Windows default)
When creating a library table, MySQL converts all library table names to lowercase and stores them on disk.
The SQL statement will also convert the library table name to lowercase.
If you need to query the previously created Test_table (generate the Test_table.frm file), even if you execute select * from Test_table, it will be converted into select * from test_table, resulting in the error table does not exist.

2 - Case insensitive (OS X default)
Created library tables will be saved as-is on disk.
But the SQL statement converts the library table name to lowercase.

Common hidden dangers caused by modifying lower_case_table_names:
If lower_case_table_names=0, a library table containing uppercase letters is created and changed to lower_case_table_names=1, it will not be found.

Note:
To set the default lower_case_tables_name from 0 to 1, you need to convert the existing library table name to lowercase first:

1) For the case where there are only uppercase letters in the table name:
①, when lower_case_tables_name=0, execute rename table to lowercase.
②. Set lower_case_tables_name=1, restart to take effect.

2) For the case of uppercase letters in the database name:
① When lower_case_tables_name=0, use mysqldump to export and delete the old database.
②. Set lower_case_tables_name=1, restart to take effect.
③. Import data to the instance, at this time, the library name containing uppercase letters has been converted to lowercase.

The conversion operation needs to be tested by yourself. Different operating systems and different MySQL versions may have different situations.

reference

MySQL official website server-system-variables
MySQL database configuration information view and modify
MySQL show statement common usage summary

Guess you like

Origin blog.csdn.net/u014163312/article/details/131295504