How to view and modify the database character set in MySQL


basic concepts

Character (Character) refers to the smallest symbol in human language. For example,'A','B', etc.;

  • Given a series of characters, assign a value to each character, and use the value to represent the corresponding character. This value is the character encoding (Encoding). For example, if we assign a value of 0 to the character'A' and a value of 1 to the character'B', then 0 is the code of the character'A';
  • Given a series of characters and assigning the corresponding codes, the set of all these characters and code pairs is the character set (Character Set). For example, when the given character list is {'A','B'}, {'A'=>0,'B'=>1} is a character set;
  • Collation refers to the comparison rules between characters in the same character set;
  • Only after determining the character sequence, can you define the equivalent characters on a character set, and the size relationship between the characters;
  • Each character sequence uniquely corresponds to a character set, but a character set can correspond to multiple character sequences, one of which is the default character sequence (Default Collation);
  • The character sequence names in MySQL follow the naming convention: start with the name of the character set corresponding to the character sequence; end with _ci (indicating case-insensitive), _cs (indicating case-sensitive) or _bin (indicating comparison by encoding value). For example: in the character order "utf8_general_ci", the characters "a" and "A" are equivalent;

MySQL character set settings

System variables:

View MySQL data server and database character set

show variables where variable_name rlike 'character|collation';

Insert picture description here

View the character set supported by MYSQL

show character set;

Insert picture description here

or

show charset;

Insert picture description here

  • character_set_server: The default internal operation character set
  • character_set_client: The character set used by the client source data
  • character_set_connection: connection layer character set
  • character_set_results: query result character set
  • character_set_database: The default character set of the currently selected database
  • character_set_system: system metadata (field name, etc.) character set
  • There are also corresponding variables starting with collation_ to describe the character order.

Character set conversion process in MySQL

When MySQL Server receives a request, it converts the request data from character_set_client to character_set_connection;
before performing internal operations, converts the request data from character_set_connection to the internal operation character set. The determination method is as follows:

  • Use the CHARACTER SET setting value of each data field;
  • If the above value does not exist, use the DEFAULT CHARACTER SET setting value of the corresponding data table (MySQL extension, non-SQL standard);
  • If the above value does not exist, use the DEFAULT CHARACTER SET setting value of the corresponding database;
  • If the above value does not exist, the character_set_server setting value is used.
    Convert the operation result from the internal operation character set to character_set_results.
    We now go back and analyze the garbled problem we have generated:
    a Our field does not have a character set set, so we use the data set
    of the table b Our table does not specify a character set, and the character set
    of the database is used by default c Our database is being created When the character set is not specified, we use character_set_server to set the value
    d. We did not deliberately modify the specified character set
    of character_set_server. Therefore, the default character set of mysql is latin1. Therefore, we use the latin1 character set, and our character_set_connection The character set is UTF-8, and it is inevitable to insert Chinese garbled characters.

Common problem analysis

  • FAQ-1 The connection character set is not set before inserting utf8-encoded data into the data table with the default character set of utf8, and the connection character set is set to utf8 when querying-
    according to the default setting of the MySQL server when inserting, character_set_client, character_set_connection and character_set_results are all latin1 ;
    - insertion of data through latin1 => latin1 => utf8 character set conversion process, the process of each insert characters into 6 bytes are stored from the original 3 bytes;
    - when a query The result will go through the character set conversion process of utf8=>utf8, and the saved 6 bytes will be returned intact, resulting in garbled characters. Refer to the picture below:
  • The connection character set is set to utf8 before inserting utf8-encoded data into the data table with the default character set of latin1 (the error we encountered belongs to this kind)
    -according to the connection character set setting when inserting, character_set_client, character_set_connection and character_set_results are all utf8;
    -The inserted data will undergo the character set conversion of utf8=>utf8=>latin1. If the original data contains Unicode characters outside the range of \u0000~\u00ff, they will be converted to "?" because they cannot be represented in the latin1 character set (0×3F) symbol, the content cannot be restored regardless of the connection character set setting when inquiring later. The conversion process is as follows:

Some means to detect character set problems

  • SHOW CHARACTER SET;
  • SHOW COLLATION;
  • SHOW VARIABLES LIKE ‘character%’;
  • SHOW VARIABLES LIKE ‘collation%’;
  • SQL函数HEX、LENGTH、CHAR_LENGTH
  • SQL functions CHARSET, COLLATION

Recommendations when using MySQL character set

  • When creating a database/table and performing database operations, try to explicitly indicate the character set used instead of relying on the default settings of MySQL, otherwise it may cause great troubles when MySQL upgrades;
  • When both the database and the connection character set use latin1, although the garbled problem can be solved in most cases, the disadvantage is that SQL operations cannot be performed in character units. Generally, it is better to set both the database and the connection character set to utf8 s Choice;
  • When using mysql CAPI (mysql provides C language operation API), immediately after initializing the database handle, use mysql_options to set the MYSQL_SET_CHARSET_NAME attribute to utf8, so that there is no need to explicitly use the SET NAMES statement to specify the connection character set, and use mysql_ping to reconnect and disconnect The connection character set will also be reset to utf8 during long connection;
  • For the mysql PHP API, the general page-level PHP program has a shorter running time. After connecting to the database, you can explicitly set the connection character set once with the SET NAMES statement; but when using a long connection, please pay attention to keep the connection open and After disconnecting and reconnecting, use the SET NAMES statement to explicitly reset the connection character set.

Other matters needing attention

  • The default_character_set setting in my.cnf only affects the connection character set when the mysql command connects to the server, and will not have any effect on applications that use the libmysqlclient library!
  • SQL function operations on fields are usually performed in the internal operation character set, and are not affected by the connection character set setting.
  • The bare string in the SQL statement will be affected by the connection character set or the introducer setting. For operations such as comparison may produce completely different results, you need to be careful!

Modify character set

Modify the global character set

 /*建立连接使用的编码*/
set character_set_connection=utf8;
/*数据库的编码*/
set character_set_database=utf8;
/*结果集的编码*/
set character_set_results=utf8;
/*数据库服务器的编码*/
set character_set_server=utf8;

set character_set_system=utf8;

set collation_connection=utf8;

set collation_database=utf8;

set collation_server=utf8;

Modify the character set of the library

语法:alter database 库名 default character set 字符集;

Modify the character set of the table

语法:alter table 表名 convert to character set 字符集;

Modify the character set of the field

语法:alter table 表名 modify 字段名 字段属性 character set gbk;

Guess you like

Origin blog.csdn.net/lz6363/article/details/114904325