Goodbye garbled characters: 5 minutes to understand MySQL character set settings

Abstract: In the process of using MySQL, it is very important to understand the concept of character set, character order, and the impact of different settings on data storage and comparison. The problem of "garbled characters" encountered by many students in their daily work is likely to be caused by poor understanding of the character set and character order and wrong settings. This article introduces the following contents from the shallower to the deeper: 1.

First, the content overview

In the process of using MySQL, it is very important to understand the concept of character set and character order, as well as the impact of different settings on data storage and comparison. The problem of "garbled characters" encountered by many students in their daily work is likely to be caused by poor understanding of the character set and character order and wrong settings.

This article introduces the following contents from shallow to deep:

the basic concepts of character set and character order and
the character set supported by MySQL, character order setting level, the relationship between
server, database, table, column-level character set, View and set character sequence When
should character set and character sequence be set
Second, the concept and relationship of character set and character sequence

In the storage of data, MySQL provides different character set support. In the comparison operation of data, it provides different character order support.

MySQL provides different levels of settings, including server-level, database-level, table-level, and column-level, which can provide very precise settings.

What is the character set, character order? In short:

character set (character set): defines the characters and the encoding of the characters.
Character order (collation): defines the comparison rules of characters.
for example:

There are four characters: A, B, a, b, the encoding of these four characters are A = 0, B = 1, a = 2, b = 3 respectively. The character + encoding here constitutes a character set.

What if we want to compare the size of two characters? For example, A, B, or a, b, the most intuitive way to compare is to use their encoding, for example, because 0 < 1, so A < B.

In addition, for A and a, although they are encoded differently, we think that upper and lower case characters should be equal, that is, A == a.

Two comparison rules are defined above, and the collection of these comparison rules is collation.

The same uppercase characters and lowercase characters are compared with their encoding sizes;
if the two characters are uppercase and lowercase, they are equal.
3. Character sets and character sequences supported by

MySQL MySQL supports a variety of character sets and character sequences.

A character set corresponds to at least one character order (usually 1-to-many).
Two different character sets cannot have the same character order.
Each character set has a default character order.
The above is relatively abstract, we will know what is going on after reading the next few sections.

1. View the supported character sets

You can supported by MYSQL in the following ways.

Method 1:

mysql> SHOW CHARACTER SET;
+----------+---------------------------- +---------------------+-------+
| Charset  | Description                 | Default collation   | Maxlen |
+----------+-----------------------------+---------------------+--------+
| big5     | Big5 Traditional Chinese    | big5_chinese_ci     |      2 |
| dec8     | DEC West European           | dec8_swedish_ci     |      1 |
...省略
方式二:

mysql> use information_schema;
mysql> select * from CHARACTER_SETS;
+--------------------+----------------------+-----------------------------+--------+
| CHARACTER_SET_NAME | DEFAULT_COLLATE_NAME | DESCRIPTION                 | MAXLEN |
+--------------------+----------------------+-----------------------------+--------+
| big5 | big5_chinese_ci | Big5 Traditional Chinese | 2 | | dec8
| dec8_swedish_ci | DEC West European | 1 | Example 1: Using the WHERE qualification. mysql> SHOW CHARACTER SET WHERE Charset="utf8"; +---------+---------------+----------- --------+--------+ | Charset | Description | Default collation | Maxlen | +---------+----------- ----+-------------------+--------+ | utf8 | UTF-8 Unicode | utf8_general_ci | 3 | +--- ------+---------------+-------------------+------- -+ 1 row in set (0.00 sec) Example 2: Use LIKE qualification. mysql> SHOW CHARACTER SET LIKE "utf8%";















+---------+---------------+--------------------+-- ------+
| Charset | Description | Default collation | Maxlen |
+---------+---------------+------ --------------+-------+
| utf8 | UTF-8 Unicode | utf8_general_ci | 3 |
| utf8mb4 | UTF-8 Unicode | utf8mb4_general_ci | 4 |
+- --------+---------------+--------------------------------+---- ----+
2 rows in set (0.00 sec)
2. Check the supported character sequence

Similarly , you can check the character sequence supported by MYSQL as follows.

Method 1: View through SHOW COLLATION.

As you can see, there are more than 10 character sequences in the utf8 character set. Whether the default character order is determined by whether the value of Default is Yes.

mysql> SHOW COLLATION WHERE Charset = 'utf8';
+--------------------------+---------+- ----+---------+----------+---------+
| Collation                | Charset | Id  | Default | Compiled | Sortlen |
+--------------------------+---------+-----+---------+----------+---------+
| utf8_general_ci          | utf8    |  33 | Yes     | Yes      |       1 |
| utf8_bin                 | utf8    |  83 |         | Yes      |       1 |
...略
方式二:查询information_schema.COLLATIONS。

mysql> USE information_schema;
mysql> SELECT * FROM COLLATIONS WHERE CHARACTER_SET_NAME="utf8";
+--------------------------+--------------------+-----+------------+-------------+---------+
| COLLATION_NAME           | CHARACTER_SET_NAME | ID  | IS_DEFAULT | IS_COMPILED | SORTLEN |
+------------------------------------+--------------------------------+- ----+------------+-------------+---------+
| utf8_general_ci | utf8 | 33 | Yes | Yes | 1 | | utf8_bin
| utf8 | 83 | | Yes | 1 | | utf8_unicode_ci
| utf8 | 192 | | Yes | 8 | As follows. For example, the character sequence utf8_general_ci indicates that it is the character sequence of the character set utf8. For more rules, please refer to the official documentation. MariaDB [information_schema]> SELECT CHARACTER_SET_NAME, COLLATION_NAME FROM COLLATIONS WHERE CHARACTER_SET_NAME="utf8" limit 2; +--------------------+------- ---------+ | CHARACTER_SET_NAME | COLLATION_NAME |









+--------------------+-----------------+
| utf8 | utf8_general_ci |
| utf8 | utf8_bin |
+--------------------+-----------------+
2 rows in set (0.00 sec)
4. The character set and character sequence of the server

Purpose : When you create a database and do not specify the character set and character sequence, the server character set and server character sequence will be used as the default character set and collation of the database.

How to specify: When the MySQL service is started, it can be specified through command line parameters. It can also be specified via variables in the configuration file.

Server default character set and character order: When MySQL is compiled, it is specified by compilation parameters.

character_set_server and collation_server correspond to the server character set and server character sequence respectively.

1. Check the server character set and character sequence

, which correspond to the two system variables character_set_server and collation_server respectively.

mysql> SHOW VARIABLES LIKE "character_set_server";
mysql> SHOW VARIABLES LIKE "collation_server";
2. Specify when starting the service

You can specify the server character set and character sequence when the MySQL service is started. If not specified, the default character sequence is latin1, latin1_swedish_ci

mysqld --character-set-server=latin1 \
       --collation-server=latin1_swedish_ci
specify the server character set separately, at this time, the server character sequence is the default character sequence of latin1 latin1_swedish_ci .

mysqld --character-set-server=latin1
3. Configuration file specification

In addition to in command line parameters, it can also be specified in the configuration file, as shown below.

[client]
default-character-set=utf8

[mysql]
default-character-set=utf8

[mysqld]
collation-server = utf8_unicode_ci
init-connect='SET NAMES utf8'
character-set-server = utf8
4. Runtime modification

example : Modify at runtime (it will be invalid after restarting, if you want to keep the same after restarting, you need to write it into the configuration file)

mysql> SET character_set_server = utf8 ;
5. Specify the default character set and character sequence when compiling

The default values ​​of character_set_server and collation_server can be specified through compilation options when compiling MySQL:

cmake . -DDEFAULT_CHARSET=latin1 \
           -DDEFAULT_COLLATION=latin1_german1_ci
5. Database character set and character sequence

Purpose : Specify the character set and character sequence at the database level. Databases under the same MySQL service can specify different character sets/character sequences respectively.

1. Set the character set/character order of the data

You can specify the character set and sorting rules of the database through CHARACTER SET and COLLATE when creating or modifying the database.

Create database:

CREATE DATABASE db_name
    [[DEFAULT] CHARACTER SET charset_name]
    [[DEFAULT] COLLATE collation_name]
Modify database:

ALTER DATABASE db_name
    [[DEFAULT] CHARACTER SET charset_name]
    [[DEFAULT] COLLATE collation_name]
Example: Create database test_schema, character set Set to utf8, the default collation is utf8_general_ci.

CREATE DATABASE `test_schema` DEFAULT CHARACTER SET utf8;
2. Check the character set/character order of the database

There are 3 ways to check the character set/character order of the database.

Example 1: View the character set and collation of test_schema. (need to switch the default database)

mysql> use test_schema;
Database changed
mysql> SELECT @@character_set_database, @@collation_database;
+------------------------- -+----------------------+
| @@character_set_database | @@collation_database |
+--------------- -----------+------------+
| utf8 | utf8_general_ci |
+--------- -----------------+--------------------------------+
1 row in set (0.00 sec)
example Two: You can also view the character set and database of test_schema through the following commands (no need to switch the default database)

mysql> SELECT SCHEMA_NAME, DEFAULT_CHARACTER_SET_NAME, DEFAULT_COLLATION_NAME FROM information_schema.SCHEMATA WHERE schema_name="test_schema";
+-------------+---------------- ------------+--------------------------------------+
| SCHEMA_NAME | DEFAULT_CHARACTER_SET_NAME | DEFAULT_COLLATION_NAME |
+---- ---------+----------------------------+------------ -------------+
| test_schema | utf8 | utf8_general_ci |
+-------------+-------------- --------------+------------------------+
1 row in set (0.00 sec)
Example 3 : You can also view the character set by looking at the statement that created the database.

mysql> SHOW CREATE DATABASE test_schema;
+-------------+---------------------------- --------------------------------------------+
| Database | Create Database |
+-------------+---------------------------- -----------------------------------------+
| test_schema | CREATE DATABASE `test_schema` /*!40100 DEFAULT CHARACTER SET utf8 */ |
+-------------+------------------------ ----------------------------------------------+
1 row in set (0.00 sec)
3. How to determine the database character set and character sequence

When creating a database, if CHARACTER SET or COLLATE is specified, the corresponding character set and collation rule shall prevail.
When creating a database, if no character set or collation is specified, character_set_server and collation_server shall prevail.
6. The character set and character order of the table The syntax for

creating a table and modifying the table is as follows. You can set the character set and character order through CHARACTER SET and COLLATE.

CREATE TABLE tbl_name (column_list)
    [[DEFAULT] CHARACTER SET charset_name]
    [COLLATE collation_name]]

ALTER TABLE tbl_name
    [[DEFAULT] CHARACTER SET charset_name]
    [COLLATE collation_name]
1. Create a table and specify the character set/character sequence The

example is as follows, the specified character set is utf8, and the character sequence is the default.

CREATE TABLE `test_schema`.`test_table` (
  `id` INT NOT NULL COMMENT '',
  PRIMARY KEY (`id`) COMMENT '')
DEFAULT CHARACTER SET = utf8;
2. Check the character set/character order of the table

Similarly , there are There are 3 ways to view the character set/character order of a table.

Method 1: View the table status through SHOW TABLE STATUS. Note that the Collation is utf8_general_ci, and the corresponding character set is utf8.

MariaDB [blog]> SHOW TABLE STATUS FROM test_schema \G;
************************** 1. row ******* ********************
           Name: test_table
         Engine: InnoDB
        Version: 10
     Row_format: Compact
           Rows: 0
Avg_row_length: 0
    Data_length: 16384
Max_data_length: 0
   Index_length: 0
      Data_free: 11534336
Auto_increment: NULL
    Create_time: 2018-01-09 16:10:42
    Update_time: NULL
     Check_time: NULL
      Collation: utf8_general_ci
       Checksum: NULL
Create_options:
        Comment:
1 row in set (0.00 sec)
方式二:查看information_schema.TABLES的信息。

mysql> USE test_schema;
mysql> SELECT TABLE_COLLATION FROM information_schema.TABLES WHERE TABLE_SCHEMA = "test_schema" AND TABLE_NAME = "test_table";
+-----------------+
| TABLE_COLLATION |
+-----------------+
| utf8_general_ci |
+-----------------+
方式三:通过SHOW CREATE TABLE确认。

mysql> SHOW CREATE TABLE test_table;
+------------+----------------------------------------------------------------------------------------------------------------+
| Table      | Create Table                                                                                                   |
+------------+----------------------------------------------------------------------------------------------------------------+
| test_table | CREATE TABLE `test_table` (
  `id` int(11) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 |
+------------+---------------------------- -------------------------------------------------- -----------------------------------+
1 row in set (0.00 sec)
3. table character set, How to determine the character sequence

Suppose the values ​​of CHARACTER SET and COLLATE are charset_name and collation_name respectively. If the table is created:

charset_name and collation_name are specified, then charset_name and collation_name are used.
If only charset_name is specified, but collation_name is not specified, the character set adopts charset_name, and the character sequence adopts the default character sequence corresponding to charset_name.
Only the collation_name is specified, but the charset_name is not specified, then the character sequence adopts collation_name, and the character set adopts the character set associated with collation_name.
If neither charset_name nor collation_name is specified, the character set and character sequence of the database are used.
7. Column character set and sorting For columns of

type CHAR, VARCHAR, and TEXT, you can specify the character set/character sequence. The syntax is as follows:

col_name {CHAR | VARCHAR | TEXT} (col_length)
    [CHARACTER SET charset_name]
    [COLLATE collation_name]
1. Add a new column and specify the character set/collation.

Examples are as follows: (similar to creating a table)

mysql> ALTER TABLE test_table ADD COLUMN char_column VARCHAR(25) CHARACTER SET utf8;
2. View the column Character set/character order

Example :

mysql> SELECT CHARACTER_SET_NAME, COLLATION_NAME FROM information_schema.COLUMNS WHERE TABLE_SCHEMA="test_schema" AND TABLE_NAME="test_table" AND COLUMN_NAME="char_column";
+------------ --------+-----------------+
| CHARACTER_SET_NAME | COLLATION_NAME |
+----------------- ---+-----------------+
| utf8 | utf8_general_ci |
+--------------------+- ----------------+
1 row in set (0.00 sec)
3. Column character set/collation is determined

Assume that the values ​​of CHARACTER SET and COLLATE are charset_name and collation_name respectively:

if both charset_name and collation_name are clear, the character set and character sequence are subject to charset_name and collation_name.
If only charset_name is specified, and collation_name is not specified, the character set is charset_name, and the character sequence is the default character sequence of charset_name.
If only collation_name is specified, and charset_name is not specified, the character sequence is collation_name, and the character set is the character set associated with collation_name.
If neither charset_name nor collation_name is specified, the character set and character order of the table shall prevail.
Eight, choice: when to set the character set, character order

Generally speaking , you can configure it in three places: configure

it when you create a database.
Configure when mysql server starts.
When compiling mysql from source code, configure it through compilation parameters
1. Method 1: configure when creating a database

This method is more flexible and safer, and it does not depend on the default character set/character order. When you create a database, specify the character set/character order. When you create a table and column later, if you do not specify it, the character set/character order of the corresponding database will be inherited.

CREATE DATABASE mydb
  DEFAULT CHARACTER SET utf8
  DEFAULT COLLATE utf8_general_ci;
2. Method 2: Configure when mysql server is started

You can add the following configuration, so that when mysql server starts, it will configure character-set-server and collation-server.

When you create database/table/column through mysql client, and no character set/character order is explicitly declared, then character-set-server/collation-server will be used as the default character set/character order.

In addition, the character set/character sequence for client and server connection still needs to be set through SET NAMES.

[mysqld]
character-set-server=utf8
collation-server=utf8_general_ci
3. Method 3: When compiling mysql from source code, set it through

compilation If -DDEFAULT_CHARSET and -DDEFAULT_COLLATION are specified when compiling, then:

create database and table , it will be used as the default character set/character order.
When the client connects to the server, it will use it as the default character set/character sequence. (No separate SET NAMES)
shell> cmake . -DDEFAULT_CHARSET=utf8 \
           -DDEFAULT_COLLATION=utf8_general_ci
Nine, write it later

This article introduces the content related to the character set and character order in MySQL in detail. This part of the content is mainly aimed at the storage and comparison of data. In fact, there is still a very important part of the content that has not been covered: the character set and character sequence settings for the connection.

There are also a lot of garbled characters caused by improper setting of the connected character set and character order. This part of the content has a lot of content, which will be explained in the next article.

Due to space limitations, some content is not detailed. Interested students are welcome to communicate or view official documents. If there are any errors or omissions, please point them out.

X. Related Links

10.1 Character Set Support
https://dev.mysql.com/doc/refman/5.7/en/charset.html
Copyright Notice: The content of this article is contributed by Internet users, the copyright belongs to the author, and the community does not own the ownership , and do not assume relevant legal responsibility. If you find any content suspected of plagiarism in this community, please send an email to: [email protected] to report and provide relevant evidence. Once verified, this community will immediately delete the allegedly infringing content.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326069720&siteId=291194637