MySQL character set setting and character conversion (latin1 to utf8)

In the generation environment, you will often encounter the problem of setting Mysql characters. Many CU experts have said it N times, and I will summarize it personally.

If the character set is not set correctly at the beginning of the application, and it is found after running for a period of time that it cannot meet the requirements and needs to be adjusted, then the character set needs to be modified. The modification of the character set cannot be done directly through the alter dataabase character set ***; or alter table tablename character set ***; commands. Neither of these two commands update the character set of the existing records, but only for the newly created table or The record takes effect.

The most commonly used ones in the author's database are latin1 and utf8. Due to the integration of the project, the character requirements have also been standardized and unified into utf8, so it is necessary to convert the previous latin1 characters into utf8 format, so that there will be no garbled characters. To adjust the character set of the existing records, you need to export the data first, and then re-import it after proper adjustment.

The following simulation is the process of modifying the database of latin1 character set to the database of GBK character set. The operation is relatively simple, as long as the time spent is importing and exporting data. Let's briefly introduce the process of converting Mysql character variables and latin1 to utf8. You can draw inferences from one example according to your actual situation.

1. MySQL character set settings

• System variables:
– character_set_server: default internal operation character set
– character_set_client: character set used by client source data
– character_set_connection: connection layer character set
– character_set_results: query result character set
– character_set_database: currently selected database the default character set of
– character_set_system: System metadata (field names, etc.) character set
– There are also variables starting with collation_ corresponding to the above, which are used to describe the character order.
• Use the introducer to specify the character set for text strings:
– The format is: [_charset] 'string' [COLLATE collation]
– For example:
• SELECT _latin1 'string';
• SELECT _utf8 'hello' COLLATE utf8_general_ci;
– modified by the introducer The text string is directly converted to the internal character set for processing without redundant transcoding during the request process.
Second, the character set conversion process in

MySQL 1. When MySQL Server receives a request, it converts the request data from character_set_client to character_set_connection;
2. Converts the request data from character_set_connection to internal operation character set before performing internal operations. The determination method is as follows:
• Use the CHARACTER SET setting value of each data field;
• If the above value does not exist, use the DEFAULT CHARACTER SET setting value of the corresponding data table (MySQL extension, non-SQL standard);
• If the above value does not exist, use the corresponding The DEFAULT CHARACTER SET setting for the database;
• If the above value does not exist, the character_set_server setting is used.
3. Convert the operation results from the internal operation character set to character_set_results.

3. Convert latin1 to utf8
Take the original character set as latin1 as an example, and upgrade it to the character set of utf8. The original table: databasename (default charset=latin1), the new table: new_databasename (default charset=utf8).

mysql> show create database databasename;
+--------------+-------------------------------------------------------------------------+
| Database | Create Database |
+--------------+-------------------------------------------------------------------------+
| databasename | CREATE DATABASE `databasename` /*!40100 DEFAULT CHARACTER SET latin1 */ |
+--------------+-------------------------------------------------------------------------+
1 row in set (0.00 sec)


1> Export table structure:

mysqldump -uroot -p --default-character-set=utf8 -d databasename > createtab.sql

Where --default-character-set=utf8 indicates what character set to connect to, and -d indicates that only the table structure is exported, not the data.
2> Modify the character set in the table structure definition in createtab.sql to a new character set.
sed -i s/CHARSET=latin1/CHARSET=utf8/g `grep -rl "CHARSET=latin1" createtab.sql `

3> Make sure the records are no longer updated, export all records.
mysqldump -uroot -p --no-create-info databasename > data.sql

Optional parameters:
--quick: This option is used to dump large tables. It forces mysqldump to retrieve the rows of the table from the server one row at a time instead of retrieving all the rows, and caches it in memory before outputting.
--extended-insert: Use a multi-line insert syntax that includes several lists of values, which makes the dump file smaller and can speed up insertion when reloading the file.
--no-create-info: Do ​​not write create table statements that recreate each dump table.
--default-character-set=latin1: Export all data according to the original character set, so that in the exported file, all Chinese characters are visible and will not be saved as garbled characters. Do not add this parameter to export with default characters.
4> Open data.sql, and change set names latin1 to set names utf8.
sed -i s/CHARSET=latin1/CHARSET=utf8/g `grep -rl "CHARSET=latin1" data.sql `

(PS: The work efficiency of sed is very high. In the test, the conversion of 60G data is completed in 4 minutes.)

5> Use the new character set to create a new database.
create database new_databasename default charset utf8;

6> Create a table, execute createtab.sql
mysql -uroot -p new_databasename < createtab.sql

7> Import data, execute data.sql
mysql -uroot -p new_databasename < data.sql

8> View the character information of the new database
mysql> show create database new_databasename;

+------------------+-----------------------------------------------------------------------------+

| Database | Create Database |

+------------------+-----------------------------------------------------------------------------+

| new_databasename | CREATE DATABASE `new_databasename` /*!40100 DEFAULT CHARACTER SET utf8 */ |

+------------------+-----------------------------------------------------------------------------+

1 row in set (0.00 sec)

mysql> show create table type;

+-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

| Table | Create Table |

+-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

| type | CREATE TABLE `type` (

  `id` int(10) NOT NULL AUTO_INCREMENT,

  `Name` varchar(100) CHARACTER SET gb2312 NOT NULL,

  PRIMARY KEY (`id`)

) ENGINE=MyISAM AUTO_INCREMENT=17 DEFAULT CHARSET=utf8 |

+-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

1 row in set (0.00 sec)

Note: When choosing a character set, pay attention to the super character of the source character, or make sure the font library is larger than the source character set.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326481532&siteId=291194637