Cures MySQL garbled question

Author: spermwhale0
Jane book: https: //www.jianshu.com/p/94d6b75bdff9

MySQL encoding process

There are many reasons MySQL garbled, with the general character_setparameters. Let's take a look at what are the parameters:

SHOW VARIABLES LIKE "character%";
Variable_name   Value
character_set_client    utf8
character_set_connection    utf8
character_set_database  utf8
character_set_filesystem    binary
character_set_results   utf8
character_set_server    utf8
character_set_system    utf8
character_sets_dir  /usr/local/Cellar/[email protected]/5.7.24/share/mysql/charsets/

Among them, the most important is character_set_client, and character_set_results. These two parameters are what use is it?

When a client command input MySQL, MySQL only know that this command is a byte stream of 0101, we do not know what specific coding employed. The first parameter character_set_clienttells the MySQL, this command is UTF-8coded, so MySQL will be used UTF-8to decode a stream of bytes. When MySQL successfully decoded, the content will be converted to the target table command code.

Coding tables can be viewed with the following command:

SHOW FULL COLUMNS FROM student;

MySQL is assumed character_set_clientto UTF-8, for coding table GBK. If UTF-8the terminal type: INSERT INTO student VALUES ('小明', 12), first decodes the MySQL UTF-8 with this command, then "Bob" the word into a corresponding GBK encoding, and finally stored in the table.

Another parameter character_set_resultsis encoded query output means. If the encoding table is GBK, character_set_resultsset to UTF-8, then the contents of the query in the table will be first converted to UTF-8 encoding, and then output to the terminal.

MySQL data read and write process can be represented by the following figure:

As it can be seen from the figure, when the table is stored decoding / encoding process and the decoding / encoding process does not correspond to the read table, will be garbled.

If you want to change character_set_clientand character_set_resultsmakes it easy to execute a command:

SET names gbk;
Variable_name   Value
character_set_client    gbk
character_set_connection    gbk
character_set_database  utf8
character_set_filesystem    binary
character_set_results   gbk
character_set_server    utf8
character_set_system    utf8
character_sets_dir  /usr/local/Cellar/[email protected]/5.7.24/share/mysql/charsets/

In this way, character_set_client and character_set_results was modified into GBK.

UTF-8, GBK, and Latin-1

UTF-8, GBK and MySQL Latin-1 is the most common of the three encoded form.

  • They are backward compatible with ASCII. Using the same string of characters in the ASCII code is converted to UTF-8, GBK Latin-1, and the result is the same after. Therefore, assuming that the client was introduced to SET NAMES latin1 this directive, whether character_set_client set to UTF-8, GBK or Latin-1, can be decoded and executed properly.
  • Latin-1 is a single byte encoding, which encodes the range 0x00-0xFF. That is any 8-bit binary byte may correspond to a Latin-1 characters.
  • UTF-8 indicates a range much greater than GBK. All Latin-1 character can be converted to UTF-8 characters, but not necessarily converted to GBK character.

The above points provided the conditions for the MySQL "garbage in garbage out." The so-called garbage in garbage out, refers to a different character encoding format client's character encoding and the final table, but take twice as long as the guarantee deposit and character set encoding will still be able to get consistent output of this phenomenon is not garbled.

Garbage in garbage out

We first consider a command like:

INSERT INTO table VALUE("啊");

Assumed that the terminal coding method is GBK, “啊”binary representation is 10110000 10100001.
After get the MySQL command, through character_set_clientdecode the specified encoding.

  • If you character_set_clientare GBK, MySQL will think this is an "ah" character;
  • If character_set_clientis Latin-1, MySQL treats it as two separate Latin-1 characters (10110000) (10100001), the last decoded ° ¡ .
  • If it character_set_clientis UTF-8, due 1,011,000,010,100,001 is not a valid UTF-8 encoding, so either an error, or will be replaced by a misidentification . If this time directly into the table, can not be achieved "garbage in garbage out" the.

Thus, a necessary condition out into the wrong error is to character_set_clientset Latin-1, if set to GBK or UTF-8 can not be guaranteed correctly decoded.

The above is the decoding process, when using a Latin-1 decoding is completed, the data still stored in the target table.

  • If the target table is a Latin-1 encoding, the decoded data may be directly stored in the table.
  • If the target table is a UTF-8 encoding, the decoded data is first converted to UTF-8 encoded and then stored in the table.
  • If the target table is GBK encoding, since not every Latin-1 encoded characters can be found in the corresponding encoded in GBK, so the process of transcoding may be an error.

Therefore, another condition is that the error into the wrong target table must be Latin-1 or UTF-8 encoding.

When reading, MySQL data table will be converted to the target character_set_results specified encoding. As the Latin-1, when reading also need to specify character_set_results as Latin-1 use when writing to us. So ultimately achieve a "garbage in garbage out."

for example

Assuming that there is such a student table:

|name| age|
|----|----|
|小明|12|
|小红|10|

Wherein, name column encoded as Latin-1, encodes the stored data is used GBK .

That person logging data to the table may execute the following statement using a terminal GBK of:

SET NAMES latin1;
INSERT INTO student VALUES ('小明', 12);

So, if we use the terminal code is UTF-8, to query information about how Xiao Ming from the table it?

  1. You can try to log in directly MySQL, enter the following statement:
SELECT * FROM student WHERE name = "小明";

But such a mistake can do:

ERROR 1267 (HY000): Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and (utf8_general_ci,COERCIBLE) for operation '='

MySQL default user terminals using UTF-8 encoding, encoding Latin-1 form is inconsistent, so MySQL will first try to convert the query to Latin-1. Latin-1 but does not correspond to "Bob" code word, and therefore error.

  1. If you add a change character_set_clientof statement, what will happen?
SET NAMES latin1;
SELECT * FROM students WHERE name = "小明";

This time, MySQL will think the user's terminal is Latin-1 encoding, so do not do the conversion operation. But the result was the final query is empty.

This is because the user terminal is encoded UTF-8, so the incoming "Bob" encoding is UTF-8, the data table is GBK encoding different forms are stored in memory. Therefore, even if MySQL will deal with them as Latin-1, does not think they are equal.

  1. Not directly landed MySQL, but Shell statement in the first query into GBK encoding, and then pass the MySQL:
echo "
SET names latin1;
SELECT * FROM student WHERE name = '小明';"\
| iconv -f utf8 -t gbk\
| mysql -uroot -p123 -Dtest

Wherein iconvthe input action is converted to the specified standard encoding format (here GBK), and then transmitted to the output standard MySQL. We got:

name    age
�� 12

You can query the result, but part of the name is garbled. This is because the data is stored in the table GBK encoding, and the terminal code is UTF-8. It is also necessary to increase the final step: the results of the query is converted to UTF-8.

echo "
SET names latin1;
SELECT * FROM student WHERE name = '小明';"\
| iconv -f utf8 -t gbk\
| mysql -uroot -p123 -Dtest\
| iconv -f gbk -t utf8

The output is:

name    age
小明  12

In this way, we finally got the correct information.

If the table itself is GBK coding, rather than Latin-1, whether such cumbersome steps still need it?

The answer is not required. As long as properly set 了character_set_clientand character_set_results, although the coding table is GBK, MySQL will automatically convert in the process of reading and writing.

Data sheet

Share learning materials

Collection prepared 12 sets of micro-services, Spring Boot, Spring Cloud core technical information, which is part of the Information Catalog:

  • Spring Security Authentication and Authorization
  • Spring Boot project combat (small and medium sized Internet company back-office services architecture and operation and maintenance architecture)
  • Spring Boot project combat (Enterprise Rights Management Project))
  • Spring Cloud project combat micro Services Architecture (Distributed Transaction Solutions)
  • Spring Cloud + Spring Boot + Docker full set of video tutorials
  • Spring Cloud website project combat (real estate sales)
  • Spring Cloud Services Micro combat (major electricity supplier based systems)
  • Single sign-on basis to combat
  • Spring Boot project combat (enterprise micro-channel ordering system) (primary combat)
  • Spring Cloud project combat Internet applications (weather forecasting system)
  • Spring source depth analysis to develop a full set of video tutorials comment +
  • Spring Boot project combat (financial products system)

Screenshot catalog:

No public to receive information on the background:

Guess you like

Origin www.cnblogs.com/xwgblog/p/12453846.html