MySQL character set validation rules and workflow

MySQL character sets and validation rules works

Character encoding parameters

Transcoding process the data stream

Validation Rules

Tips: Character Set and validation rules are always accompanied

Start building a database from a simple statement

CREATE DATABASE [IF NOT EXISTS] <db_name>
[[DEFAULT] CHARACTER SET <db_charset>] 
[[DEFAULT] COLLATE <db_collation>];

db_name: database name Required

db_charset: The default character set for the database server character set

db_collation: check the rules of the database server default collation

What the two-character set and character code is?

Character Encoding: The specific character in a character set is a collection of one mapping the binary code. Each character set encoding should be planted on the character set.

Common supports only ASCII character set and special characters in English, Chinese and English support of GBK, Unicode support for all characters in the world, and so <UTF-8 Unicode character set is a subset of, they are not two encoding>.

A Case Study in ASCII character set

It is based on the encoding of the Roman alphabet, he can not represent Chinese only contains all of the English case and few special characters, and each character is one byte low 7-bit encoding highest retention, the highest level in some places do the expansion. Add some forms and symbols, operators, and so on. In summary a 8bit byte represents one character, as correspondence, all of a total of 2 ^ 7 characters. Extended character set has 2 ^ 8.

MySQL encoding three View

show variables like 'character%';
Variable_name Value meaning
character_set_client utf8mb4 # Client data source character set
character_set_connection utf8mb4 # Link Layer character set
character_set_database utf8mb4 Currently selected database default character set
character_set_filesystem binary The current encoding format of the file system
character_set_results utf8mb4 Encoding format returned by the server
character_set_server utf8mb4 The default encoding format server
character_set_system Utfa8 Encoding format used by the database system
character_sets_dir /usr/local/mysql-8.0.15-macos10.14-x86_64/share/charsets/. Database character set is stored address
  1. MySQL does not need to start again once the concern character_set_filesystem, character_set_system, character_sets_dir three variables, because they do not cause distortion of the issue. Do not care about file storage system, the storage location-independent character set in MySQL and business performance, the encoding format using a database storage format metadata. Understanding of MySQL encoding transforming principle is easy to understand.

  2. When building a database, if not explicitly specify the character set, it is used character_set_server specified character set.

    When construction of the table, if not explicitly specify the character set, it uses the current character set used by the library.

    When you add records, modifying tables field, if not explicitly specify the character set, the use of character sets used in the current table.

Coding scheme and four action validation rules in use

1 concept connection

A connection: refers to the things that made when connecting to the server. - "MySQL Handbook"

For example: The client sends SQL statements, such as queries sent to the server via the connection. Sending a response server through the connection to the client, for example, the result set.

2 decomposition submitted a

  1. Client initiated inquiries

  2. The server uses character_set_client variables sent as a client query the character set used.
  3. Server to get the inquiry after using the character_set_client coding into character_set_connection corresponding validation rules for the collation_connection , ( if the query is a text string, that is, they have introduced some kind of character format for example prepositions _utf8, if a column value, school inspection rules will not rely on collation_connection )
  4. Server executes the query results will follow character_set_results return query results to the client encoding. Result data including, for example, result column values and metadata (e.g., column names).
  5. About: string of notes 3: [_charset_name] 'String' [the COLLATE collation_name]
    1. [_charset_name] The character encoding is cited preposition, represents the next string encoding.
    2. [COLLATE collation_name] represents the parity string matching method
    3. From the official interpretation of the document cited prepositions: _ charset_name expression formally known as a lead preposition . It tells the parser, "the string that will appear later in the character set X." People are confused because in the past, we emphasize cited prepositions do not cause any conversion; it is only a symbol, does not change the value of the string. Preposition cited in standard hex letters and numbers in hexadecimal notation (the X-' literal ' and 0x nnnn is legal), and in? (Alternatively parameters when used in pretreatment statement in a programming language interface).

3 update and query the transcoding process

Data update transcoding process: character_set_client -> character_set_connection -> character set table.

Data Query transcoding process: Table Character Set -> character_set_result

Determining lead prepositions and validation rules string 4

Cited preposition: _ charset_name expression formally known as a lead preposition . It tells the parser, "the string that will appear later in the character set X." People are confused because in the past, we emphasize cited prepositions do not cause any conversion; it is only a symbol, does not change the value of the string. Preposition cited in standard hex letters and numbers in hexadecimal notation (the X-' literal ' and 0x nnnn is legal), and in? (Alternatively parameters when used in pretreatment statement in a programming language interface).

  • If CHARACTER SET X and COLLATE Y designated, then the use of CHARACTER SET X and COLLATE Y.

  • If CHARACTER SET X is specified without specifying COLLATE Y, then use the default collation CHARACTER SET X and CHARACTER SET X's.

  • Otherwise, the character set and collation given by character_set_connection and collation_connection system variables.

Tips: COLLATE clause, able to cover a default collation any queries. MySQL manual

5 rules on verification

  1. Each character has a default set of validation rules
  2. Two different character sets can not have the same validation rules
  3. There is collation naming conventions: its name associated with the character set beginning, middle name include a language, and to _ci (not case sensitive), _ cs (case-sensitive) or _bin (two yuan) end.

Common operations

View all encoding is now supported

show character set;

View support for all inspection rules

show collation;

View character encoding settings

show variables like 'character%';

View the current character set and collation settings

show variables like 'collation_%';

Set coded character set

set names 'utf8';

Modify the database character set

alter database database_name character set xxx;

Modify the table character set

  1. Only modify the character set table, the impact of the default definition of additional follow-up of the new table, the column has character set are not affected.
alter table table_name character set xxx;
  1. Also modify the existing table and column character set character set and character set encoding existing data conversion.
alter table table_name convert to character set xxx;

Modify column character set

alter table table_name modify col_name varchar(col_length) character set xxx;

Guess you like

Origin www.cnblogs.com/monkey-code/p/12152488.html