Database character set and encoding conversion

The database management system supports a certain encoding, which mainly involves three aspects:

       Database server support.

       Data provider support.

       Client tool support.

 1. Database server character encoding :

The database server supports a certain encoding, which means that the database server can receive, store and provide the characters of this encoding (including identifiers and character field values) from the client, and can convert the characters of this encoding to other Encoding (such as UTF-8 encoding to GBK encoding).

1.1 Specify the database server code:

Postgresql:

Specify when creating the database:

CREATE DATABASE … ENCODING …

Can take ASCII, UTF-8, EUC_CN, ...

1.2 View database code

Postgresql:

show server_encoding

2. Database access interface coding

       The data access interface supports a certain encoding, and the interface should be able to read and write the characters of this encoding correctly, and there should be no data loss or data distortion.

Take the JDBC interface as an example:

The JDBC interface generally sets client_encoding according to the JVM's file.encoding, set client_encoding to file_encoding.

Convert String into client_encoding encoded byte stream and pass it to the server, prototype String.getBytes(client_encoding).

After receiving the byte stream from the server, use client_encoding to construct a String object as the return value of getString to the application, the prototype String(byte[], …, client_encoding) 

 

3. Client encoding

       The client tool supports a certain encoding and must be able to display the characters in this encoding read from the database, and can also submit the characters in this encoding to the server through this tool.      

       3.1 Postgresql specifies the client encoding of the session

                SET CLIENT_ENCODING TO 'value'

       3.2 View database code

               Show client_encoding

4 View binary strings with different encodings of characters

      The following are the binary storage strings of several characters in the database under different encodings. Select decoding(name,'escape') from test in Postgresql can view the binary strings in the database server.

4.1 Take "Beigang" as an example

             GBK encoding is: B1B4 B8D6

            UTF-8 encoding is: E8B49D E992A2

            GB18030 encoding is: B1B4 B8D6

 4.2 Taking " " as an example, the
            GBK code is: FE57 FE54

            UTF-8 encoding is: EEA09C EEA099

            GB18030 code is: 8336C9388336C935

5 Code conversion example

     Let's take a look at a specific example. In this example, the client uses GBK/GB18030 encoding, both ends of the interface use GBK18030 encoding, and the database server uses UTF-8 encoding:

 



 

Conversion involves:

Conversion between in-application encoding and connection client encoding

       Conversion between connection server side encoding and database server encoding

       Represented by an orange-red arrow in the image above

Taking " " as an example, the binary strings in the database server under different encodings are:

GBK encoding is: FE57 FE54

UTF-8 encoding is: EEA09C EEA099

GB18030 code is: 8336C9388336C935

 

Socket:

The programming interface ensures that the character encoding sent to the server is consistent with the client_encoding of the current session.

client_encoding can be set to the current encoding of characters obtained from the application

You can also get the client_encoding of the current session, and convert the characters obtained from the application into the encoding set by client_encoding

Server:

Conversion between client_encoding and server_encoding

 According to the database code conversion algorithm conversion, convert the practice that is not in the target code into a question mark " "

 

6 Frequently encountered problems

Incorrect encoding and parsing of characters, resulting in garbled characters.

The character exists in both character sets, causing this part of the character to become " "

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324431469&siteId=291194637