Shanghai Tengke Education Dameng Database Training Dry Goods Sharing Dameng Database Chinese Character Storage Know How Much

1 Introduction

 

When we deal with Chinese characters in DM7, we often use the varchar data type. However, the number of Chinese characters that can be stored in varchar varies according to the parameters of the database initialization. So in each case, what is the difference in the storage of characters? This article will take you to find out.

 

2. Introduction to parameters

 

When using DMINIT to initialize the database, we have the following two parameters related to the character set, UNICODE_FLAG and LENGTH_IN_CHAR.

 

UNICODE_FALG : This parameter represents the character set of all data in the database, including the character set of the data dictionary. It should be noted that once the database is initialized, the character set cannot be modified. We can use select unicode to query the character set type of the current database, 0 represents gb18030, and 1 represents UTF-8.

 

LENGTH_IN_CHAR : This parameter determines whether the length of the VARCHAR type object in the database is in characters. A value of 1 is set to use characters as the unit, and the storage length value is enlarged according to the theoretical character length. If the value is 0, the length of all VARCHAR type objects is in bytes.

 

Similarly, if we use the DBCA assistant to create the database, we can also modify the values ​​of these two parameters in the initialization parameter step.

 

 

3. Test

 

According to the different values ​​of Unicode and length_in_char (0 or 1), we initialized a total of four different databases to test for different situations.

 

This article demonstrates the environment: DM Database Server x64V7.1.6.48-Build(2018.03.01-89507)ENT

 

3.1、UNICODE_FLAG=0,LENGTH_IN_CHAR=0

 

This situation is the default configuration when initializing the database, that is, the character set is gb18030, and the varchar length is in bytes. Related tests are as follows

 

We know that the next Chinese character or full-width character in gb18030 generally takes up two bytes. So varchar(3) type can insert one Chinese character plus one half-width character, but cannot insert two Chinese characters.

 

3.2、UNICODE_FLAG=1,LENGTH_IN_CHAR=0

 

The character set is utf-8, and the varchar length is in bytes. The relevant tests are as follows:

 

 

In the case of Utf-8, a Chinese character generally needs to occupy three bytes, so varchar(3) can only insert one Chinese character.

 

3.3、UNICODE_FLAG=0,LENGTH_IN_CHAR=1

 

The character set is utf-8, and the length of varchar is in characters. The test is as follows

 

 

We know that in the case of length_in_char=1, the actual number of storable bytes of varchar will be enlarged by a certain ratio. So when using gb18030, varchar(3) can actually store 3 Chinese characters, that is, 3*2=6 bytes of data.

 

3.4、UNICODE_FLAG=1,LENGTH_IN_CHAR=1

 

The character set is utf-8, and the length of varchar is in characters

 

 

Here we will find a strange situation, obviously the setting is varchar(3), why can insert 4 Chinese characters. This is because the actual data stored in the database in DM7 is in bytes. When lengtg_in_char=1 and the character set is utf-8, the maximum length of the VARCHAR type object actually stored is the length defined by the VARCHAR type * 4 bytes.

 

In other words, the data that can be stored in a varchar(3) structure is 3*4=12 bytes. However, in fact, a Chinese character in UTF-8 generally only takes up 3 bytes, so here we can insert 12/3=4 Chinese characters.

 

4. Summary

 

When LENGTH_IN_CHAR=0, the length of varchar() is in bytes. At this time, we only need to consider the number of bytes occupied by Chinese characters and full-width characters. A Chinese character in gb18030 is two bytes, and a Chinese character in utf-8 is generally three bytes. If the total number of bytes of inserted data is greater than the length defined by varchar, the insertion will fail.

 

When LENGTH_IN_CHAR=1, the number of bytes that varchar() can store will be expanded according to a certain proportion. When the character set is gb18030, the number of bytes of varchar is equal to the defined length*2, and when the character set is utf-8, the number of bytes of varchar is the defined length*4.

 

 

Guess you like

Origin blog.csdn.net/qq_42726883/article/details/108399168