Understand the meaning of n in char(n) varchar(n) through the underlying ibd file of the database

Preface

I wonder if you are still confused about whether n in char(n) and varchar(n) can store up to n characters or n bytes ?

Then this article will answer your question through examples of mysql data insertion scenarios and querying the storage file xx.ibd of the underlying database. Okay, let’s get started!

Data insertion scenario practice

Guys, I am going to create two new tables to store only char and varchar type data respectively to verify whether it is [characters or bytes] .

The picture below is the test table I created:

CREATE DATABASE dbtest

CREATE TABLE a1(
-- 定长 n=5
one char(5)
)

CREATE TABLE a2(
-- 变长 n=5
two VARCHAR(5)
)

Scenario 1: char insertion test

Insert 5 English, Chinese, and more than 5 strings into table a1, which only contains char columns, and see what the result is?

-- 五个英文字符
INSERT INTO a1(one)
VALUES("abcde")

-- 五个中文字符
INSERT INTO a1(one)
VALUES("我爱代码丰")

-- 大于五的字符
INSERT INTO a1(one)
VALUES("我不爱代码丰")

char insertion result:

Set char(5), five Chinese characters can be inserted:

Likewise, five English characters can be inserted:

Of course, if it exceeds five characters, it will prompt that it is too long and the insertion will fail:

Scenario 2: varchar insertion test

Friends, also insert 5 English, Chinese and more than 5 strings into table a2, which only has varchar columns, and see what the result is? (You can guess that it must be the same as the insertion of char type)

INSERT INTO a2(two)
VALUES("abcde")

INSERT INTO a2(two)
VALUES("我爱代码丰")

INSERT INTO a2(two)
VALUES("我不爱代码丰")

varchar insertion result:

Similarly, when inserting data that is greater than the set string length, the insertion fails.

Summarize the meaning of n

Below MySQL 4.0, varchar(5): refers to 5 bytes.
Above MySQL 5.0, varchar(5): refers to 5 characters.

It turns out that the commonly used MySQL database versions are 5.7 and 8.0, that is, n represents characters instead of bytes.

I have another problem:

What is the maximum size of the string that varchar(n) can insert?

In other words, what is the maximum value that n in varchar(n) can be set to?

65535? 65532? [What is the number 65535? See Scenario 2: The maximum storage space for a row record is 65535 bytes]

Or something else?

Scenario 3: What is the maximum inserted string of varchar?

First,  you need to understand the maximum storage space of a record in the table:

The maximum row size for the used table type, not counting BLOBs, is 65535

This sentence means:

MySQL has a limit on the maximum storage space occupied by a record. Except for BLOB or TEXT type columns, the total byte length occupied by all other columns (excluding hidden columns and record header information) cannot exceed 65535 words. Festival.

Note: In fact, it can be simply understood as inserting a row record. The bottom layer of this record can only be stored in a maximum of 65535 bytes.

Secondly, we must have a clear understanding of how many bytes in different encodings represent a character.

For example, the A character in ASCII has a code point of 65 (in binary representation: 1000001), that is, one character can be represented by one byte (8bit)

In UTF-8 encoding, a Chinese character requires three bytes (24 bits) to represent

ok, after understanding the meaning of 65535 and character encoding, we then use the next scenario to verify varchar(n)

How high can this mysterious n be set?

Example 1: Nullable example of ascii encoding

Since we know that ASCII encoding a character is represented by one byte,

So if the character set of the table is ascii

Then it means inserting a record

There is a one-to-one relationship between the number of characters and the number of bytes in the upper limit of 65536. That is, if varchar (65535) is set, the length of the string that actually represents the inserted data is also 65535 (regardless of whether the insertion is successful)

This can also be a little confusing for friends who are coming into contact for the first time. The common understanding is:

The upper limit of a row of data is 65535 bytes

Setting varchar(100) means that the length of the string I want to insert is 100, and since it is ascii encoding, the bottom layer consumes 100 bytes.

Okay, here's an example:

CREATE TABLE b1(
three VARCHAR(65535)
)CHARSET=ascii

CREATE TABLE b1(
three VARCHAR(65532)
)CHARSET=ascii

Insert results:

 Ascii insertion result analysis:

You can find from the picture above that if the character set is set to ascii, it cannot be set to the maximum 65535, and can only be set to 65532 (three bytes less). Why is this? After querying the documentation, I found:

In addition to the data of the column itself, 65535 bytes also include some other data. Taking the Compact row format
as an example, for example, in order to store a VARCHAR(M) type column, in addition to the space occupied by the real data, we
also need to record additional information.

Right now:

Two bytes of variable length field length + one byte of NULL value flag

This can explain clearly why there are three bytes missing, because these three bytes need to store other additional information!

Example 2: Non-empty example of ascii encoding

If the field is set to NOT NULL, is it still set to 65532 at most?

With the foreshadowing of Example 1, I believe you can quickly understand Example 2 as well.

答案:不是
因为显示设置字段为NOT NULL后,就不需要额外信息中的NULL值标识符了,所以可以插入65533
【少两个字节,就是存储两字节的变长字段的长度】

Example 3: Simple example of UTF8 encoding

Example 1 and 2 are examples about ascii encoding, and example 3 is about UTF-8 encoding. The difference between the two is that UTF-8 uses three bytes to store one character (ascii uses one byte to store one character. )

Vernacular understanding:

If the encoding is UTF-8 then

The actual occupancy of varchar(100) is 300 bytes (100 characters X3) (n always represents characters)

Let’s still give an example to see if this is the case:

CREATE TABLE c1(
four VARCHAR(65533)
)CHARSET=utf8

 Result analysis:

Indeed, the maximum set string length is 21845 (65535/3)

Also due to the space occupied by additional information, the maximum setting cannot be 21845, but 21844

(65536-1 [1 byte of NULL value]-2 [two bytes of variable field length] / 3)

Example 4: Complex example of UTF8 encoding (including multiple fields)

I believe that friends who have seen this already understand that the table in UTF-8 only has the maximum length of n set for the varchar single field. Let us extend it, what if a table not only has the varchar field, but also has other fields? ? (int char etc...)

In fact, it is very simple. As long as we understand that the maximum number of bytes in a row of records is 65535, then we only need to subtract the bytes occupied by other fixed-length fields, which is not the setting of the number of characters of the variable-length field we have left. Is it the upper limit?

Hahaha, it's very simple

Examples are as follows:

create table d4(
a int, b char(30), c varchar(21812)
) charset=utf8;
-- 设置了固定长度的int char和变长的varchar 

 

Result analysis:

When there are other fixed-length fields such as int char in the table, you only need to subtract the number of bytes they occupy to set the upper limit of the varchar string length!

(65535-1-2-4-30*3)/ 3 =21812

1 -> one byte of list of NULL values

2 -> 2 bytes occupied by variable length of varchar(n) 

4 -> int fixed length takes 4 bytes

30 * 3 -> char(30) takes 90 bytes under utf-8 encoding

/3 -> reason for utf-8

Understand the real storage through the underlying ibd file

Let’s still create a virtual scene:

Open /var/lib/mysql/your library/table through HEX fiend (mac software). The following XX.ibd file 

For example: as shown below

 04 08 Variable length field list (arranged in reverse order: col2 currently stores 8 characters, col1 currently stores 4 characters)

 00 NULL value list, because no null data is currently inserted, so it is 00

61 62 63 64 65 66 67 68 -> Inserted 8-character variable-length data abcdefgh

65 72 69 63 -> Inserted 4-character variable-length data eric

63 6F 64 65 20 20 20 20 ->Insert 8-character fixed-length data code 

Note: Why are there four repeated 20s?

The corresponding decimal value of 20 (displayed in hexadecimal format) is 32, and the corresponding ascii code is empty.

Since our field is set to char(8), after inserting a string less than 8 characters, the [empty] placeholder will be used.

As for what is the pile of data between recording variable length fields, NULL value lists and real data, here are only reference pictures. If you are interested, please refer to other bloggers' blogs!

Leave a question for your friends?

Why can the NULL value list and the variable-length field list be stored in only 1 byte and 2 bytes respectively?

If there are many varchar type fields in a table, how can 2 bytes store their lengths?

answer:

Please refer to this blogger’s article (I don’t understand it either hahahaha)

(67 messages) mysql creates table field length range_MySQL principle-InnoDB engine-row record storage-Compact row format_weixin_39960920's blog-CSDN blog

Guess you like

Origin blog.csdn.net/qq_44716086/article/details/124793818