mysql Guidelines (IX): sort logic table space innodb

Above last innodb architecture we focus indicates that the table space herein, this take a look tablespace table space.

Table space is only logical concept, it focuses on learning the logical reason clearly.


Previous article mentioned in 文件this virtual concept, we program the file operation and development, but simply do not care about how the file on disk storage. I.e. the details of the underlying storage file screen, which of course is the operating system of the credit.

The article also mentioned before, each table and index data are stored in the .ibdfile, since it is a file, the operating system will also be managed. And this ibddocument innodb view, it can be called a table space, because the internal innodb application point of view, there is no 文件concept of, my site I call the shots thing, so when the 文件time innodb loaded into the process, it was seen as open (enter) a 表空间.

So, innodb is how to find the entrance to this table space by fight it? Of course, by mysql.cnfthe configuration of the configuration file in the database directory data addresses found. Regardless of the timing of what he is open or other xxx, anyway, now watch this space Pandora's box has been opened, then we go further exploration.


Because well-known reasons, the operating system page file on disk is composed of one, of course, where documents and pages are virtual concept, only the disk is physically present. In other words, ibdthe file is operating system cut into the pages one by one, and stored on disk. (Innodb heart in the blood, table spaces have been cut rotten. This view, the operating system is the holy Magister space systems, ha ha ha).

Now there is a contradiction must be resolved , conflict twin is mysql and operating system, the contradiction is this object file (table space), the following contradiction:

  • For the operating system, obviously I manage files and disks, and so on get very good, my file system so powerful. Innodb but you would not listen, I have to go beyond the direct management, recreating a file system, is it possible?
  • For innodb, I seek is ensuring reliable and available, the ultimate enhance performance. How to enhance? You put my table space cut to pieces ... 10,000 words omitted here.

Contradiction is really not small, but how smart God has the perspective of the development of large cattle ah. Since I can not directly change your operating system's management, I would like to create another chant in your basis. What made one do? Yes, just make a space management system for these operating systems are shredded pages.

The space management system of course is divided into two: the table space (only one object) and a method of performing a series of space around the table. In this way, you can achieve a built on top of the operating system file system 文件系统.


Now, for table space, we have been very clear that its purpose, namely:

Management table space corresponding countless pages

So, the first step, naturally 去操作系统化, will replace the page in the operating system into the page table space. That is, the file is page 1, page 2 in the operating system, to the side of the table space becomes 页a、页b. Although both called page, but this page nor that page. Below mentioned are the table space this page.

Of course, how these two different concepts of page mapping, we would not nosy.

This way, the cornerstone of the entire table space, it is completely independent research and development of controllable. So we can easily modify innodb page size, but do not control the operating system.


However, tens of thousands of pages all in a table space, how can we manage it ? (Page table space is that we talked about earlier data pages and the like)

Under everything is linked, you can consider, if you are, what advice would you propose?


Just a few examples, such as the military, does it mean that a commander to manage millions of soldiers it?

As another example, the school, why placement ah, like a chant direct all together?

As another example, the mall, so much merchandise casually stood like a chant, what fresh areas, daily necessities area why should it?

There are many, many examples. Found yet, what the solution is?

Area , that is, partition, at different levels.

Natural, innodb table space is also introduced the 区 extentconcept . Each region comprises 64 pages, each zone 256 is divided into a set. Thus, when the page size is 16K, each district is just 1M. Each group in the first zone of the first few pages, naturally, the group responsible for some of the key information about each area is recorded.

Thus, for the management of the page to have a hierarchy, not chaos. But this like it? It is certainly not, saying that the above- innodb to the ultimate performance upgrade , so light levels under the provisions of what use is it.


So innodb I have made the following several processing:

  1. Each page area on the disk is continuous. Continuous means that under certain scenarios reduce disk random I / O.

But how to ensure continuity? This relies on the operating system of management. innodb can apply for a space the size of a page so much, that nature is continuous.

  1. Because each of them only a B + tree leaf node is stored in the data page, not an end in order to find the leaf node is a leaf node corresponding. If these two types of pages together, for example, is arranged in the form of AAAABBAABBAAAA page, the same is not conducive to find.

On this point, innodb begs a concept . Each build an index (i.e., a B + tree), generates two space segments in the table: the index and data segments. And provides segment area is allocated as a unit, that is, a segment can include several areas. It appears primarily to solve this performance.

  1. Measure the performance of an algorithm from two aspects: time and space. Correspondence innodb also because he can also be seen as a big algorithm. So for the optimization of space, innodb also introduced a , that is 碎片区. what are the benefits?

Mentioned in the second paragraph of the point, each segment region below, these pages are the regions inside the leaf node corresponding to segment index or a non-leaf nodes. And a lower section region not mixed with other indexes. But the debris area is allowed, no matter which segment of the page, can be stored in the debris zone.

Thus, a process is generally: table space is set up, the two sections appears (primary key index), a data insertion. Then the data will be stored in the data page, and this page is to get the data from the area over the debris. After continued insert data, this segment has accounted for 32 pages in the debris area, will continue to apply the full storage area.

If there is no debris zone, then the original data will be directly inserted into a space of one application area (each segment a), directly take up 2M of space, some of the tables simply take so much space.


Under is finishing, the table space has many areas, some of which make up the area of ​​one segment. At the same time, each district and 256 for a group of related information within the group are placed in the first zone. Each area below 64 consecutive pages.

In fact, a region both in a group, while they may in certain segments.

This level of a division, a good performance improvements. But also leads to other performance issues, and that is the area to find how fast?

For example, a table space 10G, the area had 10,000. These zones, some areas have been filled, and some area there is space. When the newly inserted data, how to find the right area is inserted, you do not go along all the past it?

Of course not, when to see the page structure, which will have all kinds of lists. Table space is actually made up of various lists the district's maintenance, such as space debris area full list or area list and so on.

For purposes of each zone is in one of four states:

  • idle
  • Space debris zone
  • Filled debris zone
  • Belonging to a certain area

Table space list there are two angles, one is standing on the entire table space perspective , there are three concatenated list of all the areas, namely:

  • Free area list
  • Space debris zone list
  • Filled debris zone list

Can be clearly seen, the three regions in the linked list is not the fourth state, it is another perspective standing level a certain period to see , there are three lists, namely:

  • Free and belong to this segment of the area list
  • Space and belong to this segment of the area list
  • Fill and belongs to the zone list this segment

So, it will be assigned to a different area of ​​the list at different levels.


Corresponding to the attribute structure (structure, object description) of each zone is also called: XDES Entrythe attribute is called the structure of each segment Inode Entry. While the upper chain 6, which nodes are XDES Entry.

The list mentioned above, must be able to find the node list head node, so save the base of its three nodes in the list of Inode Entry.

For the entire table space, the table space is certainly unique information, which is stored in the first area extent0 zone in. Meanwhile, the first area extent0 area is also a table space for the first group, so they will have a unique set of records of information. Other groups in the first zone, you do not need to record information unique to the table space.

For standing on the top three list table space level, their base node is stored in extent0 in. Of course, specifically the (first page) stored on extent0 of a page.

XDES EntryThere are special pages to store, page type that is XDES, as we have seen the page type 数据页of. Of course, different types of page structure is similar. So there will also exist between the list page.

Inode EntryThe page type Inode, the same Inodewill exist between the list page.


Then the system table space for System tablespace is it? Is not a regular table space and also the same?

Of course, the structure of course is the same. But some of the district system table space is used for more specific purposes, while global memory (shared by all table spaces) some information.

Some of this information keywords, we have seen the basic structure of innodb, namely: data dictionary, double write buffer, Change Buffer, Undo Logs and so on.


Space on the table on here, we did not like the page structure, such as to see which attributes and so on XDES Entry there. Because the table space is down overall, the various attributes too much, no need to look at it is not necessary to remember. As long as the logic straighten out, we know the table space in a variety of important elements on the line. Follow-up study other locks, transaction time, or encounter some attribute to the table when it comes to space as long as the macro-image on the line.


Double write buffer, change buffer what we first put up, and then look at other studies incidentally. Next, enter this important topic matters.

Finally herein, this is not drawing, for two reasons: mainly about the logic diagram not much use; attribute too much, drawing and tabular representation does not work, has little significance. Figure casual search online to see there, but also the way to a deeper impression.

For more, see: the back-end development tutorial series to -java

Published 50 original articles · won praise 99 · views 110 000 +

Guess you like

Origin blog.csdn.net/zhou307/article/details/104729347