mysql combat 13 | Why half of the table data deleted, the file size of the same table?

Students often have to ask me, my databases take up too much space, I put a maximum of half of the table deleted data, how the size of the table file still has not changed?

So today, I'll talk to you recover the database table space to see how to solve this problem.

Here, we discuss the most widely used for MySQL's InnoDB engine. A InnoDB table consists of two parts, namely: the table structure definition and data. In previous versions of MySQL 8.0, the table structure is present in the .frm file suffix. And MySQL 8.0 version, has allowed to define the table structure in the system data table. Because the table structure defines the space occupied by small, so we are discussing today is the main table data.

Next, I will first explain why and you simply delete table data table space reclamation can not reach the effect, and then introduce you to the correct method recovery space.

Parameters innodb_file_per_table

Table data can be shared table space exists, may also be a separate file. This behavior is controlled by the parameters of innodb_file_per_table:

  1. This parameter is set to OFF means that the data table in a system shared table space, that is, together with the data dictionary;
  2. This parameter is set to ON indicates that each InnoDB table data stored in a file with the extension to the .ibd.

Starting MySQL 5.6.6 version, it's the default value is ON.

I suggest you use no matter which version of MySQL, this value will be set to ON. Because a single table is stored as a file easier to manage, but you do not need in this table, the drop table by command, the system will delete the file. And if it is in the shared table space, even if the deleted table, space is not recovered.

Therefore, the innodb_file_per_table set to ON, is the recommended approach, our next discussion are based on this set up a deployment.

We delete the entire table, you can use the drop table command to reclaim table space. However, more data deleted scenes we encounter is to remove some rows, then the problems we encountered at the beginning of the article: The data in the table is deleted, but the table space has not been recovered.

We do understand the problem thoroughly, then we would talk about the deletion of data from the process.

Data deletion process

Let's take another look at a diagram of the index in InnoDB. In front of 4 and 5 article, I introduce you to index and when have mentioned, InnoDB data are used in the structural organization of the B + tree.

                                                    FIG 1 B + tree index schematic

Suppose that we want to delete this record R4, InnoDB engine will only R4 this record marked for deletion. If after inserting an ID is recorded again between 300 and 600, this position may be multiplexed. However, the size of the disk file and will not shrink.

Now, you already know the InnoDB data is stored by page, so if we deleted all records on a data page, what will happen?

The answer is that the entire data pages can be re-used.

However, multiplexed with the recording of the multiplexed data pages is different.

Multiplexing recording, data conforming limited range condition. Such as the above example, R4 After this record is deleted, if an ID is inserted into the line 400, this space can be directly reused. However, if an ID is inserted is a line 800, this can not reuse the position.

When an entire page after take off from a B + tree which can be multiplexed into any position. 1 as an example, if after all the data recorded on the deleted page page A, page A is marked as reusable. When this time if you want to insert a record ID = 50 requires the use of a new page, page A can be multiplexed.

If adjacent two pages of data utilization is very small, this system will fit on two pages of data on one page, the other page of data is marked as reusable.

Further, if we put the entire table of data with the delete command to delete it? As a result, all of the data page will be marked as reusable. But on the disk, the file is not reduced.

So now you know, delete command is only the position of the record, or data page marked for "reusable", but the size of the disk file will not change. In other words, by the delete command can not reclaim table space. These can be reused, while the space is not being used, it looks like a "hole."

In fact, not only deleted data can cause cavities, insert the data will be.

If the data is inserted into ascending order according to the index, the index is compact. But if the data is randomly inserted, it may cause data page split index.

Figure 1 assume page A full, then I have to insert a row, what will happen?


                                 Figure 2 page insert data lead to split

You can see that because of page A full, then insert a data ID is 550, it shall no longer apply for a new page page B to save the data. After the completion of the split pages, page A on the left end of the hollow (Note: Actually, the position may be more than one record is empty).

You can see that because of page A full, then insert a data ID is 550, it shall no longer apply for a new page page B to save the data. After the completion of the split pages, page A on the left end of the hollow (Note: Actually, the position may be more than one record is empty).

In other words, after a large number of additions and deletions to the list it is likely to be empty existence. Therefore, if able to remove these voids, shrinkage can achieve the object table space.

And rebuild the table, we can achieve this purpose.

Reconstruction table

Imagine, if you now have a table A, shrink space needs to be done, in order to remove the empty table exists, how can you do it?

You can create a structure same as in Table A Table B, and the primary key ID in ascending order, the data read out line by line and then inserted into Table A Table B.

Since the new table is the table B, so that the hole in the primary key index table A, table B, it does not exist. Obviously, the primary key index table B more compact, also higher utilization data pages. If we put the temporary table as Table B, import data from Table A Table B after the operation is completed, replacing A with Table B, In effect, it serves as a contraction of Table A space.

Here, you can use the alter table A engine = InnoDB command to rebuild the table. Before MySQL 5.5 version, execute this command with the process we've described is similar to the difference is only temporary table B do not need to create your own, MySQL will automatically dump data exchange table name, delete the old table.

                                                      FIG 3 to the lock table DDL

 Apparently, spent most of the time step is the process of inserting data into a temporary table, if in the process, there are new data to be written to Table A, it would cause data loss. Thus, the entire process DDL, Table A can not be updated. In other words, this is not a DDL Online's.

In Online DDL MySQL 5.6 version introduced in the beginning of the operational processes optimized.

After I give you briefly describe the introduction of Online DDL, table reconstruction process:

  1. Create a temporary file, scan all the data table on page A primary key;
  2. A recording data generating page tables B + tree is stored in the temporary file;
  3. Temporary files generated during the operation of all A is recorded in a log file (row log), the corresponding figures are state2 state;
  4. After temporary file is generated, the operation log file to a temporary file in the application, to obtain a logical data in Table A the same data file, the corresponding figure is state3 state;
  5. Substitution table data file temporary file A.


                                                                  Figure 4 Online DDL

Can be seen that, with the process of FIG. 3 except that, since the recording and reproducing operation log file is present this feature, the program in the reconstruction table, Table A do allow additions and deletions to the operation. This is the source of Online DDL names.

I remember speaking students in the first six table locks article "Global and table locks: add a field to the table how much cable hinder? "The message said the comments area before the DDL is to bring MDL write lock, so you can call Online DDL?

Indeed, the process of FIG. 4, alter statements at boot time need to acquire MDL write lock, but the lock before you actually write copy data to degenerate into a read lock.

Why should it degenerate? In order to achieve Online, MDL read lock does not block the CRUD operations.

Why do not you just unlock it? To protect yourself, prohibits other threads on this table do DDL same time.

For a large tables, Online DDL is the most time consuming process to copy data temporary table, additions and deletions can be accepted during the execution of this step of operation. So, relative to the entire process DDL, the lock time is very short. For business, it can be considered the Online.

It should be added that the above-described reconstruction method scans the data table and constructs the original temporary file. For large tables, this operation is very consuming CPU and IO resources. Therefore, if the online service, you have to carefully control the operation time. If you want to compare the safety of the operation, I recommend that you use GitHub open source gh-ost do.

Online and inplace

Speaking Online, you and I still have to clarify the difference between it and another with DDL-related, inplace of confusing concept.

You may have noticed, in Figure 3, we put the data in Table A guide out of the storage location called tmp_table. This is a temporary table is created in the server layer.

In Figure 4, Table A reconstruction out of the data is on the "tmp_file" inside, this temporary file is created InnoDB out internally. DDL whole process is done in-house InnoDB. For the server layer, there is no data to move into a temporary table, an "in situ" operation, which is the source of "inplace" name.

So, I ask you now, if you have a table of 1TB, is now among disk 1.2TB, a DDL can not do inplace of it?

The answer is no. Because, tmp_file also to take up temporary space.

This statement alter table rebuild our table t engine = InnoDB, in fact, the implication is:

alter table t engine=innodb,ALGORITHM=inplace;
复制代码

With inplace corresponding copy of the table is the way, the usage is:

alter table t engine=innodb,ALGORITHM=copy;
复制代码

When you use ALGORITHM = copy, the copy is mandatory table indicates, the process is the operation corresponding to FIG. 3.

But I say that you may feel, inplace with Online is not that a meaning?

Well, not exactly, but in the reconstruction table is exactly this logic would be it.

For example, add a field full-text index if I'm going to InnoDB tables, wording is:

alter table t add FULLTEXT(field_name);
复制代码

This process is inplace, but will block CRUD operations, the non-Online.

If you say what is the relationship between these two is logical, then, can be summarized as:

  1. Online DDL process if it is, it must be inplace of;
  2. In turn necessarily, that is inplace of DDL, there's probably not Online. As of MySQL 8.0, add full-text indexing (FULLTEXT index) and spatial index (SPATIAL index) This is the case.

Finally, we'll extend it.

In the first 10 article "MySQL Why would sometimes choose the wrong index," the commentary area, there are students asked to use optimize table, analyze table and alter table three ways to rebuild the table difference. Here, you and I might be simpler to explain.

  • Starting MySQL 5.6 version, alter table t engine = InnoDB (i.e. the recreate) the default is the flow of FIG. 4 above;
  • analyze table t actually not rebuild the table, but the index information table do recount, not modify the data, the process adds MDL read lock;
  • optimize table t 等于 recreate+analyze。

summary

Today this article, I discussed ways and you shrink the database table space.

Now that you know, if you want to shrink a table, just delete unused data off the table inside the case, the file size of the table will not change, you have to rebuild the table by the alter table command, in order to achieve the purpose of the table file smaller . I introduce to you two to rebuild the table implementations, Online DDL way is to consider using a low peak in the business, and MySQL version 5.5 and earlier, this command will block DML, and this you need to be especially careful.

Finally, we turn to the question of after-school time.

Suppose now that someone ran into a situation "table space you want to shrink, counterproductive", it looks like this:

  1. A file size of 1TB table t;
  2. Alter table to perform the table t engine = InnoDB;
  3. After the discovery execution is completed, the space not only smaller, but also a little bit bigger, such as into a 1.01TB.

You think there might be what causes it?

Do you think you can write a message possible causes area, I will in the next end of the article we describe the rational reasons are listed after other students would not fall into such a pit. Thank you for listening, you are welcome to send this share to more friends to read together.

On the issue of time

Finally, in the last issue, I leave you with the question is, if a high with the machine, redo log set too small, what happens.

Each commit should write redo log, if set too small, will soon be filled, which is below the state diagram, the "ring" will soon be filled, write pos has been chasing CP.


This time the system had to stop all updates, to promote the checkpoint.

At this point, you see the phenomenon is the disk pressure is very small, but the database intermittent performance decline.


Reproduced in: https: //juejin.im/post/5d034be8f265da1bca51d796

Guess you like

Origin blog.csdn.net/weixin_33722405/article/details/93183475