[Reprint] file holes (Hole Punching) and its application

File holes (Hole Punching) and its application

http://blog.jcix.top/2018-09-28/hole_punching/

 

The length of a file and its actual amount of disk space is likely to be different, which primarily involves the concept of sparse files (sparse file), and file holes (hole punching) are. These two features needed to support the operating system and file system, Linux is currently ext4, XFS and other file systems support these features.

Sparse files (Sparse File)

Learn coefficient file most intuitive example is to create a file, and then use lseek to a larger offset, the actual write something in this offset, then the actual amount of disk space is small, but the length of the file but bigger. such as:

 

 

lsThe -soption to print out the disk space occupied by the file in the first column:

 

 

Can be seen, the length of the two documents are 3 bytes and 98K bytes, but the disk space is the same, i.e., the smallest file system storage unit 4 KB. This is because there is no use file_sparse disk blocks 100,000 before offset.

File holes (Hole Punching)

Examples upper sparse file is written by a 3-byte offset in the empty file obtained. And in some cases, start a file is not sparse, it has occupied a number of disk space, then if some of the data file in the middle of no use, we have to reduce the disk space occupied by files, you can only file holes (Hole Punching) way non-sparse file into a sparse file.

The specific method is by fallocatecalling system. Through man 2 fallocate, we can see the fallocatecall prototype is as follows: [1]

 

 

General usage of this call can be called "allocation": designated modeas 0, then [will file offsetoffset+len) to write the content area is zero.

We use it to make file burrows, this usage may be correspondingly called "deallocation": We specify modeto FALLOC_FL_PUNCH_HOLE|FALLOC_FL_KEEP_SIZEthis time [ offsetoffset+len) area of the block will be "holes" in order to reduce the file disk usage.

It is because of fallocate"allocation" mode, the system calls for a full file of zero, so we can just use this one to call first "allocation", then "deallocation" test file hole punching function.

Note : Although the man page stated contains only fcntl.hcan be, but in my CentOS 7 systems also need to include linux/falloc.hotherwise, the following compilation error:

 

 

Our test routines are as follows:

 

 

operation result:

 

 

You can see whether it is used struct statin the st_blocksfield or lsthe -soption to tell us "file_withhole" This file was hit a hole 600 KB (1000 K -> 400 K) .

Holes in the MySQL page compression applications

In my previous articles have analyzed page MySQL InnoDB transparent compression  [2] , this compression mechanism is file-based burrowing. Essay blog can look in detail, in the simple instructions below:

InnoDB以InnoDB页为单元进行存储,对于一般的情况,InnoDB页默认为16KB,文件系统默认为4KB。当InnoDB要存储一个页时,会对16KB进行压缩,压缩后大小为12KB,那么12KB到16KB之间的内容会首先被填零,然后用fallocate作“deallocation”打洞,这样额外的一个文件系统块就因压缩而被节约了;同样,若压缩后的页小于8 KB或小于4 KB,那么分别就可以节约8 KB 或 12 KB。


[1] fallocate – manipulate file space, http://man7.org/linux/man-pages/man2/fallocate.2.html

[2] MySQL InnoDB透明页压缩的简单分析, http://blog.jcix.top/2017-04-16/transparent_page_compression/

Guess you like

Origin www.cnblogs.com/jinanxiaolaohu/p/12490773.html