File holes (Hole Punching) and its application
http://blog.jcix.top/2018-09-28/hole_punching/
The ToC [ fold ]
The length of a file and its actual amount of disk space is likely to be different, which primarily involves the concept of sparse files (sparse file), and file holes (hole punching) are. These two features needed to support the operating system and file system, Linux is currently ext4, XFS and other file systems support these features.
Sparse files (Sparse File)
Learn coefficient file most intuitive example is to create a file, and then use lseek to a larger offset, the actual write something in this offset, then the actual amount of disk space is small, but the length of the file but bigger. such as:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
|
#include <fcntl.h>
#include <assert.h>
int main()
{
// 打开两个文件file_normal和file_sparse
int fd = open("file_normal", O_RDWR|O_CREAT, 0755);
int fd_sparse = open("file_sparse", O_RDWR|O_CREAT, 0755);
assert(fd != -1);
// 一个从0写入3个字节,一个从1000偏移写入3个字节
lseek(fd, 0, SEEK_SET);
lseek(fd_sparse, 100000, SEEK_SET);
write(fd, "ABCDEFG", 3);
write(fd_sparse, "ABCDEFG", 3);
close(fd);
close(fd_sparse);
return 0;
}
|
ls
The -s
option to print out the disk space occupied by the file in the first column:
1
2
3
4
5
|
zjc@~$ ./sparse_file
zjc@~$ ls -lsh file*
4.0K -rwxr-xr-x. 1 zjc zjc 3 9月 28 11:45 file_normal
4.0K -rwxr-xr-x. 1 zjc zjc 98K 9月 28 11:45 file_sparse
|
Can be seen, the length of the two documents are 3 bytes and 98K bytes, but the disk space is the same, i.e., the smallest file system storage unit 4 KB. This is because there is no use file_sparse disk blocks 100,000 before offset.
File holes (Hole Punching)
Examples upper sparse file is written by a 3-byte offset in the empty file obtained. And in some cases, start a file is not sparse, it has occupied a number of disk space, then if some of the data file in the middle of no use, we have to reduce the disk space occupied by files, you can only file holes (Hole Punching) way non-sparse file into a sparse file.
The specific method is by fallocate
calling system. Through man 2 fallocate
, we can see the fallocate
call prototype is as follows: [1]
1
2
3
4
5
|
#define _GNU_SOURCE /* See feature_test_macros(7) */
#include
int fallocate(int fd, int mode, off_t offset, off_t len);
|
General usage of this call can be called "allocation": designated mode
as 0, then [will file offset
, offset+len
) to write the content area is zero.
We use it to make file burrows, this usage may be correspondingly called "deallocation": We specify mode
to FALLOC_FL_PUNCH_HOLE|FALLOC_FL_KEEP_SIZE
this time [ offset
, offset+len
) area of the block will be "holes" in order to reduce the file disk usage.
It is because of fallocate
"allocation" mode, the system calls for a full file of zero, so we can just use this one to call first "allocation", then "deallocation" test file hole punching function.
Note : Although the man page stated contains only fcntl.h
can be, but in my CentOS 7 systems also need to include linux/falloc.h
otherwise, the following compilation error:
1
2
3
4
5
6
7
8
|
hole_punching.c: In function 'main':
hole_punching.c:33:25: error: 'FALLOC_FL_PUNCH_HOLE' undeclared (first use in this function)
ret = fallocate(fd, FALLOC_FL_PUNCH_HOLE|FALLOC_FL_KEEP_SIZE, 409600, 1024000);
^
hole_punching.c:33:25: note: each undeclared identifier is reported only once for each function it appears in
hole_punching.c:33:46: error: 'FALLOC_FL_KEEP_SIZE' undeclared (first use in this function)
ret = fallocate(fd, FALLOC_FL_PUNCH_HOLE|FALLOC_FL_KEEP_SIZE, 409600, 1024000);
|
Our test routines are as follows:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
|
#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
// 注意在CentOS 7中还需要包含linux/falloc.h:
#include <linux/falloc.h>
#include <sys/stat.h>
#include <assert.h>
int main()
{
off_t offset;
int ret;
struct stat st;
// do allocation
printf("===== Allocation =====\n");
int fd = open("./file_nohole", O_RDWR|O_CREAT, 0755);
assert(fd != -1);
ret = fallocate(fd, 0 , 0, 1024000);
assert(ret == 0);
offset = lseek(fd, 0, SEEK_END);
printf("SEEK_END offset:\t %d\n", offset);
fstat(fd, &st);
printf("fstat:\t\t\t file size %d, %d allocated (%d Bytes).\n",
st.st_size, st.st_blocks, st.st_blocks * 512);
close(fd);
// do dedallocation
printf("==== Deallocation ====\n");
fd = open("./file_withhole", O_RDWR|O_CREAT, 0755);
assert(fd != -1);
ret = fallocate(fd, 0 , 0, 1024000);
ret = fallocate(fd, FALLOC_FL_PUNCH_HOLE|FALLOC_FL_KEEP_SIZE, 409600, 1024000);
assert(ret == 0);
offset = lseek(fd, 0, SEEK_END);
printf("SEEK_END offset:\t %d\n", offset);
fstat(fd, &st);
printf("fstat:\t\t\t file size %d, %d allocated (%d Bytes).\n",
st.st_size, st.st_blocks, st.st_blocks * 512);
close(fd);
return 0;
}
|
operation result:
1
2
3
4
5
6
7
8
9
10
11
12
|
zjc@~$ ./hole_punching
===== Allocation =====
SEEK_END offset: 1024000
fstat: file size 1024000, 2000 allocated (1024000 Bytes).
==== Deallocation ====
SEEK_END offset: 1024000
fstat: file size 1024000, 800 allocated (409600 Bytes).
zjc@~$ ls -lsh file*
1000K -rwxr-xr-x. 1 zjc zjc 1000K Sep 28 11:59 file_nohole
400K -rwxr-xr-x. 1 zjc zjc 1000K Sep 28 11:59 file_withhole
|
You can see whether it is used struct stat
in the st_blocks
field or ls
the -s
option to tell us "file_withhole" This file was hit a hole 600 KB (1000 K -> 400 K) .
Holes in the MySQL page compression applications
In my previous articles have analyzed page MySQL InnoDB transparent compression [2] , this compression mechanism is file-based burrowing. Essay blog can look in detail, in the simple instructions below:
InnoDB以InnoDB页为单元进行存储,对于一般的情况,InnoDB页默认为16KB,文件系统默认为4KB。当InnoDB要存储一个页时,会对16KB进行压缩,压缩后大小为12KB,那么12KB到16KB之间的内容会首先被填零,然后用fallocate
作“deallocation”打洞,这样额外的一个文件系统块就因压缩而被节约了;同样,若压缩后的页小于8 KB或小于4 KB,那么分别就可以节约8 KB 或 12 KB。
[1] fallocate – manipulate file space, http://man7.org/linux/man-pages/man2/fallocate.2.html
[2] MySQL InnoDB透明页压缩的简单分析, http://blog.jcix.top/2017-04-16/transparent_page_compression/