Depth analysis ext2 file system

It has long wanted to write an article about ext family of file systems, from the time I just work, once accidentally rm -rf, accidentally deleted a lot of files, I really wanted to have a data recovery software to help me data replied. Of course, learning data recovery, we must first learn the file system. Recent work reasons, did not look good for a long time to learn things related to the Linux kernel, terrifying feeling. Pulling away, and began our adventure ext2 file system.

那些介绍ext2特征的套话我就不说了,任何一本靠谱的linux教程中都可以找到,我们直接单刀直入,开始探索。

首先生成一个ext2文件系统。我在我磁盘空间有限的Ubuntu中,划出500M的空间来从头学习ext2 文件系统。

dd命令用来创建一个文件,不多说了,通过执行这个dd命令生成了一个全零的大小为512000*1KB的文件,即500MB 的文件。

losetup是设定循环设备(loop service)的,循环设备可以将文件模拟成块设备。然后在块设备上建立我们的ext2文件系统,来进行我们的学习。所以下面用mke2fs命令将loop设备格式化成ext2文件系统。 Oh,yeah,我们终于有了ext2文件系统。

这里需要强调下,我们调用了mke2fs的默认选项其中:

root @ libin: ~ # dd if = / dev / zero of = bean bs = 1K count = 512000
records read into 5,120,000
records written 5,120,000 to
524,288,000 bytes (524 MB) have been copied, 9.40989 seconds, 55.7 MB / sec
the root Libin @: ~ # LL the bean
-rw-R & lt - r--. 1 the root the root 524.288 million 2012-07-06 22:24 the bean
the root Libin @: ~ # -H LL the bean
-rw-R & lt - R & lt - 1 root root 500M 2012-07-06 22:24 bean
root @ Libin: ~ #
root @ Libin: ~ #
root @ Libin: ~ # losetup / dev / loop0 bean

root@libin:~# cat /proc/partitions
major minor #blocks name

7 0 512000 loop0
8 0 312571224 sda
8 1 49182966 sda1
.......

@ Libin OOT: ~ # the mke2fs / dev / loop0
the mke2fs 1.41.11 (-14-Mar-2010)
file system label =
OS inux
block size = 1024 (log = 0)
block size = 1024 (log = 0)
Stride = blocks 0, 0 Stripe width = blocks
128 016 of inodes, 512000 blocks
25600 blocks (5.00%) The Super User Reserved for
the first data block. 1 =
the Maximum filesystem = 67,633,152 blocks
63 is block Groups
8192 blocks per Group, Group 8192 fragments per
2032 of inodes Group per
of Superblock Backups Stored ON Blocks:
8193, 24577, 40961, 57345, 73729, 204 801, 221 185, 401 409

Writing inode tables: Finish
Writing superblocks and filesystem accounting information: complete

Automatically the checked by Will BE filesystem the this 24-mounts or Every
180 Days, whichever Comes First. The Use tune2fs -c or -i to the override.
But this is not finished, we still can not access our new ext2 file system, because no mount, I decided to loop device mounted under / mnt / bean directory.

mkdir /mnt/bean
mount -t ext2 /dev/loop0 /mnt/bean

root@libin:/mnt/bean# mount
.........
/dev/loop0 on /mnt/bean type ext2 (rw)

root @ libin: / mnt / bean # ll
total amount. 17
drwxr XR-1024-X. 3 the root the root 2012-07-06 22:31 ./
drwxr XR-4096-X. 4 the root the root 2012-07-06 22:32. ./
drwx ------ 2 root root 12288 2012-07-06 22:31 Lost found /
through our efforts, we finally created our ext2 file system. The following need to talk about the structure of the ext2 filesystem is what the.

下面这张图是经典的ext2文件系统的结构图。网上到处可以找到这种类似的图片,但是我非要画这个图片的原因是为了澄清2个问题:

1 并不是所有的块组都有超级块和快组描述符。
2 块组描述符GDT并不是只管理自己这个块组的信息,相反,它管理的是所有的块组的信息。

(Inode table and the number of data blocks are not necessarily equal, I question this picture somewhat)

 我们知道,超级块是很重要的,因为它告诉了linux 这个块设备是怎样组织的,它告诉linux我这个文件系统是什么文件系统,每个块的大小是多大(1024、2048 or 4096),每个块组有多少个块,inode占多少个字节。等等的信息。正是因为超级块很重要,所以我们不能将这些信息只保存1份。试想一下,如果超级块坏掉了,而我们只有一个块组有超级块,那么就彻底完蛋了,后面接近500M的空间及里面的数据我们都没办法获得了。这是比较容易理解的。但是,是不是每个块组都要有启动块呢。这就没必要了,这也有点空间浪费。那到底把超级块放到那些块组呢?

Backups Stored ON Blocks of Superblock:
8193, 24577, 40961, 57 345, 73729, 204 801, 221 185, 401 409
which is formatted to a loop device outputs the result information of the terminal, because each block is a group of blocks 8192 (hereinafter reasons speak), so 0th block groups, the first block group, the third block group fifth block group, the seventh block group, the ninth block group, the 25th group of blocks, 27 block groups, the first 49 blocks group is stored superblock.

怎么计算出来的,为什么非要存在这些块组?计算规则是3 5 和7的幂,这样的块组保存超级块。

解释块组描述符之前我们先看下超级块的相关信息:

struct ext2_super_block {
u32 s_inodes_count;
u32 s_blocks_count;
u32 s_r_blocks_count;
__u32 s_free_blocks_count;
u32 s_free_inodes_count;
u32 s_first_data_block;
__u32 s_log_block_size;
u32 s_dummy3[7];
unsigned char s_magic[2];
__u16 s_state;
...

}
Let's get the relevant information about the ext2 by debugfs.

root@libin:/mnt/bean# dumpe2fs /dev/loop0
dumpe2fs 1.41.11 (14-Mar-2010)
Filesystem volume name: <none>
Last mounted on: <not available>
Filesystem UUID: 3bff7535-6f39-4720-9b64-1dc8cf9fe61d
Filesystem magic number: 0xEF53
Filesystem revision #: 1 (dynamic)
Filesystem features: ext_attr resize_inode dir_index filetype sparse_super
Filesystem flags: signed_directory_hash
Default mount options: (none)
Filesystem state: not clean
Errors behavior: Continue
Filesystem OS type: Linux
Inode count: 128016
Block count: 512000
Reserved block count: 25600
Free blocks: 493526
Free inodes: 128005
First block: 1
Block size: 1024
Fragment size: 1024
Reserved GDT blocks: 256
Blocks per group: 8192
Fragments per group: 8192
Inodes per group: 2032
Inode blocks per group: 254
Filesystem created: Fri Jul 6 22:31:09 2012
Last mount time: Fri Jul 6 22:33:28 2012
Last write time: Fri Jul 6 22:33:28 2012
Mount count: 1
Maximum mount count: 24
Last checked: Fri Jul 6 22:31:09 2012
Check interval: 15552000 (6 months)
Next check after: Wed Jan 2 22:31:09 2013
Reserved blocks uid: 0 (user root)
Reserved blocks gid: 0 (group root)
First inode: 11
Inode size: 128
Default directory hash: half_md4
Directory Hash Seed: 0140915d-91ae-43df-9d84-9536cedc0d2b

Group 0: (Blocks 1-8192)
main AT. 1 Superblock, Group 2-3 descriptors AT
the GDT block reserved located 4-259
Block Bitmap 260. AT (259), Inode Bitmap AT 261 (260.)
Inode table is 262-515 ( 261)
7663 Free blocks, Free of inodes 2021, Directories 2
free blocks: 530-8192
available number of the inode: 12-2032
...
Group 62 is: (blocks 507905-511999)
block Bitmap AT 507 905 (+0), Inode Bitmap AT 507 906 (+1)
inode table is 507907-508160 (+2)
3839 as Free blocks, 2032 as Free inodes, 0 Directories
number of available blocks: 508161-511999
number of available inode: 125985-128016
the OK, we got this information, but I debugfs prove how information is to get it. Only one way we drill into the super block which, according to the superblock data structure, access to the superblock value of each field, it sounds very exciting, right, OK, Just DO IT.

root @ libin: / mnt / bean # dd if = / dev / loop0 bs = 1k count = 261 | od -tx1 -Ax> / tmp / dump_hex
record read 2610 into the
record 2610 to write
267,264 bytes (267 kB) have been copied, 0.0393023 seconds, 6.8 MB / sec
root @ libin: / mnt / bean # vi / tmp / dump_hex
me the whole front of the loop device 261K bytes read into the / tmp / dump_hex in. Wherein the first boot block is 0, not to mention the press. That is the first piece of super block. Very excited, we can finally and legendary naked meet the superblock.

000400 10 f4 01 00 00 d0 07 00 00 64 00 00 d6 87 07 00
000410 05 f4 01 00 01 00 00 00 00 00 00 00 00 00 00 00
000420 00 20 00 00 00 20 00 00 f0 07 00 00 5f cb f7 4f
000430 5f cb f7 4f 01 00 1a 00 53 ef 00 00 01 00 00 00
000440 25 cb f7 4f 00 4e ed 00 00 00 00 00 01 00 00 00
000450 00 00 00 00 0b 00 00 00 80 00 00 00 38 00 00 00
000460 02 00 00 00 01 00 00 00 5a 65 4b 92 fe 63 43 eb
000470 b6 86 3e f3 6e 44 19 af 00 00 00 00 00 00 00 00
000480 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

0004c0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01
0004d0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0004e0 00 00 00 00 00 00 00 00 00 00 00 00 f9 6f 16 79
0004f0 b7 dc 4f 8a a1 a1 18 82 72 a7 d8 25 01 00 00 00
000500 00 00 00 00 00 00 00 00 25 cb f7 4f 00 00 00 00
000510 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

000560 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
000570 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
*
000800 04 01 00 00 05 01 00 00 06 01 00 00 ef 1d e5 07

    最左边一列是地址,16进制。000400=1K,换句话说,就是文件第1K个字节。000800 =2K,这就是我们朝思暮想的超级块啊。我很激动,所以把整个超级块都贴上了,幸好我不是靠字数来骗稿费的人,否则咱得被鄙视死。
    再把ext2超级块的数据结构贴上,咱挨个字段比较比较,看看debugfs说的对不?

struct ext2_super_block {
u32 s_inodes_count;
u32 s_blocks_count;
u32 s_r_blocks_count;
__u32 s_free_blocks_count;
u32 s_free_inodes_count;
__u32 s_first_data_block;
__u32 s_log_block_size;

...

}
The first field is called s_inodes_count, representing four bytes. OK, we see, from four before the start of 1K bytes is 10 f4 01 00. We know that there are little-endian and big-endian. To support designers ext2 file system can be moved, all the provisions of the disk is little-endian, data is read into memory, kernel is responsible for the format converted to the native format of cpu.

OK,是little-endian咱就明白了,不就是0x0001f410嘛 。 0x0001f410=128016,看看debugfs给我们的数据,Inode count: 128016,一模一样。

再举个例子,比如,我们关心free_blocks_count,查看数据结构,free_blocks_count字段起始位置是超级块的第12字节。即00040c地址。看下的 d6 87 07 00。计算以下可以得到0x000787d6 = 493526,和debugfs 的Free blocks给出的一样。OK。看管关心什么字段,可以自己查看。通过和超级块赤裸想见,我们知道了ext2 super block的结构。

最后总结一句,不是所有的块组都有超级块,超级块只占1个block块,没错,当blocksize为4K的时候,这个块大多数空间是浪费的。不过还好,毕竟超级块个数有限,浪费不了多少。

 下面讲述 块组描述符:

组描述符一共32个字节,大多数的教材都会给我们一组误解,就是每个块组,都要有组描述符。事实上并不是这样。我们知道,一个组描述符只占32字节,而大多数的教材都会告诉我们,一个块组里面的组描述符占k个块,一个组描述符是用不了这么多空间的。

真相只有一个,就是所有的组描述符以数组的形式存放在k个块中。也就是说,某个块组可能没有组描述符,而有组描述符的块组,k个block中存放了所有组块的组描述符。下面我来证实:

struct ext2_group_desc
{
u32 bg_block_bitmap; / Blocks bitmap block /
u32 bg_inode_bitmap; / Inodes bitmap block /
u32 bg_inode_table; / Inodes table block /
u16 bg_free_blocks_count; / Free blocks count /
u16 bg_free_inodes_count; / Free inodes count /
__u16 bg_used_dirs_count; / Directories count /
u16 bg_flags;
__u32 bg_exclude_bitmap_lo;/ Exclude bitmap for snapshots /
u16 bg_block_bitmap_csum_lo;/ crc32c(s_uuid+grp_num+bitmap)LSB /
u16 bg_inode_bitmap_csum_lo;/ crc32c(s_uuid+grp_num+bitmap)LSB /
u16 bg_itable_unused; / Unused inodes count /
u16 bg_checksum; / crc16(s_uuid+grouo_num+group_desc)/
};

Group 0: (Blocks 1-8192)
main AT. 1 Superblock, Group 2-3 descriptors AT
the GDT block reserved located 4-259
Block Bitmap AT 260. (+259), Inode Bitmap AT 261 (+260)
Inode table is 262- 515 (+261)
7663 Free blocks, Free of inodes 2021, Directories 2
free blocks: 530-8192
available number of the inode: 12-2032
Group. 1: (blocks 8193-16384)
backup superblock at 8193, Group descriptors at 8194-8195
reserved the GDT block is located 8196-8451
block Bitmap AT 8452 (+259), Inode Bitmap AT 8453 (+260)
Inode table located 8454-8707 (+261)
7677 Free blocks, Free of inodes 2032, 0 Directories
number of free blocks: 8708- 16384
number of available the inode: 2033-4064
Group 2: (Blocks 16385-24576)
Block Bitmap AT 16385 (+0), Inode Bitmap AT 16386 (+ 1'd)
Inode table located 16387-16640 (+2)
7936 free blocks, 2032 free inodes, 0 directories
free blocks: 16641-24576
number of available inode: 4065-6096
fancy FIG debugfs out information, Group 2, and no so-called group descriptors. And Group1, 8194 and 8195 with two memory blocks. OK, we look, what is inside the store.

Group 0里面第2和第3块存储的是组描述符,也就说从0x000800~0x001000是组描述符块的内容。

000800 04 01 00 00 05 01 00 00 06 01 00 00 ef 1d e5 07
group descriptor 000 810 0,200,040,000,000,000 0,000,000,000,000,000 block group 0

000820 04 21 00 00 05 21 00 00 06 21 00 00 fd 1d f0 07
group descriptor 000830 0,000,040,000,000,000 0,000,000,000,000,000 block group 1

01 40 00 00 02 000.84 thousand 40 00 00 03 40 00 00 1F 00 07 F0
000.85 thousand 00 00 04 00 00 00 00 00 00 00 00 00 00 00 00 00 group descriptors Group 2

000860 04 61 00 00 05 61 00 00 06 61 00 00 fd 1d f0 07
000870 00 00 04 00 00 00 00 00 00 00 00 00 00 00 00 00
000880 01 80 00 00 02 80 00 00 03 80 00 00 00 1f f0 07
000890 00 00 04 00 00 00 00 00 00 00 00 00 00 00 00 00
0008a0 04 a1 00 00 05 a1 00 00 06 a1 00 00 fd 1d f0 07
0008b0 00 00 04 00 00 00 00 00 00 00 00 00 00 00 00 00
0008c0 01 c0 00 00 02 c0 00 00 03 c0 00 00 00 1f f0 07
0008d0 00 00 04 00 00 00 00 00 00 00 00 00 00 00 00 00
0008e0 04 e1 00 00 05 e1 00 00 06 e1 00 00 fd 1d f0 07
0008f0 00 00 04 00 00 00 00 00 00 00 00 00 00 00 00 00
000900 01 00 01 00 02 00 01 00 03 00 01 00 00 1f f0 07
000910 00 00 04 00 00 00 00 00 00 00 00 00 00 00 00 00
000fb0 00 00 04 00 00 00 00 00 00 00 00 00 00 00 00 00
000fc0 01 C0 07 00 02 C0 07 00 03 C0 07 00 FF 0E F0 07
000fd0 00 00 04 00 00 00 00 00 00 00 00 00 00 00 00 00 set 62 of group descriptors

000fe0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

  • No block group 63

    20 is 00 00 04 04 001000 60 00 00 04 A0 00 00 04 E0 00 00
    04 01 00 00 decimal is converted into readable 0x104 = 259, data representing the bitmap 259 located on the block. inode bitmap 260 is located, and debugfs out information is the same as (not boot block). 0x1def = 7663 idle blocks ....

    Tell me what you can parse any information about themselves a block group, block groups, and information can be proved out of debugfs is consistent. Now we have identified, in the form of group descriptor array is stored in the K fast, our only 63 chunks, each chunk requires 32 bytes, only two of 1KB block is sufficient. That is, in fact, the superblock and group descriptors as, in fact redundant. That is, the other two of the storage group descriptor block, two block group descriptor information block group and 0 is the same. Let me prove.

    Block groups 25 are also set descriptor block, two blocks 204,802 and 204,803, the group described recording blocks 63 of the group icon information. Content and should be set in front of the block 0 of the two blocks the same. I've removed the contents of the two block, we compare it themselves, the result is the content is the same.

Group 25: (Blocks 204801-212992)
backup AT 204 801 Superblock, Group descriptors AT 204802-204803
the GDT block reserved positioned 204804-205059
Block Bitmap AT 205 060 (+259), Inode Bitmap AT 205 061 (+260)
Inode table is 205062- 205 315 (+261)
7677 Free blocks, Free of inodes 2032, 0 Directories
free blocks: 205316-212992
number of available inode: 50801-52832

Click (here) folding or opening
the root Libin @: / mnt / dd IF the bean # = / dev / loop0 BS = 204 802 = 1K Skip COUNT = 2 | OD -tx1 -Ax> / tmp / the dump hex
recorded 2 + 0 read into the
record a 2 + 0, write
2048 bytes (2.0 kB) have been copied, 0.000160205 seconds, 12.8 MB / sec
root @ Libin: / mnt / bean # vi / tmp / dump hex
000000 04 01 00 00 05 01 00 00 06 01 00 00 EF 1D E5 07
000010 02 00 04 00 00 00 00 00 00 00 00 00 00 00 00 00
000020 04 21 00 00 05 21 00 00 06 21 00 00 fd 1D F0 07
000030 00 00 04 00 00 00 00 00 00 00 00 00 00 00 00 00
000040 01 40 00 00 02 40 00 00 03 40 00 00 00 1F F0 07
000050 00 00 04 00 00 00 00 00 00 00 00 00 00 00 00 00
000060 04 61 00 00 61 00 00 06 61 05 00 00 F0 1D fd 07
000070 00 00 04 00 00 00 00 00 00 00 00 00 00 00 00 00
000080 01 80 00 00 02 80 00 00 03 80 00 00 00 1f f0 07
000090 00 00 04 00 00 00 00 00 00 00 00 00 00 00 00 00
0000a0 04 a1 00 00 05 a1 00 00 06 a1 00 00 fd 1d f0 07
0000b0 00 00 04 00 00 00 00 00 00 00 00 00 00 00 00 00
....
0007c0 01 c0 07 00 02 c0 07 00 03 c0 07 00 ff 0e f0 07
0007d0 00 00 04 00 00 00 00 00 00 00 00 00 00 00 00 00
0007e0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
*
000800

最后,最后的最后,解释以下,为什么每个块组中的块数blocks per group 是8192,因为,我们用1个块作为位图保存本块组 block的使用情况(bit为1表示对应的block被使用,bit为0表示对应的block空闲),1个block是1024字节,共有1024*8=8192个bit,所以,每个块组最多只能是81292个块。

同样道理如果用户使用的是4094大小的块,那么,4096*8=32768个bit,所以每个块组会有32K个块。证据在下面。

the root @ Libin: / mnt / the bean # CD / Home
the root @ Libin: / Home # umount / dev / loop0
the root @ Libin: / Home # CD / mnt / the bean
the root @ Libin: / mnt / the bean # LL
total volume. 8
drwxr the root the root-X 2 -XR 4096 2012-07-06 22:32 ./
drwxr XR-4096-X. 4 the root the root 2012-07-06 22:32 ../
the root Libin @: / mnt / the bean the mke2fs -b # 4096 / dev / loop0
the mke2fs 1.41.11 (-14-Mar-2010)
file system label =
OS inux
block size = 4096 (log = 2)
block size = 4096 (log = 2)
Stride = 0 blocks, Stripe width = blocks 0
128000 of inodes, 128000 blocks
6400 blocks (5.00%) The Super User Reserved for
a first data block 0 =
the Maximum filesystem blocks = 134217728
. 4 block Groups
32768 blocks per Group, 32768 Group fragments per
32000 of inodes per Group
Superblock backups stored on blocks:
32768, 98304

Writing inode tables: Finish
Writing superblocks and filesystem accounting information: complete

This filesystem will be automatically checked every 39 mounts or
180 days, whichever comes first. Use tune2fs -c or -i to override

Guess you like

Origin blog.51cto.com/14601104/2450298