Linux operating system study notes (11) file system

I. Introduction

  The beginning of this section will analyze the Linux file system. The idea that everything is a file in Linux is well known, and its file system is essential knowledge for character devices, block devices, pipes, inter-process communication, networks, etc., so its importance can be imagined. This article will first introduce the basics of the file system, and then introduce the most important structure inodeand the layered file system built on it.

2. Basic knowledge of file system

  Everything is designed to achieve requirements, so we look at how to design it from the basic functions required by the file system. First of all, a file system needs to have the following basic requirements

  • The file needs to be easy to read and write, and avoid name conflicts, etc.
  • Files need to be easy to find, organize and categorize
  • The operating system needs to have documentation capabilities for management

  Therefore, the file system is designed with the following characteristics

  • Adopt a tree structure and folder design
  • Cache hot files for easy reading and writing
  • Use index structure, easy to find classification
  • Maintain a set of data structures to record which documents are being used by which tasks

  Based on this basic design, we can begin to slowly start to look at the broad and spiritual file system of Linux.

3. Inode structure and file system

3.1 Representation of block storage

  In the hard disk, we use blocks as the storage unit. In the file system, we need to have a basic structure for storing block information. This is the cornerstone of the file system inode. The source code is as follows. inodeIt means index nodethat the index node. From this data structure, we can see that inodethere are file read and write permissions i_mode, which user belongs to i_uid, which group i_gid, what is the size i_size_lo, and how many blocks it occupies i_blocks_lo. In addition, there are several times related to files. i_atimeThat access timeis the time of the last access to the file; i_ctimethat is change time, is the most recent change inodetime; i_mtimethat is modify time, change the file is the last time.

/*
 * Structure of an inode on the disk
 */
struct ext4_inode {
    
    
    __le16	i_mode;		/* File mode */
    __le16	i_uid;		/* Low 16 bits of Owner Uid */
    __le32	i_size_lo;	/* Size in bytes */
    __le32	i_atime;	/* Access time */
    __le32	i_ctime;	/* Inode Change time */
    __le32	i_mtime;	/* Modification time */
    __le32	i_dtime;	/* Deletion Time */
    __le16	i_gid;		/* Low 16 bits of Group Id */
    __le16	i_links_count;	/* Links count */
    __le32	i_blocks_lo;	/* Blocks count */
    __le32	i_flags;	/* File flags */
......
    __le32	i_block[EXT4_N_BLOCKS];/* Pointers to blocks */
......
};

#define EXT4_NDIR_BLOCKS 12
#define EXT4_IND_BLOCK EXT4_NDIR_BLOCKS
#define EXT4_DIND_BLOCK (EXT4_IND_BLOCK + 1)
#define EXT4_TIND_BLOCK (EXT4_DIND_BLOCK + 1)
#define EXT4_N_BLOCKS (EXT4_TIND_BLOCK + 1)

  Here we need to focus on the following i_block, the member variable actually stores each block of the file content. In ext2 and ext3 format file systems, we use the first 12 blocks to store the corresponding file data, each block is 4KB. If the file is too large to fit, you need to use the next few indirect storage blocks to save the data. The vivid expression of its storage principle.

img

  The problem with this storage structure is that for large files, we need multiple calls to access the content of the corresponding block, so the access speed is slow. To this end, ext4 proposed a new solution: Extents . Simply put, Extents uses a tree structure to continuously store file blocks to improve access speed. The general structure is shown in the figure below.

[External link image transfer failed. The source site may have an anti-leech link mechanism. It is recommended to save the image and upload it directly (img-lB3a7yLE-1596126703207)(https://ars.els-cdn.com/content/image/1-s2 .0-S1742287617300270-gr2.jpg)]

  The main structure is a node ext4_extent_header, which eh_entriesindicates how many items are in this node. There are two types of items here:

  • If it is a leaf node, this item will directly point to the address of a continuous block on the hard disk, which is called a data node ext4_extent;
  • If it is a branch node, this item will point to the branch node or leaf node of the next layer, which we call the index node ext4_extent_idx. The size of these two types of items is 12 byte.

  If the file is small, inodeinside i_block, you can put it down one ext4_extent_headerand four ext4_extent. So this time, eh_depthis 0, that inodethere is a leaf node, the tree height is 0. If the files are large, four extentdoes not fit, we should split into a tree, eh_depth>0the node is the inode, which maximum depth of the root node in the inodemiddle. The bottom eh_depth=0is a leaf node. In addition to the root node, the other nodes are stored in a block inside 4k, 4k deduction ext4_extent_headerof 12 byte, and the rest can put 340, each extenta maximum of 128MB of data can be represented, 340 extentwill indicate the files you reach 42.5GB. This is already very big, if it is bigger, we can increase the depth of the tree.

/*
 * Each block (leaves and indexes), even inode-stored has header.
 */
struct ext4_extent_header {
    
    
    __le16	eh_magic;	/* probably will support different formats */
    __le16	eh_entries;	/* number of valid entries */
    __le16	eh_max;		/* capacity of store in entries */
    __le16	eh_depth;	/* has tree real underlying blocks? */
    __le32	eh_generation;	/* generation of the tree */
};

/*
 * This is the extent on-disk structure.
 * It's used at the bottom of the tree.
 */
struct ext4_extent {
    
    
    __le32  ee_block;  /* first logical block extent covers */
    __le16  ee_len;    /* number of blocks covered by extent */
    __le16  ee_start_hi;  /* high 16 bits of physical block */
    __le32  ee_start_lo;  /* low 32 bits of physical block */
};

/*
 * This is index on-disk structure.
 * It's used at all the levels except the bottom.
 */
struct ext4_extent_idx {
    
    
    __le32  ei_block;  /* index covers logical blocks from 'block' */
    __le32  ei_leaf_lo;  /* pointer to the physical block of the next *
         * level. leaf or next index could be there */
    __le16  ei_leaf_hi;  /* high 16 bits of physical block */
    __u16  ei_unused;
};

  Thus, we can inodeexpress a series of blocks to form a file. On the hard disk, through a series inode, we can store a large number of files. But we still need a way to store and manage inode, this is the bitmap. Similarly, we will use block bitmaps to manage block information. The following shows inodethe access to the bitmap during the creation process. We need to find out where the next 0 bit is, that is, the free inodeposition.

struct inode *__ext4_new_inode(handle_t *handle, struct inode *dir,
             umode_t mode, const struct qstr *qstr,
             __u32 goal, uid_t *owner, __u32 i_flags,
             int handle_type, unsigned int line_no,
             int nblocks)
{
    
    
......
    inode_bitmap_bh = ext4_read_inode_bitmap(sb, group);
......
    ino = ext4_find_next_zero_bit((unsigned long *)
                inode_bitmap_bh->b_data,
                EXT4_INODES_PER_GROUP(sb), ino);
......
}

3.2 File system format

  inodeThe sum block is the smallest unit of the file system, and there are multi-level systems on top of it, roughly as follows:

  • Block group: A unit that stores a piece of data, the data structure is ext4_group_desc. There is a block for the group of inodethe bitmap bg_inode_bitmap_lo, bitmap block bg_block_bitmap_lo, inodea list of bg_inode_table_locorresponding definitions are. Each block group basically constitutes the structure of our entire file system.
  • Block group descriptor table: a table composed of multiple block group descriptors
  • Super block: a case where the entire file system will be described, i.e. ext4_super_block, store global information, such as an entire file system of a total number inode: ; s_inodes_counta total number of blocks: s_blocks_count_loeach block group number inode: s_inodes_per_groupeach block group number of blocks: s_blocks_per_groupother .
  • Boot block: For the entire file system, we need to reserve an area as the boot area for operating system startup, so 1K should be reserved in front of the first block group to start the boot area.
img

  Super block and block group descriptor table are global information, and these data are very important. If these data are lost, the entire file system cannot be opened, which is more serious than damage to one block of a file. Therefore, we need to backup these two parts, but adopt different strategies.

  • Default strategy: Save a copy of the super block and block group description table in each block
  • sparse_super strategy: sparse storage is adopted, and only stored in the integer power of the block group index of 0, 3, 5, and 7.
  • Meta Block Groups strategy: We divide block groups into multiple meta block groups (Meta Block Groups). The block group descriptor table in each meta block group only includes its own content. A tuple group contains 64 block groups. The block group descriptor table in such a tuple group has a maximum of 64 items. This approach is similar merkle tree, and space can be optimized to a large extent.
img

3.3 The storage format of the directory

  In order to facilitate the search of files, we must have an index, that is, a file directory. In fact, the directory itself is also a file, as well inode. inodeIt also points to some blocks. Unlike ordinary files, the blocks of ordinary files store file data, while the blocks of directory files store file information item by item in the directory. We call this information ext4_dir_entry. There are two versions of the second version ext4_dir_entry_2is a 16-bit name_len, into a 8-bit name_lenand 8-bit file_type.

struct ext4_dir_entry {
    
    
    __le32  inode;      /* Inode number */
    __le16  rec_len;    /* Directory entry length */
    __le16  name_len;    /* Name length */
    char  name[EXT4_NAME_LEN];  /* File name */
};
struct ext4_dir_entry_2 {
    
    
    __le32  inode;      /* Inode number */
    __le16  rec_len;    /* Directory entry length */
    __u8  name_len;    /* Name length */
    __u8  file_type;
    char  name[EXT4_NAME_LEN];  /* File name */
};

  In the block catalog file, the easiest format to save a list, that is, one by one to ext4_dir_entry_2the column where. Each item will save the file name of the file at the next level of the directory and the corresponding inode, through this inode, you can find the real file. The first item is ".", indicating the current directory, the second item is "...", indicating the upper level directory, and the next is the file name and item inode. Sometimes, if there are too many files in a directory, we want to find a file in this directory. It is too slow to find one by one according to the list, so we add the index mode. If you inodeset the EXT4_INDEX_FLflag, the block of the file directory organization will change, become like defined below:

struct dx_root
{
    
    
    struct fake_dirent dot;
    char dot_name[4];
    struct fake_dirent dotdot;
    char dotdot_name[4];
    struct dx_root_info
    {
    
    
      __le32 reserved_zero;
      u8 hash_version;
      u8 info_length; /* 8 */
      u8 indirect_levels;
      u8 unused_flags;
    }
    info;
    struct dx_entry  entries[0];
};

  The current directory and the parent directory remain unchanged, and the file list is changed to a dx_root_infostructure. The most important member variable is indirect_levels, which indicates the number of indirect index levels. Index entry by the structure dx_entryrepresented, a mapping relation is a hash value of the file name and the data blocks in nature.

struct dx_entry
{
    
    
    __le32 hash;
    __le32 block;
};

  If we want to find the file name under a directory, we can get the hash by name. If the hash can match, it means that the information of this file is in the corresponding block. Then open this block, if there is no index, but the index tree leaf node, then there is still ext4_dir_entry_2a list, one by one as long as we find the file name on the line. Through the index tree, we can disperse more than N files in a directory into many blocks, which can be quickly searched.

img

3.4 Storage format of soft link and hard link

  Soft links and hard links are also a kind of files, which can be created by the following commands. ln -sCreate a soft link, without the -screation of a hard link.

ln [参数][源文件或目录][目标文件或目录]

  Hard link to the original file share a inode, but inodeis not cross-file system, each file system has its own inodelist, which is no way to hard links across file systems. Unlike soft links, soft links are equivalent to re-creating a file. This file is also independent inode, but when you open this file to see the content inside, the content points to another file. This is very flexible. We can cross file systems, even if the target file is deleted, the linked file still exists, but the pointed file is no longer found.

img

Four. Summary

  This article mainly starts from the design perspective of the file system, and gradually analyzes the structure and main components of the file system inodebased on it . The following is a summary of a picture in Geek Time.inodeext4

img

Source information

[1] inode

[2] ext4_inode

[3] ext4_extent

Reference

[1] wiki

[2] elixir.bootlin.com/linux

[3] woboq

[4] Linux-insides

[5] Deep understanding of Linux kernel

[6] The art of Linux kernel design

[7] Geek Time Talks about Linux Operating System

Guess you like

Origin blog.csdn.net/u013354486/article/details/107704439