File system and files---concept, storage structure, access

Reference documents:

https://blog.csdn.net/github_37882837/article/details/90672881

http://c.biancheng.net/view/880.html

https://blog.csdn.net/qq_22613757/article/details/80853391

https://www.iteye.com/topic/816268

Preface

    The content of files and file systems on the Internet is relatively fragmented, so I'm just moving bricks here. After reading the articles of various great gods, publish them in a way that you understand for future reference. 

 

One, file system

 

1. File system concept

    The file system is a mechanism for organizing data and metadata on a device.

    The file system used by each operating system is different. For example, Microsoft operating systems before WINDOWS98 use FAT (FAT16) file system, and versions after WINDOWS 2000 use NTFS file system. The orthodox file system of Linux is EXT2.

    Formatting the hard disk only erases the data in the hard disk. In fact, it is not. The file system is written to the hard disk during the formatting process. Because different operating systems have different ways of managing files in the system (the attributes and permissions set for the files are not exactly the same), therefore, in order for the hard disk to effectively store the file data in the current system, the hard disk needs to be Format so that it uses the same (or close) file system format as the operating system.

 

2. File system level

 

                                       

                                                                                Figure 1 Linux file system hierarchy

     From top to bottom, it is mainly divided into user layer, VFS layer, file system layer, general block driver layer, driver layer, and physical layer

    1) Application: The top user layer is the various programs we use daily. The required interfaces are mainly file creation, deletion, opening, closing, writing, reading, etc.

    2) System Call Interface (SCI): Encapsulate the interface provided by the virtual file system.

    3) VFS layer: virtual file system, abstracts different file systems, and provides a unified API interface for upper-level applications. User space does not need to care about different APIs in different file systems. After the unified API provided by VFS is encapsulated by System Call Interface, users can use SCI system calls to operate different file systems.

    4) File system layer: Different file systems implement these functions of VFS and register them in VFS through pointers. Therefore, user operations are transferred to various file systems through VFS.

    5) General Block Driver Layer: Hide the details of different hardware devices and provide a unified IO operation interface for the kernel; if you modify this layer, it will affect all file systems, whether it is ext3, ext4 or other file systems.

    6) Device Driver: Disk driver layer, the disk driver converts read and write commands to the disk into their own protocols, or custom commands that can be recognized by their own hardware, and send them to the disk controller.

    7) Physical disk: Disk physical layer, read and write physical data to disk media.

 

3. File system storage structure

    Take the EXT4 file system as an example. The EXT4 file system mainly uses the super block and block group descriptor table in block group 0, and there are redundant backups of the super block and block group descriptor table in some other specific block groups. If there is no redundant backup in the block group, the block group will start with the data block bitmap. When the formatted disk becomes an Ext4 file system, mkfs will allocate reserved GDT table data blocks ("Reserve GDT blocks") behind the block group descriptor table for future expansion of the file system. Immediately after the reserved GDT table data block is the data block bitmap and the inode table bitmap, these two bitmaps respectively represent the use of the data block and inode table in the block group, after the inode table data block It is the data block that stores the file. Among these various blocks, the super block, GDT, block bitmap, and index node bitmap are the metadata of the entire file system. Of course, the inode table is also the metadata of the file system, but the inode node table is related to the file For one-to-one correspondence, I prefer to treat the index node as the metadata of the file, because when the file system is actually formatted, there is actually no data in the other inode tables except for the ten or so already used. The inode table that will not be allocated until the corresponding file is created, and the file system will write the inode information related to the file in the inode table.

 

                                 

                                                                           Figure 2 Standard disk layout of EXT4 file system

    1) Super block: The first block in the file system is called the super block (recording the information of the entire file system), this block stores the structural information of the file system itself. For example, the super block records the number of data blocks, the number of inodes, the number of unused blocks, supported features, and management information.

    2) Fast group descriptor: Each block group has a corresponding group descriptor to describe it. All group descriptors form a group descriptor table, which may occupy multiple data blocks. The group descriptor is equivalent to the super block of each block group. Once a group descriptor is destroyed, the entire block group will be unusable, so the group descriptor table is also backed up in each block group like a super block. To prevent damage. The block occupied by the group descriptor table is the same as the ordinary data block, which is transferred to the block cache during use.

    3) Data block bitmap and inode bitmap: The data block bitmap tracks the usage of data blocks in the block group. The Inode bitmap tracks the usage of Inode in the block group. Each bitmap has a data block, and each bit uses 0 or 1 to indicate the usage of the data block in a block group or the inode in the inode table. If the size of a data block is 4KB, that bitmap block can represent the usage of 4*1024*8 data blocks, which is also the maximum number of data blocks in a single block group. It can be calculated that the size of a block group is 128MB. Of course, a bitmap block can also represent the usage of 4*1024*8 inodes, but in fact, even if a block group is full of files, so many inodes will not be used, because there are basically no inodes in the actual system. All file sizes are less than or equal to the size of 1 data block. In fact, the number of inodes in a block group is determined in the block group descriptor. This value will also be seen during the file system formatting process. If you remember correctly, it is probably every 4 or 8 data Block allocates an inode space.

    4) The node table contains a list, which lists all the inode numbers of the corresponding file system. When a user searches or accesses a file, the Linux system searches for the correct inode number through the inode table. After finding the inode number, related commands can access the inode and make appropriate changes.

 

4. The connection and difference between hard links and soft links

    Hard link and soft link (also called symbolic link, soft link or symbolic link). Linking solves the sharing of files for the Linux system, and also brings benefits such as hiding file paths, increasing permission security, and saving storage.

    If one inode number corresponds to multiple file names, these files are called hard links. In other words, a hard link is the use of multiple aliases to the same file (see Figure 3   an alias hard link is the file, they have a common inode). The hard link can be created by the command link or ln. The following is to create a hard link to the file oldfile.

    link oldfile newfile 或

    ln oldfile newfile

Since hard links are files with the same inode number but different file names, hard links have the following characteristics:

    1) The files have the same inode and data block;

    2) Only existing files can be created;

    3) It is not possible to create hard links across file systems;

    4) Cannot create directories, only files;

    5) Deleting a hard-linked file does not affect other files with the same inode number.

    A soft link is different from a hard link. If the content stored in the user data block of a file is pointed to by the path name of another file, the file is a soft link. A soft link is an ordinary file, but the content of the data block is a bit special. Soft link has its own inode number and the user data block (see Figure 3 ). Therefore, the creation and use of soft links do not have many restrictions similar to hard links:

    1) The soft link has its own file attributes and permissions;

    2) Create soft links to non-existent files or directories;

    3) Soft link can cross file system;

    4) Soft links can be created for files or directories;

    5) When creating a soft link, the link count i_nlink will not increase;

    6) Deleting a soft link does not affect the pointed file, but if the original file pointed to is deleted, the related soft link is called a dead link (ie dangling link, if the pointed path file is recreated, the dead link can be restored Is a normal soft link).

                                                                           

                                                                                     Figure 3 Soft link access 

    Create a soft link: ln -s oldfile newfile.soft

    The impact of copying files, moving files and deleting files on the inode:

    1) Copying files is to create files and account for Inode and Block.

The file creation process is: first find an empty Inode, write a new Inode table, create a Directory, correspond to the file name, and write the file content to the block;

    2) There are two situations for moving files:

    Move files in the same file system, create a new file name and inode correspondence, that is, write information in the Directory, then delete the old information in the Directory, update CTIME (file time), other information such as inode, etc. No impact;

    When moving files in different file systems, first find an empty inode, write to a new inode table, create a corresponding relationship in the Directory, write the file content to the block, and change the CTIME at the same time.

    3) Deleting a file is essentially reducing the link count. When the link count is 0, it means that the inode can be used and the block is marked as writable, but the data in the block is not eliminated unless there is new data to be used This block.

 

5. Common file systems supported by Linux

    

    View the file system types supported by the current Linux system on the device: cat /proc/filesystems

 

2. Documents

1. File storage structure

    Linux orthodox file system (such as ext2, ext3) A file is composed of directory entries, inodes and data blocks.

    Directory entry : including file name and inode node number.

    Inode : Also known as the file index node, it is the storage location of the basic information of the file and the storage location of the data block pointer.

    Data block : where the specific content of the file is stored.

    The orthodox Linux file system (such as ext2, 3, 4, etc.) partitions the hard disk into directory blocks, inode Table blocks, and data blocks. A file consists of a directory entry, inode and data area block. Inode contains the attributes of the file (such as read and write attributes, owner, etc., and pointers to data blocks), and the data area block is the file content. When viewing a file, first find out the file attributes and data storage points from the inode table, and then read the data from the data block.

     The file storage structure is roughly as follows:

                                                      

                                                                                  Figure 4 File storage structure

     The structure of the directory entry is as follows (the directory entry of each file is stored in the file content of the directory to which the changed file belongs):

                                                                         
                                                                                   Figure 5 Directory item structure

     The inode structure of the file is as follows (the file information contained in the inode can be viewed through stat filename):

                                               

                                                                                          Figure 6 inode structure

    The above only reflects the general structure, the linux file system itself is constantly evolving. But the above concept is basically unchanged. And there are also big differences between ext2, ext3, and ext4 file systems. If you want to understand, you can check the introduction of special file systems.

 

2. How does the file system access files

    1) According to the file name, find the inode number (inode node number) corresponding to the file through the corresponding relationship in the Directory;

    2) Search the Inode table and find the inode node corresponding to the inode number (the node stores the address of the corresponding data block);

    3) Read the corresponding data block according to the data block pointer in the inode node;

Note :

    1) There is an important content here, which is Directory. It is not a directory we usually say, but a list that records the Inode number corresponding to a file/directory name.

    2) The inode node does not store the file name of the file, because the file name is the data of the directory where the file is located, so it will be stored in the data block of the upper-level directory.

 

3. File Type

    The main file types under Linux are:

    1) Ordinary files: C language metacode, SHELL script, binary executable file, etc. Divided into plain text and binary.

    2) Directory file: Directory, the only place to store files.

    3) Linked files: files that point to the same file or directory.

    4) Special files: related to system peripherals, usually under /dev. Divided into block devices and character devices.

    You can use ls -l, file, stat commands to view the file type and other related information.

     a: regular file, that is f:

    d: directory, directory file

    b: block device, block device file, supports random access in units of blocks

    c: character device, character device file, supports access in character units

    major number: major device number, used to indicate the device model, and then determine the driver to be loaded

    minor number: minor device number, used to identify different devices of the same type

    l: symbolic link, symbolic link file

    p: pipe, named pipe

    s: socket, socket file

Guess you like

Origin blog.csdn.net/the_wan/article/details/108554957