Data structure of the virtual file system

Data structure of the virtual file system

Although the physical structures of different file system types are different, the virtual file system defines a unified data structure.

(1) Superblock. The first block of the file system is the super block, which describes the overall information of the file system. When the file system is mounted, a copy of the super block is created in memory: the structure super_block.
(2) The virtual file system organizes directories as a tree in memory. A file system can only be accessed by a process if it is mounted under a directory in the directory tree in memory. Every time the file system is mounted, the virtual file system creates a mount descriptor: mount structure, reads the super block of the file system, and creates a copy of the super block in memory.
(3) The format of the super block of each file system is different. It is necessary to register the file system type file_system_type with the virtual file system, and implement the mount method to read and parse the super block.
(4) Index node. Each file corresponds to an index node, and each index node has a unique number. When the kernel accesses a file on a storage device, it creates a copy of the inode in memory: the structure inode.
(5) Directory items. The file system regards the directory as a type of file, and the data of the directory is composed of directory entries, and each directory entry stores the name of a subdirectory or file and the corresponding index node number. When the kernel accesses a directory entry on the storage device, a copy of the directory entry is created in memory: the structure dentry.
(6) When a process opens a file, the virtual file system will create an open instance of the file: the file structure, and then assign an index in the process's open file table. This index is called a file descriptor, and finally the file A map of descriptors and file structures is added to the open file table.

super fast

The first block of the file system is the super block, which is used to describe the overall information of the file system. When we mount the file system to a directory of the directory tree in the memory, the super block of the file system will be read, and a copy of the super block will be created in the memory: the structure super_block, the main members are as follows:

include/linux/fs.h 
struct super_block {
    
     
	struct list_head s_list; 
	dev_t s_dev; 
	unsigned char s_blocksize_bits; 
	unsigned long s_blocksize; 
	loff_t s_maxbytes;
	struct file_system_type *s_type; 
	const struct super_operations *s_op; 
	… 
	unsigned long s_flags; 
	unsigned long s_iflags; /*内部 SB_I_* 标志 */ 
	unsigned long s_magic; 
	struct dentry *s_root; 
	… 
	struct hlist_bl_head s_anon; 
	struct list_head s_mounts; 
	struct block_device *s_bdev; 
	struct backing_dev_info *s_bdi; 
	struct mtd_info *s_mtd; 
	struct hlist_node s_instances;void *s_fs_info;}; 

(1) The member s_list is used to link all super block instances to the global linked list super_blocks.
(2) Members s_dev and s_bdev save the block device where the file system is located, the former saves the device number, and the latter points to a block_device instance in memory.
(3) The member s_blocksize is the block length, and the member s_blocksize_bits is the base 2 logarithm of the block length.
(4) The member s_maxbytes is the maximum file length supported by the file system.
(5) The member s_flags is a flag bit.
(6) The member s_type points to the file system type.
(7) The member s_op points to the super block operation set.
(8) The member s_magic is the magic number of the file system type, and each file system type is assigned a unique magic number.
(9) The member s_root points to the structure dentry of the root directory.
(10) The member s_fs_info points to the private information of the specific file system.
(11) The member s_instances is used to link all super block instances of the same file system type together, and the head node of the linked list is the member fs_supers of the structure file_system_type.

The data structure of the super block operation set is the structure super_operations, the main members are as follows:

include/linux/fs.h 
struct super_operations {
    
     
	struct inode *(*alloc_inode)(struct super_block *sb); 
	void (*destroy_inode)(struct inode *); 
	void (*dirty_inode) (struct inode *, int flags); 
	int (*write_inode) (struct inode *, struct writeback_control *wbc); 
	int (*drop_inode) (struct inode *); 
	void (*evict_inode) (struct inode *); 
	void (*put_super) (struct super_block *); 
	int (*sync_fs)(struct super_block *sb, int wait);int (*statfs) (struct dentry *, struct kstatfs *); 
	int (*remount_fs) (struct super_block *, int *, char *); 
	void (*umount_begin) (struct super_block *);}; 

(1) The member alloc_inode is used to allocate and initialize memory for an index node.
(2) The member destroy_inode is used to release the index contacts in the memory.
(3) The member dirty_inode is used to mark the index node as dirty.
(4) The member write_inode is used to write an index node to the storage device.
(5) The member drop_inode is used to call when the reference count of the index node is reduced to 0.
(6) The member evict_inode is used to delete an index node from the file system on the storage device.
(7) The member put_super is used to release the super block.
(8) The member sync_fs is used to synchronize the modified data of the file system to the storage device.
(9) The member statfs is used to read the statistical information of the file system.
(10) The member remount_fs is used to call when remounting the file system.
(11) The member umount_begin is used to call when unmounting the file system.

mount descriptor

A file system can only be accessed by a process if it is mounted under a directory in the directory tree in memory. Every time the file system is mounted, the virtual file system will create a mount descriptor: mount structure. The mount descriptor is used to describe a mount instance of the file system. The file system on the same storage device can be mounted multiple times, and each time it is mounted to a different directory. The main members of the mount descriptor are as follows:

fs/mount.h 
struct mount {
    
     
	struct hlist_node mnt_hash; 
	struct mount *mnt_parent; 
	struct dentry *mnt_mountpoint; 
	struct vfsmount mnt; 
	union {
    
     
		struct rcu_head mnt_rcu; 
		struct llist_node mnt_llist; 
	}; 
#ifdef CONFIG_SMP 
	struct mnt_pcp __percpu *mnt_pcp; 
#else 
	int mnt_count; 
	int mnt_writers; 
#endif 
	struct list_head mnt_mounts; 
	struct list_head mnt_child; 
	struct list_head mnt_instance; 
	const char *mnt_devname; 
	struct list_head mnt_list; 
	… 
	struct mnt_namespace *mnt_ns; 
	struct mountpoint *mnt_mp; 
	struct hlist_node mnt_mp_list;} 

Suppose we mount the file system 2 under the directory "/a", and the directory a belongs to the file system 1. The directory a is called the mount point, the mount instance of file system 2 is the child of the mount instance of file system 1, and the mount instance of file system 1 is the parent of the mount instance of file system 2.

(1) The member mnt_parent points to the father, that is, the mount instance of file system 1.
(2) The member mnt_mountpoint points to the directory as the mount point, that is, directory a of file system 1, and the member d_flags of the dentry instance of directory a sets the flag bit DCACHE_MOUNTED.
(3) The type of member mnt is as follows:

struct vfsmount {
    
     
	struct dentry *mnt_root; 
	struct super_block *mnt_sb; 
	int mnt_flags; 
};

mnt_root points to the root directory of file system 2, and mnt_sb points to the super block of file system 2.
(4) The member mnt_hash is used to add the mount descriptor to the global hash table mount_hashtable, and the key word is {parent mount descriptor, mount point}.
(5) The member mnt_mounts is the head node of the child list.
(6) The member mnt_child is used to join the father's child list.
(7) The member mnt_instance is used to add the mount descriptor to the mount instance linked list of the super block. The file system on the same storage device can be mounted multiple times, and each time it is mounted to a different directory.
(8) The member mnt_devname points to the name of the storage device.
(9) The member mnt_ns points to the mount namespace.
(10) The member mnt_mp points to the mount point, the type is as follows:

struct mountpoint {
    
     
	struct hlist_node m_hash; 
	struct dentry *m_dentry; 
	struct hlist_head m_list; 
	int m_count; 
}; 

m_dentry points to the directory as the mount point, and m_list is used to link all mount descriptors under the same mount point. Why are there multiple mount descriptors under the same mount point? This has to do with mounting namespaces.
(11) The member mnt_mp_list is used to add the mount descriptor to the mount descriptor linked list of the same mount point, and the head node of the linked list is the member m_list of the member mnt_mp.

file system type

Because the format of the super block of each file system is different, each file system needs to register the file system type file_system_type with the virtual file system, and implement the mount method to read and parse the super block. The structure file_system_type is as follows:

include/linux/fs.h 
struct file_system_type {
    
     
	const char *name; 
	int fs_flags; 
#define FS_REQUIRES_DEV 1
#define FS_BINARY_MOUNTDATA 2 
#define FS_HAS_SUBTYPE 4 
#define FS_USERNS_MOUNT 8 
#define FS_RENAME_DOES_D_MOVE 32768 
	struct dentry *(*mount) (struct file_system_type *, int, const char *, void *); 
	void (*kill_sb) (struct super_block *); 
	struct module *owner; 
	struct file_system_type * next; 
	struct hlist_head fs_supers;}; 

(1) The member name is the name of the file system type.
(2) The method mount is used to read and parse the super block when mounting the file system.
(3) The method kill_sb is used to release the super block when unmounting the file system.
(4) The types of file systems on multiple storage devices may be the same, and the member fs_supers is used to
link the superblocks of the same file system type.

index node

In the file system, each file corresponds to an index node, and the index node describes two types of information.
(1) The attributes of the file, also known as metadata (metadata), such as the length of the file, the identifier of the user who created the file, the time of the last access and the time of the last modification, and so on.
(2) The storage location of the file data.
Each inode has a unique number.
When the kernel accesses a file on the storage device, a copy of the index node will be created in memory: the structure inode, the main members are as follows:

include/linux/fs.h 
struct inode {
    
     
	umode_t i_mode; 
	unsigned short i_opflags; 
	kuid_t i_uid; 
	kgid_t i_gid; 
	unsigned int i_flags; 
	
#ifdef CONFIG_FS_POSIX_ACL 
	struct posix_acl *i_acl; 
	struct posix_acl *i_default_acl; 
#endif 

	const struct inode_operations *i_op; 
	struct super_block *i_sb; 
	struct address_space *i_mapping; 
	… 
	unsigned long i_ino; 
	union {
    
     
		const unsigned int i_nlink; 
		unsigned int __i_nlink; 
	}; 
	dev_t i_rdev; 
	loff_t i_size; 
	struct timespec i_atime; 
	struct timespec i_mtime; 
	struct timespec i_ctime; 
	spinlock_t i_lock; 
	unsigned short i_bytes; 
	unsigned int i_blkbits; 
	blkcnt_t i_blocks; 
	… 
	struct hlist_node i_hash; 
	struct list_head i_io_list; 
	… 
	struct list_head i_lru; 
	struct list_head i_sb_list; 
	struct list_head i_wb_list; 
	union {
    
     
		struct hlist_head i_dentry; 
		struct rcu_head i_rcu; 
	}; 
	u64 i_version; 
	atomic_t i_count; 
	atomic_t i_dio_count; 
	atomic_t i_writecount; 
#ifdef CONFIG_IMA 
	atomic_t i_readcount; 
#endif 
	const struct file_operations *i_fop; 
	struct file_lock_context *i_flctx; 
	struct address_space i_data; 
	struct list_head i_devices; 
	union {
    
     
		struct pipe_inode_info *i_pipe; 
		struct block_device *i_bdev; 
		struct cdev *i_cdev; 
		char *i_link; 
		unsigned i_dir_seq; 
	};void *i_private; 
	}; 

i_mode is the file type and access rights, i_uid is the identifier of the user who created the file, and i_gid is the group identifier to which the user who created the file belongs.

i_ino is the number of the inode.

i_size is the file length; i_blocks is the number of blocks in the file, that is, the quotient of dividing the file length by the block length; i_bytes is the remainder of dividing the file length by the block length; i_blkbits to the power.

i_atime (access time) is the last time the file was accessed, i_mtime (modified time) is the last time the file data was modified, and i_ctime (change time) is the last time the file index node was modified.

i_sb points to the superblock of the file system to which the file belongs.

i_mapping points to the address space of the file.

i_count is the reference count of the inode and i_nlink is the hard link count.

If the file type is a character device file or a block device file, then i_rdev is the device number, i_bdev points to the block device, and i_cdev points to the character device.

Files are divided into the following types.
(1) Ordinary file (regular file): It is what we usually call a file, and it is a file in a narrow sense.
(2) Directory: A directory is a special file whose data is composed of directory entries, and each directory entry
stores the name of a subdirectory or file and the corresponding index node number.
(3) Symbolic link (also called soft link): The data of this kind of file is the path of another file.
(4) Character device files.
(5) Block device files.
(6) Named pipes (FIFOs).
(7) socket (socket).
Character device files, block device files, named pipes, and sockets are special files that have only inodes and no data. Character device files and block device files are used to store device numbers, and the device numbers are directly stored in inodes.

The kernel supports two kinds of linking.
(1) Soft link, also known as symbolic link, the data of this file is the path of another file.
(2) Hard links are equivalent to giving multiple names to a file, and multiple file names correspond to the same index node, and the member i_nlink of the index node is the hard link count.

The member i_op of the index node points to the index node operation set inode_operations, and the member i_fop points to the file operation set file_operations. The difference between the two is: inode_operations is used to manipulate directories (create or delete files in a directory) and file attributes, and file_operations is used to access file data. The data structure of the index node operation set is the structure inode_operations, the main members are as follows:

include/linux/fs.h 
struct inode_operations {
    
     
	struct dentry * (*lookup) (struct inode *,struct dentry *, unsigned int); 
	const char * (*get_link) (struct dentry *, struct inode *, struct delayed_call *); 
	int (*permission) (struct inode *, int); 
	struct posix_acl * (*get_acl)(struct inode *, int); 
	
	int (*readlink) (struct dentry *, char __user *,int); 
	
	int (*create) (struct inode *,struct dentry *, umode_t, bool); 
	int (*link) (struct dentry *,struct inode *,struct dentry *); 
	int (*unlink) (struct inode *,struct dentry *); 
	int (*symlink) (struct inode *,struct dentry *,const char *); 
	int (*mkdir) (struct inode *,struct dentry *,umode_t); 
	int (*rmdir) (struct inode *,struct dentry *); 
	int (*mknod) (struct inode *,struct dentry *,umode_t,dev_t); 
	int (*rename) (struct inode *, struct dentry *, struct inode *, struct dentry *, unsigned int); 
	int (*setattr) (struct dentry *, struct iattr *); 
	int (*getattr) (const struct path *, struct kstat *, u32, unsigned int); 
	ssize_t (*listxattr) (struct dentry *, char *, size_t); 
	int (*fiemap)(struct inode *, struct fiemap_extent_info *, u64 start,u64 len); 
	int (*update_time)(struct inode *, struct timespec *, int); 
	int (*atomic_open)(struct inode *, struct dentry *, struct file *, unsigned open_flag, umode_t create_mode, int *opened); 
	int (*tmpfile) (struct inode *, struct dentry *, umode_t); 
	int (*set_acl)(struct inode *, struct posix_acl *, int); 
} ____cacheline_aligned;

The lookup method is used to find files in a directory.

The system calls open and creat call the create method to create an ordinary file, the system call link calls the link method to create a hard link, the system call symlink calls the symlink method to create a symbolic link, the system call mkdir calls the mkdir method to create a directory, and the system call mknod calls mknod method to create character device files, block device files, named pipes, and sockets.

The system call unlink calls the unlink method to delete a hard link, and the system call rmdir calls the rmdir method to delete a directory.
The system call rename calls the rename method to rename a file.
The system call chmod calls the setattr method to set the attributes of the file, and the system call stat calls the getattr method to read the file attributes.
The system call listxattr calls the listxattr method to list all extended attributes of a file.

catalog entry

The file system regards a directory as a file, and the data of this file is composed of directory entries, and each directory entry stores the name of a subdirectory or file and the corresponding index node number.

When the kernel accesses a directory entry on the storage device, a copy of the directory entry will be created in memory: the structure dentry, the main members are as follows:

include/linux/dcache.h 
struct dentry {
    
     
	/* RCU查找访问的字段 */ 
	unsigned int d_flags; 
	seqcount_t d_seq; 
	struct hlist_bl_node d_hash; 
	struct dentry *d_parent; 
	struct qstr d_name; 
	struct inode *d_inode; 
	unsigned char d_iname[DNAME_INLINE_LEN]; 
	
	/* 引用查找也访问下面的字段 */ 
	struct lockref d_lockref; 
	const struct dentry_operations *d_op; 
	struct super_block *d_sb; 
	unsigned long d_time; 
	void *d_fsdata; 
	
	union {
    
     
		struct list_head d_lru; 
		wait_queue_head_t *d_wait; 
	}; 
	struct list_head d_child; 
	struct list_head d_subdirs; 
	/*
	* d_alias和d_rcu可以共享内存
	*/ 
	union {
    
     
		struct hlist_node d_alias; 
		struct hlist_bl_node d_in_lookup_hash; 
		struct rcu_head d_rcu; 
	} d_u; 
}; 

d_name stores the file name, qstr is a string wrapper, stores the address, length and hash value of the string; if the file name is relatively short, store the file name in d_iname; d_inode points to the index node of the file.

d_parent points to the parent directory, and d_child is used to add this directory to the subdirectory list of the parent directory.
d_lockref is reference counted.
d_op points to the collection of directory entry operations.
d_subdirs is a linked list of subdirectories.
d_hash is used to add directory entries to the hash table dentry_hashtable.
d_lru is used to add the directory entry to the Least Recently Used (LRU) list s_dentry_lru of the super block. When the
reference count of the directory entry decreases to 0, add the directory entry to the LRU list of the super block.
d_alias is used to link the directory entries corresponding to all hard links of the same file.

Taking the file "/a/b.txt" as an example, the relationship between directory entries and index nodes is shown in the figure.

insert image description here

The data structure of the directory entry operation set is the structure dentry_operations, and its code is as follows:

include/linux/dcache.h 
struct dentry_operations {
    
     
	int (*d_revalidate)(struct dentry *, unsigned int); 
	int (*d_weak_revalidate)(struct dentry *, unsigned int); 
	int (*d_hash)(const struct dentry *, struct qstr *); 
	int (*d_compare)(const struct dentry *, unsigned int, const char *, const struct qstr *); 
	int (*d_delete)(const struct dentry *); 
	int (*d_init)(struct dentry *); 
	void (*d_release)(struct dentry *); 
	void (*d_prune)(struct dentry *); 
	void (*d_iput)(struct dentry *, struct inode *); 
	char *(*d_dname)(struct dentry *, char *, int); 
	struct vfsmount *(*d_automount)(struct path *); 
	int (*d_manage)(const struct path *, bool); 
	struct dentry *(*d_real)(struct dentry *, const struct inode *, unsigned int); 
} ____cacheline_aligned; 

d_revalidate is important for network file systems to confirm that directory entries are valid.
d_hash is used to calculate the hash value.
d_compare is used to compare the file names of two directory entries.
d_delete is used to judge whether the memory of the directory entry can be released when the reference count of the directory entry is reduced to 0.
d_release is used to call just before releasing memory for a directory entry.
d_iput is used to release the inode associated with the directory entry.

Open instance of file and open file table

When a process opens a file, the virtual file system will create an open instance of the file: file structure, the main members are as follows.

include/linux/fs.h 
struct file {
    
     
	union {
    
     
		struct llist_node fu_llist; 
		struct rcu_head fu_rcuhead; 
	} f_u; 
	struct path f_path; 
	struct inode *f_inode; 
	const struct file_operations *f_op; 
	
	spinlock_t f_lock; 
	atomic_long_t f_count; 
	unsigned int f_flags; 
	fmode_t f_mode; 
	struct mutex f_pos_lock; 
	loff_t f_pos; 
	struct fown_struct f_owner; 
	const struct cred *f_cred;void *private_data; 
	… 
	struct address_space *f_mapping; 
} __attribute__((aligned(4)));

(1) f_path stores the location of the file in the directory tree, the types are as follows:

struct path {
    
     
	struct vfsmount *mnt; 
	struct dentry *dentry; 
}; 

mnt points to the member mnt of the mount descriptor of the file system to which the file belongs, and dentry is the directory entry corresponding to the file.
(2) f_inode points to the index node of the file.
(3) f_op points to the file operation collection.
(4) f_count is the reference count of the file structure.
(5) f_mode is the access mode.
(6) f_pos is the file offset, that is, the position that the process is currently accessing.
(7) f_mapping points to the address space of the file.

The relationship between the open instance of the file and the index node is shown in the figure.

insert image description here

The main members of the file system information structure are as follows:

include/linux/fs_struct.h 
struct fs_struct {
    
     
	… 
	struct path root, pwd; 
};

Member root stores the root directory of the process, and member pwd stores the current working directory of the process.

Assuming that the system call chroot is called first, the directory "/a" is set as the root directory of the process, and then a child process is created, and the child process inherits the file system information of the parent process, then the range of directories that the child process can see is limited to the directory " /a" as the root subtree. When the child process opens the file "/b.txt" (the file path is an absolute path starting with "/"), the real file path is "/a/b.txt".

Assume that the system call chdir is called, and the directory "/c" is set as the current working directory of the process. When the child process opens the file "d.txt" (the file path is a relative path and does not start with "/"), the real file path It is "/c/d.txt".

The open file table is also called the file descriptor table. The data structure is shown in the figure. The structure files_struct is a wrapper for the open file table. The main members are as follows:

insert image description here

include/linux/fdtable.h 
struct files_struct {
    
     
	atomic_t count; 
	… 
	
	struct fdtable __rcu *fdt; 
	struct fdtable fdtab; 
	
	spinlock_t file_lock ____cacheline_aligned_in_smp; 
	unsigned int next_fd; 
	unsigned long close_on_exec_init[1]; 
	unsigned long open_fds_init[1]; 
	unsigned long full_fds_bits_init[1]; 
	struct file __rcu * fd_array[NR_OPEN_DEFAULT]; 
};

The member count is the reference count of the structure files_struct.
The member fdt points to the open file table.
When the process is just created, the member fdt points to the member fdtab. After running for a period of time, if the number of files opened by the process exceeds NR_OPEN_DEFAULT, the open file table will be expanded, and the fdtable structure will be redistributed, and the member fdt will point to the new fdtable structure.
The data structure of the open file table is as follows:

include/linux/fdtable.h 
struct fdtable {
    
     
	unsigned int max_fds; 
	struct file __rcu **fd; 
	unsigned long *close_on_exec; 
	unsigned long *open_fds; 
	unsigned long *full_fds_bits; 
	struct rcu_head rcu; 
}; 

(1) The member max_fds is the current size of the open file table, that is, the size of the file pointer array pointed to by the member fd. As the number of open files of the process increases, the open file table gradually expands.
(2) The member fd points to the file pointer array. When a process calls open to open a file, the returned file descriptor is an index into the array of file pointers.
(3) The member close_on_exec points to a bitmap indicating which file descriptors need to be closed when executing execve() to load a new program.
(4) The member open_fds points to the file descriptor bitmap, indicating which file descriptors are allocated.

Guess you like

Origin blog.csdn.net/weixin_43912621/article/details/131363802