Linux file system, understand that everything is a file

1. System file I/O

1.1 open

#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
int open(const char *pathname, int flags);
int open(const char *pathname, int flags, mode_t mode);
pathname: 要打开或创建的目标文件
flags: 打开文件时,可以传入多个参数选项,用下面的一个或者多个常量进行“或”运算,构成flags。

参数:
 O_RDONLY: 只读打开
 O_WRONLY: 只写打开
 O_RDWR : 读,写打开
 这三个常量,必须指定一个且只能指定一个
 O_CREAT : 若文件不存在,则创建它。需要使用mode选项,来指明新文件的访问权限
 O_APPEND: 追加写

返回值:
 成功:新打开的文件描述符
 失败:-1

Which open function to use depends on the specific application scenario. If the target file does not exist and needs to be created by open, the third parameter indicates the default permission to create the file. Otherwise, use the open with two parameters.

1.2 System calls and library functions

fopen fclose fread fwrite are all functions in the C standard library, which we call library functions (libc).

open close read write lseek are all interfaces provided by the system, which are called system call interfaces.

Therefore, it can be considered that the functions of the f# series are encapsulation of system calls to facilitate secondary development.

1.3 File descriptor (fd)

0 & 1 & 2

  • By default, a Linux process will have three open file descriptors, namely standard input 0, standard output 1, and standard error 2.
  • The physical devices corresponding to 0,1,2 are generally: keyboard, monitor, and display.

File descriptors are small integers starting from 0. When we open a file, the operating system creates a corresponding data structure in memory to describe the target file. So there is the file structure. Represents an open file object. The process executes the open system call, so the process and the file must be associated . Each process has a pointer *files, pointing to a table files_struct. The most important part of the table is that it contains an array of pointers. Each element is a pointer to an open file! So, essentially, the file descriptor is the subscript of the array. Therefore, as long as you hold the file descriptor, you can find the corresponding file

Input and output can also be done in the following ways: 

#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <string.h>
int main()
{
 char buf[1024];
 int ret = read(0, buf, sizeof(buf));
 if(s > 0){
 buf[s] = 0;
 write(1, buf, strlen(buf));
 write(2, buf, strlen(buf));
 }
 return 0;
}

Allocation rules for file descriptors (fd)

File descriptor allocation rules: In the files_struct array, find the smallest subscript that is not currently used as a new file descriptor.

Close the corresponding file opened by the file descriptor

close(fd)

Redirect

Common redirects are:>, >>, <

Using dup2 system call

#include <unistd.h>
int dup2(int oldfd, int newfd);

Add redirection function in myshell

principle:

Printf is an IO function in the C library. It usually outputs to stdout. However, when stdout accesses files at the bottom level, it still looks for fd:1, but at this time, the content represented by the fd:1 subscript has become the address of myfile. , is no longer the address of the monitor file, so any output messages will be written to the file, thereby completing output redirection.

FILE

Because IO-related functions correspond to system call interfaces, and library functions encapsulate system calls, in essence, files are accessed through fd. Therefore, fd must be encapsulated inside the FILE structure in the C library.

#include <stdio.h>
#include <string.h>
int main()
{
 const char *msg0="hello printf\n";
 const char *msg1="hello fwrite\n";
 const char *msg2="hello write\n";
 printf("%s", msg0);
 fwrite(msg1, strlen(msg0), 1, stdout);
 write(1, msg2, strlen(msg2));
 fork();
 return 0;
}

We found that both printf and fwrite (library function) output twice, while write only output once (system call). why? It must be related to fork!

  • Generally, C library functions are fully buffered when writing to files, while writing to the display is line buffering.
  • The printf fwrite library function will have its own buffer (as the progress bar example can illustrate). When redirection to a normal file occurs, the data buffering method changes from line buffering to full buffering.
  • The data we put in the buffer will not be refreshed immediately. Even after the fork but after the process exits, it will be refreshed uniformly and written to the file.
  • But when forking, the parent-child data will be copied on write, so when your parent process is ready to refresh, the child process will also have the same data, and then two copies of data will be generated.
  • There is no change in write, indicating that there is no so-called buffering.

Therefore, the printf fwrite library function will have its own buffer, but the write system call does not have a buffer. In addition, the buffers we are talking about here are all user-level buffers. In fact, in order to improve the performance of the entire machine, the OS will also provide relevant kernel-level buffers. Who provides this buffer zone? printf fwrite is a library function, and write is a system call. The library function is in the "upper layer" of the system call and is the "encapsulation" of the system call. However, write does not have a buffer, while printf fwrite does. It is enough to show that the buffer is a secondary buffer. The addition is provided by the C standard library because it is C.

2. File system

ls -l reads file information stored on disk and displays it

File metadata:

  • model
  • Number of hard links
  • File owner
  • Group
  • size
  • Last Modified

You can see more information with the stat command

acm:

  • Access last access time
  • Modify file content last modified time
  • Change property last modified time

Linux ext2 file system, the picture above is the disk file system diagram (the kernel memory image must be different), the disk is a typical block device, and the hard disk partition is divided into blocks. The size of a block is determined during formatting and cannot be changed. For example, the -b option of mke2fs can set the block size to 1024, 2048 or 4096 bytes. The size of the Boot Block in the picture above is determined.

  • Block Group: The ext2 file system will be divided into several Block Groups according to the size of the partition. Each Block Group has the same structural composition. Examples of government management of various districts
  • Super Block: Stores the structural information of the file system itself. The recorded information mainly includes: the total amount of bolts and inodes, the number of unused blocks and inodes, the size of a block and inode, the most recent mount time, the most recent data writing time, and the most recent disk check time. and other file system related information. The information in the Super Block is destroyed, and it can be said that the entire file system structure is destroyed.
  • GDT, Group Descriptor Table: Block group descriptor, describing block group attribute information. Interested students can learn more about it
  • Block Bitmap: Block Bitmap records which data block in the Data Block has been occupied and which data block has not been occupied.
  • inode bitmap: Each bit indicates whether an inode is free and available.
  • i-node table: stores file attributes such as file size, owner, last modification time, etc.
  • Data area: stores file content

There are four main operations to create a new file:

1. The storage attribute 
kernel first finds an idle i node (here is 263466). The kernel records file information into it.
2. To store data, 
the file needs to be stored in three disk blocks, and the kernel found three free blocks: 300, 500, 800. Copy the first block of data in the kernel buffer
to 300, the next block to 500, and so on.
3. Record the distribution situation.
The file contents are stored in order 300, 500, 800. The kernel records the above block list in the disk distribution area on the inode.
4. Add filename to directory

The new file name is hello. How does Linux record this file in the current directory? The kernel adds the entry (131074, hello) to the directory file. The correspondence between the file name and the inode connects the file name to the file's content and attributes.

Soft and hard links:

hard link

We can understand it as a "pointer to the original file inode", and the system does not allocate independent inodes and files for it. Therefore, the hard link file and the original file are actually the same file, just with different names. Every time we add a hard link, the number of inode links in the file will increase by 1; and only when the number of inode links in the file is 0, it will be completely deleted. In other words, since the hard link is actually a pointer to the inode of the original file, even if the original file is deleted, it can still be accessed through the hard link file.

Summarize:

  • 1. Hard links exist in the form of file copies. But it doesn't take up actual space.
  • 2. It is not allowed to create hard links to directories.
  • 3. Hard links can only be created in the same file system

Soft link (also called symbolic link)

Soft links only contain the pathname of the linked file, so they can link to directory files or across file systems. However, when the original file is deleted, the linked file will also become invalid. From this point of view, it has the same nature as the "shortcut" in the Windows system.

Summarize:

  • 1. Soft links exist in the form of paths. Similar to shortcuts in Windows operating systems
  • 2. Soft links can cross file systems, but hard links cannot
  • 3. A soft link can link to a non-existent file name
  • 4. Soft links can link directories

ln command

The ln command is used to create a link file in the format of "ln [option] target". Its available parameters and functions are as follows:
 

-b 删除,覆盖以前建立的链接

-d 允许超级用户制作目录的硬链接

-f 强制执行

-i 交互模式,文件存在则提示用户是否覆盖

-n 把符号链接视为一般目录

-s 软链接(符号链接)

-v 显示详细的处理过程

 Number of soft links

If the source file of the soft link is removed to another place, the soft link will become invalid. 

Soft links support nesting dolls

 

Number of hard links

After a hard link deletes the original file, its linked file can still be accessed. This is because the newly created hard link no longer relies on the name of the original file and other information. We can see that after the hard link is created, the hard link of the original file The number has increased to 4. If you want to delete it completely, the number of links must reach 0 before it is completely deleted.

You can add "hard" links to soft links. This increase will increase the number of soft links by 1.

 

Understanding soft and hard links

What actually finds the file on the disk is not the file name, but the inode. In fact, in Linux, multiple file names can correspond to the same inode. Hard links use the same inode! ! !

A hard link refers to another file through an inode, and a soft link refers to another file through a name.

Guess you like

Origin blog.csdn.net/m0_74234485/article/details/132629287