File operations are a headache for many programming beginners, no matter what programming language you learn, C/C++/Java/Go/Python, because the file-related operations we have learned are for you It is a black box, we only know how to use the corresponding file function to operate, but we don't know anything about the internal implementation, which makes many beginners very afraid of file operations.

In fact, this is not your problem, because file operations involve a lot of operating system-related knowledge, we can only have a deeper understanding of file operations from the perspective of the operating system, so let’s learn the basic IO operations in this Linuxarticle , to deepen our understanding of language-level file operations and systems.

Linux file operations

1. Preliminary knowledge about documents
2. System calls for reading and writing files under Linux
3. File descriptor
- 1. Understanding file descriptors
- 2. Allocation rules for file descriptors
Fourth, understand that everything is a file under Linux
5. Redirection
- The principle of redirection
- syscall dup2
Six, the understanding of the buffer zone

1. Preliminary knowledge about documents

File = content + attribute , the operation on the file is the operation on the file content and attribute.
When we do not open the file, the file is located on the disk. When the file is opened, the file needs to be loaded into the memory by the operating system. This is the limitation of the von Neumann architecture on our file operations!
The file is opened by the process!
The actual process will definitely open multiple files during the running process. For multiple files, our operating system needs to organize them for easy management and maintenance. So when the file is opened, the operating system must create the corresponding kernel data structure for the opened file
In order to manage and organize files, Linuxthe operating system defines a struct filestructure, which records various attributes of the file and the location of the file content. Each struct filestructure has a struct file*pointer, through which all our file objects Organized into a linked list, so our operation on the file becomes adding, deleting, checking and modifying the linked list.

insert image description here

2. System calls for reading and writing files under Linux

LinuxThere are mainly the following interfaces for the system calls about reading and writing files, and we will introduce them one by one below:
open close write read

1. open function

We open manthe manual 2and see openthe function prototype and introduction in the number directory:

insert image description here
We can see that openthe function has three header files, and we can't leave any one behind when we use it!
We also saw openthat there are two functions, here we introduce the first openfunction first.

The first parameter is the path name , this path can be a relative path or an absolute path !
The second parameter is a parameter of a bitmap structure , which we can use |to connect, thus passing multiple parameters at once.
Its parameters are mainly as follows:
- O_RDONLY read-only open (abbreviation for read only in English)
- O_WRONLY write-only open (abbreviation for write only in English)
- O_CREAT Create the file if it does not exist (English creat)
- O_TRUNC file will be cleared every time it is opened (English truncate)
- O_APPEND When writing a file, it is written in an additional way (append in English)
Return value: If the opening is successful, an integer greater than or equal to 0 will be returned. This integer is called a file identifier, which is used to calibrate a file. If the opening fails, -1 will be returned to indicate that the opening failed.

We first open a file with the first interface

#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#include<sys/types.h>
#include<sys /stat.h>
#include<fcnt1.h>

#define LOG  "mylog.txt"

int main()
{
    
    
	//由于是位图结构，我们传参时要用 | 将参数连接起来
	int fd = open(LOG, O_WRONLY | O_CREAT | O_TRUNC); //以写的方式打开文件 ,如果文件不存在就创建。
	if(fd == -1)
	{
    
    
		perror( "open fail");
		exit(-1);
	}
	return 0;
}

Open for the first time:
insert image description here

The permission to open the newly created file here is a bit strange, we delete it and run the program again:

insert image description here
The permission to open the newly created file is still a bit strange, and it is different from the permission of the last created file. Let’s delete it again and run the program again:

insert image description here

This time it was even more strange. Why?

Because the first function we use opendoes not specify what the file permissions are after creation, so the created file permissions are a random value, then we have to consider the second openfunction!

The third parameter: mode_tit is an unsigned integer, and our third parameter can pass in the permission of the file we created in octal.

Modifying our code above:

#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#include<sys/types.h>
#include<sys /stat.h>
#include<fcnt1.h>

#define LOG  "mylog.txt"

int main()
{
    
    
	//由于是位图结构，我们传参时要用 | 将参数连接起来
	int fd = open(LOG, O_WRONLY | O_CREAT | O_TRUNC, 0666); //创建的文件权限是 -rw-rw-rw-。
	if(fd == -1)
	{
    
    
		perror( "open fail");
		exit(-1);
	}
	return 0;
}

operation result:

insert image description here
The end result is not: -rw-rw-rw-why is this?

This is because Linuxthe permission mask of our ordinary user affects the permission of our file creation, we can use the system call umask()to set the permission mask of our process.

insert image description here

Running the following code, we get the result we want:

#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#include<sys/types.h>
#include<sys /stat.h>
#include<fcnt1.h>

#define LOG  "mylog.txt"

int main()
{
    
    
	umask(0)	//将权限掩码设置为0
	//由于是位图结构，我们传参时要用 | 将参数连接起来
	int fd = open(LOG, O_WRONLY | O_CREAT| O_TRUNC, 0666); //创建的文件权限是 -rw-rw-rw-。
	if(fd == -1)
	{
    
    
		perror( "open fail");
		exit(-1);
	}
	return 0;
}

insert image description here

2. close function

closeIt is to close the file. The file we open must be closed at the end, otherwise it will occupy system resources, so the code above is actually not standardized.

insert image description here

Parameters: The file identifier fd to close.
Return value: If the execution is successful, return 0, if the execution fails, return -1, and set the error code.

LinuxThe following are standard open and close operations using system calls.

#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#include<unistd.h>
#include<sys/types.h>
#include<sys /stat.h>
#include<fcnt1.h>

#define LOG  "mylog.txt"

int main()
{
    
    
	umask(0)	//将权限掩码设置为0
	//打开文件
	int fd = open(LOG, O_WRONLY | O_CREAT | O_TRUNC, 0666); //创建的文件权限是 -rw-rw-rw-。
	if(fd == -1)
	{
    
    
		perror( "open fail");
		exit(-1);
	}
	
	//关闭文件
	close(fd);
	return 0;
}

3. write function

writeThe function is the write function, through which we can write data to the file.
insert image description here

The first argument is the file identifier, which is the file to write to.
The second parameter is a pointer to the data to be written
The third parameter is a variable indicating how many bytes of data to write at most
Return value: the number of bytes actually written.

Note: It should be noted that when we use the system call, do not input the C language \0into the file.

#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#include<unistd.h>
#include<sys/types.h>
#include<sys /stat.h>
#include<fcnt1.h>

#define LOG  "mylog.txt"

int main()
{
    
    
	umask(0)	//将权限掩码设置为0
	//打开文件
	int fd = open(LOG, O_WRONLY | O_CREAT| O_TRUNC, 0666); //创建的文件权限是 -rw-rw-rw-。
	if(fd == -1)
	{
    
    
		perror( "open fail");
		exit(-1);
	}
	
	const char* str1 = "hello world!\n" ;
	for(int i = 0; i < 5; ++i)
	{
    
    
		ssize_t n = write(fd, str1, strlen(str1));//strlen()后面不要 +1
		if(n < 0)
		{
    
    
			perror( "write fail: ");
			exit( -1);
		}
	}

	//关闭文件
	close(fd);
	return 0;
}

operation result

insert image description here

4. read function

readThe function is a read function that reads data from a file into a variable.

insert image description here

The first parameter is: the file identifier of the file to read.
The second parameter is: pointer, which points to a space where data can be stored.
The third parameter is: the number of bytes to read.
Return value: the number of bytes actually read.

Note: readThe function will read the newline character ( \n) and treat it as an ordinary character, and will only stop reading when the specified number of bytes is read or the end of the file is encountered.

#include<stdio.h>    
#include<stdlib.h>    
#include<string.h>    
#include<unistd.h>    
#include<sys/types.h>    
#include<sys/stat.h>    
#include<fcntl.h>    
    
    
#define LOG "mylog.txt"    

int main()    
{
    
        
  int fd = open(LOG, O_RDONLY);    
  if(fd == -1)    
  {
    
        
    perror("open fail");                                                                                                                                       
  }    
   
  //定义缓冲区 
  char str2[64];    
  
  for(int i = 0; i < 5; ++i)    
  {
    
        
    ssize_t  n = read(fd, str2, sizeof(str2) - 1);    
    if(n < 0)    
    {
    
        
      perror("write fail: ");    
      exit(-1);    
    }
    //注意在读取到的字符串后面加'\0'才是C风格的字符串    
    str2[n] = '\0';    
    printf("%s",str2);    
   }    
	close(fd);
}

insert image description here

3. File descriptor

We know that the file operation function of the C language is a C library function, and the relationship between the library function and the system call is: the library function encapsulates the system call .

insert image description here

We also know: A process will open three streams by default, standard input stream, standard output stream, and standard error stream.

These three streams are by default in C/C++:

C:stdin stdout stderr
in C++cin cout cerr

1. Understanding file descriptors

We know that after C language opens a file, it will return a FILE*pointer, and our three standard streams of C are also FILE*pointers, and they point to the keyboard file , display file , and display file respectively . When we try to input data to the standard output stream or standard error stream, we are actually writing data to the display file, and when we try to read data from the standard input stream, we are actually reading data from the keyboard file.

Since it is a file, file descriptors must be used Linuxfor file operations. Then the three streams opened by default in our C program (and programs written in other programming languages) also have file descriptors. Only with file descriptors Our system can find the corresponding file.

LinuxIn fact, the file descriptors corresponding to the three streams opened by default in the process formed by the C program (and programs written in other programming languages) are:

flow	file descriptor
standard input stream	0
standard output stream	1
standard error stream	2

Observing the numerical rules of their file descriptors, we can think about our process opening a file after opening the default three streams, so what is the file descriptor of this file?

#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#include<unistd.h>
#include<sys/types.h>
#include<sys/stat.h>
#include<fcntl.h>


#define LOG "mylog.txt"

int main()
{
    
    
    int fd = open(LOG, O_WRONLY | O_CREAT | O_TRUNC, 0666);
    if(fd == -1)
    {
    
        
      perror("open fail");    
      exit(-1);
    }    
    //打印出文件标识符
    printf("%d\n",fd);   
    close(fd);         
    return 0;
}

insert image description here

correct! It's 3! You may think that this is similar to the subscript of the array! In fact, the file descriptor is the subscript of the array !

We have said before that the operating system needs to create an object for the file we open struct fileto organize and manage the opened file. We also know that the file is opened by the process, so the process must know struct filewhere the operating system created the file, so in In the kernel data structure, there is also a table, which records the location of each opened file struct file.

So the operating system will create a structure for us struct file_struct. There is a pointer in this structure struct file* fd_array, which points to an array of pointers. The address of the object is stored in the array of pointers struct file, and the file descriptor we get is actually the subscript of the array of pointers.

The PCB of our process ( Linuxbelow task_struct) stores a struct file_struct*pointer. Through this pointer, we can find struct file_structthe structure. Through this structure, we can find the pointer array of the file descriptor. Through the pointer array, we can find struct filethe object. Through this object, we can find the file content data.

insert image description here

2. Allocation rules for file descriptors

With the understanding of file descriptors, we will discuss the next question, how are file descriptors allocated?

With the above experience, we may guess that the allocation rule of the file descriptor is: in files_structthe array, find the smallest subscript that is not currently used as a new file descriptor.

So is it actually the case? We can conduct experiments to determine our results. Suppose we close the standard input stream when opening a new file, then the file identifier at position 0 will be emptied, and we will open the file again to see the newly opened file. Identifier is not 0.

Experiment code:

#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#include<unistd.h>
#include<sys/types.h>
#include<sys/stat.h>
#include<fcntl.h>


#define LOG "mylog.txt"

int main()
{
    
    
	close(0);
    int fd = open(LOG, O_WRONLY | O_CREAT | O_TRUNC, 0666);
    if(fd == -1)
    {
    
        
      perror("open fail");    
      exit(-1);
    }    
    //打印出文件标识符
    printf("%d\n",fd);   
    close(fd);         
    return 0;
}

insert image description here

The results show that our conjecture is correct!

Fourth, understand that everything is a file under Linux

Most people who have studied Linuxhave heard a sentence: " LiunxEverything is a file", which is very important for file operations. Let's talk about the understanding of this sentence.

We know things like: keyboards, monitors, network cards... These are all hardware, not files at all, why can we treat them as files?

First of all, when we operate the hardware, we operate the hardware through the driver , and the drivers for different hardware must be different. For example, the keyboard can only be read but not written, and the display can only be written. , and cannot be read, the network card can be both read and written...

Although their drivers are different, we can design a class to encapsulate them. After the encapsulation, the above caller will not be able to see the underlying content. The above caller only needs to operate on this class. The same effect can be achieved, so that for the caller, its operations are all operating on this class, and it has no access to the underlying drivers and hardware. Of course, for it, what it sees is what it is.

So the function pointers of these drivers are encapsulated Linuxwith struct filea structure, so that the process uses the keyboard and display as if it is using a file, so whether it is an ordinary file or hardware, it is a file for the process.

And our users use the operating system through processes, so the perspective of our users and the perspective of processes are the same. From the perspective of our users, if we want the disk to save some data for us, we only need to save Just click the file, we don’t need to take out the disk, and then use some physical means to engrave data into the disk, the process is the same as our users, it also thinks that I only need to save the file to complete the task, so for our users As far as processes are concerned, everything is a file, and we can only access files, so Linuxeverything below is a file.

insert image description here

5. Redirection

Before talking about redirection, let's talk about the C languageFILE

When we use C language to open a file, the system will give us a FILEpointer, so FILEwhat is this pointer? Who provided it to us?

The answer is: it is provided by the C language. This FILEis actually a structure encapsulated by a C library, and there must be a file descriptor inside the structure, fdbecause the IO-related functions correspond to the system call interface, and the library function Encapsulates system calls, so essentially, accessing files is fddone via access. FILETherefore , the inside of the structure in the C library must be encapsulated fd.

In the internal source code of the C library there is some source code like this:

//将 _IO_FILE 重命名为FILE
typedef struct _IO_FILE FILE; //在/usr/include/stdio.h

struct _IO_FILE {
    
    
int _flags; 
// ......

struct _IO_FILE *_chain;
int _fileno; //封装的文件描述符

//......
};

Through this source code, we know that FILEthere is a file descriptor _filenocalled , then we can stdin stdout stderrprint out the file descriptor to see if it is the same as our conclusion above.

Print the file descriptors of the three standard streams

#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#include<unistd.h>
#include<sys/types.h>
#include<sys/stat.h>
#include<fcntl.h>


#define LOG "mylog.txt"

int main()
{
    
    
  //打印出文件标识符
  printf("%d\n", stdin->_fileno);    
  printf("%d\n", stdout->_fileno);    
  printf("%d\n", stderr->_fileno);    
  close(fd);         
  return 0;
}

insert image description here

The results are the same as our previous conclusions.

The principle of redirection

Input redirection
Look at the following piece of code, we can try if we close1the file descriptor, then we open a file, and then westdoutenter some data into it, and see what happens? Or print to the monitor?

#include<stdio.h>    
#include<stdlib.h>    
#include<string.h>    
#include<unistd.h>    
#include<sys/types.h>    
#include<sys/stat.h>    
#include<fcntl.h>    
    
    
#define LOG "mylog.txt"    
//#define N 64    
int main()    
{
    
        
  //关闭标准输出流
  close(1);    
  int fd = open(LOG, O_WRONLY | O_CREAT | O_TRUNC);    
  if(fd < 0)    
  {
    
        
    perror("open fail:");    
    exit(-1);                                                                                                                                                  
  }    
  printf("you can see me?\n");    
  printf("you can see me?\n");    
  printf("you can see me?\n");    
  printf("you can see me?\n");    
    
  return 0;    
}

insert image description here

The answer is that it is not printed to the monitor, but printed to the file. I believe you have understood with the previous foundation. After we close, the newly stdoutopened file occupies 1the number file descriptor, and our printffunction only recognizes 1the number The file descriptor, so 1inputting content to the file pointed to by the file descriptor will cause the data to be input into the file!

insert image description here

Append redirect
The principle of append redirect is very simple, we only need to add andO_APPENDremoveO_TRUNC.

For example, to redirect the file just now:

#include<stdio.h>    
#include<stdlib.h>    
#include<string.h>    
#include<unistd.h>    
#include<sys/types.h>    
#include<sys/stat.h>    
#include<fcntl.h>    
    
    
#define LOG "mylog.txt"    
//#define N 64    
int main()    
{
    
        
    
  close(1);    
  int fd = open(LOG, O_WRONLY | O_CREAT | O_APPEND);    
  if(fd < 0)    
  {
    
        
    perror("open fail:");    
    exit(-1);    
  }    
  printf("this is append\n");    
  printf("this is append\n");    
  printf("this is append\n");                                                                                                                                    
    
  return 0;

insert image description here

Input redirection
Similarly, we0close the number file identifier, and then open our new filescanf, then we should read data from the newly opened file, let's take a look at the result:

#include<stdio.h>    
#include<stdlib.h>    
#include<string.h>    
#include<unistd.h>    
#include<sys/types.h>    
#include<sys/stat.h>    
#include<fcntl.h>    
    
    
#define LOG "mylog.txt"    
//#define N 64    
int main()    
{
    
        
  close(0);    
  int fd = open(LOG, O_RDONLY);    
  if (fd < 0)    
  {
    
        
    perror("open fail:");    
    exit(-1);    
  }    
  int a;    
  char c;    
  scanf("%d %c",&a, &c);                                                                                                                                       
  printf("%d %c\n", a, c);    
  return 0;    
}

insert image description here

The result is in line with our expectations!

The principle of redirection: In the case where the upper layer cannot perceive, within the operating system, change the direction of the specific subscript in the file descriptor table corresponding to the process!!!

According to these principles, we implement a requirement: to split the information of the standard output stream and the standard error stream.

Analysis: We know that the standard input stream and the standard output stream actually open display files. If we directly use the standard output and standard error streams together, it will cause error information and correct information to be mixed together, making it difficult for us to find the error. .

We can use redirection for shunting. We first close 1the number file descriptor, then open a new file normal.txt, then close 2the number file descriptor, and then open a new file error.txtso that when we use the standard input or standard error stream, the information will be Write in two different files, we can open and error.txtview if we care about error information, and open and normal.txtview if we care about correct information.

When originally no split:

#include<stdio.h>    
    
int main()    
{
    
        
  fprintf(stdout, "stdout->normal\n");                                                                                                                         
  fprintf(stdout, "stdout->normal\n");    
  fprintf(stdout, "stdout->normal\n");    
  fprintf(stdout, "stdout->normal\n");    
    
    
  fprintf(stderr, "stderr->error\n");    
  fprintf(stderr, "stderr->error\n");    
  fprintf(stderr, "stderr->error\n");    
    
  return 0;    
}

WRONG INFORMATION MIXED WITH RIGHT INFORMATION! !

insert image description here

To stream:

#include<stdio.h>    
#include<sys/types.h>    
#include<sys/stat.h>    
#include<fcntl.h>    
#include<unistd.h>    
    
int main()    
{
    
        
  umask(0);    
  close(1);    
  open("normal.txt", O_WRONLY | O_CREAT | O_TRUNC, 0666);    
  close(2);    
  open("error.txt", O_WRONLY | O_CREAT | O_TRUNC, 0666);                                                                                                       
  fprintf(stdout, "stdout->normal\n");    
  fprintf(stdout, "stdout->normal\n");    
  fprintf(stdout, "stdout->normal\n");    
  fprintf(stdout, "stdout->normal\n");    
    
    
  fprintf(stderr, "stderr->error\n");    
  fprintf(stderr, "stderr->error\n");    
  fprintf(stderr, "stderr->error\n");    
  close(1);
  close(2); 
  return 0;    
}

Triage complete!
insert image description here

syscall dup2

In fact, it is inconvenient for us to manually close the above operation. For the above operation, Linuxwe are provided with a system call dup2. Its function is to overwrite the second identifier with the address in the first identifier. address in . So as to achieve the purpose of redirection.

insert image description here

First parameter: The parameter to keep.
Second parameter: The parameter to be overwritten.
Return value: If successful, return the second file indicator, and if failed, return -1 and set the error code.

For the shunt code above we can:

#include<stdio.h>      
#include<sys/types.h>                                                         
#include<sys/stat.h>      
#include<fcntl.h>      
#include<unistd.h>      
      
int main()                                          
{
    
          
  umask(0);      
  int fd1 = open("normal.txt", O_WRONLY | O_CREAT | O_TRUNC, 0666);      
  int fd2 = open("error.txt", O_WRONLY | O_CREAT | O_TRUNC, 0666); 
       
  int n1 = dup2(fd1, 1);    
  int n2 = dup2(fd2, 2);
  
  //将dup2的返回值打印进文件中    
  printf("%d %d\n", n1, n2);
                                                                                                                                     
  fprintf(stdout, "stdout->normal\n");                                                                                                     
  fprintf(stdout, "stdout->normal\n");                                                                                                     
  fprintf(stdout, "stdout->normal\n");                                                                                                     
  fprintf(stdout, "stdout->normal\n");                                                                                                     
                                                                                                                                           
                                                                                                                                           
  fprintf(stderr, "stderr->error\n");                                                                                                      
  fprintf(stderr, "stderr->error\n");                                                                                                      
  fprintf(stderr, "stderr->error\n");  
                                                                                                      
  close(fd1);
  close(fd2);                                                                                                                                         
  return 0;                                                                                                                                
}

insert image description here

Six, the understanding of the buffer zone

When we used to learn the file operation of C language, we all knew FILEthat there should be a buffer in it. Now when we learn the operating system, we also know that there is also a buffer in the operating system kernel. Are these two buffers the same? ?

We can't answer this question right now, so we can only give a conclusion first: it is different, FILEit is a structure provided to us by the C library, and the buffer inside corresponds to the buffer in user mode, and linuxthe buffer in the kernel The area corresponds to the buffer in the kernel state.

Let's look at the following code first, analyze the problem according to the phenomenon, and finally understand the buffer.

#include<stdio.h>      
#include<unistd.h>      
#include<string.h>      

      
int main()      
{
    
          
  printf("printf : hello world!\n");      
      
  const char* str = "write: hello world!\n";      
  write(1, str, strlen(str));        
  
  //创建子进程    
  fork();                                                                                                                                                      
  return 0;      
}

result:

insert image description here

We found that when we run directly and after redirection, the results are different, and printf()it will be writeprinted one more time, why?

In fact, this has something to do with the buffer of the C library! Where is the buffer? When you open the file with fopen, you will get the FILE structure, and the buffer is in this FILE structure!!

insert image description here

The C library will combine certain refresh strategies to write the data in our buffer to the operating system (via write (FILE->fd,xXXX) );

no buffer
Line buffering (refresh strategy adopted by the monitor: line buffering)
Full buffering (refresh strategy adopted by ordinary files: full buffering)

Through this picture, we can well know why the above situation occurs.

At runtime, printfthe function uses the display file, so the code is immediately refreshed by the C library to the buffer in the operating system kernel after running. The writefunction itself writes data into the operating system kernel, so it also writes the data After reaching the buffer in the operating system kernel, the data content in the buffer will be cleared forklater , but there is no data in the buffer itself, the data cannot be output, and the process is over.FILEFILE

However, when it becomes a redirection, since printfthe file used by the function becomes a normal file, the refresh method of the data becomes a full buffer, so printfthe data is temporarily stored FILEin the buffer after the code runs, and writethe function written The data is directly written into the system kernel. After the program runs, FILEthe internal buffer needs to be cleaned (there is data in the buffer at this time), but the process changes from one to two, and the buffer needs to be cleaned twice, so log.txtThere are two printfprinted contents inside.

Why is there a buffer in the C library FILE?

The answer is: save the caller's time! If we want to directly write data to the operating system kernel, we need to call the system call, and the cost of using the system call is much higher than that of ordinary functions, so in order to use the system as little as possible Call, try to read and write more data in one IO, so there is a buffer FILEinside .

【Linux】Linux file operation

Linux file operations

1. Preliminary knowledge about documents

2. System calls for reading and writing files under Linux

1. open function

2. close function

3. write function

4. read function

3. File descriptor

1. Understanding file descriptors

2. Allocation rules for file descriptors

Fourth, understand that everything is a file under Linux

5. Redirection

The principle of redirection

syscall dup2

Six, the understanding of the buffer zone

Guess you like