File operations are a headache for many programming beginners, no matter what programming language you learn, C/C++/Java/Go/Python, because the file-related operations we have learned are for you It is a black box, we only know how to use the corresponding file function to operate, but we don't know anything about the internal implementation, which makes many beginners very afraid of file operations.
In fact, this is not your problem, because file operations involve a lot of operating system-related knowledge, we can only have a deeper understanding of file operations from the perspective of the operating system, so let’s learn the basic IO operations in thisLinux
article , to deepen our understanding of language-level file operations and systems.
Linux file operations
1. Preliminary knowledge about documents
- File = content + attribute , the operation on the file is the operation on the file content and attribute.
- When we do not open the file, the file is located on the disk. When the file is opened, the file needs to be loaded into the memory by the operating system. This is the limitation of the von Neumann architecture on our file operations!
- The file is opened by the process!
- The actual process will definitely open multiple files during the running process. For multiple files, our operating system needs to organize them for easy management and maintenance. So when the file is opened, the operating system must create the corresponding kernel data structure for the opened file
- In order to manage and organize files,
Linux
the operating system defines astruct file
structure, which records various attributes of the file and the location of the file content. Eachstruct file
structure has astruct file*
pointer, through which all our file objects Organized into a linked list, so our operation on the file becomes adding, deleting, checking and modifying the linked list.
2. System calls for reading and writing files under Linux
Linux
There are mainly the following interfaces for the system calls about reading and writing files, and we will introduce them one by one below:
open
close
write
read
1. open function
We open man
the manual 2
and see open
the function prototype and introduction in the number directory:
We can see that open
the function has three header files, and we can't leave any one behind when we use it!
We also saw open
that there are two functions, here we introduce the first open
function first.
-
The first parameter is the path name , this path can be a relative path or an absolute path !
-
The second parameter is a parameter of a bitmap structure , which we can use
|
to connect, thus passing multiple parameters at once.
Its parameters are mainly as follows:- O_RDONLY read-only open (abbreviation for read only in English)
- O_WRONLY write-only open (abbreviation for write only in English)
- O_CREAT Create the file if it does not exist (English creat)
- O_TRUNC file will be cleared every time it is opened (English truncate)
- O_APPEND When writing a file, it is written in an additional way (append in English)
-
Return value: If the opening is successful, an integer greater than or equal to 0 will be returned. This integer is called a file identifier, which is used to calibrate a file. If the opening fails, -1 will be returned to indicate that the opening failed.
We first open a file with the first interface
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#include<sys/types.h>
#include<sys /stat.h>
#include<fcnt1.h>
#define LOG "mylog.txt"
int main()
{
//由于是位图结构,我们传参时要用 | 将参数连接起来
int fd = open(LOG, O_WRONLY | O_CREAT | O_TRUNC); //以写的方式打开文件 ,如果文件不存在就创建。
if(fd == -1)
{
perror( "open fail");
exit(-1);
}
return 0;
}
Open for the first time:
The permission to open the newly created file here is a bit strange, we delete it and run the program again:
The permission to open the newly created file is still a bit strange, and it is different from the permission of the last created file. Let’s delete it again and run the program again:
This time it was even more strange. Why?
Because the first function we use open
does not specify what the file permissions are after creation, so the created file permissions are a random value, then we have to consider the second open
function!
- The third parameter:
mode_t
it is an unsigned integer, and our third parameter can pass in the permission of the file we created in octal.
Modifying our code above:
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#include<sys/types.h>
#include<sys /stat.h>
#include<fcnt1.h>
#define LOG "mylog.txt"
int main()
{
//由于是位图结构,我们传参时要用 | 将参数连接起来
int fd = open(LOG, O_WRONLY | O_CREAT | O_TRUNC, 0666); //创建的文件权限是 -rw-rw-rw-。
if(fd == -1)
{
perror( "open fail");
exit(-1);
}
return 0;
}
operation result:
The end result is not: -rw-rw-rw-
why is this?
This is because Linux
the permission mask of our ordinary user affects the permission of our file creation, we can use the system call umask()
to set the permission mask of our process.
Running the following code, we get the result we want:
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#include<sys/types.h>
#include<sys /stat.h>
#include<fcnt1.h>
#define LOG "mylog.txt"
int main()
{
umask(0) //将权限掩码设置为0
//由于是位图结构,我们传参时要用 | 将参数连接起来
int fd = open(LOG, O_WRONLY | O_CREAT| O_TRUNC, 0666); //创建的文件权限是 -rw-rw-rw-。
if(fd == -1)
{
perror( "open fail");
exit(-1);
}
return 0;
}
2. close function
close
It is to close the file. The file we open must be closed at the end, otherwise it will occupy system resources, so the code above is actually not standardized.
- Parameters: The file identifier fd to close.
- Return value: If the execution is successful, return 0, if the execution fails, return -1, and set the error code.
Linux
The following are standard open and close operations using system calls.
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#include<unistd.h>
#include<sys/types.h>
#include<sys /stat.h>
#include<fcnt1.h>
#define LOG "mylog.txt"
int main()
{
umask(0) //将权限掩码设置为0
//打开文件
int fd = open(LOG, O_WRONLY | O_CREAT | O_TRUNC, 0666); //创建的文件权限是 -rw-rw-rw-。
if(fd == -1)
{
perror( "open fail");
exit(-1);
}
//关闭文件
close(fd);
return 0;
}
3. write function
write
The function is the write function, through which we can write data to the file.
- The first argument is the file identifier, which is the file to write to.
- The second parameter is a pointer to the data to be written
- The third parameter is a variable indicating how many bytes of data to write at most
- Return value: the number of bytes actually written.
Note: It should be noted that when we use the system call, do not input the C language \0
into the file.
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#include<unistd.h>
#include<sys/types.h>
#include<sys /stat.h>
#include<fcnt1.h>
#define LOG "mylog.txt"
int main()
{
umask(0) //将权限掩码设置为0
//打开文件
int fd = open(LOG, O_WRONLY | O_CREAT| O_TRUNC, 0666); //创建的文件权限是 -rw-rw-rw-。
if(fd == -1)
{
perror( "open fail");
exit(-1);
}
const char* str1 = "hello world!\n" ;
for(int i = 0; i < 5; ++i)
{
ssize_t n = write(fd, str1, strlen(str1));//strlen()后面不要 +1
if(n < 0)
{
perror( "write fail: ");
exit( -1);
}
}
//关闭文件
close(fd);
return 0;
}
operation result
4. read function
read
The function is a read function that reads data from a file into a variable.
- The first parameter is: the file identifier of the file to read.
- The second parameter is: pointer, which points to a space where data can be stored.
- The third parameter is: the number of bytes to read.
- Return value: the number of bytes actually read.
Note: read
The function will read the newline character ( \n
) and treat it as an ordinary character, and will only stop reading when the specified number of bytes is read or the end of the file is encountered.
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#include<unistd.h>
#include<sys/types.h>
#include<sys/stat.h>
#include<fcntl.h>
#define LOG "mylog.txt"
int main()
{
int fd = open(LOG, O_RDONLY);
if(fd == -1)
{
perror("open fail");
}
//定义缓冲区
char str2[64];
for(int i = 0; i < 5; ++i)
{
ssize_t n = read(fd, str2, sizeof(str2) - 1);
if(n < 0)
{
perror("write fail: ");
exit(-1);
}
//注意在读取到的字符串后面加'\0'才是C风格的字符串
str2[n] = '\0';
printf("%s",str2);
}
close(fd);
}
3. File descriptor
We know that the file operation function of the C language is a C library function, and the relationship between the library function and the system call is: the library function encapsulates the system call .
We also know: A process will open three streams by default, standard input stream, standard output stream, and standard error stream.
These three streams are by default in C/C++:
- C:
stdin
stdout
stderr
- in C++
cin
cout
cerr
1. Understanding file descriptors
We know that after C language opens a file, it will return a FILE*
pointer, and our three standard streams of C are also FILE*
pointers, and they point to the keyboard file , display file , and display file respectively . When we try to input data to the standard output stream or standard error stream, we are actually writing data to the display file, and when we try to read data from the standard input stream, we are actually reading data from the keyboard file.
Since it is a file, file descriptors must be used Linux
for file operations. Then the three streams opened by default in our C program (and programs written in other programming languages) also have file descriptors. Only with file descriptors Our system can find the corresponding file.
Linux
In fact, the file descriptors corresponding to the three streams opened by default in the process formed by the C program (and programs written in other programming languages) are:
flow | file descriptor |
---|---|
standard input stream | 0 |
standard output stream | 1 |
standard error stream | 2 |
Observing the numerical rules of their file descriptors, we can think about our process opening a file after opening the default three streams, so what is the file descriptor of this file?
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#include<unistd.h>
#include<sys/types.h>
#include<sys/stat.h>
#include<fcntl.h>
#define LOG "mylog.txt"
int main()
{
int fd = open(LOG, O_WRONLY | O_CREAT | O_TRUNC, 0666);
if(fd == -1)
{
perror("open fail");
exit(-1);
}
//打印出文件标识符
printf("%d\n",fd);
close(fd);
return 0;
}
correct! It's 3! You may think that this is similar to the subscript of the array! In fact, the file descriptor is the subscript of the array !
We have said before that the operating system needs to create an object for the file we open struct file
to organize and manage the opened file. We also know that the file is opened by the process, so the process must know struct file
where the operating system created the file, so in In the kernel data structure, there is also a table, which records the location of each opened file struct file
.
So the operating system will create a structure for us struct file_struct
. There is a pointer in this structure struct file* fd_array
, which points to an array of pointers. The address of the object is stored in the array of pointers struct file
, and the file descriptor we get is actually the subscript of the array of pointers.
The PCB of our process ( Linux
below task_struct
) stores a struct file_struct*
pointer. Through this pointer, we can find struct file_struct
the structure. Through this structure, we can find the pointer array of the file descriptor. Through the pointer array, we can find struct file
the object. Through this object, we can find the file content data.
2. Allocation rules for file descriptors
With the understanding of file descriptors, we will discuss the next question, how are file descriptors allocated?
With the above experience, we may guess that the allocation rule of the file descriptor is: in files_struct
the array, find the smallest subscript that is not currently used as a new file descriptor.
So is it actually the case? We can conduct experiments to determine our results. Suppose we close the standard input stream when opening a new file, then the file identifier at position 0 will be emptied, and we will open the file again to see the newly opened file. Identifier is not 0.
Experiment code:
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#include<unistd.h>
#include<sys/types.h>
#include<sys/stat.h>
#include<fcntl.h>
#define LOG "mylog.txt"
int main()
{
close(0);
int fd = open(LOG, O_WRONLY | O_CREAT | O_TRUNC, 0666);
if(fd == -1)
{
perror("open fail");
exit(-1);
}
//打印出文件标识符
printf("%d\n",fd);
close(fd);
return 0;
}
The results show that our conjecture is correct!
Fourth, understand that everything is a file under Linux
Most people who have studied Linux
have heard a sentence: " Liunx
Everything is a file", which is very important for file operations. Let's talk about the understanding of this sentence.
We know things like: keyboards, monitors, network cards... These are all hardware, not files at all, why can we treat them as files?
First of all, when we operate the hardware, we operate the hardware through the driver , and the drivers for different hardware must be different. For example, the keyboard can only be read but not written, and the display can only be written. , and cannot be read, the network card can be both read and written...
Although their drivers are different, we can design a class to encapsulate them. After the encapsulation, the above caller will not be able to see the underlying content. The above caller only needs to operate on this class. The same effect can be achieved, so that for the caller, its operations are all operating on this class, and it has no access to the underlying drivers and hardware. Of course, for it, what it sees is what it is.
So the function pointers of these drivers are encapsulated Linux
with struct file
a structure, so that the process uses the keyboard and display as if it is using a file, so whether it is an ordinary file or hardware, it is a file for the process.
And our users use the operating system through processes, so the perspective of our users and the perspective of processes are the same. From the perspective of our users, if we want the disk to save some data for us, we only need to save Just click the file, we don’t need to take out the disk, and then use some physical means to engrave data into the disk, the process is the same as our users, it also thinks that I only need to save the file to complete the task, so for our users As far as processes are concerned, everything is a file, and we can only access files, so Linux
everything below is a file.
5. Redirection
Before talking about redirection, let's talk about the C languageFILE
When we use C language to open a file, the system will give us a FILE
pointer, so FILE
what is this pointer? Who provided it to us?
The answer is: it is provided by the C language. This FILE
is actually a structure encapsulated by a C library, and there must be a file descriptor inside the structure, fd
because the IO-related functions correspond to the system call interface, and the library function Encapsulates system calls, so essentially, accessing files is fd
done via access. FILE
Therefore , the inside of the structure in the C library must be encapsulated fd
.
In the internal source code of the C library there is some source code like this:
//将 _IO_FILE 重命名为FILE
typedef struct _IO_FILE FILE; //在/usr/include/stdio.h
struct _IO_FILE {
int _flags;
// ......
struct _IO_FILE *_chain;
int _fileno; //封装的文件描述符
//......
};
Through this source code, we know that FILE
there is a file descriptor _fileno
called , then we can stdin
stdout
stderr
print out the file descriptor to see if it is the same as our conclusion above.
Print the file descriptors of the three standard streams
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#include<unistd.h>
#include<sys/types.h>
#include<sys/stat.h>
#include<fcntl.h>
#define LOG "mylog.txt"
int main()
{
//打印出文件标识符
printf("%d\n", stdin->_fileno);
printf("%d\n", stdout->_fileno);
printf("%d\n", stderr->_fileno);
close(fd);
return 0;
}
The results are the same as our previous conclusions.
The principle of redirection
Input redirection
Look at the following piece of code, we can try if we close1
the file descriptor, then we open a file, and then westdout
enter some data into it, and see what happens? Or print to the monitor?
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#include<unistd.h>
#include<sys/types.h>
#include<sys/stat.h>
#include<fcntl.h>
#define LOG "mylog.txt"
//#define N 64
int main()
{
//关闭标准输出流
close(1);
int fd = open(LOG, O_WRONLY | O_CREAT | O_TRUNC);
if(fd < 0)
{
perror("open fail:");
exit(-1);
}
printf("you can see me?\n");
printf("you can see me?\n");
printf("you can see me?\n");
printf("you can see me?\n");
return 0;
}
The answer is that it is not printed to the monitor, but printed to the file. I believe you have understood with the previous foundation. After we close, the newly stdout
opened file occupies 1
the number file descriptor, and our printf
function only recognizes 1
the number The file descriptor, so 1
inputting content to the file pointed to by the file descriptor will cause the data to be input into the file!
Append redirect
The principle of append redirect is very simple, we only need to add andO_APPEND
removeO_TRUNC
.
For example, to redirect the file just now:
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#include<unistd.h>
#include<sys/types.h>
#include<sys/stat.h>
#include<fcntl.h>
#define LOG "mylog.txt"
//#define N 64
int main()
{
close(1);
int fd = open(LOG, O_WRONLY | O_CREAT | O_APPEND);
if(fd < 0)
{
perror("open fail:");
exit(-1);
}
printf("this is append\n");
printf("this is append\n");
printf("this is append\n");
return 0;
Input redirection
Similarly, we0
close the number file identifier, and then open our new filescanf
, then we should read data from the newly opened file, let's take a look at the result:
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#include<unistd.h>
#include<sys/types.h>
#include<sys/stat.h>
#include<fcntl.h>
#define LOG "mylog.txt"
//#define N 64
int main()
{
close(0);
int fd = open(LOG, O_RDONLY);
if (fd < 0)
{
perror("open fail:");
exit(-1);
}
int a;
char c;
scanf("%d %c",&a, &c);
printf("%d %c\n", a, c);
return 0;
}
The result is in line with our expectations!
The principle of redirection: In the case where the upper layer cannot perceive, within the operating system, change the direction of the specific subscript in the file descriptor table corresponding to the process!!!
According to these principles, we implement a requirement: to split the information of the standard output stream and the standard error stream.
Analysis: We know that the standard input stream and the standard output stream actually open display files. If we directly use the standard output and standard error streams together, it will cause error information and correct information to be mixed together, making it difficult for us to find the error. .
We can use redirection for shunting. We first close 1
the number file descriptor, then open a new file normal.txt
, then close 2
the number file descriptor, and then open a new file error.txt
so that when we use the standard input or standard error stream, the information will be Write in two different files, we can open and error.txt
view if we care about error information, and open and normal.txt
view if we care about correct information.
When originally no split:
#include<stdio.h>
int main()
{
fprintf(stdout, "stdout->normal\n");
fprintf(stdout, "stdout->normal\n");
fprintf(stdout, "stdout->normal\n");
fprintf(stdout, "stdout->normal\n");
fprintf(stderr, "stderr->error\n");
fprintf(stderr, "stderr->error\n");
fprintf(stderr, "stderr->error\n");
return 0;
}
WRONG INFORMATION MIXED WITH RIGHT INFORMATION! !
To stream:
#include<stdio.h>
#include<sys/types.h>
#include<sys/stat.h>
#include<fcntl.h>
#include<unistd.h>
int main()
{
umask(0);
close(1);
open("normal.txt", O_WRONLY | O_CREAT | O_TRUNC, 0666);
close(2);
open("error.txt", O_WRONLY | O_CREAT | O_TRUNC, 0666);
fprintf(stdout, "stdout->normal\n");
fprintf(stdout, "stdout->normal\n");
fprintf(stdout, "stdout->normal\n");
fprintf(stdout, "stdout->normal\n");
fprintf(stderr, "stderr->error\n");
fprintf(stderr, "stderr->error\n");
fprintf(stderr, "stderr->error\n");
close(1);
close(2);
return 0;
}
Triage complete!
syscall dup2
In fact, it is inconvenient for us to manually close the above operation. For the above operation, Linux
we are provided with a system call dup2
. Its function is to overwrite the second identifier with the address in the first identifier. address in . So as to achieve the purpose of redirection.
- First parameter: The parameter to keep.
- Second parameter: The parameter to be overwritten.
- Return value: If successful, return the second file indicator, and if failed, return -1 and set the error code.
For the shunt code above we can:
#include<stdio.h>
#include<sys/types.h>
#include<sys/stat.h>
#include<fcntl.h>
#include<unistd.h>
int main()
{
umask(0);
int fd1 = open("normal.txt", O_WRONLY | O_CREAT | O_TRUNC, 0666);
int fd2 = open("error.txt", O_WRONLY | O_CREAT | O_TRUNC, 0666);
int n1 = dup2(fd1, 1);
int n2 = dup2(fd2, 2);
//将dup2的返回值打印进文件中
printf("%d %d\n", n1, n2);
fprintf(stdout, "stdout->normal\n");
fprintf(stdout, "stdout->normal\n");
fprintf(stdout, "stdout->normal\n");
fprintf(stdout, "stdout->normal\n");
fprintf(stderr, "stderr->error\n");
fprintf(stderr, "stderr->error\n");
fprintf(stderr, "stderr->error\n");
close(fd1);
close(fd2);
return 0;
}
Six, the understanding of the buffer zone
When we used to learn the file operation of C language, we all knew FILE
that there should be a buffer in it. Now when we learn the operating system, we also know that there is also a buffer in the operating system kernel. Are these two buffers the same? ?
We can't answer this question right now, so we can only give a conclusion first: it is different, FILE
it is a structure provided to us by the C library, and the buffer inside corresponds to the buffer in user mode, and linux
the buffer in the kernel The area corresponds to the buffer in the kernel state.
Let's look at the following code first, analyze the problem according to the phenomenon, and finally understand the buffer.
#include<stdio.h>
#include<unistd.h>
#include<string.h>
int main()
{
printf("printf : hello world!\n");
const char* str = "write: hello world!\n";
write(1, str, strlen(str));
//创建子进程
fork();
return 0;
}
result:
We found that when we run directly and after redirection, the results are different, and printf()
it will be write
printed one more time, why?
In fact, this has something to do with the buffer of the C library! Where is the buffer? When you open the file with fopen, you will get the FILE structure, and the buffer is in this FILE structure!!
The C library will combine certain refresh strategies to write the data in our buffer to the operating system (via write (FILE->fd,xXXX) );
- no buffer
- Line buffering (refresh strategy adopted by the monitor: line buffering)
- Full buffering (refresh strategy adopted by ordinary files: full buffering)
Through this picture, we can well know why the above situation occurs.
At runtime, printf
the function uses the display file, so the code is immediately refreshed by the C library to the buffer in the operating system kernel after running. The write
function itself writes data into the operating system kernel, so it also writes the data After reaching the buffer in the operating system kernel, the data content in the buffer will be cleared fork
later , but there is no data in the buffer itself, the data cannot be output, and the process is over.FILE
FILE
However, when it becomes a redirection, since printf
the file used by the function becomes a normal file, the refresh method of the data becomes a full buffer, so printf
the data is temporarily stored FILE
in the buffer after the code runs, and write
the function written The data is directly written into the system kernel. After the program runs, FILE
the internal buffer needs to be cleaned (there is data in the buffer at this time), but the process changes from one to two, and the buffer needs to be cleaned twice, so log.txt
There are two printf
printed contents inside.
Why is there a buffer in the C library
FILE
?
The answer is: save the caller's time! If we want to directly write data to the operating system kernel, we need to call the system call, and the cost of using the system call is much higher than that of ordinary functions, so in order to use the system as little as possible Call, try to read and write more data in one IO, so there is a bufferFILE
inside .