Article directory
- Basic I/O
Basic I/O
Understanding of files on the computer:
1. File = content + attributes, and empty files also take up space on the disk
2. File path + file name identifies the uniqueness of the file
3. The operation on the file is the operation on the content and attributes
4. If no file path is specified, file access is performed in the current path by default
When writing code, after using various file function interfaces to operate on the file, the binary file is formed after compilation and the operation on the file is not executed when it is not running, so the operation of the code on the file is essentially the operation of the process on the file
**The file is divided into opened file and unopened file. The premise of accessing the file is to open the file, so to operate on the file, **It is the process to operate on the opened file!
Review the file operation function of c language
Looking back at the file operations learned in C language, the first thing to do to operate a file is to open the file—there must be a file pointer to open in a corresponding way
r | Open the file in read mode, and report an error if the file does not exist; |
---|---|
w | Open the file for writing, if the file does not exist, create it; if you only open the file and then close the file, the content of the file will be cleared |
r+ | Open the file in read-write mode, if the file does not exist, an error will be reported |
w+ | Open the file in read-write mode. If the file does not exist, create it. If you only open the file and then close the file, the content of the file will be cleared. |
a | Open file with appended text |
1 #include<stdio.h>
2 #include<unistd.h>
3 #include<string.h>
4 #define FILE_NAME "gout.c"
5 int main()
6 {
7 // FILE* fp=fopen(FILE_NAME,"w");//写方式打开
8 FILE* fp=fopen(FILE_NAME,"r");//读方式打开
9 if(NULL==fp)
10 {
11 perror("fopen fail!\n ");
12 return 1;
13 }
14
15 char buff[64];
16 while(fgets(buff,sizeof(buff)-1,fp)!=NULL)//把后者按规定大小读到前者里
17 {
18 buff[strlen(buff)-1]=0;//去掉自带的/n
19 puts(buff);//按行打印
20 }
21
22 //int num=5;
23 //while(num)
24 //{fprintf(fp,"%s:%d\n","hello bug",num--);//fprintf-把后者按照规定方式写入. 25 //}
26
26
27 fclose(fp);
28 }
In addition to C language, other languages also have file operation functions. The library functions of these language file operations are all packaged on top of the operating system system call file operation function, so let’s talk about the operating system system call file operation function interface. .
Operating system file manipulation functions
open - open a file
The first parameter of the second function of open is the file, and the second parameter is int flags (flag bits)—the options are passed in 32 bits, so these options cannot be repeated. Options can be O_RDONLY (read-only), O_WRONLY (write-only), O_RDWR (read-write) , O_CREAT (create file) , O_TRUNC—overwrite the original file content, O_APPEND—append content , and the third file is permission
The return value is a file descriptor
As for how these 32 bits pass options? Here I present it in a simple way, define some parameters through macros, and then pass some int parameters to know what options you want, and the flags are unique
If there is no file and the underlying open function is read-only or write-only, the file will not be created. You need to pass the identifier of the created file and pass in the permission to create a normal file
If you don’t want the initial permission of the file specified by the system—664, you can pass in the desired mask in the child process to change the permission of the file
And the mask passed here is the mask of the changed child process, which will not affect the parent process
write—write to a file
The first parameter is the file, the second parameter is the content to write, and the third parameter is the size of the payload
If the O_TRUNC parameter is not passed, there will be problems if rewriting the original content!
read—read the contents of a file
The first parameter is the file to be read, the second parameter is the type and location of the file to be read, the content size of the third parameter, and the return value is the size of the file to be read, and the unit is the number of bytes. If The function returns -1 on failure
The content of the read file here, whether it is a binary file, a string, or a custom type, is of type void *, so you need to add details yourself when reading different files. For example, to read a string, you need to add "\0" at the end of the effective content, if
Sensibility + Phenomenon Understanding Documentation
File operation is the relationship between the process and the opened file. A process can open multiple files, and the opened files must be managed by the operating system. By first describing and then organizing, in order to manage the corresponding open files, the operating system must create a corresponding kernel data structure identification file for the file struct file{ }->contains most of the attributes of the file
How is that managed?
I first opened 5 files and printed their file descriptors sequentially
It is found that it starts from 3 and is printed in integer order, so what is the structure of integer order?
The array subscripts are also arranged from 0 in integer order, and it is possible that the first three file descriptors are allocated to three standard input and output streams
stdin -> keyboard, stdout -> monitor, stderr -> monitor
file descriptor fd
The essence of a file descriptor is an array subscript
The upper layer finds the subscript array corresponding to the file descriptor of the file descriptor table by passing the process file descriptor and the struct files_struct *files pointer of the process, and the pointer of the array will access the corresponding file, and then return to the upper layer!
Allocation Rules for File Descriptors
Here I use the system call file function to create a file and print out the file descriptor. You can see that the file descriptor is 3, that is, the three standard input and output streams of 012 (stdin, stdout, stderr) are removed.
Now I close 0 (stdin) or close 2 (stderr), and then run it, you can see that the file descriptor will be 0 or 1, and you can see that when the two standard input streams are closed, the file returned when open opens the file The descriptor can be 0 or 2, but if it is 1 (stdout—standard output stream), you will find that it cannot be printed
When stdin is closed, the new file is opened first, and then the smallest and unoccupied fd is searched in order from small to large in the file descriptor table . At this time, 0 is not occupied, so the pointer at 0 will point to new document.
But when stdout is closed, the content to be printed is not displayed, but is printed to a new file, and because the buffer mechanism of the standard input and output stream is different from the file buffer mechanism, there is no print to stdout in the file at this time The content of the file needs to be forced to refresh the buffer to print it into the file.
At this time, the fd used by the upper layer is 1 unchanged, but the address of the struct file* corresponding to fd is changed in the kernel, and it should be passed to the corresponding file, but it is passed to another file, which is output redirection.
redirect
In the operating system, there are three kinds of redirection, which are output redirection, append redirection, and input redirection.
> | output |
---|---|
>> | addition |
< | enter |
redirection function dup2
int dup2(int oldfd,int newfd)
Pass in two file descriptors, return newfd if successful, return -1 if failed
Copy the content (pointer) in oldfd to newfd, then the content pointed to by newfd (pointer) is the content pointed to by oldfd!
output redirection
Print the content originally printed to stdout into the soo.txt file
append redirection
You can see that the content of the file is appended after running several times
input redirection
Print the original keyboard input to the screen, and change it to print the contents of the file to the screen
Work such as redirection is that the parent process provides information to the child process, such as various pointers to files, and the parent process copies the file descriptor table to the child process for operation. These are all in the kernel data structure, and the process replacement is in the user space, and the operation of the file by the child process will not affect the process replacement. When both the parent process and the child process point to the same file, there will be a reference count in the file , and the count is 2. When the parent process closes the file, the pointer of the parent process does not point to the file. The count decrement is equal to 1, and the file will not be closed until the child process also closes the file (the count decrement is equal to 0), and there is no pointer to the file.
understand the file again
Various peripherals have their own access methods, but they can be unified into the same struct file type on the operating system. The upper layer finds the driver of the peripheral through the function pointer of the struct file of the operating system, and the corresponding operation can be found in the driver. method to call the peripheral. And standing on the struct file, all files or peripherals are struct files, that is, everything is a file on Linux!
understand the buffer
At the same time, the printing of the library function and the calling of the system call, printing to the screen, and redirection to the file are all corresponding to the .c file
But after fork creates a child process at the end of the process, the content printed on the screen is the same as before, but the redirection to the file is different: first, the system call is made once, and then the library function is printed twice . Why?
When you see a phenomenon, trace the source
perceptual understanding buffer
Buffer is part of memory
Some scroll kings don’t go home for the New Year, and they don’t go home for the New Year, hahaha I’m just kidding~~~
Then people who are busy with their livelihoods in other places don’t have time to go home for the New Year, and they are far away from their relatives. If the mother wants to send you some food and drink, if the mother is responsible for taking the things to the children alone, it will take up the mother’s time. It takes a long time to go back and forth, but it takes a long time to send the things to the courier station and send them to the children in a packaged express, which frees up the mother's time, and the things arrive in the children's hands quickly.
Then the mother is the process, the courier station is the buffer, you are the disk, and the buffer is responsible for transferring the data of the process to the disk! It saves the time for the process to perform data io!
buffer flush strategy
1. Immediate refresh —> no buffering 2. Row refresh —> row cache — such as a display 3. Buffer full —> full buffer — such as a disk file
The monitor is for people to see, and it is printed line by line to the monitor for people to see. Waiting for the buffer to be full, and then full buffering at once is the most efficient!
And summed up into two situations: one is the user forced refresh, such as fflush, and the other is the process exit—buffer refresh
Where is the buffer? Through the phenomenon, the library function is printed twice, and the system system call is printed once, indicating that the buffer is not in the kernel; after the fork, the parent-child data is copied on write, and the child process has the same data, that is, two copies of the data are generated.
To sum up, printf, fprintf, and fputs have their own buffers, and in the user-level language layer, the library functions are in the upper layer of the system call, which is the encapsulation of the system call, but write does not, indicating that the buffer is added twice, and Because it is c, it is provided by the c standard library.
The user-level buffer exists in the FILE structure
The library function uses FILE* to find the corresponding FILE structure (used to store information about the file) in the memory. There are buffers, file descriptors, etc. in the structure. After finding it, write the corresponding content to the buffer inside, and then encapsulate fd in the buffer, and refresh the data in the buffer to the peripheral at the corresponding time!
Finally, the reason for the phenomenon: there is a fork: when the peripheral is stdout, before the fork, the three library functions have printed the data to the screen in the form of line buffer. At this time, when the fork creates a child process, the parent process buffer has been Without this part of data, the child process will not copy this part of data; if it is redirected to a file: before fork, the library function will be printed to the file in a fully cached manner. When fork creates a child process, there is still data in the buffer, and the child process also copies a copy of the data, and then the process exits. When the parent and child processes exit, the buffer must be refreshed, so the library function in the file is printed. twice. But the system call write only uses fd, not FILE, so it only prints once.
Write a buffer mechanism
mystdio.h
1 #pragma once
2
3 #include<errno.h>
4 #include<unistd.h>
5 #include<sys/types.h>
6 #include<sys/stat.h>
7 #include<fcntl.h>
8 #include<string.h>
9 #include<stdlib.h>
10 #include<assert.h>
11
12
13 #define SIZE 1024 //大小
14 #define SYNC_NOW 1 //无缓冲
15 #define SYNC_LINE 2 //行缓存
16 #define SYNC_FULL 4 //全缓存
17
18 typedef struct _FILE{
19
20 int flags;//刷新方式
21 int fileno;//文件描述符
22 char buffer[SIZE];//缓冲区
23 int cap;//容量
24 int size;//使用量
25
26 }FILE_;
27
28 FILE_ *fopen_(const char* path_name,const char *mode);//路径 权限
29 void fwrite_(const void *ptr,int num,FILE_ *fp);//去向 大小 来源
30 void fflush_(FILE_ *fp);
31 void fclose_(FILE_ *fp);//来源
mystdio.c
1 #include"mystdio.h"
2
3 FILE_ *fopen_(const char* path_name,const char *mode)//路径 权限
4 {
5 int flags=0;
6 int Moded=0666;
7 if(strcmp(mode,"r")==0)
8 {
9 flags|=O_RDONLY;
10 }
11 else if(strcmp(mode,"w")==0)
12 {
13 flags|=(O_WRONLY|O_CREAT|O_TRUNC);
14 }
15 else if(strcmp(mode,"a")==0)
16 {
17 flags|=(O_WRONLY|O_CREAT|O_APPEND);
18 }else
19 {
20 //TODO
21 }
22 int fd=0;
23 if(flags&O_RDONLY) fd=open(path_name,flags);
24 else fd=open(path_name,flags,Moded);
25 if(fd<0)
26 {
27 const char* err=strerror(errno);
28 write(2,err,strlen(err));
29 return NULL;
30 }
31
32 FILE_ *fp=(FILE_*)malloc(sizeof(FILE_));
33 assert(fp);// 失败就断言
34 //成功就初始化
35 fp->flags=SYNC_LINE;//默认行缓冲
36 fp->fileno=fd;//文件描述符
37 fp->cap=SIZE;//容量
38 fp->size=0;//使用量
39 memset(fp->buffer,0,SIZE);//缓冲区初始化为0
40 return fp;//成功就返回FILE*指针
41 }
42
43
44 void fwrite_(const void *ptr,int num,FILE_ *fp)//去向 大小 来源
45 {
46 //1.写到缓冲区里
47 memcpy(fp->buffer+fp->size,ptr,num);//des,sor,size
48 fp->size+=num;
49 //2.判断是否刷新了
50 if(fp->flags&SYNC_NOW)
51 {
52 //写到文件里
53 write(fp->fileno,fp->buffer,fp->size);
54 fp->size=0;
55 }
56 else if(fp->flags&SYNC_FULL)
57 {
58 if(fp->size==fp->cap)
59 {
60 write(fp->fileno,fp->buffer,fp->size);
61 fp->size=0;
62 }
63 }
64 else if(fp->flags& SYNC_LINE)
65 {
66 if(fp->buffer[fp->size-1]=='\n')//不考虑abcd\nabab
67 {
68 write(fp->fileno,fp->buffer,fp->size);
69 fp->size=0;
70 }
71 }
72 else
73 {
74 //FAIL
75 }
76
77 }
78 void fflush_(FILE_* fp)
79 {
80 if(fp->size>0) write(fp->fileno,fp->buffer,fp->size);
81 }
82 void fclose_(FILE_ *fp)//来源
83 {
84 fflush_(fp);
85 close(fp->fileno);
86 }
try line buffering
Command line to clear the file: [space] > file name
full buffer
no buffer
kernel buffer
According to the above strategy: write data to disk: write into the buffer through the library function printf or system call write, then refresh the buffer according to the specified method (no buffer, line buffer, full buffer), and copy the data to the file.
But it is not. **First copy the data to the language layer buffer through the file function (fwrite, etc.) at the language level, and then copy the data to the struct file in the kernel through the system call (write) by specifying the buffer mode (none, line, full) The kernel-level buffer of the operating system, and then copy the data to the disk in the operating system's own way (maybe the memory is not enough to refresh, or it may be refreshed at a certain frequency). ** A total of three copies.
Therefore, the writing functions such as the language layer file function fwrite and the system call function write should be called copy functions.
fsync write function
fsync is a write function. When the data is in the kernel buffer, calling this function no longer refreshes the buffer according to the operating system's own method, but forcibly synchronizes the data in the operating system buffer to the file.
At the same time, it is also possible to change the small program we wrote.
78 void fflush_(FILE_* fp)
79 {
80 // if(fp->size>0) write(fp->fileno,fp->buffer,fp->size);
fsync(fp->fileno);//强制刷新内核缓冲区
fp->size=0;
81 }
Therefore, the writing functions such as the language layer file function fwrite and the system call function write should be called copy functions.
fsync write function
fsync is a write function. When the data is in the kernel buffer, calling this function no longer refreshes the buffer according to the operating system's own method, but forcibly synchronizes the data in the operating system buffer to the file.
78 void fflush_(FILE_* fp)
79 {
80 // if(fp->size>0) write(fp->fileno,fp->buffer,fp->size);
fsync(fp->fileno);//强制刷新内核缓冲区
fp->size=0;
81 }
At the same time, it is also possible to change the small program we wrote.