Basic IO-file operation functions, file descriptors, understanding buffers

image-20230128235826125

Basic I/O

Understanding of files on the computer:

1. File = content + attributes, and empty files also take up space on the disk

2. File path + file name identifies the uniqueness of the file

3. The operation on the file is the operation on the content and attributes

4. If no file path is specified, file access is performed in the current path by default

When writing code, after using various file function interfaces to operate on the file, the binary file is formed after compilation and the operation on the file is not executed when it is not running, so the operation of the code on the file is essentially the operation of the process on the file

**The file is divided into opened file and unopened file. The premise of accessing the file is to open the file, so to operate on the file, **It is the process to operate on the opened file!

Review the file operation function of c language

Looking back at the file operations learned in C language, the first thing to do to operate a file is to open the file—there must be a file pointer to open in a corresponding way

r Open the file in read mode, and report an error if the file does not exist;
w Open the file for writing, if the file does not exist, create it; if you only open the file and then close the file, the content of the file will be cleared
r+ Open the file in read-write mode, if the file does not exist, an error will be reported
w+ Open the file in read-write mode. If the file does not exist, create it. If you only open the file and then close the file, the content of the file will be cleared.
a Open file with appended text
  1 #include<stdio.h>
  2 #include<unistd.h>
  3 #include<string.h>
  4 #define FILE_NAME "gout.c"
  5 int main()
  6 {
    
    
  7  // FILE* fp=fopen(FILE_NAME,"w");//写方式打开
  8   FILE* fp=fopen(FILE_NAME,"r");//读方式打开
  9  if(NULL==fp)
 10  {
    
    
 11    perror("fopen fail!\n ");
 12    return 1;
 13  }
 14 
 15 char buff[64];
 16 while(fgets(buff,sizeof(buff)-1,fp)!=NULL)//把后者按规定大小读到前者里
 17 {
    
    
 18   buff[strlen(buff)-1]=0;//去掉自带的/n
 19   puts(buff);//按行打印
 20 }
 21 
 22 //int num=5;
 23 //while(num)
 24 //{fprintf(fp,"%s:%d\n","hello bug",num--);//fprintf-把后者按照规定方式写入.  25 //}
 26                                                                              
 26 
 27  fclose(fp);
 28 } 

In addition to C language, other languages ​​also have file operation functions. The library functions of these language file operations are all packaged on top of the operating system system call file operation function, so let’s talk about the operating system system call file operation function interface. .

Operating system file manipulation functions

open - open a file

The first parameter of the second function of open is the file, and the second parameter is int flags (flag bits)—the options are passed in 32 bits, so these options cannot be repeated. Options can be O_RDONLY (read-only), O_WRONLY (write-only), O_RDWR (read-write) , O_CREAT (create file) , O_TRUNC—overwrite the original file content, O_APPEND—append content , and the third file is permission

The return value is a file descriptor

image-20230124213916020

As for how these 32 bits pass options? Here I present it in a simple way, define some parameters through macros, and then pass some int parameters to know what options you want, and the flags are unique

image-20230124220436848

If there is no file and the underlying open function is read-only or write-only, the file will not be created. You need to pass the identifier of the created file and pass in the permission to create a normal file

image-20230124222810680

If you don’t want the initial permission of the file specified by the system—664, you can pass in the desired mask in the child process to change the permission of the file

image-20230124223110923

image-20230124223332234

And the mask passed here is the mask of the changed child process, which will not affect the parent process

write—write to a file

The first parameter is the file, the second parameter is the content to write, and the third parameter is the size of the payload

image-20230124224726753

If the O_TRUNC parameter is not passed, there will be problems if rewriting the original content!

image-20230124225534137

image-20230124230651737

read—read the contents of a file

The first parameter is the file to be read, the second parameter is the type and location of the file to be read, the content size of the third parameter, and the return value is the size of the file to be read, and the unit is the number of bytes. If The function returns -1 on failure

image-20230124230814272

image-20230126104357706

The content of the read file here, whether it is a binary file, a string, or a custom type, is of type void *, so you need to add details yourself when reading different files. For example, to read a string, you need to add "\0" at the end of the effective content, if

image-20230124231554263

image-20230124231603399

Sensibility + Phenomenon Understanding Documentation

File operation is the relationship between the process and the opened file. A process can open multiple files, and the opened files must be managed by the operating system. By first describing and then organizing, in order to manage the corresponding open files, the operating system must create a corresponding kernel data structure identification file for the file struct file{ }->contains most of the attributes of the file

How is that managed?

I first opened 5 files and printed their file descriptors sequentially

image-20230124234710411

It is found that it starts from 3 and is printed in integer order, so what is the structure of integer order?

image-20230124234746218

The array subscripts are also arranged from 0 in integer order, and it is possible that the first three file descriptors are allocated to three standard input and output streams

stdin -> keyboard, stdout -> monitor, stderr -> monitor

image-20230124235143280

image-20230124235159038

file descriptor fd

The essence of a file descriptor is an array subscript

The upper layer finds the subscript array corresponding to the file descriptor of the file descriptor table by passing the process file descriptor and the struct files_struct *files pointer of the process, and the pointer of the array will access the corresponding file, and then return to the upper layer!

image-20230125002826440

Allocation Rules for File Descriptors

Here I use the system call file function to create a file and print out the file descriptor. You can see that the file descriptor is 3, that is, the three standard input and output streams of 012 (stdin, stdout, stderr) are removed.

image-20230126151158816

Now I close 0 (stdin) or close 2 (stderr), and then run it, you can see that the file descriptor will be 0 or 1, and you can see that when the two standard input streams are closed, the file returned when open opens the file The descriptor can be 0 or 2, but if it is 1 (stdout—standard output stream), you will find that it cannot be printed

image-20230126151408927

image-20230126151531089

image-20230126151919129

When stdin is closed, the new file is opened first, and then the smallest and unoccupied fd is searched in order from small to large in the file descriptor table . At this time, 0 is not occupied, so the pointer at 0 will point to new document.

image-20230126152830376

But when stdout is closed, the content to be printed is not displayed, but is printed to a new file, and because the buffer mechanism of the standard input and output stream is different from the file buffer mechanism, there is no print to stdout in the file at this time The content of the file needs to be forced to refresh the buffer to print it into the file.

image-20230126160406251

At this time, the fd used by the upper layer is 1 unchanged, but the address of the struct file* corresponding to fd is changed in the kernel, and it should be passed to the corresponding file, but it is passed to another file, which is output redirection.

redirect

In the operating system, there are three kinds of redirection, which are output redirection, append redirection, and input redirection.

> output
>> addition
< enter

redirection function dup2

int dup2(int oldfd,int newfd)

Pass in two file descriptors, return newfd if successful, return -1 if failed

image-20230126160559256

Copy the content (pointer) in oldfd to newfd, then the content pointed to by newfd (pointer) is the content pointed to by oldfd!

image-20230126161504521

output redirection

Print the content originally printed to stdout into the soo.txt file

image-20230126162105541

image-20230126162212888

append redirection

image-20230126163127169

You can see that the content of the file is appended after running several times

image-20230126163159949

input redirection

Print the original keyboard input to the screen, and change it to print the contents of the file to the screen

image-20230126164144870

image-20230126164153637

image-20230126164200375

Work such as redirection is that the parent process provides information to the child process, such as various pointers to files, and the parent process copies the file descriptor table to the child process for operation. These are all in the kernel data structure, and the process replacement is in the user space, and the operation of the file by the child process will not affect the process replacement. When both the parent process and the child process point to the same file, there will be a reference count in the file , and the count is 2. When the parent process closes the file, the pointer of the parent process does not point to the file. The count decrement is equal to 1, and the file will not be closed until the child process also closes the file (the count decrement is equal to 0), and there is no pointer to the file.

image-20230126200226317

understand the file again

Various peripherals have their own access methods, but they can be unified into the same struct file type on the operating system. The upper layer finds the driver of the peripheral through the function pointer of the struct file of the operating system, and the corresponding operation can be found in the driver. method to call the peripheral. And standing on the struct file, all files or peripherals are struct files, that is, everything is a file on Linux!

image-20230126203040708

understand the buffer

At the same time, the printing of the library function and the calling of the system call, printing to the screen, and redirection to the file are all corresponding to the .c file

image-20230128110418648

But after fork creates a child process at the end of the process, the content printed on the screen is the same as before, but the redirection to the file is different: first, the system call is made once, and then the library function is printed twice . Why?

image-20230128110515393

When you see a phenomenon, trace the source

perceptual understanding buffer

Buffer is part of memory

Some scroll kings don’t go home for the New Year, and they don’t go home for the New Year, hahaha I’m just kidding~~~

Then people who are busy with their livelihoods in other places don’t have time to go home for the New Year, and they are far away from their relatives. If the mother wants to send you some food and drink, if the mother is responsible for taking the things to the children alone, it will take up the mother’s time. It takes a long time to go back and forth, but it takes a long time to send the things to the courier station and send them to the children in a packaged express, which frees up the mother's time, and the things arrive in the children's hands quickly.

Then the mother is the process, the courier station is the buffer, you are the disk, and the buffer is responsible for transferring the data of the process to the disk! It saves the time for the process to perform data io!

image-20230128114048976

buffer flush strategy

1. Immediate refresh —> no buffering 2. Row refresh —> row cache — such as a display 3. Buffer full —> full buffer — such as a disk file

The monitor is for people to see, and it is printed line by line to the monitor for people to see. Waiting for the buffer to be full, and then full buffering at once is the most efficient!

And summed up into two situations: one is the user forced refresh, such as fflush, and the other is the process exit—buffer refresh

Where is the buffer? Through the phenomenon, the library function is printed twice, and the system system call is printed once, indicating that the buffer is not in the kernel; after the fork, the parent-child data is copied on write, and the child process has the same data, that is, two copies of the data are generated.

To sum up, printf, fprintf, and fputs have their own buffers, and in the user-level language layer, the library functions are in the upper layer of the system call, which is the encapsulation of the system call, but write does not, indicating that the buffer is added twice, and Because it is c, it is provided by the c standard library.

The user-level buffer exists in the FILE structure

The library function uses FILE* to find the corresponding FILE structure (used to store information about the file) in the memory. There are buffers, file descriptors, etc. in the structure. After finding it, write the corresponding content to the buffer inside, and then encapsulate fd in the buffer, and refresh the data in the buffer to the peripheral at the corresponding time!

Finally, the reason for the phenomenon: there is a fork: when the peripheral is stdout, before the fork, the three library functions have printed the data to the screen in the form of line buffer. At this time, when the fork creates a child process, the parent process buffer has been Without this part of data, the child process will not copy this part of data; if it is redirected to a file: before fork, the library function will be printed to the file in a fully cached manner. When fork creates a child process, there is still data in the buffer, and the child process also copies a copy of the data, and then the process exits. When the parent and child processes exit, the buffer must be refreshed, so the library function in the file is printed. twice. But the system call write only uses fd, not FILE, so it only prints once.

Write a buffer mechanism

mystdio.h

  1 #pragma once 
  2 
  3 #include<errno.h>
  4 #include<unistd.h>
  5 #include<sys/types.h>
  6 #include<sys/stat.h>
  7 #include<fcntl.h>
  8 #include<string.h>
  9 #include<stdlib.h>
 10 #include<assert.h>
 11 
 12 
 13 #define SIZE 1024    //大小
 14 #define SYNC_NOW  1  //无缓冲
 15 #define SYNC_LINE 2  //行缓存
 16 #define SYNC_FULL 4  //全缓存
 17 
 18 typedef struct _FILE{
    
    
 19 
 20 int flags;//刷新方式
 21 int fileno;//文件描述符
 22 char buffer[SIZE];//缓冲区
 23 int cap;//容量
 24 int size;//使用量
 25 
 26 }FILE_;
 27 
 28 FILE_ *fopen_(const char* path_name,const char *mode);//路径 权限
 29 void fwrite_(const void *ptr,int num,FILE_ *fp);//去向 大小 来源                                                     
 30 void fflush_(FILE_ *fp);
 31 void fclose_(FILE_ *fp);//来源

mystdio.c

 1 #include"mystdio.h"
  2                                                                                                                                                                                        
  3 FILE_ *fopen_(const char* path_name,const char *mode)//路径 权限
  4 {
    
                                                                   
  5   int flags=0;
  6   int Moded=0666;
  7   if(strcmp(mode,"r")==0)
  8   {
    
                          
  9     flags|=O_RDONLY;
 10    }                
 11   else if(strcmp(mode,"w")==0)
 12   {
    
                               
 13    flags|=(O_WRONLY|O_CREAT|O_TRUNC);
 14   }                                  
 15   else if(strcmp(mode,"a")==0)
 16   {
    
                               
 17     flags|=(O_WRONLY|O_CREAT|O_APPEND);
 18   }else                                
 19   {
    
         
 20     //TODO
 21   }       
 22   int fd=0;
 23   if(flags&O_RDONLY) fd=open(path_name,flags);
 24   else fd=open(path_name,flags,Moded);        
 25   if(fd<0)                            
 26   {
    
           
 27     const char* err=strerror(errno);
 28     write(2,err,strlen(err));       
 29    return NULL;              
 30   }            
 31    
 32   FILE_ *fp=(FILE_*)malloc(sizeof(FILE_));
 33   assert(fp);// 失败就断言                
 34 //成功就初始化            
 35 fp->flags=SYNC_LINE;//默认行缓冲
 36 fp->fileno=fd;//文件描述符      
 37 fp->cap=SIZE;//容量       
 38 fp->size=0;//使用量
 39 memset(fp->buffer,0,SIZE);//缓冲区初始化为0
 40 return fp;//成功就返回FILE*指针            
 41 }                              
 42 
 43                                            
 44 void fwrite_(const void *ptr,int num,FILE_ *fp)//去向 大小 来源
 45 {
    
                                                                                                                                                                                          
 46    //1.写到缓冲区里
 47    memcpy(fp->buffer+fp->size,ptr,num);//des,sor,size 
 48    fp->size+=num;
 49    //2.判断是否刷新了
 50    if(fp->flags&SYNC_NOW)
 51    {
    
    
 52      //写到文件里
 53      write(fp->fileno,fp->buffer,fp->size);
 54      fp->size=0;
 55    }
 56    else if(fp->flags&SYNC_FULL)
 57    {
    
    
 58     if(fp->size==fp->cap)
 59      {
    
    
 60      write(fp->fileno,fp->buffer,fp->size);
 61      fp->size=0;
 62      }
 63   } 
 64    else if(fp->flags& SYNC_LINE)
 65    {
    
    
 66      if(fp->buffer[fp->size-1]=='\n')//不考虑abcd\nabab
 67      {
    
    
 68      write(fp->fileno,fp->buffer,fp->size);
 69      fp->size=0;
 70      }
 71    }
 72    else 
 73    {
    
    
 74      //FAIL
 75    }
 76 
 77 }
 78 void fflush_(FILE_* fp)
 79 {
    
    
 80   if(fp->size>0) write(fp->fileno,fp->buffer,fp->size);
 81 }
 82 void fclose_(FILE_ *fp)//来源
 83 {
    
    
 84   fflush_(fp);
 85   close(fp->fileno);
 86 }

try line buffering

image-20230128184759931

image-20230128185936782

Command line to clear the file: [space] > file name

full buffer

image-20230128190143307

image-20230128190338375

no buffer

image-20230128191323069

image-20230128192242390

kernel buffer

According to the above strategy: write data to disk: write into the buffer through the library function printf or system call write, then refresh the buffer according to the specified method (no buffer, line buffer, full buffer), and copy the data to the file.

But it is not. **First copy the data to the language layer buffer through the file function (fwrite, etc.) at the language level, and then copy the data to the struct file in the kernel through the system call (write) by specifying the buffer mode (none, line, full) The kernel-level buffer of the operating system, and then copy the data to the disk in the operating system's own way (maybe the memory is not enough to refresh, or it may be refreshed at a certain frequency). ** A total of three copies.

image-20230128213614225

Therefore, the writing functions such as the language layer file function fwrite and the system call function write should be called copy functions.

fsync write function

fsync is a write function. When the data is in the kernel buffer, calling this function no longer refreshes the buffer according to the operating system's own method, but forcibly synchronizes the data in the operating system buffer to the file.

image-20230128214731595

At the same time, it is also possible to change the small program we wrote.

image-20230128215207244

 78 void fflush_(FILE_* fp)
 79 {
    
    
 80  // if(fp->size>0) write(fp->fileno,fp->buffer,fp->size);
 fsync(fp->fileno);//强制刷新内核缓冲区
 fp->size=0;
 81 }

Therefore, the writing functions such as the language layer file function fwrite and the system call function write should be called copy functions.

fsync write function

fsync is a write function. When the data is in the kernel buffer, calling this function no longer refreshes the buffer according to the operating system's own method, but forcibly synchronizes the data in the operating system buffer to the file.

 78 void fflush_(FILE_* fp)
 79 {
    
    
 80  // if(fp->size>0) write(fp->fileno,fp->buffer,fp->size);
 fsync(fp->fileno);//强制刷新内核缓冲区
 fp->size=0;
 81 }

At the same time, it is also possible to change the small program we wrote.

Guess you like

Origin blog.csdn.net/m0_71841506/article/details/128783496