【Linux】14. File buffer

1. Export of file buffer

insert image description here
The above phenomenon can be explained after learning the file buffer

2. Know the buffer

The essence of the buffer is a part of the memory, so who applied for the memory? Who does it belong to? Why is there a buffer?
insert image description here
This is the case. In the previous learning process, we did know that there is a buffer concept in the operating system, but when we wrote the code, we did not execute the code to copy the data to the buffer. How does the operating system copy the data? What about the buffer?
– Rather than understanding the fwrite function as a function that writes to a file, it is better to understand it as a copy function that copies data from the process to the "buffer" or peripheral

3. Buffer refresh strategy

If there is some data that needs to be written to the peripheral (disk file) at this time, is it more efficient to write once or write a small number of times?
Answer: Of course, the one-time write efficiency is high. Writing data to the peripheral requires a request to the peripheral. Writing is very efficient for the CPU, but it is very inefficient for the peripheral to respond to CPU requests, so one-time write It only needs to be requested once, and multiple batch writes are bound to be less efficient

The buffer will definitely customize its own refresh strategy based on specific devices. Usually, there are the following three refresh strategies:

  1. Refresh immediately - no buffering
  2. Row Flush – Row Buffering
  3. Buffer full – fully buffered

Usually, for disk files, full buffering (the most efficient) is used.
For monitors, line buffering is used, because the monitor is displayed to the user.
If it is fully buffered, all the data will be refreshed at one time, it is difficult for users to read the data, and no buffering is too inefficient, so the line buffering (user-friendly) is adopted

There are also special cases for the same refresh strategy:

  1. When the user performs a forced refresh (call the fflush interface)
  2. Process exit - buffer flushing is generally required

4. Explain the phenomenon

At the beginning, we used the fork function to redirect the running results to the file to extract the file buffer. We found that the data calling the C interface will be printed twice, while the data calling the system interface will only be printed once. What is the relationship between this phenomenon and the buffer? ?
First of all, we need to make it clear that this phenomenon must be related to the buffer, and another point is that the buffer must not exist in the kernel, otherwise the system interface should also be printed twice
All the buffers we talked about before are the buffers provided to us by the user-level language layer
The buffer exists in stdout, stderr, and stdin, all of which are pointed by the file pointer (FILE*) and the FILE structure is encapsulated in C language, which contains fd (file descriptor) and a buffer.
All of us If you want to get data immediately, you need to force refresh (fflush (file pointer)), and you also need to pass in the file pointer (fclose (file pointer)) when closing the file.

4.1 Source code in the C language library

insert image description here
From the source code, it can be seen that not only the file descriptor is encapsulated in the FILE structure, but the buffer is also encapsulated in the way the file is opened.

Based on the above understanding, we can explain this phenomenon. Fork creates a child process
insert image description here
before the end of the code.

  1. If no redirection is performed, only 4 lines of information
    are printed. Stdout uses line refresh by default.
    The data has been printed and output to the peripheral (display) before the process fork, so inside FILE (or called inside the process) no There is corresponding data
  2. If redirection is performed, the written file is no longer a display but an ordinary file, and the refresh strategy adopted is full buffering.
    Although the previous 3 C printing functions have \n at the end,
    ** is not enough to save the stdout buffer If it is full, **then the data will not be refreshed.
    At this time, execute the fork function. The stdout belongs to the parent process. When a child process is created, the child process will copy the code and data of the parent process and
    then exit after fork. Whoever exits first will definitely refresh the buffer (that is, modify)
    and modify will cause copy-on-write, resulting in two data displays
  3. Why doesn't write show two copies?
    Because the above process has nothing to do with write, write does not have a FILE structure but uses the file descriptor fd, so there is no buffer provided by C!

5. Deep understanding of buffers

How should the buffer be understood? We achieve a deep understanding of buffers by trying to encapsulate file descriptors and buffers into FILE

5.1 Realization of functional requirements

Encapsulate the file descriptor and buffer first, and then realize the basic functions of file operations

  1. Write data: _fwrite
  2. Refresh data: _fflush
  3. Close a file: _fclose
  4. Open the file: _fopen
    first implements these simple modules for the time being, mainly for the understanding of the buffer

5.2 Basic framework construction

[hx@hx my_stdio]$ ll
total 4
-rw-rw-r-- 1 hx hx 78 Jun  7 18:33 Makefile
-rw-rw-r-- 1 hx hx  0 Jun  7 18:33 myStdio.c
-rw-rw-r-- 1 hx hx  0 Jun  7 18:33 myStdio.h
[hx@hx my_stdio]$ cat Makefile 
main:main.c myStdio.c
	gcc -o $@ $^ -std=c99

.PHONY:clean
clean:
	rm -f main

5.3 Package as _FILE

// 在myStdio.h文件当中定义
  1 #pragma once                                                                                                                                                  
  2                                                                                                                                                               
  3 #include <stdio.h>                                                                                                                                            
  4                                                                                                                                                               
  5 #define SIZE 1024                                                                                                                                             
  6                                                                                                                                                               
  7 typedef struct _FILE                                                                                                                                          
  8 {
    
                                                                                                                                                                 
  9   int flags; // 刷新方式:(无/行/全缓冲)                                                                                                                       
 10   int fileno; // 文件描述符                                                                                                                                   
 11   int capacity; // buffer的总容量                                                                                                                             
 12   int size; // buffer当前的使用量                                                                                                                             
 13   char buffer[SIZE]; // SIZE字节的缓冲区                                                                                                                      
 14 } _FILE;                                                                                                                                                      
 15                                                                                                                                                               
 16 // 文件打开需要文件打开路径和权限                                                                                                                             
 17 _FILE * fopen_(const char* path_name,const char *mode);                                                                                                         
 18                                                                                                                                                               
 19 // ptr是要写入文件的数据 num是数据字节数 _FILE* 文件指针                                                                                                         
 20 void fwrite_(const char* ptr,int num,_FILE* fp);                                                                                                              
 21                                                                                                                                                               
 22 //传文件指针关闭文件                                                                                                                                          
 23 void fclose_(_FILE* fp);                                                                                                                                      
 24                                                                                                                                                               
 25 //传文件指针强制刷新缓冲区                                                                                                                                    
 26 void fflush_(_FILE* fp);   

5.4 Implementation of fopen_

[hx@hx my_stdio]$ cat myStdio.c
#include "myStdio.h"

_FILE* fopen_(const char * path_name,const char *mode)
{
    
    
  int flags = 0;
  int defaultMode = 0666;

  // 以只读的方式打开文件
  if(strcmp(mode,"r") == 0)
  {
    
    
    flags |= O_RDONLY;
  }
  // 以只写的方式打开文件
  else if(strcmp(mode,"w") == 0)
  {
    
    
    flags |= (O_WRONLY | O_CREAT | O_TRUNC);
  }
  else if(strcmp(mode,"a") == 0)
  {
    
    
    flags |= (O_WRONLY | O_CREAT | O_APPEND);
  }
  else 
  {
    
    
    // 目前就简单实现这3种文件操作
  }
  // 文件描述符
  int fd = 0;
  // 以只读的方式打开文件
  if(flags & O_RDONLY) fd = open(path_name,flags);
  // 写入文件 若文件不存在 需要以defaultMode的权限创建
  else fd = open(path_name,flags,defaultMode);

  // 文件打开失败
  if(fd < 0)
  {
    
    
    // 记录下错误信息
    const char* err = strerror(errno);
    // 将错误信息写入到标准错误(2/stderr)当中
    write(2,err,strlen(err));
    // 这也就是为啥文件打开失败要返回NULL(C语言底层就是这样实现的)
    return NULL;
  }

  // 下面就是文件打开成功
  // 在堆上申请空间
  _FILE * fp = (_FILE*)malloc(sizeof(_FILE));
  // 暴力检查 未申请成功直接报断言错误
  assert(fp);

  // 默认设置为行刷新
  fp->flags = SYNC_LINE;
  // 文件描述符置为fd
  fp->fileno = fd;
  fp->capacity = SIZE;
  fp->size = 0;
  // 将缓冲区数据置为0 保证后续往缓冲区写入数据正确
  memset(fp->buffer,0,SIZE);

  // 这也就是为啥打开文件要返回FILE*的指针(C语言底层的实现方式)
  return fp;
}

5.5 Definition of reference refresh method of header file

[hx@hx my_stdio]$ cat myStdio.h
#pragma once 

#include <assert.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <errno.h>

#define SIZE 1024

// 无缓冲 
#define SYNC_NOW  1
// 行缓冲
#define SYNC_LINE 2
// 全缓冲
#define SYNC_FULL 4

5.6 Implementation of fwrite_

void fwrite_(const void * ptr,int num,_FILE *fp)
{
    
    
  // 将数据写入到缓冲区
  // 这里的fp->buffer+fp->size 若缓冲区当中存在数据 往后追加
  // 这里不考虑缓冲区溢出的问题
  memcpy(fp->buffer+fp->size,ptr,num);
  // 缓冲区数据增加num个字节
  fp->size += num;

  // 判断刷新方式
  // 无刷新
  if(fp->flags & SYNC_NOW)
  {
    
    
    // 将缓冲区数据写入文件
    write(fp->fileno,fp->buffer,fp->size);
    // 将缓冲区置为0 惰性清空缓冲区
    fp->size = 0;
  }
  // 全刷新
  else if(fp->flags & SYNC_FULL)
  {
    
    
    //当缓冲区满了才刷新
    if(fp->size == fp->capacity)
    {
    
    
      write(fp->fileno,fp->buffer,fp->size);
      fp->size = 0;
    }
  }
  // 行缓冲
  else if(fp->flags & SYNC_LINE)
  {
    
    
    //当最后1个字符为\n时,刷新数据
    //这里不考虑 "abcd\nefg" 这种情况
    if(fp->buffer[fp->size-1] == '\n')
    {
    
    
      write(fp->fileno,fp->buffer,fp->size);
      fp->size = 0;
    }
  }
  else 
  {
    
    
    // 不执行任何操作
  }
}

5.7 Implementation of fclose_ and fflush_

void fflush_(_FILE *fp)
{
    
    
  //若缓冲区内存在数据 将缓冲区的数据写入对应的文件描述符(可能是磁盘文件,也可能是显示器)
  if(fp->size > 0)
    write(fp->fileno,fp->buffer,fp->size);
}

void fclose_(_FILE *fp)
{
    
    
  // 文件关闭前要进行数据的强制刷新
  fflush_(fp);
  // 关闭对应的文件描述符
  // 文件描述指向的就是文件(关闭文件)
  close(fp->fileno);
}

5.8 Example test

1. Case 1

insert image description here

2. Case 2

insert image description here

3. Case 3

insert image description here

6. Understand the whole process after the file is refreshed

insert image description here
When the user writes "hello linux\n" to the file, first call the C language interface fwrite, fwrite will first write the data into the C language encapsulated FILE buffer, and then adopt the corresponding refresh strategy to enter the write interface (Bottom interface) Copy the data to the kernel buffer according to the file descriptor, and finally the OS periodically flashes it to the peripheral (disk).
The data needs to be copied 3 times to be written to the peripheral, the first copy is copied to the buffer of C language, the second copy is copied to the kernel buffer, and the third copy is copied to the peripheral
(So ​​the essence of the fwrite/write interface is actually a copy function)

How to prove the process? – Can’t prove it, but it can be seen that the interface
user calls fwrite to transfer the data to the C language buffer, and the C language calls the write interface at the bottom of the operating system to transfer the data to the kernel buffer. What if the OS crashes during this process? manage?
The downtime of the operating system means that the data buffered in the kernel buffer has not been refreshed to the peripherals, which will cause data loss. What if the user has zero tolerance for data loss (assuming that the user is a banking institution, the impact of data loss is significant ),so what should I do now?
There is an interface fsync in the operating system – force refresh

7. Deep understanding of forced refresh

[hx@hx my_stdio]$ man 2 fsync
FSYNC(2)                               Linux Programmer's Manual                               FSYNC(2)

NAME
       fsync, fdatasync - synchronize a file's in-core state with storage device

SYNOPSIS
       #include <unistd.h>

       int fsync(int fd);

Call this interface to tell the operating system not to refresh the data according to its own refresh strategy, as long as the data is obtained, it will be refreshed to the peripheral immediately

7.1 Examples

In fflush_, the forced refresh interface fsync is the real forced refresh. fwrite can only be regarded as a copy

void fflush_(_FILE *fp)
{
    
    
  //若缓冲区内存在数据 将缓冲区的数据写入对应的文件描述符(可能是磁盘文件,也可能是显示器)
  if(fp->size > 0)
    write(fp->fileno,fp->buffer,fp->size);
  // 强制要求操作系统对外设进行刷新
  fsync(fp->fileno);

  // 刷新完 将size置为0 表示此时缓冲区内无数据
  fp->size = 0;
}

insert image description here

Guess you like

Origin blog.csdn.net/weixin_60915103/article/details/131082458