The difference and connection between Linux standard IO and file IO

Foreword:

      The main content of this article is extracted from the Internet, and the scattered information found is digested and summarized together for future reference.

      Refer to the following articles, thank you for sharing:

      Linux Quest I/O Efficiency

      https://cloud.tencent.com/developer/article/1018033

      Basic Essentials of Linux System Programming (4): The difference between C standard library IO buffer and kernel buffer     

      https://cloud.tencent.com/developer/article/1012299

concept:


       Standard I/O: Refers to the standard I/O library, also known as I/O with buffer. It is stated by the ANSI C standard. The standard I/O library replaces the user to handle many details, such as cache allocation, I/O execution with optimized length, etc. The purpose of providing cache is to minimize the number of read and write calls.
Common functions: fopen, fread, fwrite, fclose, printf, fprintf, scanf, sscanf, etc.
File I/O: Also known as unbuffered I/O, it means that each read and write is a system call in the kernel. These unbuffered file I/O functions are not part of ISO C. They are part of POSIX.1 and Single UNIX Specification. Common functions: open, read, write, lseek, close, etc.

contact:


      The user program calls the C standard I/O library functions to read and write ordinary files or devices, and these library functions pass the read and write requests to the kernel through system calls, and the kernel drives the disk or device to complete the I/O operation. The C standard library allocates an I/O buffer for each open file to speed up read and write operations. This buffer can be found through the FILE structure of the file. The user calls the read and write functions in the I/O buffer most of the time. Read and write, only a few times need to pass the read and write request to the kernel. Taking fgetc / fputc as an example, when the user program calls fgetc to read a byte for the first time, the fgetc function may enter the kernel through a system call to read 1K bytes into the I/O buffer, and then return the value in the I/O buffer. The first byte is given to the user, and the read and write position is pointed to the second character in the I/O buffer. Later, the user will adjust fgetc and read it directly from the I/O buffer without entering the kernel. When the user finishes reading all the 1K bytes, and calls fgetc again, the fgetc function will enter the kernel again to read 1K bytes into the I/O buffer. In this scenario, the relationship between the user program, the C standard library, and the kernel is just like the relationship between the CPU, Cache, and memory in the "Memory Hierarchy". The reason why the C standard library pre-reads some data from the kernel is placed in I In the /O buffer, it is hoped that the user program will use these data later. The I/O buffer of the C standard library is also in the user space. Reading data directly from the user space is much faster than reading data into the kernel. On the other hand, the user program calling fputc is usually only written to the I/O buffer, so that the fputc function can return quickly. If the I/O buffer is full, fputc will put the I/O buffer in the system call The data is passed to the kernel, and the kernel finally writes the data back to the disk or device. Sometimes the user program wants to transfer the data in the I/O buffer to the kernel immediately, so that the kernel writes it back to the device or disk. This is called the Flush operation. The corresponding library function is fflush. The fclose function will also do Flush before closing the file. operating.


the difference:


1. The difference in buffer mechanism:
      As we all know, the data exchange between CPU and memory is much larger than disk operation. Through the caching mechanism, the number of disk reads and writes can be reduced, and the efficiency of concurrent processing programs can be improved. Therefore, caching is a way to improve task storage and processing. Effective method of efficiency.
From a macro point of view, the Linux operating system is divided into user mode and kernel mode, both of which provide caches when processing I/O operations. User mode is called standard I/O cache, also called user space cache, and kernel mode is called buffer cache, also called page cache. Since caches are provided, why this book is divided into caches without I/O and caches with I/O. The reason is that "without I/O cache" means that these I/Os are not available in the user space. /O operations are buffered, and the kernel is buffered.

2. The flow difference of I/O operation:


  As shown in the figure above, the read and write operations of the user process space and the kernel process space must go through the buffer cache. The function of the cache is also mentioned before, to reduce the number of disk reads and writes and improve the efficiency of I/O. When reading and writing a file, first look at the operation flow of the system I/O.
2.1 File I/O:

It belongs to the kernel system call and does not involve user mode participation. Take the label in the figure as an example:
      (3) Call the write function to write data to the file. The data to be written is stored in buf, such as write(fd,'abc', 3). BUFFSIZE needs to be set before calling. Different BUFFSIZE will affect the I/O efficiency. Let me talk about this problem again.
      (5) Delayed write: When the cache area is full or the kernel needs to rewrite the buffer, the data is written to the output queue. When the data reaches the head of the queue, the disk write operation is really triggered.
      (6) Pre-reading: When detecting that a sequential read is being performed, the kernel tries to read more data than the application requires, and assumes that the application will read these data soon. In this way, when there is no data in the buffer, the data to be read next time can be quickly filled.
      (4) Call read to read the required data from the buffer cache to the logic unit for processing.
The above is the four-step operation involved in the system I/O.

2.2 Standard I/O:

      It belongs to the standard library functions implemented by ISO C and calls the underlying system calls.
      (1) Write the data in the logic unit to the file. According to the needs, there are three types of functions that can be called. Take fputc, fputs, and fwrite as examples. These functions do not need to artificially control the size of the buffer, but are automatically applied by the system. After the user defines the corresponding I/O function, according to different cache types (whether it is full buffer, line buffer or no buffer), the system automatically calls malloc and other functions to apply for a buffer, that is, standard I/O buffer.
      (3)(5) When the user buffer is full, like system I/O operations, call write at this time to copy data from the standard I/O buffer to the kernel buffer, and then write it to disk.
      (4)(6) Same as system I/O operation, call read from the kernel buffer to read into the user buffer.
      (2) There are also three types of functions that can be called. Take fgetc, fgets, and fread as examples, read into the logic unit for subsequent processing.
It can be seen that the implementation mechanism of standard I/O is based on system I/O. In this way, standard I/O is definitely not as efficient as system I/O, but the fact is that standard I/O is not compared with system I/O. It's a lot slower, and there are many other advantages.

2.3 Use data flow to describe the difference between the two: data flow for
      non-buffered I/O operations: data -> kernel buffer ->
      data flow for disk standard I/O operations: data -> stream buffer -> kernel buffer ->Disk

2.4 Advantages and disadvantages of standard I/O:
      1) One advantage of using standard I/O routines is that there is no need to consider the choice of cache and the best I/O length, and it is not much slower than calling read and write directly.
      2) In In the standard I/O library, an inefficient deficiency is the amount of data that needs to be copied. When using the functions fgets and fputs one line at a time, you usually need to copy the data twice: once between the kernel and the standard I/O cache (when calling read and write), and the second time in the standard I/O cache ( Usually the system allocates and manages) and the line cache in the user program (the parameter of fgets requires a user line cache pointer).

3. Classic answer: The
      former is low-level IO, and the latter is high-level IO.
      The former returns a file descriptor (in the user program area), and the latter returns a file pointer.
      The former has no buffer, the latter has buffer.
      The former is used in conjunction with read and write, and the latter is used in conjunction with fread and fwrite.
      The latter is based on the former, and in most cases, the latter is used.
     The above is the introduction of the difference between open and fopen. The difference between the two is mainly the difference in buffering. Fopen has buffering but open does not, and their levels are also different. Fopen is portable but open cannot.
 

Guess you like

Origin blog.csdn.net/the_wan/article/details/108309703