File I/O summary

Preface

Without accumulating silicon, one step cannot reach thousands of miles, and without accumulating small streams, there is no river. carry on. . .

 

1. Concept:

      File I/O is called unbuffered I/O, which means that every read and write is a system call in the kernel. These unbuffered file I/O functions are not part of ISO C. They are part of POSIX.1 and Single UNIX Specification.

      Commonly used file I/O functions without buffering are: open, read, write, lseek, close.

      1. About system calls

      System call, the English name system call, each operating system has some built-in function libraries in the kernel, these functions can be used to complete some system calls, pass application requests to the kernel, and call the corresponding kernel functions Complete the required processing and return the processing results to the application. Without system calls and kernel functions, users will not be able to write large-scale applications and other functions. The collection of these functions is called a program interface or application programming interface (Application Programming Interface). Interface, API), we need to write various applications on this system, which is to call functions in the system kernel through this API interface. If there is no system call, then the application loses kernel support.

     2. About the buffer zone  

      Linix's operations on IO files are divided into IO operations without cache and standard IO operations (that is, with cache). The following points should be clarified:
      1) Without cache, it is not directly read operations on disk files, such as read() and The write() function, they are all system calls, but there is no cache in the user layer, so it is called cacheless IO, but for the kernel, it is still cached, but the user layer cannot see it.
      2) With or without cache is relatively speaking, if you want to write data to a file (that is, write to disk), the kernel first writes the data to the buffer memory set in the kernel, if this buffer is stored The length of the device is 100 bytes, call the system function:
ssize_t write (int fd,const void * buf,size_t count); When
writing, set the length of each write count=10 bytes, then you have to call This function can fill the buffer 10 times. At this time, the data is still in the buffer and not written to the disk. When the buffer is full, the actual IO operation is performed and the data is written to the disk.

      So, since the operation without cache actually has a cache in the kernel, what about the IO operation with cache?
Cached IO is also called standard IO. It conforms to ANSI C standard IO processing and does not rely on the system kernel, so it is highly portable. We use standard IO operations often to reduce the number of system calls to read() and write(). cache IO is actually at the user level and then create a buffer zone, the buffer allocation and optimization length and other details are standard IO library on your behalf handled well, do not have to worry about, or explain the process of this operation with the above example:
say above To write data to a file, the length of the kernel cache (note that this is not a user-level cache area) is 100 bytes. We call the IO function write() without cache 10 times, so the system efficiency is low. Now we are in The user layer creates another buffer area (user layer buffer area or stream buffer). Assuming that the length of the stream buffer is 50 bytes, we use fwrite() of the standard C library function to write data into this stream buffer area. After the buffer area is full of 50 bytes, it enters the kernel buffer area. At this time, the system function write() is called to write the data to the file (essentially a disk). Seeing this, you should understand that the standard IO operation fwrite() Finally, we still need to use the unbuffered IO operation write, where two calls to fwrite() to write 100 bytes are two system calls to write().
Two summaries:
Data flow path for non-buffered IO operation: data-kernel buffer-disk
Standard IO operation data flow path: data-stream buffer-kernel buffer-disk

(Reference above: https://blog.csdn.net/scottly1/article/details/24186719 )

      3. File descriptor:
      For kernel files, all open files are referenced by file descriptors. The file descriptor is a non-negative integer. When opening an existing file or creating a new file, the kernel returns a file descriptor to the process. When reading or writing a file, use the file descriptor returned by open or creat to identify the file and pass it as a parameter to read or write. By convention, the UNIX system shell uses file descriptor 0 to associate with the standard input of the process, file descriptor 1 to associate with standard output, and file descriptor 2 to associate with standard error. In POSIX-compliant applications, the magic numbers 0, 1, and 2 should be replaced with symbolic constants STDIN_FILENO, STDOUT_FILENO, and STDERR_FILENO. These constants are defined in the header file <unistd.h>.

 

2. Common file I/O function analysis

      1. Open function: open or create a file.
      1) Header file: #include <fcntl.h>
      2) Function prototype: int open(const char *pathname, int oflag, ... /*mode_t mode*/);
      3) Return value: Return file description if successful If an error occurs, return -1;
      4) Parameters:
                    const char *pathname: the name of the file to be opened or created;
                    int oflag: used to describe the multiple options of this function;
                    O_RDONLY read-only open
                    O_WRONLY write only open
                    O_RDWR read, One of the
                    above three constants must be specified and only one of the above three constants must be specified for writing open . The following constants are optional:
                    O_APPEND is appended to the end of the file each time it is written.
                    O_CREAT If this file does not exist, create it. The third parameter mode is required when using this file, which specifies the access rights of the new file.
                    O_EXCL If O_CREAT is also specified and the file already exists, an error will occur. Use this to test whether a file exists, if it does not exist, create this file, which makes both testing and creating an atomic operation
                    O_TRUNC If the file exists, and it is opened for writing only or for reading and writing successfully, its length is truncated to 0;
                    O_NOCTTY If pathname refers to a terminal device, the device is not allocated as the controlling terminal of this process.
                    O_NONBLOCK If pathname refers to a FIFO, a block special device, or a character special file, this option sets the non-blocking mode for this file open operation and subsequent I/O operations.

                    The third parameter mode is only used when creating a new file;

      5)mode:       

        The mode represents the file permission flag and can also be represented by weighted numbers. This group of numbers is called the umask variable, and its type is mode_t, which is an unsigned octal number. The definition method of umask variable is shown in the following table. The umask variable is composed of 3 digits, and each digit of the number represents a type of permission. The authority obtained by the user is the sum of the weighted values. For example, 764 indicates that the owner has read, write, and execute permissions, the group has read and write permissions, and other users have read permissions.

Weighted value

The first 1 Wei

The first 2 Wei

The first 3 Wei

4

Owner owned

Read permission

Group has read permission

Other users have

Have read permission

2

Owner owned

Write permission

Group has write permission

Other users have

Have write permission

1

Owner owned

Execution authority

Group has execution permissions

Other users have

Have execution permission

      mode setting:

      File permissions = given pair of file permissions & local mask (inverted)
      For example: set permissions 0777
      umask and the local mask is 0002

777 ----------------------------Binary 111 111 111
002 ---------------- ------------ Binary 00 000 010 111 111 101
                                                                                               & (bitwise AND)
                                                                              actual authority 111 111 101 

      That is, the actual authority is 0775

     Example of creating a file using the open function: open(pathname, ORDWR | 0_CREAT |O_TRUNC, 0777);

      2. The creat function: Create a new file.
      1) Header file: #include <fcntl.h>
      2) Function prototype: int creat(const char *pathname, mode_t mode);
      3) Return value: If successful, it will return to write only the open file descriptor, if error occurs Return -1;
      Note: creat is to open the created file in write-only mode. If you want to create a temporary file and want to write the file first and then read the file, you must call creat, close and then open. You can use the following method of calling open instead:
      open(pathname, ORDWR | 0_CREAT |O_TRUNC, mode);

      3. lseek function: explicitly set the offset for an open file.
      Each file has a "current file offset" associated with it. It is usually a non-negative integer used to measure the number of bytes counted from the beginning of the file. Generally, read and write operations start from the current file offset and increase the offset by the number of bytes read and written. By system default, when opening a file, unless the O_APPEND option is specified, the offset is set to 0;
      1) Header file: #include <unistd.h>
      2) Function prototype: off_t lseek(int filedes, off_t offset, int whence);
      3) Return value: if successful, return the new file offset, if error, return -1;
      4) Parameters:
      int filedes: file descriptor;
      int whence: if whence is SEEK_SET, then The offset of the file is set to offset bytes from the beginning of the file.
                           If whyce is SEEK_CUR, set the offset of the file to its current value plus offset, which can be positive or negative.
                           If whyce is SEEK_END, set the file offset to the file length plus offset, which can be positive or negative.
      If lseek is successfully executed, it returns the new file offset. For this, the current offset of the open file can be determined in the following way:
      off_t currpos;
      currpos = lseek(fd, 0, SEEK_CUR);

      4. Close function: close an open file.
      1) Header file: #include <unistd.h>
      2) Function prototype: int close(int filedes);
      3) Return value: 0 if successful, -1 if error occurs;
      the process will be released when a file is closed All record locks added to the file.
      When a process terminates, the kernel will automatically close all open files. Many programs make use of this function without explicitly closing open files with close;
      4) Parameters:
      int filedes: file descriptor;

      5. Read function:
      1) Header file: #include <unistd.h>
      2) Function prototype: ssize_t read(int filedes, void *buf, size_t nbytes);
      3) Return value: If successful, return the read byte If it has reached the end of the file, it will return 0, if there is an error, it will return -1;
      there are many situations that can make the actual number of bytes read less than the required number of bytes: when
      reading ordinary files, the required bytes are read The end of the file has been reached before the count. For example, if there are 30 bytes in the file before it reaches the trailing end, and asked to read 100 bytes, the read returns 30, then call read next time, he will return 0 (end of file);
      if from the terminal device When reading, usually read at most one line at a time.
      When reading from the network, the buffer mechanism in the network may cause the return value to be less than the number of bytes required to be read.
      When reading from a pipe or FIFO, if the pipe contains less than the required number of bytes, then read will only return the number of bytes actually available.
      When reading from some record-oriented devices, at most one record is returned at a time.
     4) Parameters:
     int filedes: file descriptor;
     void *buf: used to store the content read from the file;
     size_t nbytes: the number of bytes required to be read;

      6. Write function:
      1) Header file: #include <unistd.h>
      2) Function prototype: ssize_t write(int filedes, const void *buf, size_t nbytes);
      3) Return value: If successful, it returns the written word The number of sections, if there is an error, it returns -1;
      the return value is usually the same as the value of nbytes, otherwise it means an error. The reason for write errors is usually that the disk is full or exceeds the file length limit of a given process.
      For ordinary files, the write operation starts from the current offset of the file. If the O_APPEND option is specified when opening the file, the offset of the file is set at the current end of the file before each write operation. After a successful write, the file offset is increased by the number of bytes actually written.
      4) Parameters:
      int filedes: file descriptor;
      void *buf: used to store the data to be written;
      size_t nbytes: the number of bytes required to be written;

      7. size_t and ssize_t
     1) size_t is defined in stddef.h by some C/C++ standards. This type is sufficient to represent the size of the object. The true type of size_t is related to the operating system. In 32-bit architecture, it is generally defined as:
      typedef unsigned int size_t;
      while in 64-bit architecture, it is defined as:
      typedef unsigned long size_t;
      size_t is 4 bytes on 32-bit architecture. It is 8 bytes on the 64-bit architecture. You need to pay attention to this issue when compiling on different architectures.
      2) ssize_t is a signed integer, which is equivalent to int on a 32-bit machine, and equivalent to long int on a 64-bit machine, 32-bit and 64-bit C data types

      3) 32 and 64-bit C language built-in data types, as shown in the following table:

      

 

3. Application examples of open, read, write, and close functions:

          Read the specified length of bytes from the source file, and write the read data to the destination file.

#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>
#include <errno.h>

#define READ_MAX_SIZE 2048      //要求从源文件中读的字节数
#define WRITE_SIZE 100          //每次调用write时写入的字节数

int copy_info(char *str, char *dstr)
{
    int fp = -1;                      
    int fd = -1;                      
    char *buf = NULL;                  
    char *bufp = NULL;                  
    size_t max_bytes = READ_MAX_SIZE;  //要求读的字节数;max_bytes要做自减操作,必须定义为无符号整型;
	size_t total_read_bytes = 0;       //实际读到的字节数
	size_t total_write_bytes = 0; 	   //实际写入的字节数
    ssize_t read_bytes = 0;            
    ssize_t write_bytes = 0;  
	int count = 0;
	int i = 0;
	
	buf = (char *) malloc(sizeof(char) * (READ_MAX_SIZE + 1));
	memset(buf, '\0', sizeof(char) * (READ_MAX_SIZE + 1));
	
	bufp = buf;
	
	if((str == NULL) || (dstr == NULL))
	{
		printf("input parameter is NULL!\n");
		return -1;

	}
	
    fp = open(str, O_RDONLY);
    if ( 0 > fp )
    {
		printf("[%s]%s %d open %s file failed!\n", __FILE__, __func__, __LINE__, str);
		return -1;
    }

    while ( (0 != max_bytes) && ( read_bytes = read(fp, bufp, max_bytes)) != 0)
    {
        if ( -1 == read_bytes )  
        {
            if ( EINTR == errno )  
            {
                continue;
            }
            else   
            {
				printf("[%s]%s %d Read %s file failed!\n", __FILE__, __func__, __LINE__, str);
				close(fp);
                return -1;
            }
        }

        max_bytes -= read_bytes;
        bufp += read_bytes;
		total_read_bytes += read_bytes;
    }
	errno = 0;
	
	fd = open(dstr, O_WRONLY | O_CREAT |O_TRUNC, 0777);
	if(fd == -1)
	{
		printf("open file %s failed!\n", dstr);
		return -1;
	}
	
	bufp = buf;
	count = total_read_bytes / WRITE_SIZE;
	printf("[%s]%s %d, total_read_bytes = %u ,count = %d\n", __FILE__, __func__, __LINE__, total_read_bytes, count);

	for(i = 0; i < count; i++)
	{
		write_bytes = write(fd, bufp, WRITE_SIZE);
		if(WRITE_SIZE != write_bytes)
		{
			printf("[%s]%s %d write %s file failed!\n", __FILE__, __func__, __LINE__, dstr);
			close(fp);
			close(fd);
			return -1;	
		}
		
		bufp += write_bytes;
		total_write_bytes += write_bytes;
	}

	if(total_read_bytes % WRITE_SIZE != 0)
	{
		write_bytes = write(fd, bufp, total_read_bytes % WRITE_SIZE);
		if(total_read_bytes % WRITE_SIZE != write_bytes)
		{
			printf("[%s]%s %d write %s file failed!\n", __FILE__, __func__, __LINE__, dstr);
			close(fp);
			close(fd);
			return -1;	
		}
		
		bufp += write_bytes;
		total_write_bytes += write_bytes;
		printf("[%s]%s %d remainder = %d, total_write_bytes = %u\n", __FILE__, __func__, __LINE__, write_bytes, total_write_bytes);
	}

	close(fp);
    close(fd);

    return 0;

}


/* 从file1 READ_MAX_SIZE个字节的字符,将其写入写到file2中 */
int main(int argc, char *argv[])
{

	int ret = 0;

	printf("%s %d argc:%d\r\n", __FUNCTION__, __LINE__, argc );

	if((argv[1] == NULL) || (argv[2] == NULL) || (argc < 3))
	{
		printf("input parameter is NULL!\n");
		return -1;

	}

	printf("argv0 = %s\r\n", argv[0]);
	printf("argv1 = %s\r\n", argv[1]);
	printf("argv1 = %s\r\n", argv[2]);


	ret = copy_info( argv[1], argv[2]);
	
	if(ret != 0)
	{
		printf("copy_info error!\n");
	}

	return 0;
}

      1) The source file is a binary file, and the file length is greater than 2048 bytes. The output is as follows:

 

      $ ./read_info test.bin result.bin

      main 120 argc:3

      argv0 = ./read_info

      argv1 = test.bin

      argv1 = result.bin

      [read_info.c]copy_info 73, total_read_bytes = 2048 ,count = 20

      [read_info.c]copy_info 103 remainder = 48, total_write_bytes = 2048

     $

      Make a binary comparison between the source file and the target file to see if it meets the requirements.

      

      2048 bytes of data were copied as required. The first 2048 bytes are exactly the same.

       2) The source file is a text file, and the file length is less than 2048 bytes. The output is as follows:

       $ ./read_info ifconfig ifconfig.bin

      main 120 argc:3

      argv0 = ./read_info

      argv1 = ifconfig

      argv1 = ifconfig.bin

      [read_info.c]copy_info 73, total_read_bytes = 952 ,count = 9

      [read_info.c]copy_info 103 remainder = 52, total_write_bytes = 952

      $

      Compare the source file and the destination file as a text file to see if they meet the requirements.

      It is required to read 2048 bytes of data, but 952 bytes of data are actually read, and the number of bytes actually written to the target file is also 952 bytes. The data read and written are exactly the same.

Four, attention

      Review the definition of read and write functions again:

       1. Read function

       1) Header file: #include <unistd.h>
       2) Function prototype: ssize_t read(int filedes, void *buf, size_t nbytes);
       3) Return value: If successful, return the number of bytes read, if it has reached 0 is returned at the end of the file, and -1 if there is an error;
       there are many situations that can make the actual number of bytes read less than the required number of bytes: When reading a normal file, the number of bytes has been reached before the required number of bytes is read The end of the file. For example, if there are 30 bytes before reaching the end of the file, and 100 bytes are required to be read, read returns 30. The next time read is called, it will return 0 (end of the file);
       4) Parameters:
       int filedes: file descriptor;
       void *buf: used to store the content read from the file;
       size_t nbytes: the number of bytes required to be read;  

       The read function can determine whether the file is over according to the return value of the function. Even if nbytes is greater than the actual length of the filedes file, the data length actually read at the end is still the actual length of the file, and there will be no problem of reading data out of bounds.

       2, write function

       1) Header file: #include <unistd.h>
       2) Function prototype: ssize_t write(int filedes, const void *buf, size_t nbytes);
       3) Return value: If successful, return the number of bytes written, if an error occurs It returns -1;
       its return value is usually the same as the value of nbytes, otherwise it means an error. The reason for write errors is usually that the disk is full or exceeds the file length limit of a given process.
       For ordinary files, the write operation starts from the current offset of the file. If the O_APPEND option is specified when opening the file, the offset of the file is set at the current end of the file before each write operation. After a successful write, the file offset is increased by the number of bytes actually written.
      4) Parameters:
      int filedes: file descriptor;
      void *buf: used to store the data to be written;
      size_t nbytes: the number of bytes required to be written;

      Note the description of the return value of the writr function. The write function cannot determine whether the file has been read based on the return value of the function. If the length nbytes written is greater than the length of buf, the write function will still write nbytes bytes of data to the filedes file. Therefore, when using the write function, pay attention to controlling the length of the written data nbytes to prevent writing data other than Buf into the file.

      In the following example, for the write and fwrite functions, the number of bytes written must be greater than the length of buf.

      

#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>
#include <errno.h>


int main()
{
	int fd = -1;
	FILE *sp = NULL;
	ssize_t write_bytes = 0; 
	char bufp[10] = {0};
	
	#if 1
	fd = open("111", O_WRONLY | O_CREAT |O_TRUNC, 0777);
	if(fd == -1)
	{
		printf("open file 111 failed!\n");
		return -1;
	}
	
	write_bytes = write(fd, bufp, 100);
	printf("[%s]%s %d remainder = %d\n", __FILE__, __func__, __LINE__, write_bytes);
	close(fd);
	
	#else
	sp = fopen("112", "r");
	fwrite(bufp, sizeof(char), 100, sp);
	fclose(sp);
	#endif
	return 0;
}

      No errors were reported during compilation, and no errors were reported during execution, but the output results are viewed in binary mode. The first 0 bytes of data written to the target file 111 by write are the data in buf, and the last 90 data should be after buf Data in storage space. This is obviously problematic.

      

In the above figure, the 112 file is written using the fwrite function. From the binary file, it seems that fwrite is very safe and does not write out-of-bounds characters. However, as long as the written data does not meet expectations, there is a problem, at least in use. The code and results are as follows:

#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>
#include <errno.h>

#define READ_MAX_SIZE 2048
#define WRITE_SIZE 100

int copy_info(char *str, char *dstr)
{
    int fp = -1;                      
    int fd = -1;                      
    char *buf = NULL;                  
    char *bufp = NULL;                  
    size_t max_bytes = READ_MAX_SIZE;  //max_bytes要做自减操作,必须定义为无符号整型
    ssize_t read_bytes = 0;            
    ssize_t write_bytes = 0;  
	int count = 0;
	int i = 0;
	
	buf = (char *) malloc(sizeof(char) * (READ_MAX_SIZE + 1));
	memset(buf, '\0', sizeof(char) * (READ_MAX_SIZE + 1));
	
	bufp = buf;
	
	if((str == NULL) || (dstr == NULL))
	{
		printf("input parameter is NULL!\n");
		return -1;

	}
	
    fp = open(str, O_RDONLY);
    if ( 0 > fp )
    {
		printf("[%s]%s %d open %s file failed!\n", __FILE__, __func__, __LINE__, str);
		return -1;
    }

    while ( (0 != max_bytes) && ( read_bytes = read(fp, bufp, max_bytes)) != 0)
    {
        if ( -1 == read_bytes )  
        {
            if ( EINTR == errno )  
            {
                continue;
            }
            else   
            {
				printf("[%s]%s %d Read %s file failed!\n", __FILE__, __func__, __LINE__, str);
				close(fp);
                return -1;
            }
        }

        max_bytes -= read_bytes;
        bufp += read_bytes;
    }
	errno = 0;
	
	//测试fwrite是否安全。
	FILE *sp = NULL;
	sp = fopen("113", "a");
	fwrite(buf, sizeof(char), 1024, sp);
	close(fp);
    fclose(sp);
	
    return 0;

}


/* 从file1 READ_MAX_SIZE个字节的字符,将其写入写到file2中 */
int main(int argc, char *argv[])
{

	int ret = 0;

	printf("%s %d argc:%d\r\n", __FUNCTION__, __LINE__, argc );

	if((argv[1] == NULL) || (argv[2] == NULL) || (argc < 3))
	{
		printf("input parameter is NULL!\n");
		return -1;

	}

	printf("argv0 = %s\r\n", argv[0]);
	printf("argv1 = %s\r\n", argv[1]);
	printf("argv1 = %s\r\n", argv[2]);


	ret = copy_info( argv[1], argv[2]);
	
	if(ret != 0)
	{
		printf("file_copy error!\n");
	}

	return 0;
}

      It is required to read 2048 bytes of data, and actually read only 952 bytes of data; it is required to write 1024 bytes of bytes actually written to the target file, and 1024 bytes are actually written. Among them, 952 bytes are actually useful data, and the last 72 bytes are initialization data in buf. Although the last 72 bytes of data are not read out of bounds, they are not the data we want. Therefore, pay attention to the length of nbytes when using fwrite.

 

Quoting the following information, thank you all for sharing:

https://blog.csdn.net/scottly1/article/details/24186719

Advanced Programming in UNIX Environment

 

 

 

 

Guess you like

Origin blog.csdn.net/the_wan/article/details/108307674