File IO_File reading and writing (with Linux-5.15.10 kernel source code analysis)

Table of contents

1. What is a file offset?

1.1 Introduction to file offsets

1.2 Key points of file offset

1.3 How file offsets work

2. File offset setting

2.1 lseek function

2.2 lseek kernel source code analysis

3.Write files

3.1 write function

3.2 write kernel source code analysis

4. Read files

4.1 read function

4.2 read kernel source code analysis

5. File reading and writing, file offset setting sample code


1. What is a file offset?

1.1 Introduction to file offsets

Before introducing file offsets, let’s shout a slogan: Only by truly understanding file offsets can you understand file reading and writing.

A file offset is an indicator of the current position in a file that is being read or written.

In Linux, every open file has a file offset that records where in the file the next read or write operation will occur. The file offset is an integer value in bytes, calculated from the beginning of the file.

When a read or write operation is performed, the file offset changes.

  • A read operation starts reading data at the position indicated by the file offset and moves the file offset backward to the position after the read operation ends.
  • The write operation will start writing data from the position indicated by the file offset and move the file offset backward to the position after the write operation ends.
  • By changing the file offset, you can locate a specific location in the file for reading or writing operations.

1.2 Key points of file offset

Regarding file offsets, we need to pay attention to the following points. Only by fully mastering the following points can we read and write files correctly:

  • 1. The file offset corresponds to the f_pos member of the struct file object. This member is shared by the write, read, and lseek functions, which means that all three functions will change the f_pos value.
  • 2. If the O_APPEND flag is set in the open function, it will change the behavior of the write function using f_pos. For details, please refer to the write kernel source code analysis.

1.3 How file offsets work

(1) How file offset works under normal circumstances

 Figure 1-1 How file offset works under normal circumstances

(2) How file offset works when O_APPEND is set

 Figure 1-2  Working principle of file offset when O_APPEND is set

2. File offset setting

2.1 lseek function

#include <sys/types.h>
#include <unistd.h>

off_t lseek(int fd, off_t offset, int whence);

Function introduction: The lseek function is a file operation function in the Linux system, used to change the position of the file read and write pointer. It can move the read and write pointers arbitrarily in the file to achieve random access to the file.

Function parameters:

fd: File descriptor, specifying the file to be operated on.

offset: Offset, specifying the number of bytes to be moved. A positive value means moving towards the end of the file, a negative value means moving towards the beginning of the file.

whence: starting position, specifying the reference position of the offset. It can take the following three values:

  • SEEK_SET: Calculate the offset from the beginning of the file.
  • SEEK_CUR: Calculate the offset from the current read and write pointer position.
  • SEEK_END: ​​Calculate the offset from the end of the file.

lseek parameter analysis:

Figure 2-1 lseek parameter analysis

Function return value:

Success: Returns the new read and write pointer position.

Failure: Returns -1 and sets errno.

2.2 lseek kernel source code analysis

 Figure 2-2 lseek kernel source code analysis

The main process of the lseek kernel source code is shown in Figure 2-2. The main job of the lseek function is to update the struct file object member f_ops. Everything in Linux is a file. The specific implementation of the lseek function corresponding to different file types will be different.

After calling the main processes of lseek, write, and read, f.file->f_pos = pos must be executed to update the value of f_pos.

3.Write files

3.1 write function

#include <unistd.h>

ssize_t write(int fd, const void *buf, size_t count);

Function introduction: Used to write data to a file descriptor or device.

Function parameters:

fd: File descriptor, the identifier of the file or device to be written. Usually the file descriptor returned after the open function opens the file or device is used as a parameter.

buf: Pointer to the buffer to which data is to be written. The data is written into the buffer at the location specified by the file descriptor.

count: The number of bytes to be written, that is, the length of data written from the buffer.

Function return value:

Success: Returns the number of bytes actually written.

Failure: Returns -1 and sets errno.

3.2 write kernel source code analysis

 Figure 3-1 Analysis of write function kernel source code

The main process of the write function kernel source code is shown in Figure 3-1. The main work of the write function is to write data to a file and update the struct file object member f_ops. The write function implementation corresponding to different file types will be different, and specific analysis is required.

An important point of the write function is that when the open function sets the O_APPEND flag, every time write writes data, it starts from the end of the queue. The implementation of this feature is that write will not start writing the file at the position specified by the struct file object member f_pos, but will start writing the file again. Calculate pos (set to the actual size of the file) and start writing the file using the position specified by pos. According to the kernel source code, we can clearly understand the working principle of the O_APPEND flag.

4. Read files

4.1 read function

#include <unistd.h>

ssize_t read(int fd, void *buf, size_t count);

Function introduction: Used to read data from file descriptors.

Function parameters:

fd: File descriptor, the identifier of the file or device to be read. Usually the file descriptor returned after the open function opens the file or device is used as a parameter.

buf: Pointer to the buffer that stores the read data. Data will be read into this buffer from the location specified by the file descriptor.

count: The number of bytes to read, that is, the length of data read from the file.

Function return value:

Success: Returns the number of bytes actually read.

Failure: Returns -1 and sets errno.

4.2 read kernel source code analysis

 Figure 4-1 Read function kernel source code analysis

The main process of the read function kernel source code is shown in Figure 4-1. The main work of the read function is to read from the file and update the struct file object member f_ops. The implementation of the read function corresponding to different file types will be different, and specific analysis is required.

5. File reading and writing, file offset setting sample code

This example simulates the processes in Figure 1-1 and Figure 1-2

#include <sys/types.h>
#include <sys/stat.h>
#include <stdbool.h>
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>

#define TEST_FILE "/tmp/test.txt"
#define BUF_SIZE (256)
#define READ_BUF_SIZE (2048)

void print_pos(int fd) {
    int pos = lseek(fd, 0, SEEK_CUR);
    printf("cur pos:%d\n", pos);
}

int write_len_data(int fd, unsigned char len, char ch) {
    unsigned char sbuf[BUF_SIZE] = {0};
    for (unsigned char i = 0; i < len; i++) {
        sbuf[i] = ch;
    }

    int ret = write(fd, sbuf, len);
    if (ret == -1) {
        perror("write error");
        return -1;
    }
    return 0;
}

int read_len_data(int fd, unsigned int len) {
    if (len > READ_BUF_SIZE) return -1;
    char rbuf[READ_BUF_SIZE] = {0};
    return read(fd, rbuf, len);
}


int fpos_test(bool append) {
    int flags = 0;
    if (append) {
        flags = O_RDWR | O_CREAT | O_TRUNC | O_APPEND;
    } else {
        flags = O_RDWR | O_CREAT | O_TRUNC;
    }
    int fd = open(TEST_FILE, flags, 0777);
    if (fd == -1) {
        perror("open error");
        return -1;
    }

    write_len_data(fd, 100, 'a');
    print_pos(fd);
    lseek(fd, 10, SEEK_SET);
    read_len_data(fd, 40);
    print_pos(fd);
    write_len_data(fd, 20, 'b');
    print_pos(fd);

    close(fd);
    return 0;
}

int main(int argc, char *argv[]) {
    fpos_test(false);
    return 0;
}

Guess you like

Origin blog.csdn.net/weixin_28673511/article/details/131727965