[C language] How much do you know about file operations?

We know that when our program starts running, the data is stored in the memory . Since the data in the memory is volatile , the data in the memory of the program will be lost when we close the program. But often we need to save data in many programs, such as address book, information management system, etc. We need to store the entered data and read the data when using it . How should we do this?

Our general methods for making data persistent are: storing data in disk files , storing data in databases, and so on.

In this issue, we introduce how to store data in files. Using files, we can store data directly in the hard disk of our computer to achieve data persistence.

2. What is a file

2.1 What is a file

The file must be familiar to everyone. On our computer, the files stored on the C drive, D drive and other hard drives are called files. It's just that in our programming, we generally talk about two types of files: program files and data files . (classified by file function)

2.2 Program files

Including source files written by ourselves (suffix .c, .cpp, etc.), target files (windows environment suffix .obj), executable files (windows environment suffix .exe)

2.3 Data files

The content of the file is not necessarily the program, but the data that needs to be read and written when the program is running , such as the file from which the program needs to read data, or the file that outputs the content.

In this installment we'll be discussing data files .

In the past, we used the terminal as the object for data input and output , that is, read data from the keyboard of the terminal and output the data to the display of the terminal. In fact, keyboards and monitors can also be used as data files.

In some cases, we will output the information to the disk , and then read the data from the disk into the memory for use when needed. Here we are dealing with files on the disk.

2.4 File name

A file must have a unique file identifier for user identification and reference.

The file identification consists of three parts: file path + file name trunk + file suffix

For example: c:\code\test.txt , this is a file identifier. Among them, c:\code\ is the file path, test is the main body of the file name, and txt is the file suffix, which means it is a text file.

Aside: For convenience, the file ID is often referred to as the filename

3. File opening and closing

3.1 File Pointer

Each opened and used file will open up a corresponding file information area in the memory to store the corresponding information of the file (such as the name of the file, the status of the file and the current location of the file, etc.). This information is stored in a structure variable . The structure type is declared by the system and named FILE . A file pointer is a pointer of type FILE* .

For example, in the stdio.h header file provided by the VS2013 compilation environment, there are the following FILE type declarations:

struct _iobuf 
{
    char *_ptr;
    int   _cnt;
    char *_base;
    int   _flag;
    int   _file;
    int   _charbuf;
    int   _bufsiz;
    char *_tmpfname;
};
//类型重命名
typedef struct _iobuf FILE;

Note: The content contained in the FILE type may be different in different C compilers, but the same.

Whenever a file is opened, the system will automatically create a variable of the FILE structure according to the situation of the file, and fill in the information, and the user does not need to care about the details. We generally maintain this structure variable (file information area) through the FILE* file pointer . A file pointer variable can be created as follows:

FILE* fp; //文件指针变量

The above pf is a pointer variable pointing to FILE type data , which can make the pf pointer point to the file information area (struct variable of FILE type). The file can be accessed through the contents of the file information area. In other words, the file associated with it can be found through the pf file pointer variable. For example:

3.2 Opening and closing of files

The file must be opened before reading and writing , and the file must be closed after use .

In the program we write, when we open a file, a pointer variable of type FILE* will be returned to point to the file, which is equivalent to establishing the relationship between the pointer and the file.

In the ANSIC standard, the fopen function is used to open the file and the fclose function is used to close the file .

The opening method is shown in the table below:

file usage	meaning	If the specified file does not exist
"r" (read-only)	To enter data, open an existing text file	go wrong
"w" (write only)	To output data, open a text file	create a new file
“a” (append)	Append data to the end of the text file	create a new file
"rb" (read-only)	To enter data, open an existing binary file	go wrong
"wb" (write only)	To output data, open a binary file	create a new file
“ab” (append)	Append data to the end of a binary file	go wrong
"r+" (read and write)	Open a text file for reading and writing	go wrong
"w+" (read and write)	Create a new file for reading and writing	create a new file
"a+" (read and write)	Open a file for reading and writing at the end of the file	create a new file
"rb+" (read and write)	Open a binary file for reading and writing	go wrong
"wb+" (read and write)	Create a new binary file for reading and writing	create a new file
"ab+" (read and write)	Open a binary file for reading and writing at the end of the file	create a new file

For example, we can open file.txt for writing:

#include<stdio.h>
int main()
{
	FILE* fp;
	//打开文件
	fp = fopen("file.txt", "w");
	//打开失败
	if (fp == NULL)
	{
		printf("open file false\n");
		exit(-1);
	}
	//打开成功
	//输出到文件
	fputs("fopen example", fp);

	//关闭文件
	fclose(fp);
	fp = NULL;
	return 0;
}

After we compile the code and run it, we can find the file.txt file in the current project directory. The content of the file is what we output:

4. Sequential reading and writing of files

4.1 Function Summary

The C language provides us with many functions to implement sequential read and write operations on files, as shown in the following table:

Function name	Function	apply to
fgetc	character input function	all input streams
fputc	character output function	all output streams
fgets	text line input function	all input streams
fputs	text line output function	all output streams
fscanf	format input function	all input streams
fprintf	format output function	all input streams
fread	binary input function	document
fwrite	binary output function	document

Some people may think here: There are so many functions, the brain is not enough. Don't forget, as a programmer, you need to learn how to use tools . Here is a recommended website: cplusplus.com - The C++ Resources Network . We can use this website to search for some functions we need, for example:

We can get a lot of information we need from it. Practice makes perfect , when we use it more often, we will remember it naturally. Even if you forget, just look it up.

4.2 printf/fprintf/sprintf

Let's compare this set of very similar functions:

We are already familiar with printf() , which formats and outputs data to the standard output device , that is, to our display:

2. The prototype of fprintf() is as follows:

The difference between it and the printf function is that there is an additional parameter of the file pointer type . The fprintf() function can format and output data to a file, as follows:

struct Student
{
	char name[20];
	char sex[5];
	int age;
};
int main()
{
	FILE* fp;
	struct Student s = { "张三","男",16 };
	//打开文件
	fp = fopen("file.txt", "w");
	//打开失败
	if (fp == NULL)
	{
		printf("open file false\n");
		exit(-1);
	}
	//打开成功
	//格式化输出到文件
	fprintf(fp, "%s %s %d", s.name, s.sex, s.age);

	//关闭文件
	fclose(fp);
	fp = NULL;
	return 0;
}

The fprintf() function can not only output the data formatted to the file, but also output the data to our display like printf , just specify the first parameter as stdout (standard output stream) , as follows:

struct Student
{
	char name[20];
	char sex[5];
	int age;
};
int main()
{
	struct Student s = { "张三","男",16 };
	//格式化输出到显示器(标准输出设备)
	fprintf(stdout, "%s %s %d", s.name, s.sex, s.age);
	return 0;
}

3. The prototype of the sprintf() function is as follows:

Similarly, sprintf has one more parameter of type char* than printf . We can guess: fprintf() can format and output data to a file, so does sprintf format and output data to a string ? Congratulations! You guessed it right! <Scattering Flowers> <Scattering Flowers>

The first parameter is to point to the target string, and its usage is basically the same as that of printf, which formats and prints the data into the target string, as follows:

int main()
{
	int a = 100;
	char c = 'a';
	char dest[20] = { 0 };
	//输出到字符串dest中
	sprintf(dest, "%d%c", a, c);
	//打印字符串dest
	printf("%s", dest);
	return 0;
}

In some scenarios, the sprintf() function may be useful. For example, we need to convert data such as integers into strings , which can be easily done by using the sprintf function.

4.3 scanf/fscanf/sscanf

Corresponding to printf, scanf also has three groups of very similar functions:

1. The scanf() function is to format the input data from the standard input device into the memory , that is, format the input data from the keyboard to the memory:

2. The prototype of the fscanf() function is as follows:

The difference between it and the scanf function is that there is an additional parameter of the file pointer type . The fscanf function is to format the data content of the file into the memory, as follows:

struct Student
{
	char name[20];
	char sex[5];
	int age;
};
int main()
{

	FILE* fp;
	struct Student s;
	//打开文件,以读的形式
	fp = fopen("file.txt", "r");
	//打开失败
	if (fp == NULL)
	{
		printf("open file false\n");
		exit(-1);
	}
	//打开成功
	//将文件的内容格式化输入到结构体s中
	fscanf(fp, "%s %s %d", s.name, s.sex, &s.age);
	//显示结构体s内容
	printf("%s %s %d", s.name, s.sex, s.age);
	//关闭文件
	fclose(fp);
	fp = NULL;
	return 0;
}

In the same way as fprintf, the fscanf function can not only format the contents of the file into the memory, but also input data from the keyboard into the memory like the scanf function , just specify the first parameter as stdin (standard input stream) that is Yes, as follows:

struct Student
{
	char name[20];
	char sex[5];
	int age;
};
int main()
{
	struct Student s;
	//从键盘格式化输入数据(标准输入设备)
	fscanf(stdin, "%s %s %d", s.name, s.sex, &s.age);
	//显示结构体s内容
	printf("%s %s %d", s.name, s.sex, s.age);
	return 0;
}

3. The function prototype of the sscanf() function is as follows:

Its function is to format and input the content of the string into the corresponding space of the memory . The first parameter s points to the address of the first element of a string , and this string is our data source . as follows:

struct Student
{
	char name[20];
	char sex[5];
	int age;
};
int main()
{
	char src[20] = "wangwu 男 19";
	struct Student s;
	//将字符串src内容格式化输入到结构体s中
	sscanf(src, "%s %s %d",s.name ,s.sex,&s.age);

	//打印结构体s内容
	printf("%s %s %d", s.name, s.sex, s.age);
	return 0;
}

Using the sprintf() function, we can convert data such as integers into strings, and using the sscanf() function, we can convert strings into data such as integers.

5. Random reading and writing of files

5.1 Introduction

The previous functions read and write files sequentially , and the file pointer will move to the next position after each read and write operation. Next, we will introduce some functions related to random reading and writing of files . Through these functions, we can operate files more flexibly, and make the file pointer jump repeatedly (qwq).

5.2 fseek()

Its function is to locate the file pointer according to the position and offset of the file pointer .

For the starting position of the third parameter, we have three positions to choose from, which are defined by three macros , as follows:

macro name	meaning
SEEK_SET	start of file
SEEK_CUR	The current position of the file pointer
SEEK_END	end of file

Here is an example of usage:

#include <stdio.h>
int main()
{
	FILE* pFile;
	pFile = fopen("example.txt", "wb");
	//输出到文件
	fputs("This is an apple.", pFile);
	//将文件指针偏移到文件头向后9字节处，即移动到字符n处
	fseek(pFile, 9, SEEK_SET);
    //从当前文件指针的位置开始写入，覆盖
	fputs(" sam", pFile);
	fclose(pFile);
	return 0;
}

Notes on using the fseek() function:

For binary files opened in binary mode : the new position of the file pointer can be defined by the above three reference start positions plus an offset .

For a text file opened in text mode : the offset of the file pointer should be 0 or the value returned by the previous call to the ftell() function, and the starting position should be SEEK_SET . Using other parameters is library- and system-specific, i.e. not portable . Let's introduce the ftell() function.

5.3 ftell()

Its function is: return the offset of the file pointer relative to the starting position .

The function has only one parameter, which is our file pointer . When the function call is successful, it returns the offset of the current file pointer relative to the beginning of the file ; if the function call fails, it returns -1.

Here is an example of usage:

#include <stdio.h>
int main()
{
	FILE* pFile;
	pFile = fopen("example.txt", "wb");
	//输出到文件
	fputs("This is an apple.", pFile);
	printf("当前文件指针偏移量为:%ld\n", ftell(pFile));
	//将文件指针偏移到文件头向后9字节处，即移动到字符n处
	fseek(pFile, 9, SEEK_SET);
	printf("更新后文件指针偏移量为:%ld\n", ftell(pFile));
	//从当前文件指针的位置开始写入，覆盖
	fputs(" sam", pFile);
	fclose(pFile);
	return 0;
}

5.4 rewind()

Its function is: let the position of the file pointer return to the starting position of the file

Here is an example of usage:

#include <stdio.h>
int main()
{
    int n;
    FILE* pFile;
    char buffer[27];
    //以读写的形式打开一个文本文件
    pFile = fopen("myfile.txt", "w+");
    //将字母‘A’-‘Z’写入文件
    for (n = 'A'; n <= 'Z'; n++)
    {
        fputc(n, pFile);
    }
    //文件指针回到开头
    rewind(pFile);
    //从文件开头读26个字节到buffer数组中，即把‘A’-‘Z’读入数组
    fread(buffer, 1, 26, pFile);
    //关闭文件
    fclose(pFile);
    buffer[26] = '\0';
    //输出buffer数组到显示器
    puts(buffer);
    return 0;
}

6. Text files and binary files

Depending on how the data is organized, data files are also known as text files and binary files . We also mentioned these two kinds of files above, so what are text files and binary files?

We know that data is stored in binary form in memory . If we directly output the data in memory to a file without conversion, then this file is a binary file . If we look at it directly, it is incomprehensible.

If it is required to store in the form of ASCII code in the file , it needs to be converted before storing. A file stored in the form of ASCII characters is a text file . This is what we can understand.

So, how is the data stored in the file?

Characters are all stored in the form of ASCII code , and numerical data can be stored in either ASCII or binary form .

For example: if the integer 10000 is output to a file in ASCII code , it will occupy 5 bytes in the file (one byte for each character), but in binary form , it will only occupy 4 bytes on the disk (one integer type size).

We can test it out:

//测试代码
#include <stdio.h>
int main()
{
	int a = 10000;
	FILE* pf1 = fopen("test1.txt", "wb");
	FILE* pf2 = fopen("test2.txt", "w");
	fwrite(&a, 4, 1, pf1);//二进制的形式写到文件1中
	fprintf(pf2, "%d", a);//以ASCII码的形式写到文件中
	fclose(pf1);
	fclose(pf2);
	pf1 = NULL;
	pf2 = NULL;
	return 0;
}

The results are as follows, in line with our expectations:

If you are still not sure, we can use the binary editor in VS to view the binary information of the two files:

In summary, we found that the binary content of the two files is exactly the same as what we analyzed, which proves that our previous conclusion is correct.

7. Judgment of the end of file reading

7.1 Misunderstood feof()

Many people use the feof() function to judge whether a file is over, which is actually a wrong approach.

The function of the feof() function is not to judge whether the file is over , but when the file reading ends, it is judged whether it ends because of a failure to read, or it ends when encountering EOF at the end of the file . Returns 0 if the read fails and ends, and returns a non-zero value when the end of the file is encountered.

Corresponding to it is the ferror() function , which is also used to determine why the file has been read. Returns a non-zero value when the read fails and ends, and returns 0 when the end of the file is encountered.

So, how do these two functions determine the reason for the end of the file?

In fact, when we read a file, if the end of the file is encountered , an EOF indicator will be set ; and if an error ends , an error indicator will be set . The feof() and ferror() functions determine the reason for the end of the file by checking whether the corresponding indicator is set. as follows:

7.2 How to determine the end of file reading

This has a lot to do with what function we use to read the file and what type of file we read. Different functions may have different ways of judging the end , for example:

To judge whether the text file has been read, it should be judged whether the return value is EOF (fgetc function) or whether the return value is NULL (fges function) .

To judge whether the binary file has been read, it should be judged whether the return value of fread() is less than the actual number to be read .

7.3 Example of use

For text files , we can judge like this:

#include <stdio.h>
#include <stdlib.h>
int main()
{
    int c; // 注意：这里使用int而非char类型是为了判断是否为EOF(-1)
    FILE* fp = fopen("test3.txt", "r");
    //文件打开失败
    if (!fp) 
    {
        perror("File opening failed");
        return EXIT_FAILURE;
    }
    //文件打开成功
    //fgetc 当读取失败结束或者遇到文件尾结束的时候，都会返回EOF
    while ((c = fgetc(fp)) != EOF) // 标准C I/O读取文件循环
    {
        putchar(c);
    }
    //文件读取结束

    //判断是什么原因结束的
    if (ferror(fp))
        puts("\nI/O error when reading");
    else if (feof(fp))
        puts("\nEnd of file reached successfully");
    fclose(fp);
    fp=NULL;
}

For binary files , we can judge like this:

#include <stdio.h>
#define SIZE 5
int main(void)
{
    double a[SIZE] = { 1.0,2.0,3.0,4.0,5.0 };
    FILE* fp = fopen("test.bin", "wb"); // 用二进制模式写入
    fwrite(a, sizeof(a[0]), SIZE, fp); // 写 double 的数组
    fclose(fp);
    double b[SIZE];
    fp = fopen("test.bin", "rb");// 用二进制模式读出
    size_t ret_code = fread(b, sizeof * b, SIZE, fp); // 读 double 的数组
    if (ret_code == SIZE) //全部读入成功
    {
        puts("Array read successfully, contents: ");
        for (int n = 0; n < SIZE; ++n)
        {
            printf("%.1f ", b[n]);
        }
        putchar('\n');
    }
    else // 中途遇到文件读取结束，判断结束原因
    { 
        if (feof(fp))
            printf("Error reading test.bin: unexpected end of file\n");
        else if (ferror(fp)) 
            perror("Error reading test.bin");
    }
    fclose(fp);
}

Eight. File buffer

8.1 Why there is a file buffer

We know that there is a big difference in the speed of the major components of the computer , and the CPU at the top is far from the execution speed of our IO devices. And assuming that when we perform file operations, we save the data every time we read the data. When we perform multiple operations, we will inevitably need to perform multiple IO operations , which will take up a lot of CPU time . In order to alleviate the speed mismatch between high-speed CPU and low-speed IO devices and improve CPU efficiency , we introduce the concept of file buffer .

This is like when your mother is cooking and finds that there is not enough salt, she asks you to go down to the convenience store downstairs to buy a packet of salt; after you buy it, she finds that there is not enough oil, and then asks you to go down and bring up a bucket of oil; You brought up a bucket of oil again out of breath, and your mother said that there are guests at home today, and the food may not be enough, so I asked you to go down and buy some big fish and meat. At this time, you must have big doubts in your little head: Can you finish what you need in one go? ! !

8.2 What is a file buffer

The CPU will also complain: Can you stop this torture, first put the data that needs to be operated in one place, and let me take some time to do IO together?

In order to meet the long-cherished wish of the CPU, for each file in use, we will open up an additional space in the memory . Whether it is outputting data from the memory to the disk or outputting data from the disk to the memory , it will be sent to this space first. When this After the block space is full, it is sent to the disk or program data area (program variable) together . This space is the so-called file cache area . The size of the buffer is determined by the C compilation system .

Seeing the appearance of the file buffer, the CPU looked up to the sky and smiled: Haha, I am finally liberated, I am free, so I walked confidently to other places to shine.

8.3 Feel the existence of the file buffer

We can verify the existence of the file buffer through the following program:

#include <stdio.h>
#include <windows.h>
//windows VS2022测试环境
int main()
{
	FILE* pf = fopen("test4.txt", "w");
	fputs("abcdef", pf);//先将代码放在输出缓冲区
	printf("睡眠10秒-已经写数据了，打开test4.txt文件，发现文件没有内容\n");
	Sleep(10000);
	printf("刷新缓冲区\n");
	fflush(pf);//刷新缓冲区时，才将输出缓冲区的数据写到文件（磁盘）
	//注：fflush 在高版本的VS上不能使用了
	printf("再睡眠10秒-此时，再次打开test4.txt文件，文件有内容了\n");
	Sleep(10000);
	fclose(pf);
	//注：fclose在关闭文件的时候，也会刷新缓冲区
	pf = NULL;
	return 0;
}

Run the program at this time and go to sleep. Due to the existence of the file buffer , the data we write is actually still in the file buffer. At this time, if we open the file, no information will be displayed :

After waiting for 10s, call the fflush() function to refresh the buffer . At this time, the data in the buffer is sent to the disk file. This is when we open the file, and the data we write will be displayed :

Here, we can also draw a conclusion :

Because of the existence of the file buffer, when the C language operates the file, it needs to refresh the buffer or close the file at the end of the file operation , so that our data will be sent from the buffer to the corresponding location. Failure to do so can cause problems reading and writing files.