Manipulating files in C language

Files on disk are files.
But in program design, we generally talk about two types of files: program files and data files (classified from the perspective of file functions).

Program files: including source program files (suffix .c), target files (windows environment suffix .obj), executable programs (windows environment suffix .exe).
Data file: The content of the file is not necessarily the program, but the data read and written when the program is running, such as the file from which the program needs to read data, or the file that outputs the content.
Sometimes we will output the information to the disk, and then read the data from the disk into the memory for use when needed. Here we are dealing with
the files on the disk.

file name

A file must have a unique file identifier for user identification and reference.
The file name consists of 3 parts: file path + file name trunk + file suffix
For example: c:\code\test.txt
For convenience, the file identifier is often called the file name.

file opening and closing

Next we will try to create and open files using some file operations in c language.

file pointer:

In the cache file system, the key concept is "file type pointer", referred to as " file pointer ".

Each used file has opened up a corresponding file information area in the memory, which is used to store the relevant information of the file (such as the name of the file, the status of the file and the current location of the file, etc.). This information is stored in a structure variable. The structure type is declared by the system, named FILE.

struct _iobuf {
    char *_ptr;
    int _cnt;
    char *_base;
    int _flag;
    int _file;
    int _charbuf;
    int _bufsiz;
    char *_tmpfname;
};
    typedef struct _iobuf FILE;
    FILE* pf;//文件指针变量

Whenever we open a file, the system will open a file information area in the memory. The entire file information area is a structure, which records the information of this file. The information inside this structure is actually very similar to the " ID card” basic information will be here.

In order to be able to access this file, since we have a file pointer, we can borrow a pointer to access it.

FILE* pf

Opening and closing of files:

When using a program to open a file, a file pointer of type FILE* will be returned , which points to the file

There are the following two functions, which can control the opening and closing of files.

Files should be opened before reading and writing , and should be closed after use

ANSIC stipulates that the fopen function is used to open the file, and the fclose function is used to close the file

//打开文件
FILE * fopen ( const char * filename, const char * mode );

//关闭文件
int fclose ( FILE * stream );

We can see that the first parameter of the fopen function is the file name, and the second parameter is the opening method, which can be opened in the following ways.

The opening modes of fopen are different, and there are many kinds. Please don’t forget to add double quotes when using it. The return value of fopen is a structure pointer of type FILE*. Because the opening of the file is likely to fail, remember to check whether the return value is a null pointer.

Closing the file is the same as malloc, the file is closed, but this pointer needs to be empty, otherwise it is a wild pointer.

Let's try W to try the effect.

When opening a file in write-only mode, if the file name does not exist, a new file will be created with the name used.

#include <stdio.h>
int main()
{
    FILE* pFile;
    //打开文件
    pFile = fopen("myfile.txt", "w");
    //文件操作
    if (pFile != NULL)
    {
        fputs("fopen example", pFile);
        //关闭文件
        fclose(pFile);
    }
    return 0;
}

After executing the above code, we found a txt file with the same name in our project folder

Sequential reading and writing of files:

The sequential read and write functions here are actually more like API manuals, which only need to be called when using the corresponding required functions.

The read characters are put into str, num is the number of characters read, and stream is the file pointer to be read.

Enter the specified data into a file.

Remember to use the correct format.

Here s is a structure, and the variables inside the structure are put into the file.

Output the data of the specified file.

The usage of fscanf and fprintf here is not the same as when we use the output and printing logic of C language

flow:

Why do we not need to fopen first like opening a file when using the printf and scanf functions?

The concept of data flow is mentioned below. When the C program starts, the following three types of flows will be generated by default.

It is equivalent to a pre-opened channel, in which the input stream and output stream have been opened by default, so parintf and scanf can be used directly.

So what is a stream?

Since there are still a lot of peripherals on a computer, such as audio, screen, keyboard, etc., when our program originally tried to control these hardware with software, the device communication protocol between each manufacturer is actually not very good. Similarly, as a human being, it is impossible to fully grasp the communication and control methods of these devices, so streams are generated, and the C language highly encapsulates this magical thing to help us control the hardware. In short, the flow is more like a water pipe that cannot see the inner workings. We only need to pour instructions into the water pipe, and the rest of the flow will help us communicate with these hardware.

Usually we call the information input (stdin) received by the object from the outside world as the standard input stream (keyboard), and the corresponding output (stdout) information from the object is the standard output stream (screen) and the standard error stream (stderr). As long as the C language program is running, the three streams (FILE*) are enabled by default. And when we want to read and write files, we open a file stream

Flow is more like a courier, all the communication is handed over to it, and we only need to take care of sending and receiving.

The fopen operation before opening the file is like the Jingdong brother we need to contact before sending the express. Only the little brother knows how to hand over the file control command to our target file, so we need to fopen it first.

These two functions mean to read data in binary and write data in binary.

When we use fwirte to write data, we open the notepad, and read out garbled characters, because the notepad itself reads data in character type, but in fact there is no error in the internal data, but the reading method is wrong. The data read in memory with fread is normal.

Now that we've seen other printfs and scanfs, let's introduce some other types.

sscanf and sprintf

int sscanf( const char *buffer, const char *format [, argument ] ... );
//buffer指从哪个字符串中提取，format指放到哪个数据中去

int sprintf( char *buffer, const char *format [, argument] ... );
//buffer是存储（输出）的位置

The meaning of this formatting is more like packaging. sscanf takes the string in hand, packs it into a structure type, and sprintf disassembles the packaged structure and replaces it with a string.

The application scenario of sscanf: the data (string) acquired by the front-end webpage is delivered to the back-end, and the back-end C language is encapsulated into a structure. How to convert a string into a structure requires the use of sscanf, and the back-end transfers to the front-end. reason

Random reading and writing of files

In the file operation function, every time fgets is called to get a character in the file, it will get the current single character, then return it, and then move down once, so every time you move forward to traverse, you must write a return Value to receive it, this is very troublesome.

Therefore, if we want to control the file pointer, we need to use the following three functions:

fseek

Position the file pointer based on the position and offset of the file pointer

int fseek ( FILE * stream, long int offset, int origin );

stream is the pointer to the file to be opened, the offset in this function is the starting position, and the setting of the starting position in the official document is required, that is, origin

SEEK_SET: Set the position to the beginning of the current file

SEEK_SETCUR: set the position to the position of the current file pointer

SEEK_SETEND: set the position to the end of the current file

Change the position of the desired character through the offset of the pointer, and when calculating the offset, the starting position can be replaced

offset is the offset, similar to the number axis, it depends on the position specified during the setting, and the offset can be calculated by considering the position at the setting position as the zero point of the number axis. For example, when we face a string of strings 12345678 and want to directly When reading its 4, assuming that the current setting position is SET, then the offset is 4, and when the setting position is END, taking 4 is -5.

Note that the string in the file does not have the concept of \0, so you don't need to pay attention to this problem .

#include <stdio.h>
int main()
{
	FILE* pFile;
	pFile = fopen("example.txt", "wb");
	fputs("This is an apple.", pFile);
	fseek(pFile, 9, SEEK_SET);
	fputs(" sam", pFile);
	fclose(pFile);
	return 0;
}

ftell

Returns the offset of the file pointer relative to the starting position

long int ftell ( FILE * stream );

Counting the offset is still pretty stupid, so you can use the ftell function to calculate the offset between the starting position and the current pointer position.

The way of counting is also the same, but there will be no negative numbers.

int main()
{
	FILE* pFile;
	long size;
	pFile = fopen("myfile.txt", "rb");
	if (pFile == NULL) perror("Error opening file");
	else
	{
		fseek(pFile, 0, SEEK_END); // non-portable
		size = ftell(pFile);
		fclose(pFile);
		printf("Size of myfile.txt: %ld bytes.\n", size);
	}
	return 0;
}

rewind

Return the position of the file pointer to the beginning of the file

void rewind ( FILE * stream );

There is nothing to say about this, just return to the starting position and reset.

Borrowing the above three functions can control the file pointer very well.

text files and binary files

Depending on how the data is organized, data files are called text files or binary files.
Data is stored in binary form in memory , and if it is output to external storage without conversion, it is a binary file.
If it is required to store in the form of ASCII code on the external storage, it needs to be converted before storage. A file stored in the form of ASCII characters is a text file.
How is a piece of data stored in memory?
Characters are all stored in ASCII form , and numeric data can be stored in either ASCII or binary form.
If there is an integer 10000, if it is output to the disk in the form of ASCII code, it will occupy 5 bytes (one byte for each character) on the disk, and if it is output in binary form, it will only occupy 4 bytes on the disk (VS2013 test ).

have a test:

int main()
{
	int a = 10000;
	FILE* pf = fopen("test.txt", "wb");
	fwrite(&a, 4, 1, pf);//二进制的形式写到文件中

	fclose(pf);
	pf = NULL;
	return 0;
}

We add this file to our compiler, and then read and open it in binary mode.

Judgment of the end of file reading

misused feof:

The return value of feof cannot be used to judge whether the reading of the file is over, but to determine what is the reason when the reading of the file ends, whether the reading fails or the end is read

And whether the reading of our normal segment file is finished or not is to use the following two methods:

1. Whether the reading of the text file is finished, judge whether the return value is EOF ( fgetc ), or NULL ( fgets )
For example:
fgetc judges whether it is EOF.
fgets judges whether the return value is NULL

2. Judging the end of reading the binary file, and judging whether the return value is less than the actual number to be read.
For example:
fread judges whether the return value is less than the actual number to be read

file buffer

The ANSIC standard uses the " buffer file system " to process data files. The so-called buffer file system means that the system automatically creates a "file buffer" in the memory for each file being used in the program . Data output from memory to disk will be sent to the buffer in memory first, and then sent to disk together after the buffer is filled. If data is read from the disk to the computer, the data read from the disk file is input to the memory buffer (full of the buffer), and then the data is sent from the buffer to the program data area (program variables, etc.) one by one. The size of the buffer is determined by the C compilation system.

#include <windows.h>
int main()
{
	FILE* pf = fopen("test.txt", "w");
	fputs("abcdef", pf);//先将代码放在输出缓冲区
 
	printf("睡眠10秒-已经写数据了，打开test.txt文件，发现文件没有内容\n");
	Sleep(20000);//睡眠10秒
 
	printf("刷新缓冲区\n");
	fflush(pf);//主动刷新缓冲区时，将输出缓冲区的数据写到文件（磁盘）
	//注：fflush 在高版本的VS上不能使用了
	printf("再睡眠10秒-此时，再次打开test.txt文件，文件有内容了\n");
	Sleep(10000);//让我们知道是fflush刷新了缓冲区而不是fclose
 
	fclose(pf);
	//注：fclose在关闭文件的时候，也会刷新缓冲区
	pf = NULL;
	return 0;
}

Since fflush can no longer be used on higher versions of VS, the conclusion is directly stated here.

We first put the data into the buffer with fput, and then go to sleep directly. We open the file during sleep and find that there is no data. In fact, it is not that fput is not effective but the data is still in the buffer at this time.

Then we refresh the buffer, which is equivalent to flushing the data in the buffer and putting it into the file. At this time, we sleep again, and we open the file and find that the data is stored in the file.

Its operational relationship is as shown in the figure above, which is equivalent to sending the data to the destination after each buffer is filled.

Because of the existence of this mechanism, it must be closed every time the file is opened and used , otherwise the contents in the buffer cannot be loaded, which may cause some problems.

This is the end, thanks for reading! Hope to help you a little bit!