Detailed explanation of the use of fwrite and write under linux multithreading

For file operations under Linux, some people like to use the file stream operation of the C library, and some people like to use Linux's native system calls. Generally speaking, the file operation of the C library will be more efficient, because the C library itself does the file cache processing. Today, I mainly study fwrite and write under multi-threading. Each thread writes to the same FILE* or fd to see if the result is the expected behavior.

The first case: using the fwrite of the C library, the thread implementation is as follows:
Insert picture description here

The second case: using the system call write, the implementation of the thread is as follows:
Insert picture description here

Let's look at the implementation of the main thread:

Insert picture description here

Among them, LOOPS is defined as 1000000. In other words, threads 1~3 write "aaaaaa\n", "bbbbbb\n", and "cccccc\n" one million times respectively. If the operation of writing the file is "thread-safe", then the final number of file lines should be 3 million lines, and each line can only be one of "aaaaaa", "bbbbbb", and "cccccc".

[Article benefits] C/C++ Linux server architect learning materials plus group 812855908 (data including C/C++, Linux, golang technology, Nginx, ZeroMQ, MySQL, Redis, fastdfs, MongoDB, ZK, streaming media, CDN, P2P, K8S, Docker, TCP/IP, coroutine, DPDK, ffmpeg, etc.)
Insert picture description here

Next look at the test results:

1. The macro USE_CLIB is defined, that is, the fwrite of the C library is used. The results are as follows:
Insert picture description here

2. Comment out the red USE_CLIB, that is, directly use the system call write, the results are as follows:
Insert picture description here

From the above test results, whether it is the fwrite of the C library or the write of the system call, it can guarantee that the output will not be mixed-that is, the output of multiple threads will not be mixed, but when the system call write is used, the final file line number is wrong As expected, it is much smaller than the total number of 3 million rows. It also proves that the write system call is not "thread safe". Under multithreading, the output will cover each other. The fwrite of the C library is a thread-safe function.

Why is the result like this? We first look at the implementation of fwrite:
Insert picture description here

Inside fwrite, it uses a lock to ensure the serialization of operations, thereby achieving thread safety.

And the realization of write, see the figure below:
Insert picture description here

Before writing, use file_pos_read to get the offset. If in the case of multi-core and multi-threading, two cores may fall into the kernel state at the same time and obtain the current offset of the file at the same time, the values ​​must be equal. So the two threads write data to the same offset. In the end, the actual size of the file is not the expected size.

Final summary:
The fwrite of the C library is a thread-safe function, and the system call write requires an additional flag bit O_APPEND for additional writing to ensure that the offset does not overlap and achieve the expected concurrent writing-you can modify the following test Code, test in your own environment.

#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <pthread.h>
#include <string.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>

//#define USE_CLIB

#define TEST_FILE	"./tmp.txt"

#define LOOPS		(1000000)


#ifdef USE_CLIB
struct thr_data {
    
    
	FILE *fp;
	const char *data;
};

static void * write_data(void *data)
{
    
    
	struct thr_data *d;
	size_t len;
	int i;

	d = data;
	len = strlen(d->data);
	for (i = 0; i < LOOPS; ++i) {
    
    
		fwrite(d->data, len, 1, d->fp);
	}

	return NULL;
}

#else
struct thr_data {
    
    
	int fd;
	const char *data;
};

static void *write_data(void *data)
{
    
    
	struct thr_data *d;
	int i;
	size_t len;

	d = data;
	len = strlen(d->data);
	for (i = 0; i < LOOPS; ++i) {
    
    
		write(d->fd, d->data, len); 
	}

	return NULL;
}
#endif



int main(void)
{
    
    
	pthread_t t1, t2, t3;
	struct thr_data d1, d2, d3;

#ifdef USE_CLIB
	FILE *fp = fopen(TEST_FILE, "w");
	d1.fp = d2.fp = d3.fp = fp;
#else
	//int fd = open(TEST_FILE, O_WRONLY|O_TRUNC);
	int fd = open(TEST_FILE, O_WRONLY|O_TRUNC|O_APPEND);
	d1.fd = d2.fd = d3.fd = fd;
#endif

	d1.data = "aaaaaa\n";
	d2.data = "bbbbbb\n";
	d3.data = "cccccc\n";

	pthread_create(&t1, NULL, write_data, &d1);
	pthread_create(&t2, NULL, write_data, &d2);
	pthread_create(&t3, NULL, write_data, &d3);

	pthread_join(t1, NULL);
	pthread_join(t2, NULL);
	pthread_join(t3, NULL);

#ifdef USE_CLIB
	fclose(fp);
#else
	close(fd);
#endif

	return 0;
}

Guess you like

Origin blog.csdn.net/qq_40989769/article/details/110927256