FastDFS源码分析-storage的线程分析

storage主要包括6种线程
在这里插入图片描述

storage线程堆栈

gdb attach storage进程ID 可以看到各线程堆栈：
在这里插入图片描述

accept线程

主线程是accept线程
在这里插入图片描述

nio线程

nio即网络IO，主要负责网络IO处理
storage_service_init里创建了nio线程，work_thread_entrance
nio线程默认是4个,可以看到线程2到5堆栈停在epoll_wait
在这里插入图片描述

上报线程

线程6是tracker_report_thread_entrance，用作storage向tracker上报信息，获取其它storage信息。
在这里插入图片描述

调度线程

线程7是调度线程，sched_thread_entrance，主要用作定时调度一些任务，例如定时将binlog缓冲区写到binlog文件。
该线程还包括日志同步，时间轮定时器。
在这里插入图片描述

dio线程

dio即data IO，主要用来读写文件。
线程8和线程9的堆栈相同，都是dio线程，dio_thread_entrance。
在这里插入图片描述
根据storage默认的配置文件，这里一个是读磁盘线程，一个是写磁盘线程，默认读写分离。

storage5个线程主要流程和通信

本文先不介绍上报线程，主要介绍其他5种线程，以上传文件为例，各线程流程图如下：
请添加图片描述

accept线程

accept线程数量可以配置，默认只有一个，主线程就是accept线程。主要用来接受连接，然后从空闲任务队列（对象池）里取一个fast_task_info对象。
对象池在storage_service_init里根据线程数量预先分配了：

	bytes = sizeof(struct storage_nio_thread_data) * g_work_threads;
	g_nio_thread_data = (struct storage_nio_thread_data *)malloc(bytes);

fast_task_info就是每个client（fd）对应的数据，包括读写缓冲区，还有一对管道用于其它线程和nio线程通信，nio线程绑定了管道的读端。
当client连接storage时，accept线程将fast_task_info对象的地址写入管道，然后nio线程会回调storage_recv_notify_read，刚连接会调用初始化函数storage_nio_init，也就是注册epoll读事件。
在这里插入图片描述

nio线程

nio线程，俗称工作线程，主要负责网络IO，核心函数是epoll_wait。然后根据事件类型调用不同的回调函数：

client_sock_read
client_sock_write
storage_recv_notify_read

可以看到nio线程主要负责三个功能：

从client读数据，处理
发送数据给client
通过管道接受其它线程的通知

当client上传文件后，触发n次回调client_sock_read。
第一次接收完整包后先根据命令处理任务（storage_deal_task），上传文件的命令是STORAGE_PROTO_CMD_UPLOAD_FILE，在storage_write_to_file的最后会调用storage_dio_queue_push，storage_dio_queue_push里调用blocked_queue_push将fast_task_info插入阻塞队列里，后面dio线程会从阻塞队列取任务处理。
如果不是第一次接收完整包则直接调用storage_dio_queue_push。

当上传完文件时，回调用client_sock_write通知client。
当其它线程想要通知nio线程时，会写管道，触发回调storage_recv_notify_read。
例如accept线程，通知nio线程对fd注册读事件。
而dio线程在多次写部分文件后会通知nio线程继续读取数据。这里为什么是多次写文件呢，因为一个大文件不可能一次IO就能处理完，FastDFS默认设置一次最多recv 256K数据，假设client上传1M的文件，则实际发送>1M的数据（包括协议头），故nio线程要recv 5次，调用blocked_queue_push5次，dio线程要取5次任务，写5次数据。
请添加图片描述

dio线程

dio线程主要负责文件读写，默认一个读线程一个写线程。
dio_thread_entrance函数非常简单，就是从阻塞队列取任务执行。
在client上传文件后，取出来的任务函数是dio_write_file。先调用dio_open_file打开文件，然后调用fc_safe_write写文件。
当pFileContext->offset < pFileContext->end，即文件没写完，则通知nio线程继续recv数据。
当nio线程又接收了数据，会再次将任务写入阻塞队列，dio线程再次取任务，写文件。
当pFileContext->offset >= pFileContext->end表明文件已写完，dio线程需要更新binlog。
每添加/删除/更新一个文件，binlog就会添加一条记录，用作同组所有storage的文件保存同步。
storage同步在后续介绍，本文重点关注线程。

在storage_binlog_write_ex中会更新binlog_write_cache_buff和binlog_write_cache_len

int storage_binlog_write_ex(const int timestamp, const char op_type, \
		const char *filename, const char *extra)
{
	int result;
	int write_ret;

	if ((result=pthread_mutex_lock(&sync_thread_lock)) != 0)
	{
		logError("file: "__FILE__", line: %d, " \
			"call pthread_mutex_lock fail, " \
			"errno: %d, error info: %s", \
			__LINE__, result, STRERROR(result));
	}
	// 更新binlog_write_cache_buff和binlog_write_cache_len
	if (extra != NULL)
	{
		binlog_write_cache_len += sprintf(binlog_write_cache_buff + \
					binlog_write_cache_len, "%d %c %s %s\n",\
					timestamp, op_type, filename, extra);
	}
	else
	{
		binlog_write_cache_len += sprintf(binlog_write_cache_buff + \
					binlog_write_cache_len, "%d %c %s\n", \
					timestamp, op_type, filename);
	}

	//check if buff full
	if (SYNC_BINLOG_WRITE_BUFF_SIZE - binlog_write_cache_len < 256)
	{
		write_ret = storage_binlog_fsync(false);  //sync to disk
	}
	else
	{
		write_ret = 0;
	}

	if ((result=pthread_mutex_unlock(&sync_thread_lock)) != 0)
	{
		logError("file: "__FILE__", line: %d, " \
			"call pthread_mutex_unlock fail, " \
			"errno: %d, error info: %s", \
			__LINE__, result, STRERROR(result));
	}

	return write_ret;
}

注意这里加了同步线程锁，因为dio线程和调度线程都会访问binlog缓冲区。

binlog.000文件格式：
在这里插入图片描述
时间戳+操作类型+文件ID

更新完binlog缓冲区后，文件上传完成，调用storage_nio_notify写管道通知client。
请添加图片描述

调度线程

调度线程主要作用：

使用时间轮定时器处理超时任务
循环处理已注册的调度任务

在setupSchedules总注册了日志同步，binglog同步（fdfs_binlog_sync_func），状态文件同步，trunk binlog同步等等调度任务。
由上知道dio线程在处理完上传文件后会写binlog缓冲区，此时binlog_write_cache_len增加，调度线程再次调用fdfs_binlog_sync_func时满足binlog_write_cache_len > 0，然后会调用storage_binlog_fsync。

int fdfs_binlog_sync_func(void *args)
{
    
    
	if (binlog_write_cache_len > 0)
	{
    
    
		return storage_binlog_fsync(true);
	}
	else
	{
    
    
		return 0;
	}
}

fdfs_binlog_sync_func中将binlog缓冲区写入binlog文件中，实现持久化。写完后++binlog_write_version，目的是让同步线程开始同步。

请添加图片描述

同步线程

同步线程是在上报线程tracker_report_thread_entrance中创建的。假设有n个storage，则每一个storage会创建n-1个同步线程，分别向其它storage同步文件。
同步线程循环检测binlog状态，当在storage_binlog_read函数的storage_binlog_preread中检测到pReader->binlog_buff.version 和binlog_write_version不同，会读取binlog文件。

	if (pReader->binlog_buff.version == binlog_write_version &&
		pReader->binlog_buff.length == 0)
	{
    
     // binlog文件没有更新则直接返回
		return ENOENT;
	}

当成功提取一行binlog记录到StorageBinLogRecord后storage_binlog_read返回值read_result是0，然后就可以调用storage_sync_data同步文件了。
同步文件时先调用tcpsenddata_nb发送协议上半部分（即协议头+文件名，文件大小等字段），然后调用tcpsendfile_ex发送文件内容。tcpsendfile_ex的核心是sendfile。
在这里插入图片描述
至于被同步的storage（副本）是怎么更新文件和binlog的就放到后面讲同步时再说吧。