6.824——实验一Part III: Distributing MapReduce tasks

1.任务

Your current implementation runs the map and reduce tasks one at a time. One of Map/Reduce’s biggest selling points is that it can automatically parallelize ordinary sequential code without any extra work by the developer. In this part of the lab, you will complete a version of MapReduce that splits the work over a set of worker threads that run in parallel on multiple cores. While not distributed across multiple machines as in real Map/Reduce deployments, your implementation will use RPC to simulate distributed computation.
Your job is to implement schedule() in mapreduce/schedule.go. The master calls schedule() twice during a MapReduce job, once for the Map phase, and once for the Reduce phase. schedule()'s job is to hand out tasks to the available workers. There will usually be more tasks than worker threads, so schedule() must give each worker a sequence of tasks, one at a time. schedule() should wait until all tasks have completed, and then return.

总结一下，就是前两个实验只是串行的执行map/reduce任务，而对于MapReduce最大的优势便是自动划分任务，在集群当中，单步执行的代码也可以做到并行计算，不需要其他业务开发者编写额外的并发控制代码。而这里我们完成的工作就是如何给集群中的worker分发任务，即实现master的 schedule() 功能。

2.方案

在前面master分析中说过master是通过channel方式传递worker的rpc地址给schedule()，至于为什么一定要通过这种方式，可以看master分析一文中我的解释（ps:也可能本渣渣理解错了）。总之要解决这个调度问题，我们首先要理清几个关键点：

当任务数量大于worker数量我们该怎么处理。
由于任务数量太多，worker完成任务后必须得不断加入到registerChan中去，这样才能满足要求。
一般而言map处理比较耗时，所以我们必须要发给他任务后，调度程序依旧能够正常执行，而不是等待worker执行完返回结果才继续，那该怎么处理。
那必然使用协程go关键字来处理，即使用go func() 方式调用worker的call()函数。
来自于上面两个问题而诞生出的另一个问题，由于使用了go方式处理，那调度程序如何知晓worker执行完了没有，在哪里将完成后的worker再次加到registerChan中去。
其实这里并不需要调度程序知晓worker完成了没有，因为调度程序只负责从registerChan中获得worker，如果registerChan中有数据，那必然可以使用就行了，问题关键就在于worker数量太少，需要完成后加入到registerChan中去。那自然就是在上面定义的go func()函数中判断 Worker.call()的返回值是true还是false即可，true则表示完成，那么就可以将此worker再次添加到registerChan中去。
map阶段和reduce阶段，是有先后顺序的，必须要等到map阶段所有工作完成，reduce阶段才能开始，那如何控制。
由于上面使用了协程方式来处理call()函数，而对于这种要等待多个go任务结束才能继续执行的功能，可以采用sync.WaitGroup来完成。

3.代码

调度函数

func schedule(jobName string, mapFiles []string, nReduce int, phase jobPhase, registerChan chan string) {
	var ntasks int
	var n_other int // number of inputs (for reduce) or outputs (for map)
	switch phase {
	case mapPhase:
		ntasks = len(mapFiles)
		n_other = nReduce
	case reducePhase:
		ntasks = nReduce
		n_other = len(mapFiles)
	}

	fmt.Printf("Schedule: %v %v tasks (%d I/Os)\n", ntasks, phase, n_other)

	// All ntasks tasks have to be scheduled on workers. Once all tasks
	// have completed successfully, schedule() should return.
	//
	// Your code here (Part III, Part IV).
	//

	//schedule() must wait for a worker to finish before it can give it another task
	//otherwise endless loop case if certain task not excuted by worker successfully
	var wg sync.WaitGroup
	for i := 0; i < ntasks; i++ {
		wg.Add(1)
		//get worker rpc address
		workerAddr := <-registerChan
		//set taskArgs
		taskArgs := DoTaskArgs{jobName, mapFiles[i], phase, i, n_other}
		//send RPCs to the workers in parallel so that the workers can work on tasks concurrently
		go func() {
			ok := call(workerAddr, "Worker.DoTask", taskArgs, nil)
			if ok {
				wg.Done()
				registerChan <- workerAddr
			}
		}()
		fmt.Printf("%d/%d\n", i, ntasks)
	}
	//wait until all task completed for this phase
	wg.Wait()
	fmt.Printf("Schedule: %v done\n", phase)
	return
}

run函数——控制调度

func (mr *Master) run(jobName string, files []string, nreduce int,
	schedule func(phase jobPhase),
	finish func(),
) {
	mr.jobName = jobName
	mr.files = files
	mr.nReduce = nreduce

	fmt.Printf("%s: Starting Map/Reduce task %s\n", mr.address, mr.jobName)

	schedule(mapPhase)
	schedule(reducePhase)
	finish()
	mr.merge()

	fmt.Printf("%s: Map/Reduce task completed\n", mr.address)

	mr.doneChannel <- true
}

Kevin照墨

发布了69 篇原创文章 · 获赞 10 · 访问量 1万+

私信关注