Giraph source code analysis (five) - to load data synchronization summary +

Author | White Pine

About Giraph a total of nine chapters, this fifth chapter.

Environment : on a single machine (machine name: giraphx) launched two workers.

Input : SSSP folder, there are two documents 1.txt and 2.txt.

1, after the Master Worker health status report, began to wait for the Master to create InputSplit.

Method: Each Worker znode node by detecting a presence or absence, and set on this Watcher Znode. If not, then the method releases the lock by the current thread waitForever BSPEvent of (), into a wait state. Wait until the master creates the znode. This step is located startSuperStep BSPServiceWorker class method, the waiting code is as follows:

Giraph source code analysis (five) - to load data synchronization summary +
2, Master call createInputSplits () method creates InputSplit.

Giraph source code analysis (five) - to load data synchronization summary +

In generateInputSplits () method to obtain the user setting InputSplits VertexInputFormat. code show as below:

Giraph source code analysis (five) - to load data synchronization summary +

Wherein minSplitCountHint create the minimum number of split, its value as follows:

minSplitCountHint = Workers数目 * NUM_INPUT_THREADS

NUM_INPUT_THREADS represents the number of threads per Input split loading, the default value of 1. After verification, minSplitCountHint parameters getSplits in TextVertexValueInputFormat abstract class () method is ignored. VertexInputFormat user input inherit TextVertexValueInputFormat abstract class.

If the resulting splits.size less than minSplitCountHint, then some worker is no spend.

After obtaining split information, this information should be written on the Zookeeper, so that other workers access. split information obtained above as follows:

[hdfs://giraphx:9000/user/root/SSSP/1.txt:0+66, hdfs://giraphx:9000/user/root/SSSP/2.txt:0+46]

Traversal splits List, create a Znode for each split, split the value of information. The created for split-0 Znode, is: hdfs: // giraphx: 9000 / user / root / SSSP / 1.txt: 0 + 66

/_hadoopBsp/job_201404102333_0013/_vertexInputSplitDir/0

Create znode split-1 (below), for a value of: hdfs: // giraphx: 9000 / user / root / SSSP / 2.txt: 0 + 46

/_hadoopBsp/job_201404102333_0013/_vertexInputSplitDir/1

Finally, create znode: / _hadoopBsp / job_201404102333_0013 / _vertexInputSplitsAllReady it means that all splits are created.

3, Master Creating Partitions according splits. The first determines the number of partition.

Giraph source code analysis (five) - to load data synchronization summary +

BSPServiceMaster in MasterGraphPartitioner <IV, E, M> default target HashMasterPartitioner. It createInitialPartitionOwners () method is as follows:

Giraph source code analysis (five) - to load data synchronization summary +

The above code is the number calculated in the Partition tools PartitionUtils, calculated as follows:

partitionCount=PARTITION_COUNT_MULTIPLIER availableWorkerInfos.size() availableWorkerInfos.size() ,其中PARTITION_COUNT_MULTIPLIER表示Multiplier for the current workers squared,默认值为1 。

可见,partitionCount值为4(122)。创建的partitionOwnerList信息如下:

[(id=0,cur=Worker(hostname=giraphx, MRtaskID=1, port=30001),prev=null,ckpt_file=null),

(id=1,cur=Worker(hostname=giraphx, MRtaskID=2, port=30002),prev=null,ckpt_file=null),

(id=2,cur=Worker(hostname=giraphx, MRtaskID=1, port=30001),prev=null,ckpt_file=null),

(id=3,cur=Worker(hostname=giraphx, MRtaskID=2, port=30002),prev=null,ckpt_file=null)]

4、Master创建Znode:/_hadoopBsp/job_201404102333_0013/_applicationAttemptsDir/0/_superstepDir/-1/_partitionExchangeDir,用于后面的exchange partition。

5、Master最后在assignPartitionOwners()方法中

把masterinfo,chosenWorkerInfoList,partitionOwners等信息写入Znode中(作为Znode的data),该Znode的路径为: /_hadoopBsp/job_201404102333_0013/_applicationAttemptsDir/0/_superstepDir/-1/_addressesAndPartitions 。

Master调用barrierOnWorkerList()方法开始等待各个Worker完成数据加载。调用关系如下:

Giraph source code analysis (five) - to load data synchronization summary +

barrierOnWorkerList中创建znode,path=/_hadoopBsp/job_201404102333_0013/_vertexInputSplitDoneDir 。然后检查该znode的子节点数目是否等于workers的数目,若不等于,则线程陷入等待状态。后面某个worker完成数据加载后,会创建子node(如 /_hadoopBsp/job_201404102333_0013/_vertexInputSplitDoneDir/giraphx_1)来激活该线程继续判断。

6、当Master创建第5步的znode后,会激活worker。

每个worker从znode上读出data,data包含masterInfo,WorkerInfoList和partitionOwnerList,然后各个worker开始加载数据。

把partitionOwnerList复制给BSPServiceWorker类中的workerGraphPartitioner(默认为HashWorkerPartitioner类型)对象的partitionOwnerList变量,后续每个顶点把根据vertexID通过workerGraphPartitioner对象获取其对应的partitionOwner。

Giraph source code analysis (five) - to load data synchronization summary +

每个Worker从znode: /_hadoopBsp/job_201404102333_0013/_vertexInputSplitDir获取子节点,得到inputSplitPathList,内容如下:

[/_hadoopBsp/job_201404102333_0013/_vertexInputSplitDir/1,

/_hadoopBsp/job_201404102333_0013/_vertexInputSplitDir/0]

然后每个Worker创建N个InputsCallable线程读取数据。N=Min(NUM_INPUT_THREADS,maxInputSplitThread),其中NUM_INPUT_THREADS默认值为1,maxInputSplitThread=(InputSplitSize-1/maxWorkers +1

那么,默认每个worker就是创建一个线程来加载数据。

在InputSplitsHandler类中的reserveInputSplit()方法中,每个worker都是遍历inputSplitPathList,通过创建znode来保留(标识要处理)的split。代码及注释如下:

Giraph source code analysis (five) - to load data synchronization summary +

当用reserveInputSplit()方法获取某个znode后,loadSplitsCallable类的loadInputSplit方法就开始通过该znode获取其HDFS的路径信息,然后读入数据、重分布数据。

Giraph source code analysis (five) - to load data synchronization summary +

Giraph source code analysis (five) - to load data synchronization summary +

VertexInputSplitsCallable类的readInputSplit()方法如下:

Giraph source code analysis (five) - to load data synchronization summary +

7、每个worker加载完数据后,调用waitForOtherWorkers()方法等待其他workers都处理完split。

Giraph source code analysis (five) - to load data synchronization summary +

策略如下,每个worker在/_hadoopBsp/job_201404102333_0013/_vertexInputSplitDoneDir目录下创建子节点,后面追加自己的worker信息,如worker1、worker2创建的子节点分别如下:

/_hadoopBsp/job_201404102333_0013/_vertexInputSplitDoneDir/giraphx_1

/_hadoopBsp/job_201404102333_0013/_vertexInputSplitDoneDir/giraphx_2

创建完后,然后等待master创建/_hadoopBsp/job_201404102333_0013/_vertexInputSplitsAllDone。

8、从第5步骤可知,若master发现/_hadoopBsp/job_201404102333_0013/_vertexInputSplitDoneDir下的子节点数目等于workers的总数目,就会在coordinateInputSplits()方法中创建

_hadoopBsp/job_201404102333_0013/_vertexInputSplitsAllDone,告诉每个worker,所有的worker都处理完了split。

9、最后就是就行全局同步。

master创建znode,path=/_hadoopBsp/job_201404102333_0013/_applicationAttemptsDir/0/_superstepDir/-1/_workerFinishedDir ,然后再调用barrierOnWorkerList方法检查该znode的子节点数目是否等于workers的数目,若不等于,则线程陷入等待状态。等待worker创建子节点来激活该线程继续判断。

每个worker获取自身的Partition Stats,进入finishSuperStep方法中,等待所有的Request都被处理完;把自身的Aggregator信息发送给master;创建子节点,如/_hadoopBsp/job_201404102333_0013/_applicationAttemptsDir/0/_superstepDir/-1/_workerFinishedDir/giraphx_1,data为该worker的partitionStatsList和workerSentMessages统计量;

最后调用waitForOtherWorkers()方法等待master创建/_hadoopBsp/job_201404102333_0013/_applicationAttemptsDir/0/_superstepDir/-1/_superstepFinished 节点。

master发现/_hadoopBsp/job_201404102333_0013/_applicationAttemptsDir/0/_superstepDir/-1/_workerFinishedDir的子节点数目等于workers数目后,根据/_hadoopBsp/job_201404102333_0013/_applicationAttemptsDir/0/_superstepDir/-1/_workerFinishedDir子节点上的data收集每个worker发送的aggregator信息,汇总为globalStats。

Master若发现全局信息中(1)所有顶点都voteHalt且没有消息传递,或(2)达到最大迭代次数 时,设置 globalStats.setHaltComputation(true)。告诉works结束迭代。

Create master / _hadoopBsp / job_201404102333_0013 / _applicationAttemptsDir / 0 / _superstepDir / -1 / _superstepFinished node, data is globalStats. Tell all workers end the current super-step.

Each Worker detected create master / _hadoopBsp / job_201404102333_0013 / _applicationAttemptsDir / 0 / _superstepDir / -1 / _superstepFinished node, reads the data of the znode, i.e., the global statistics. And then decide whether to continue with the next iteration.

10, after the start of the next super-synchronous step.

11, master and workers summed up the synchronization process.

(1) master created znode A, and then detect whether the child node A number equal to the number of workers, does not mean it into a wait. After a worker to create a child node, it will detect a wake-up master.

(2) Each worker conduct their own work is completed, create a child node A1 A's. Then wait for the master to create znode B.

(3) If the detected number of sub-master node A is equal to the number of workers, creating Znode B

After (4) master node B is created, it will activate individual worker. End of synchronization, each worker can begin the next super-step.

It is essentially performed by the global synchronization znode B.

Guess you like

Origin blog.51cto.com/14463231/2427624