Giraph source analysis (III) - Messaging

From the foregoing there is a known per BSPServiceWorker WorkerServer objects, objects which have ServerData WorkerServer object as a real data . ServerData contains partitionStore, edgeStore, incomingMessageStore, currentMessageStore, the aggregate value of the Worker, and the like. Wherein incomingMessageStore object MessageStoreByPartition (Interface) type, that is to say when the stored message in accordance with the partition. MessageStoreByPartition diagram interfaces are as follows:

Giraph source analysis (III) - Messaging

In SimpleMessageStore abstract class, there is a ConcurrentMap <Integer, ConcurrentMap <I, T >> type variable map, used to store the message. The first layer is a pairtitionID message to send to the partition map; VertexID second layer is to be sent to the message queue Vertex.

Giraph source analysis (III) - Messaging

"Giraph communication module analysis": http://my.oschina.net/skyaugust/blog/95182

Each vertex of the message list is ExtendedDataOutput specific type, it inherits DataOutput interface adds several methods only. Each message is written to the byte form ExtendedDataOutput object.

When sending a message, using asynchronous communication.

FIG communication with the message calculation process vertex concurrently, it may send a message in the calculation process, large-scale message dispersed in different time periods, to avoid transient network communication traffic, but requires additional space receiving end, the received temporary storage message, corresponding to space for time. The centralized communication, computation graph vertices for processing serial communication message, after the calculation is completed, sending a unified message, the control and simple implementation can be optimized to maximize the message at the transmitting end, but the network is likely to cause transient time and a message communication traffic increases storage overhead sending end.

RPC-style message communication used between different Worker, specifically Netty. Worker same, two successive iterations messages directly through the memory operation, the message to be transmitted directly copied to the Worker in incomingMessageStore. The following message storage format and transmission mechanism detailed description.

Giraph Use Cache to cache message, when the message reaches a certain threshold, a one-time transmission.

Both carried out in accordance with the bulk mode, not a piece of information sent. Message sent to a vertex in accordance with <destVertexId, Message> pair is stored in ByteArrayVertexIdData <I, T> in (actually ByteArrayVertexIdMessages <I, M> type). Described as follows: org.apache.giraph.utils.ByteArrayVertexIdData <I, T>

Function : the <vertex ID, data> Pair stored in a byte array. There are ExtendedDataOutput used to store data objects.

Giraph source analysis (III) - Messaging
There is also a kind of inner class: VertexIdDataIterator, the inner class inherits VertexIdIterator class.

Giraph source analysis (III) - Messaging

org.apache.giraph.comm.SendCache used to cache information transmitted, and then transmitted to "Bulk" mode. In Giraph may correspond to a plurality of partitions each Worker. Message buffer threshold value is calculated in units of Worker, instead of Partition.

Giraph source analysis (III) - Messaging

SendCache中有ByteArrayVertexIdData<I,T>[ ] dataCache数组用来存储发送给每个Partition的消息;有int[ ] dataSizes数组用于记录向每个Worker发送的消息大小,若大于MAX_MSG_REQUEST_SIZE(默认为512KB)就把此Worker上的所有Partition缓存的消息发送到给该Worker,同一Worker内消息也是如此缓存;有int[ ] initBufferSizes数组用于记录每个Worker上的每个Partition的初始化ByteArrayVertexIdData中ExtendedDataOutput对象的大小,同一Worker上的所有Partition初始值相同,该值为平均值。记MAX_MSG_REQUEST_SIZE(message request size)值为M, 该Worker上有P个 partitions,ADDTITIONNAL_MSG_REQUEST_SIZE(比平均值大的因子)默认为0.2f,记为A。则每个Partition的初始大小为:M*(1+A) / P .

由前文知道,每个Worker都有一个NettyWorkerClientRequestProcessor<I,V,E,M>用来发送消息。该类中有SendMessageCache对象用来缓存向外发送的信息。NettyWorkerClientRequestProcessor类中的sendMessageRequest(I,M)

方法如下,用于向某个顶点destVertexId发送消息message。

Giraph source analysis (III) - Messaging

方法解释:首先根据destVertexId得到对应的partitionId和WorkerInfo,然后把消息add到SendMessageCache中,并返回向该顶点所属Worker发送的消息大小workerMessageSize。若该值大于默认值512KB,则把此Worker对应的所有Partition消息从SendMessageCache中删除,把删除的消息赋值给workerMessages,其类型为PairList<Integer,ByteArrayVertexIdMessages<I,M>> ,key为partitionId,value为发送给该partition的消息列表,最后调用doRequest()方法发送信息。doRequest()方法如下:

Giraph source analysis (III) - Messaging

可以看到在发送消息时,先判断是否在同一Worker上。如果是的话,调用SendWorkerMessagesRequest<T,M>的doRequest发送消息;否则使用WorkerClient(底层使用Netty)进行消息发送。下面着重讨论同一Worker内的机制。

org.apache.giraph.comm.requests.SendWorkerMessagesRequest类中的doRequest方法如下:

Giraph source analysis (III) - Messaging

参数为该Worker的ServerData,代码中的partitionVertexData实际为PairList<Integer,ByteArrayVertexIdMessages<I,M>>workerMessages。遍历<partitionID,对应的消息列表>来添加到ServerData中的incomingMessageStore中。

ByteArrayMessagesPerVertexStore类中的addPartitionMessages()方法如下:

Giraph source analysis (III) - Messaging

When the user uses the Combiner, incomingMessageStore corresponding type was OneMessagePerVertexStore, each vertex is stored only such a message, rather than a message queue . Structure as shown below:

Giraph source analysis (III) - Messaging

When a message is added, it will have vertices corresponding message and call message to add combine () method are combined, and then stores the configuration in FIG. addPartitionMessages () method is as follows:

Giraph source analysis (III) - Messaging

After the call in ComputeCallable () method call computePartition (Partition) End calculated on all vertices Partition, calls WorkerClientRequestProcessor.flush () method to all remaining messages sent .

Guess you like

Origin blog.51cto.com/14463231/2423480