Spark kernel analysis (3) Principle analysis of Spark communication architecture

1. Overview of Spark communication architecture

Spark 2.x version uses Netty communication framework as internal communication component.
Insert picture description here
Spark's new rpc framework based on Netty draws on the design of Akka. It is based on the Actor model, as shown in the following figure: Each component (Client/Master/Worker) in the Spark communication framework can be considered one by one 独立的实体, and each entity passes through Message to communicate. Specific relationship between the various components are as follows:
Insert picture description here
Endpoint (Client / Master / the Worker) there 1 个 InBox, and N 个 OutBox(N> = 1, N and depending on how many other current Endpoint Endpoint communicates or communicate with a corresponding one of the other Endpoint OutBox), Endpoint 接收到的消息被写入 InBox,发送出去的消息写入OutBox ,并被发送到其他 Endpoint 的 InBox 中.

Two, Spark communication architecture analysis

The Spark communication architecture is shown in the figure below:
Insert picture description here
(1) RpcEndpoint: RPC endpoint, Spark calls each node (Client/Master/Worker) an Rpc endpoint, and all implement the RpcEndpoint interface. The internal design is different according to the needs of different endpoints. Messages and different business processing, if you need to send (inquiry), call Dispatcher;

(2) RpcEnv: RPC context environment, the context environment that each RPC endpoint depends on during runtime is called RpcEnv;

(3) Dispatcher: Message distributor, for RPC endpoints that need to send messages or messages received from remote RPC, distribute them to the corresponding command inbox/outbox. If the recipient of the instruction is himself, put it in the inbox, if the recipient of the instruction is not himself, put it in the outbox;

(4) Inbox: Instruction message inbox, a local RpcEndpoint corresponds to an inbox, Dispatcher adds the corresponding EndpointData to the internal ReceiverQueue every time a message is stored in the Inbox, and a separate thread is started when the Dispatcher is created. Inquire ReceiverQueue to consume inbox messages;

(5) RpcEndpointRef: RpcEndpointRef is a reference to remote RpcEndpoint. When we need to send a message to a specific RpcEndpoint, we generally need to obtain a reference to the RpcEndpoint, and then send the message through the application.

(6) OutBox: Instruction message outbox. For the current RpcEndpoint, one target RpcEndpoint corresponds to one outbox. If sending information to multiple target RpcEndpoints, there are multiple OutBoxes. When the message is put in the Outbox, it is sent out via TransportClient. The message is put into the outbox and the sending process is carried out in the same thread;

(7) RpcAddress: Represents the address of the remote RpcEndpointRef, Host + Port.

(8) TransportClient: Netty communication client, one OutBox corresponds to one TransportClient, TransportClient polls the OutBox continuously, and requests the corresponding remote TransportServer according to the receiver information of the OutBox message;

(9) TransportServer: Netty communication server, one RpcEndpoint corresponds to one TransportServer, after receiving the remote message, the Dispatcher is called to distribute the message to the corresponding sending and receiving box;

Based on the above analysis, the high-level view of the Spark communication architecture is shown in the following figure:
Insert picture description here

Guess you like

Origin blog.csdn.net/weixin_43520450/article/details/108607367