[Spark kernel] Source resolve communication architecture

Personal blog post address

Familiar routine, first a rough idea of ​​how to spark communication architecture work, and then go to track the source.

Spark2.x version Netty communications framework as an internal communication components.

Spark communication frame of each component (Client / Master / Worker) can be considered a separate entity, for communication between the various entities via a message, as shown:
Here Insert Picture Description

Endpoint (Client / Master / the Worker) has a InBox and N OutBox (N> = 1, N depending on the current communication with Endpoint Endpoint how many other, other Endpoint communication therewith a corresponding one of OutBox), Endpoint received InBox message is written, the written message sent OutBox and transmitted to Endpoint in other InBox.

On a more detailed Spark communication architecture diagram, a clearer understanding of how communication between the spark components, only read the this picture, with the general framework to help us track the source;
Here Insert Picture Description

  1. RpcEndpoint: the RPC endpoint , the Spark for each node (Client / Master / Worker) endpoints are called a Rpc, and implement RpcEndpoint interfaces, according to the needs of different internal endpoints, different designs and different message service processing, if desired send (inquiry) is called Dispatcher;

  2. RpcEnv: RPC context , each RPC runtime dependent endpoint context referred RpcEnv;

  3. Dispatcher: message distributor for RPC message to the endpoint needs to send or receive messages from a remote RPC to the distribution command corresponding to the Inbox / Outbox. If the instruction recipient is himself in an inbox, if the instruction recipient is not their own, into the outbox;

  4. Inbox: instruction message inbox , a RpcEndpoint corresponds to a local inbox, Dispatcher at each message stored in the Inbox, are added to the corresponding internal ReceiverQueue EndpointData, the additional polling ReceiverQueue a single start thread created when Dispatcher , a message inbox consumption;

  5. RpcEndpointRef: RpcEndpointRef is a reference to a remote RpcEndpoint of . When we need to send a message to a specific RpcEndpoint, we typically need to get a reference to the RpcEndpoint then sends a message through the application.

  6. OutBox: instruction message outbox , for the current RpcEndpoint, one corresponding to a target RpcEndpoint Outbox, send information to a plurality of targets if RpcEndpoint, there are a plurality OutBox. When the message into the Outbox, send out the message followed by TransportClient. Message into the outbox and a transmission process is performed in the same thread;

  7. RpcAddress: represents RpcEndpointRef address remote , Host + Port.

  8. TransportClient: Netty communication client , a corresponding one of OutBox TransportClient, TransportClient OutBox continually polls, according to the receiver information OutBox message corresponding to the request remote TransportServer;

  9. TransportServer: Netty communication server , a corresponding one of TransportServer RpcEndpoint, after receiving the remote call distribution message to the corresponding message transceiver Dispatcher component box;

When I learned about communication architecture spark back, you can start reading the source code. But the starting point in that it? We can then last source code analysis. In setting the RPC context Excutor environment communication endpoint starts.

CoarseGrainedExecutorBackend
main{
	env.rpcEnv.setupEndpoint("Executor", new CoarseGrainedExecutorBackend(
        env.rpcEnv, driverUrl, executorId, hostname, cores, userClassPath, env))
}

In the previous article we focus on creating CoarseGrainedExecutorBackend this class. But no attention setupEndpoint this method. So we went to see the point'll find is an abstract method RpcEnv abstract class, did not materialize, it is necessary to go to a subclass of this class.

private[spark] abstract class RpcEnv(conf: SparkConf) {
	def setupEndpoint(name: String, endpoint: RpcEndpoint): RpcEndpointRef
}

idea Press F4, there will be NettyRpcEnv ; right on this subclass're looking for, followed by the find to achieve setupEndpoint.

private[netty] class NettyRpcEnv{
	override def setupEndpoint(name: String, endpoint: RpcEndpoint): RpcEndpointRef = {
		dispatcher.registerRpcEndpoint(name, endpoint)
	}
}

Suddenly I found a familiar word the Dispatcher , yes this is our message distributor , impress and see what is inside it.

private[netty] class Dispatcher(nettyEnv: NettyRpcEnv) extends Logging { 
	// 封装了数据、端点和引用
    private class EndpointData(      
        val name: String,      
        val endpoint: RpcEndpoint,      
        val ref: NettyRpcEndpointRef) {   
        val inbox = new Inbox(ref, endpoint) 
    }
    
	// 注册Executor的rpc端点
	def registerRpcEndpoint(name: String, endpoint: RpcEndpoint): NettyRpcEndpointRef = {
		// 封装RpcEndpointRef的地址,Host + Port
		val addr = RpcEndpointAddress(nettyEnv.address, name)
		
		// 创建一个RpcEndpoint的一个引用
		val endpointRef = new NettyRpcEndpointRef(nettyEnv.conf, addr, nettyEnv)
		synchronized {
		if (stopped) {
			throw new IllegalStateException("RpcEnv has been stopped")
		}
		
		// endpoints结构是 ConcurrentMap[String, EndpointData]
		if (endpoints.putIfAbsent(name, new EndpointData(name, endpoint, endpointRef)) != null) {
			throw new IllegalArgumentException(s"There is already an RpcEndpoint called $name")
		}
		
		val data = endpoints.get(name)
		
		// endpointRefs结构也是ConcurrentMap[RpcEndpoint, RpcEndpointRef],一个rpc端点对应一个rpc端点的引用
		endpointRefs.put(data.endpoint, data.ref)
		
		//  private val receivers = new LinkedBlockingQueue[EndpointData]  receiver是个阻塞队列,将data放入队列中就会有线程来取数据运行
		receivers.offer(data)  // for the OnStart message
		}
		endpointRef
	}
}

Finally, to look through the source OnStart message into the queue, the final processing OnStart message will be the message. What in that process, handle something? In fact, the last article in the OnStart method CoarseGrainedExecutorBackend in .

private[spark] class CoarseGrainedExecutorBackend() extends ThreadSafeRpcEndpoint{
	// 由于该类继承了Rpc端点,所以该对象的生命周期是 constructor(创建) -> onStart(启动) -> receive*(接收消息) -> onStop(停止)

	// 我们所说的Executor就是CoarseGrainedExecutorBackend中的一个属性对象
	var executor: Executor = null
	
	override def onStart() {
		//向Driver反向注册
		driver = Some(ref)
		ref.ask[Boolean](RegisterExecutor(executorId, self, hostname, cores, extractLogUrls))
	}
	
	override def receive: PartialFunction[Any, Unit] = {
		// 收到Driver注册成功的消息
		case RegisteredExecutor =>
			// 创建计算对象Executor
			executor = new Executor(executorId, hostname, env, userClassPath, isLocal = false)
		
		// 收到Driver端发送过来的task
		case LaunchTask(data) =>
			// 由executor对象调用方法运行
			executor.launchTask(this, taskId = taskDesc.taskId, attemptNumber = taskDesc.attemptNumber,taskDesc.name, taskDesc.serializedTask)
	}
}

See here will produce a doubt, onStart method which calls ref.ask () to reverse registration message Driver, what who receives this message, how to deal with?
Since it is registered with the Drive, then you should go to Driver, and the Driver is a user-created SparkContent that part of the program, so we can go inside to find SparkContent.

class SparkContext(config: SparkConf) extends Logging {
	// 没错消息就是发给它了
	private var _schedulerBackend: SchedulerBackend = _

}

Then we can not wait to see SchedulerBackend , but it is the interface to find its subclasses CoarseGrainedSchedulerBackend , see this class, is not a kind of feeling met, CoarseGrainedExecutorBackend , in the final analysis, these two objects interact, understand.

class CoarseGrainedSchedulerBackend(scheduler: TaskSchedulerImpl, val rpcEnv: RpcEnv){

	override def receiveAndReply(context: RpcCallContext): PartialFunction[Any, Unit] = {
        // 匹配反向注册消息
		case RegisterExecutor(executorId, executorRef, hostname, cores, logUrls) =>
            // 总的核心数要加上Executor注册的核心数
			totalCoreCount.addAndGet(cores)
            // Executor的数量加1
			totalRegisteredExecutors.addAndGet(1)
            // 注册成功的消息
			executorRef.send(RegisteredExecutor)
	}
}

This spark communications infrastructure to understand how much more concrete will not go deeply.

Driver end user interaction is SchedulerBackend , Executor end user interaction is ExecutorBackend .

Codeword easy, further tap wave Follow / Like;

Published 95 original articles · won praise 64 · views 80000 +

Guess you like

Origin blog.csdn.net/qq_43115606/article/details/105010015