Spark-depth analysis of RPC

[This article is reprinted content of graduate students graduated from Tsinghua University, graduated from Beijing University of Posts and Telecommunications, is currently engaged in research and development work in the advertising system Alibaba, worked in Hulu main Big Data technologies, worked in Baidu work and procedures of the Department of Commerce search advertising six years, attention Service Backend, Computing Engine, Big Data , Ads technology, to code written in high-quality, love open source technology, open source some of their own class library assembly (see my work ), but also for Apache Spark, Flink, Spring Cloud Eureka, Protostuff contribution over the source code. ]

[Contained in the original know almost , thanks to the author's permission to reprint. ]

Foreword

Spark is a fast, general-purpose distributed computing system, the distributed nature means, communication between nodes must exist between the different components Spark This paper describes how by RPC (Remote Procedure Call) for point to point communication of. It is divided into three chapters,

  1. Spark RPC simple examples and practical applications
  2. Spark RPC module design principles
  3. Spark RPC core technology summary

 

Example 1. Spark RPC simple and practical

Spark of RPC in two main modules,

1) Spark-core, the main role of the package carries a better server and client, as well as fusion and scala language, which depends on the module org.apache.spark.spark-network-common.

2) In org.apache.spark.spark-network-common, the module is written in java, the latest version is based on netty4 developed to provide full-duplex, multiplex I / O model Socket I / O capacity, Spark transport protocol structure (wire protocol) is customize.

In order to better understand the internal Spark RPC implementation details, based on Spark 2.1 version I pulled out some of the RPC communication alone start a project , and released into the github Maven central repository to make learning to use, provides good documentation to get started, parameter setting and performance evaluation. Here we do a perceptual awareness Spark RPC through this module.

The following code can be in kraps-rpc found.

1.1 A simple example

Suppose we want to develop a Hello service, the client can transmit string, the server response hi or bye, and echo back to the input string.

The first step, define a HelloEndpoint inherited from RpcEndpoint show can call the service concurrently, if inherited from ThreadSafeRpcEndpoint indicates that the Endpoint not allow concurrent.

class HelloEndpoint(override val rpcEnv: RpcEnv) extends RpcEndpoint { override def onStart(): Unit = { println("start hello endpoint") } override def receiveAndReply(context: RpcCallContext): PartialFunction[Any, Unit] = { case SayHi(msg) => { println(s"receive $msg") context.reply(s"hi, $msg") } case SayBye(msg) => { println(s"receive $msg") context.reply(s"bye, $msg") } } override def onStop(): Unit = { println("stop hello endpoint") } } case class SayHi(msg: String) case class SayBye(msg: String)

And traditional RPC Java contrast solution, where it can be seen without defining the interface or a method of labeling (such as a general name or id), used scala routing pattern matching method. While peer communication exchange contract subject language, and this is SayHi SayBye two case class, but Spark RPC communication components positioned inside, so harmless.

 

The second step, to develop good just to Spark RPC Endpoint manage their life cycle, in response to external requests. RpcEnvServerConfig can define parameters, server name (only one is a flag), bind address and port. By NettyRpcEnvFactory this factory method to generate RpcEnv, RpcEnv is at the heart of the entire Spark RPC, will be detailed later expanded by setupEndpoint the "hello-service" and the first step in defining the name of Endpoint binding, client follow-up calls are routed to the Endpoint need "hello-service" name. AwaitTermination call to block the server listens for requests and processes.

val config = RpcEnvServerConfig(new RpcConf(), "hello-server", "localhost", 52345) val rpcEnv: RpcEnv = NettyRpcEnvFactory.create(config) val helloEndpoint: RpcEndpoint = new HelloEndpoint(rpcEnv) rpcEnv.setupEndpoint("hello-service", helloEndpoint) rpcEnv.awaitTermination()

 

The third step is to develop a client calls just started the server, first RpcEnvClientConfig and RpcEnv are necessary, and then create a reference to the remote Endpoint (Ref) by "hello-service" names just mentioned, can be seen as a stub, used to call, first show to do here by request asynchronously.

val rpcConf = new RpcConf() val config = RpcEnvClientConfig(rpcConf, "hello-client") val rpcEnv: RpcEnv = NettyRpcEnvFactory.create(config) val endPointRef: RpcEndpointRef = rpcEnv.setupEndpointRef(RpcAddress("localhost", 52345), "hell-service") val future: Future[String] = endPointRef.ask[String](SayHi("neo")) future.onComplete { case scala.util.Success(value) => println(s"Got the result = $value") case scala.util.Failure(e) => println(s"Got error: $e") } Await.result(future, Duration.apply("30s"))

You can also synchronized manner, in the latest in Spark askWithRetry actually renamed askSync.

val result = endPointRef.askWithRetry[String](SayBye("neo"))

 

This is a communication process Spark RPC, ease of use can be imagined, is very simple, the RPC frame shielding Socket I / O model, threading model, serialization / deserialization process, using the packet identification netty done, long connection, the network reconnection retry mechanisms.

 

1.2 practical application

Inside the Spark, a lot of Endpoint and EndpointRef communicate is through this form of, for example, such as the interaction between the driver and the executor uses a heartbeat mechanism, use HeartbeatReceiver to achieve, which is also an Endpoint, its registration in SparkContext initialization time to do the following code:

_heartbeatReceiver = env.rpcEnv.setupEndpoint(HeartbeatReceiver.ENDPOINT_NAME, new HeartbeatReceiver(this))

And it's called Executor way inside as follows:

val message = Heartbeat(executorId, accumUpdates.toArray, env.blockManager.blockManagerId) val response = heartbeatReceiverRef.askWithRetry[HeartbeatResponse](message, RpcTimeout(conf, "spark.executor.heartbeatInterval", "10s")) 

 

Design principles 2. Spark RPC module

First, the Spark 2.0 since this has put Akka RPC framework stripped out (for details see SPARK-5293 ), the reason is very simple, because many users will use Akka do messaging, and it will conflict with the version embedded Spark , while Spark was only used to do the RPC Akka, so after 2.0, based on the underlying org.apache.spark.spark-network-common module implements a similar Akka the Actor model scala messaging module, encapsulated inside the core, kraps -rpc is this part of the peel inside out independently from the core of a project.

Although stripped of Akka, but still followed the concepts Actor mode, the following mapping in the current Spark RPC in.

RpcEndpoint => Actor
RpcEndpointRef => ActorRef RpcEnv => ActorSystem

All use the underlying communications netty was replaced, using the org.apache.spark.spark-network-common internal lib.

 

2.1 class diagram analysis

Where the first figure shows a UML class relationships within Spark RPC module, white is the scala Spark-core class, the yellow org.apache.spark.spark-network-common is the java classes.

Do not be intimidated by this picture, explained through the following analysis, I believe that the reader can understand its meaning, without careful study of its reasonable design, Spark is a rapidly developing, evolving project, the code is not static, it is constantly changing certain.

 

RpcEndpoint and RpcCallContext

Look at the leftmost RpcEndpoint, RpcEndpoint can respond to a service request, and the Actor Akka Similarly, from its method of providing a signature (see below) can be seen, the receive mode is unidirectional methods, it can be compared to UDP while receiveAndReply is the response mode, and can be compared to TCP. Its subclasses can achieve selective coverage of these two functions, we achieved the first chapter of HelloEndpoint and Spark in HeartbeatReceiver all its subclasses.

def receive: PartialFunction[Any, Unit] = { case _ => throw new RpcException(self + " does not implement 'receive'") } def receiveAndReply(context: RpcCallContext): PartialFunction[Any, Unit] = { case _ => context.sendFailure(new RpcException(self + " won't reply anything")) }

Wherein RpcCallContext bridging method for separating the core and the underlying business logic of the transmission, which can be seen with a combination of multi Spark RPC, and a callback polymerization callback OO design patterns do abstraction, so that the business logic can be peeled -> RPC package (Spark- the core module) -> low-level communication (spark-network-common) three. RpcCallContext may be used to return a normal response and exception error, for example:

reply(response: Any) // 回复一个message,可以是一个case class。 sendFailure(e: Throwable) // 回复一个异常可以是Exception的子类由于Spark RPC默认采用Java序列化方式所以异常可以完整的在客户端还原并且作为cause re-throw出去

RpcCallContext also divided into two sub-categories, namely LocalNettyRpcCallContext and RemoteNettyRpcCallContext, this is mainly an internal framework, in which case the local left LocalNettyRpcCallContext direct call to Endpoint otherwise left RemoteNettyRpcCallContext required by RPC and remote interaction, this also reflects the RPC the core concept is how to execute on another address space functions, methods, as if in the same local calling.

In addition, RpcEndpoint also provides a series of callbacks coverage.

- onError
- onConnected
- onDisconnected - onNetworkError - onStart - onStop - stop

Also note that under one of its subclasses is ThreadSafeRpcEndpoint, many Spark of Endpoint inherited this class, Spark RPC framework is not concurrent processing of such Endpoint, that is, at the same time allowing only one thread to make calls.

There is also a default RpcEndpoint called the RpcEndpointVerifier, each RpcEnv initialization time will be registered on the Endpoint, because every time a client calls you need to ask the server if there is one Endpoint.

 

RpcEndpointRef

RpcEndpointRef similar Akka in ActorRef, as the name implies, it is a reference to RpcEndpoint methods provided equivalent to send!, ASK method is equivalent to?, Send a request for a one-way transmission (RpcEndpoint in response to receive it), to provide fire-and- forget semantics, and ask to provide the requested response semantics (RpcEndpoint in receiveAndReply respond to it), the default is the need to return the response, with a time-out mechanism, you can sync block waiting, you can also return a Future handle, do not block the work of initiating the request thread .

RpcEndpointRef client initiated the request entry, which can be obtained from RpcEnv, and smart to do a local call or RPC.

 

RpcEnv and NettyRpcEnv

The core class library is RpcEnv, just mentioned this is ActorSystem, service and client can use it for communication.

For the server side, it, RpcEnv is RpcEndpoint operating environment throughout the life cycle management is responsible RpcEndpoint, which can destroy or Endpoint registration, the data packet by TCP layer and deserialization, packaged into RpcMessage, and routes the request to the specified Endpoint , call the business logic code, Endpoint if a response is required, the sequence of the returned object TCP retransmission to the remote peer layer, if Endpoint abnormal, then the abnormal call RpcCallContext.sendFailure to send back.

On the client side, you can get RpcEndpoint referenced by RpcEnv, which is RpcEndpointRef of.

RpcEnv is the person in charge and specific interactions underlying communication module, and its companion object contains methods to create RpcEnv, the signature as follows:

def create(
      name: String, bindAddress: String, advertiseAddress: String, port: Int, conf: SparkConf, securityManager: SecurityManager, numUsableCores: Int, clientMode: Boolean): RpcEnv = { val config = RpcEnvConfig(conf, name, bindAddress, advertiseAddress, port, securityManager, numUsableCores, clientMode) new NettyRpcEnvFactory().create(config) }

RpcEnv create responsible RpcEnvFactory, RpcEnvFactory there is only one sub-category is NettyRpcEnvFactory, the original there AkkaRpcEnvFactory. NettyRpcEnvFactory.create method once the call will immediately start the server on the bind address and port.

It is dependent RpcEnvConfig SparkConf and contains a number of parameters (kraps-rpc are renamed RpcConf). RpcEnv parameters are required to take from RpcEnvConfig, the most basic hostname and port, as well as some high-level connection timeout, retries, Reactor thread pool size, and so on.

Let's look at RpcEnv two most commonly used methods:

// 注册endpoint,必须指定名称,客户端路由就靠这个名称来找endpoint
def setupEndpoint(name: String, endpoint: RpcEndpoint): RpcEndpointRef // 拿到一个endpoint的引用 def setupEndpointRef(address: RpcAddress, endpointName: String): RpcEndpointRef

NettyRpcEnv created by NettyRpcEnvFactory.create, which is the entire bridge Spark core and org.apache.spark.spark-network-common, the communication capacity within the leverage provided by the bottom, while packing a semantic class Actor. The core of the above two methods, setupEndpoint Endpoint is registered in the Dispatcher, setupEndpointRef call will go to a local or remote RpcEndpointVerifier attempt to verify whether there is an endpoint, and then create RpcEndpointRef. For more on the server, the client calls the details will be set forth in the timing diagram, not to undertake here.

 

Dispatcher和Inbox

NettyRpcEnv contains Dispatcher, mainly for the server to help routed to the correct RpcEndpoint, and call its business logic.

The need to set forth herein Reactor Model, Spark RPC's Socket I / O model of a typical Reactor, but in conjunction with the Mailbox Actor pattern, can be described as one implementation of mixing.

Reactor Model use, EventLoop created by the underlying netty do I / O multiplexer, as used herein, Multiple Reactors of this form, as shown below, from the point of view netty, the Main and Sub Reactor Reactor conceptual map of BossGroup and WorkerGroup the former is responsible for monitoring the TCP connection establishment and disconnection, which is responsible for real I / O read and write, and figure that the ThreadPool Dispatcher threads in the pool, it took off to decouple business logic and I / O operation, so that you can more scalabe, only a small number of threads can handle thousands of connections, this idea is a standard divide and conquer strategy, offload non-I / O operation to another thread pool.

RpcEndpoint service logic processing in real ThreadPool which, by the intermediate thread handler Reactor decode processing into RpcMessage, and then delivered to the Inbox, so in a further compute process described below which do Dispatcher thread pool.

(Source click here )

Actor pattern just also mentioned in the mailbox mode, Spark RPC originated in Akka, it evolved to now, still use this mode. Here introduce Inbox, each Endpoint has an Inbox, Inbox which has a list of InboxMessage, InboxMessage there are many sub-categories, can be invoked remotely over RpcMessage, one-way messages can be invoked remotely over the fire-and-forget OneWayMessage, can also be a variety of services to start, establish a link disconnection Message, the Message will do pattern matching method within the interior of the Inbox, call the appropriate RpcEndpoint function (all one to one).

Dispatcher中包含一个MessageLoop,它读取LinkedBlockingQueue中的投递RpcMessage,根据客户端指定的Endpoint标识,找到Endpoint的Inbox,然后投递进去,由于是阻塞队列,当没有消息的时候自然阻塞,一旦有消息,就开始工作。Dispatcher的ThreadPool负责消费这些Message。

Dispatcher的ThreadPool它使用参数spark.rpc.netty.dispatcher.numThreads来控制数量,如果kill -3 <PID>每个Spark driver或者executor进程,都会看到N个dispatcher线程:

"dispatcher-event-loop-0" #26 daemon prio=5 os_prio=31 tid=0x00007f8877153800 nid=0x7103 waiting on condition [0x000000011f78b000]

那么另外的问题是谁会调用Dispatcher分发Message的方法呢?答案是RpcHandler的子类NettyRpcHandler,这就是Reactor中的线程做的事情。RpcHandler是底层org.apache.spark.spark-network-common提供的handler,当远程的数据包解析成功后,会调用这个handler做处理。

这样就完成了一个完全异步的流程,Network IO通信由底层负责,然后由Dispatcher分发,只要Dispatcher中的InboxMessage的链表足够大,那么就可以让Dispatcher中的ThreadPool慢慢消化消息,和底层的IO解耦开来,完全在独立的线程中完成,一旦完成Endpoint内部业务逻辑,利用RpcCallContext回调来做消息的返回。

 

Outbox

NettyRpcEnv中包含一个ConcurrentHashMap[RpcAddress, Outbox],每个远程Endpoint都对应一个Outbox,这和上面Inbox遥相呼应,是一个mailbox似的实现方式。

和Inbox类似,Outbox内部包含一个OutboxMessage的链表,OutboxMessage有两个子类,OneWayOutboxMessage和RpcOutboxMessage,分别对应调用RpcEndpoint的receive和receiveAndReply方法。

NettyRpcEnv中的send和ask方法会调用指定地址Outbox中的send方法,当远程连接未建立时,会先建立连接,然后去消化OutboxMessage。

同样,一个问题是Outbox中的send方法如何将消息通过Network IO发送出去,如果是ask方法又是如何读取远程响应的呢?答案是send方法通过org.apache.spark.spark-network-common创建的TransportClient发送出去消息,由Reactor线程负责序列化并且发送出去,每个Message都会返回一个UUID,由底层来维护一个发送出去消息与其Callback的HashMap,当Netty收到完整的远程RpcResponse时候,回调响应的Callback,做反序列化,进而回调Spark core中的业务逻辑,做Promise/Future的done,上层退出阻塞。

这也是一个异步的过程,发送消息到Outbox后,直接返回,Network IO通信由底层负责,一旦RPC调用成功或者失败,都会回调上层的函数,做相应的处理。

 

spark-network-common中的类

这里暂不做过多的展开,都是基于Netty的封装,有兴趣的读者可以自行阅读源码,当然还可以参考我之前开源的Navi-pbrpc框架的代码,其原理是基本相同的。

 

2.2 时序图分析

服务启动

话不多述,直接上图。

 

服务端响应

第一阶段,IO接收。TransportRequestHandler是netty的回调handler,它会根据wire format(下文会介绍)解析好一个完整的数据包,交给NettyRpcEnv做反序列化,如果是RPC调用会构造RpcMessage,然后回调RpcHandler的方法处理RpcMessage,内部会调用Dispatcher做RpcMessage的投递,放到Inbox中,到此结束。

第二阶段,IO响应。MessageLoop获取带处理的RpcMessage,交给Dispatcher中的ThreadPool做处理,实际就是调用RpcEndpoint的业务逻辑,通过RpcCallContext将消息序列化,通过回调函数,告诉TransportRequestHandler这有一个消息处理完毕,响应回去。

这里请重点体会异步处理带来的便利,使用Reactor和Actor mailbox的结合的模式,解耦了消息的获取以及处理逻辑。

 

客户端请求

客户端一般需要先建立RpcEnv,然后获取RpcEndpointRef。

第一阶段,IO发送。利用RpcEndpointRef做send或者ask动作,这里以send为例,send会先进行消息的序列化,然后投递到指定地址的Outbox中,Outbox如果发现连接未建立则先尝试建立连接,然后调用底层的TransportClient发送数据,直接通过该netty的API完成,完成后即可返回,这里返回了UUID作为消息的标识,用于下一个阶段的回调,使用的角度来说可以返回一个Future,客户端可以阻塞或者继续做其他操作。

第二,IO接收。TransportResponseHandler接收到远程的响应后,会先做反序列号,然后回调第一阶段的Future,完成调用,这个过程全部在Reactor线程中完成的,通过Future做线程间的通知。

 

3. Spark RPC核心技术总结

Spark RPC作为RPC传输层选择TCP协议,做可靠的、全双工的binary stream通道。

做一个高性能/scalable的RPC,需要能够满足第一,服务端尽可能多的处理并发请求,第二,同时尽可能短的处理完毕。CPU和I/O之前天然存在着差异,网络传输的延时不可控,CPU资源宝贵,系统进程/线程资源宝贵,为了尽可能避免Socket I/O阻塞服务端和客户端调用,有一些模式(pattern)是可以应用的。Spark RPC的I/O Model由于采用了Netty,因此使用的底层的I/O多路复用(I/O Multiplexing)机制,这里可以通过spark.rpc.io.mode参数设置,不同的平台使用的技术不同,例如linux使用epoll。

线程模型采用Multi-Reactors + mailbox的异步方式来处理,在上文中已经介绍过。

Schema Declaration和序列化方面,Spark RPC默认采用Java native serialization方案,主要从兼容性和JVM平台内部组件通信,以及scala语言的融合考虑,所以不具备跨语言通信的能力,性能上也不是追求极致,目前还没有使用Kyro等更好序列化性能和数据大小的方案。

协议结构,Spark RPC采用私有的wire format如下,采用headr+payload的组织方式,header中包括整个frame的长度,message的类型,请求UUID。为解决TCP粘包和半包问题,以及组织成完整的Message的逻辑都在org.apache.spark.network.protocol.MessageEncoder中。

 

使用wireshake具体分析一下。

首先看一个RPC请求,就是调用第一章说的HelloEndpoint,客户端调用分两个TCP Segment传输,这是因为Spark使用netty的时候header和body分别writeAndFlush出去。

下图是第一个TCP segment:

例子中蓝色的部分是header,头中的字节解析如下:

00 00 00 00 00 00 05 d2 // 十进制1490是整个frame的长度

03一个字节表示的是RpcRequest,枚举定义如下,

RpcRequest(3)
RpcResponse(4) RpcFailure(5) StreamRequest(6) StreamResponse(7) StreamFailure(8), OneWayMessage(9) User(-1)

每个字节的意义如下

4b ac a6 9f 83 5d 17 a9 // 8个字节是UUID 05 bd // 十进制1469payload长度

具体的Payload就长下面这个样子,可以看出使用Java native serialization,一个简单的Echo请求就有1469个字节,还是很大的,序列化的效率不高。但是Spark RPC定位内部通信,不是一个通用的RPC框架,并且使用的量非常小,所以这点消耗也就可以忽略了,还有Spark Structured Streaming使用该序列化方式,其性能还是可以满足要求的。

另外,作者在kraps-rpc中还给Spark-rpc做了一次性能测试,具体可以参考github

 

总结

作者从好奇的角度来深度挖掘了下Spark RPC的内幕,并且从2.1版本的Spark core中独立出了一个专门的项目Kraps-rpc,放到了github以及发布到Maven中央仓库做学习使用,提供了比较好的上手文档、参数设置和性能评估,在整合kraps-rpc还发现了一个小的改进点,给Spark提了一个PR——[SPARK-21701],已经被merge到了主干,算是contribute社区了(10086个开心)。

接着深入剖析了Spark RPC模块内的类组织关系,使用UML类图和时序图帮助读者更好的理解一些核心的概念,包括RpcEnv,RpcEndpoint,RpcEndpointRef等,以及I/O的设计模式,包括I/O多路复用,Reactor和Actor mailbox等,这里还是重点提下Spark RPC的设计哲学,利用netty强大的Socket I/O能力,构建一个异步的通信框架。最后,从TCP层的segment二进制角度分析了wire protocol。

 

【欢迎访问作者的博客neoremind.com,欢迎技术交流。】

Guess you like

Origin www.cnblogs.com/oush/p/11486498.html
RPC
RPC