1 Introduction
Hadoop
It is a distributed computing system. In a distributed environment, the network communication module is one of its core modules. To learn well Hadoop
, you need to understand the basic working principles of its underlying communication system. Hadoop
It provides a complete RPC
framework and realizes the elegant encapsulation of the underlying network communication process.
This article will RPC
start from the concept and talk Hadoop RPC
about the implementation details together.
First understand what is RPC
?
RPC
where R
is Remote
the first letter of the word, P
is Procedure
the first letter of the word, C
is Call
the first letter.
Translated: remote procedure call. If it is just a translation, what has been said is equal to what has not been said.
For a thorough understanding RPC
, you need to understand 过程
the meaning:
过程
It can be considered as an方法
or函数
, or even an对象
or子程序
. In order to simplify the problem, this article refers过程
to方法
.
同进程
Calls between methods in , called 本地调用
.
So, can it be considered that calls that occur between different processes are remote calls? In a broad sense, if it is considered that they are 远
not in the same process, there is nothing wrong with saying so.
Tips: In a narrow sense, remote calls refer to method calls between processes in computers with different physical locations. Such as distributed, microservices,
B/S
...environments.
How to implement procedure call between different processes?
The answer is: use 网络通信模块
implementation.
It can be said that: 底层网络通信模块
the procedure call between different processes implemented by is 远程调用
. Therefore, remote invocation is a broad concept. To apply a slogan: not all milk is called milk deluxe, but 特仑苏
it refers to milk. In the same way, not all of them 远程调用
are called PRC
, but RPC
they must be remote calls.
What kind of remote call is called RPC
? You need to start with the underlying process of the remote call.
2. Native network communication
What is native network communication?
Start with a question.
If there is a process now A
, it needs a business logic function, and it is found B
in the process. Then I thought: Can I use B
the method of the process?
The idea is very good, but after all, it is not my own home, so some methods and measures are needed.
To make it easier to understand, let's take another real-life example: For example, you want to borrow your neighbor's washing machine to wash clothes. Think about it, what would you do? By the way, let's say you live next door to a nice neighbor.
Shouldn't the normal operating procedure be as follows:
- First, you come to your neighbor's door and start knocking.
- Neighbors knock on the door for you.
- You send a request: Hello, can you borrow your washing machine to wash some clothes.
- A good neighbor in China said: Yes, you can bring the clothes here, and I will wash them for you first.
- You pack your clothes and give them to your neighbors.
- The neighbor unpacks your package and throws your clothes in the washing machine.
- After the neighbors have finished washing the clothes, they pack the washed clothes and give them to you.
- Finally, don't forget to say thank you when you pick up the clothes.
The method call between different processes is similar to the process of borrowing your neighbor's washing machine to wash clothes. It is only between processes 敲门
and 开门
needs to be provided by the computer language 网络编程 API
.
The process is roughly as follows:
B
The process first creates asocket
listener.B
The process must be a very enthusiastic process, waiting for other processes to knock on the door at any time.A
B
The process sends a network connection request to the process, andB
after getting a response, the two establish a network connection. Similar to your neighbor opening the door.A
Pack your own data (similar to clothes) andB
initiate a processing request to the process (similar to laundry requests).B
The process receives your package, unpacks it, and handsA
over the data to its own method for processing.B
After the method processing is completed,B
the processing results will be packaged and sent to via network communicationA
.
The above process can be realized by using JAVA
the language provided API
. The complete code is as follows:
B
Program code:B
Programs are service providers and mayB
be called server components.
package com.gk.server;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.net.ServerSocket;
import java.net.Socket;
/*
* 服务提供者
*/
public class B {
/*
* B 中的方法,也是 A 需要的
*/
static String hello(String name) {
return "Hello!" + name;
}
/*
* B端的网络通信
* 本质 B 就是服务器 socket
*/
public static void main(String[] args) throws IOException {
// 监听请求
ServerSocket serverSocket = new ServerSocket(1234);
// 等待网络连接
Socket socket = serverSocket.accept();
// 接受 A 传递过来的数据
InputStream inputStream = socket.getInputStream();
byte buffers[] = new byte[20];
int read = inputStream.read(buffers);
String name = new String(buffers, 0, read);
//调用自己的方法,成全 A 的远程调用
String info = hello(name);
// 把处理结果传递给 A
OutputStream outputStream = socket.getOutputStream();
outputStream.write(info.getBytes());
inputStream.close();
outputStream.close();
socket.close();
serverSocket.close();
}
}
A
The code of the program:A
it is the end that needs the service, which canA
be called the client program.
package com.gk.clien;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.net.Socket;
import java.net.UnknownHostException;
/*
* 客户端
*/
public class A {
public static void main(String[] args) throws UnknownHostException, IOException {
// 发起网络连接
Socket socket = new Socket("localhost", 1234);
// 准备数据
String name = "rose";
// 把数据传递给 B
OutputStream outputStream = socket.getOutputStream();
outputStream.write(name.getBytes());
// 接受 B 处理好后的数据
InputStream inputStream=socket.getInputStream();
byte[] buffers=new byte[20];
int read= inputStream.read(buffers);
//输出
System.out.println(new String(buffers,0,read));
outputStream.close();
inputStream.close();
socket.close();
}
}
- Test: Execute
B
the program first, then executeA
the program. AtA
the end, you can seeB
the data processed by the end.
In essence, the program structure of A
and B
is based on C/S
the structure of the communication mechanism.
At this point, there should be a question?
If A
there B
are such requests often, or in addition to A
there are more processes that need requests B
. It's similar to how you often have to borrow your neighbor's washing machine to do your laundry.
You will find that you have to go through a series of tedious processes such as knocking on the door and opening the door every time. And really, the only thing that changes each time is the laundry. How can you simplify these processes and make 借
the process more artistic.
In real life, you can hire a proxy. Naturally, neighbors can also hire a proxy. Free yourself from the tedious process.
Tips: You need to understand that hiring an agent only simplifies the workload of the requester, and does not reduce the actual process.
In the same way, when communicating between processes, you can also ask for an agent. The agent here is just not a person but a component.
So far to answer what is native network communication?
Based on native API
, honest and step-by-step implementation of network communication is called native network communication.
As I just said, the agent model can be used to realize network communication, the essence of which is the concept of encapsulation.
3. Agent mode
The basic idea of the agent model:
- Encapsulate native communication systems
公共流程
in specific components. - Design a proxy for
A
and respectively.B
- When
A
or other processes needB
the service of the process, they only need to pass the data to the agent component, and then comfortably wait for the agent toB
return the processing result to itself. B
Also, its own proxy component is responsible for receivingA
or passing data from other processes, and correctly calling its own methods and returning data processing results.
Here, A
the high-level business components and B
business service components can be freed from the boring process that they have to face, and can focus on their high-level business wholeheartedly. The essence is the decoupling operation based on the idea of single responsibility.
Based on the agent idea, now start to customize the simple version of the remote request framework.
-
First of all,
B
the program needs to tell the demander what kind of functions it can provide in the form of an interface. Similar to when a company publishes recruitment information, it is necessary to clearly tell job seekers what kind of specific requirements the job has. Then服务需求者(A)
you need to understandB
the job requirements, and sign a strict labor contract to clarify your responsibilities.The interface here is the protocol, which constrains the behavioral norms of both the supply and demand sides.
package com.gk.protocol;
/*
* 通信双方共同遵守的行为准则
*/
public interface MyProtocol {
String hello(String name);
}
-
A
A program has at least2
independent components:**Business components: **In general terms, do something specific.
**Proxy component: **When the business component has a remote call request, it is implemented by the proxy component.
代理组件
The essence is a component designed following the proxy design pattern, and the classes used here java
dynamically proxy
generate proxy components. The proxy component itself cannot provide specific implementation, but encapsulates the network API
, so as to access the functional modules on the specified host.
Write the proxy component first:
package com.gk.clien;
import java.io.InputStream;
import java.io.OutputStream;
import java.lang.reflect.InvocationHandler;
import java.lang.reflect.Method;
import java.lang.reflect.Proxy;
import java.net.Socket;
import com.gk.protocol.MyProtocol;
/*
* A 的代理组件,
* 功能,代替服务需求组件访问 B 提供的功能。
* 代理组件必须实现 B 定义的接口以此了解 B 提供的功能。
* 如果对方有什么功能都不知道,代理者是不合格的
*/
public class AProxy implements InvocationHandler {
// 远程计算机的 ip
private String ip;
// 远程计算机的端口
private int port;
public AProxy(String ip, int port) {
this.ip = ip;
this.port = port;
}
/*
* 创建动态代理组件
*/
MyProtocol createProxy() {
//基于 B 程序的接口定义动态创建代理者
MyProtocol myProtocol = (MyProtocol) Proxy.newProxyInstance(AProxy.class.getClassLoader(),
new Class[] { MyProtocol.class }, this);
return myProtocol;
}
/*
* 封装具体的网络请求
*/
@Override
public Object invoke(Object proxy, Method method, Object[] args) throws Throwable {
// 发起网络连接
Socket socket = new Socket(this.ip, this.port);
// 把数据传递给 B
OutputStream outputStream = socket.getOutputStream();
// 把参数和方法名传递过去。设计一个简单的字符串格式的通信协议,最好使用 json 数据格式
StringBuffer info = new StringBuffer(method + "\t");
for (Object arg : args) {
// 参数之间使用逗号隔开
info.append(arg).append(",");
}
info.deleteCharAt(info.length() - 1);
info.append("]");
outputStream.write(info.toString().getBytes());
// 接受 B 处理好后的数据
InputStream inputStream = socket.getInputStream();
byte[] buffers = new byte[20];
int read = inputStream.read(buffers);
// 转换成字符串
String res=new String(buffers, 0, read);
outputStream.close();
inputStream.close();
socket.close();
return res;
}
}
A
Business components written for :
package com.gk.clien;
import com.gk.protocol.MyProtocol;
/*
* A 的业务组件
*/
public class AService {
//依赖 B 接口中定义的功能
private MyProtocol myProtocol;
public AService(MyProtocol myProtocol) {
this.myProtocol = myProtocol;
}
/*
* 业务方法
*/
public void doSomething(String name) {
// 自己的业务
System.out.println("自己能实现的业务");
// 另一部分业务需要远程调用
String res = this.myProtocol.hello(name);
System.out.println("远程业务功能模块处理结果:" + res);
}
}
-
B
There should also be2
components.B
Outbound business components.B
agent.
The written B
business component: it is the implementation of its own interface definition.
package com.gk.server;
import com.gk.protocol.MyProtocol;
/*
* 需要实现自己定义的接口
*/
public class BService implements MyProtocol {
@Override
public String hello(String name) {
return "Hello!" + name;
}
}
Write the agent component of B: B
the agent of B mainly parses A
the passed data, and dynamically invokes the function of the business module by means of reflection. In theory, B
the network connection and network response components should be provided. Because this article is only to explain the concept of remote calling, it highlights the main ones and ignores the secondary ones.
package com.gk.server;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.lang.reflect.InvocationTargetException;
import java.lang.reflect.Method;
import java.net.ServerSocket;
import java.net.Socket;
import com.gk.protocol.MyProtocol;
public class BProxy {
//真正实现了接口的组件
private MyProtocol myProtocol;
public BProxy(MyProtocol myProtocol) {
this.myProtocol = myProtocol;
}
/*
* 方法中代码有 3 层功能:
* A、网络连接
* B、解析数据
* C、处理数据并返回数据
* 理论而言,为了响应多用户请求,需使用多线程机制,且需把上述三部分功能设计到 3 个组件中
*/
void getRes() throws IOException, NoSuchMethodException, SecurityException, IllegalAccessException,
IllegalArgumentException, InvocationTargetException {
// 监听请求
ServerSocket serverSocket = new ServerSocket(1234);
// 等待网络连接
Socket socket = serverSocket.accept();
// 接受 A 传递过来的数据
InputStream inputStream = socket.getInputStream();
byte buffers[] = new byte[100];
int read = inputStream.read(buffers);
//得到请求数据
String info = new String(buffers, 0, read);
//解析请求数据
String[] strs = info.split("\t");
//方法名
String methodName = strs[0];
//解析参数
String args[] = strs[1].split(",");
Class<?> clz = MyProtocol.class;
// 利用反射机制,根据请求者提供的数据调用相关方法
Method method = clz.getMethod(methodName, new Class[] { String.class });
String res = String.valueOf(method.invoke(this.myProtocol, args));
// 返回给 A
OutputStream outputStream = socket.getOutputStream();
outputStream.write(res.getBytes());
inputStream.close();
outputStream.close();
socket.close();
serverSocket.close();
}
}
-
test:
B
Side test code:
package com.gk.server;
import java.io.IOException;
import java.lang.reflect.InvocationTargetException;
public class B {
public static void main(String[] args) throws NoSuchMethodException, SecurityException, IllegalAccessException, IllegalArgumentException, InvocationTargetException, IOException {
//调用代理者
BProxy bProxy = new BProxy(new BService());
bProxy.getRes();
}
}
sideA
test code:
package com.gk.clien;
public class A {
public static void main(String[] args) {
//代理对象
AProxy aProxy=new AProxy("127.0.0.1",1234);
//业务组件
AService aService=new AService(aProxy.createProxy());
//业务实现
aService.doSomething("world");
}
}
Execute the B-side test code first, and then test the A-side code. Output result:
With the help of the agent idea, the encapsulation code of the native network communication can allow A
the program to access the B
functional modules in the program without knowing the details of the underlying network communication. Each time it is called, it only needs to pass the data to the agent, which greatly simplifies the process of remote calling.
This is also RPC
the goal. So can native network communication, as well as a custom remote call framework, be called RPC
?
Back RPC
to the concept of.
RPC
The essence is an idea, or an agreement. It provides a standard for uniformly encapsulating native network communication, which is also called RPC
a protocol. In RPC
the protocol or standard, whether it is a client or a server, there is a stub
program called, which is similar to an agent. Its access process is as follows:
- The client program calls the program generated by the system in a local way
Stub
; Stub
The program encapsulates the function call information into a message packet according to the requirements of the network communication module, and hands it to the communication module to send to the remote server.- After the remote server receives the message, it sends the message to the corresponding
Stub
program; Stub
The program unpacks the message, forms the form required by the called process, and calls the corresponding function;- The called function is executed according to the obtained parameters, and the result is returned to
Stub
the program; Stub
The program encapsulates the result into a message, and transmits it to the client program step by step through the network communication module
Tips:
RPC
It is an idea, a specification, andPRC
a program with remote calls implemented based on the specification is calledRPC
a framework. SoPRC
there are differences at the implementation level.
From this point of view, purely native network communication cannot be considered RPC
, and a custom remote access framework based on the idea of an agent can be regarded as a crude version of RPC
the implementation.
Tips:
j2ee
The essence ofservlet
the specification is also a remote call specification, and its interface specification ishttp
a protocol.tomcat
and programsserlvt
written based on specificationsweb
are examples of remote calls.
4. Hadoop RPC
4.1 Features and structure
Hadoop RPC
C/S(Client/Server)
In fact, it is an application example of the model in distributed computing . Hadoop RPC
For , it has the following characteristics.
- Transparency : Encapsulate the underlying network communication, simplify the call requirements of high-level business components, and the purpose is to make
客户端
the call of the server terminal program the same as the local call. - high performance .
Hadoop
Each system (such asHDFS、YARN、MapReduce
etc.) adoptsMaster/Slave
a structure, whichMaster
is essentially oneRPC Server
, responsible for responding and processingSlave
the sent requests. In order to ensureMaster
the maximum concurrent processing capability,RPC Server
it must be a high-performance server. - Controllability .
JDK
There is already aRPC
framework —RMI(Remote Method Invocation,远程方法调用)
, butRMI
it is too large and difficult to control.Hadoop
Reimplemented as much as possible to satisfy lightweight effects.
Hadoop RPC
Designed by adopting a four-tier architecture:
- Serialization layer : In order to facilitate the transmission of data across machines,
Hadoop
various data will be serialized into byte streams and then transmitted on the network. - Function call layer : The essence of the function call layer is to use dynamic proxy to realize remote call.
- Network transport layer : Based on
socket
the real data interaction between the client and the server. - Server-side processing layer : Let the server have concurrent processing capabilities.
hadoop
Adopt anReactor
event-driven model based on design patternsI/O
.
4.2 Using Hadoop RPCs
Hadoop 与 RPC
The relevant main function codes are encapsulated in RPC
the class:
org.apache.hadoop.ipc.RPC
Main method introduction:
getProxy/waitForProtocolProxy
: Construct a client proxy object (the object implements a protocol) for sendingRPC
requests to the server.
public static ProtocolProxy <T> public static <T> T getProxy(Class<T> protocol,
long clientVersion,
InetSocketAddress addr, Configuration conf,
SocketFactory factory) throws IOException{
}
public static <T> ProtocolProxy<T> waitForProtocolProxy(Class<T> protocol,
long clientVersion,
InetSocketAddress addr, Configuration conf,
int rpcTimeout,
RetryPolicy connectionRetryPolicy,
long timeout) throws IOException {
}
RPC.Builder
: Construct a server object for an instance of a protocol (actually a Java interface) to handle requests sent by clients. UseHadoop RPC
can customize your own network request model.
public static Server RPC.Builder (Configuration).build():
hadoop rpc
In addition to the artistry and elegance of code design and the hierarchy of structure. hadoop rpc
Corresponding items can be found in the above custom framework for related functional modules. The same implementation is now used hadoop rpc
in the feature request. You will find that the whole process is similar to the implementation process in the custom framework.API
hello
- First, customize
PRC
the protocol of the server, which needs to be inheritedVersionedProtocol
.
package com.hc.rpc;
import java.io.IOException;
import org.apache.hadoop.ipc.VersionedProtocol;
/*
* 功能定义
*/
interface MyProtocol extends VersionedProtocol {
// 版本号,默认情况下,不同版本号的RPC Client和Server之间不能相互通信
public static final long versionID = 1L;
String hello(String name) throws IOException;
int add(int num1, int num2) throws IOException;
}
- Implement
RPC
the agreement.Hadoop RPC
A protocol is just an interface that needs to be implemented to provide actual functionality.
package com.hc.rpc;
import java.io.IOException;
import org.apache.hadoop.ipc.ProtocolSignature;
/*
* 功能实现类
*/
public class MyProtocolmpl implements MyProtocol {
// 重载的方法,用于获取自定义的协议版本号,
public long getProtocolVersion(String protocol, long clientVersion) {
return MyProtocol.versionID;
}
// 重载的方法,用于获取协议签名
public ProtocolSignature getProtocolSignature(String protocol, long clientVersion, int hashcode) {
return new ProtocolSignature(MyProtocol.versionID, null);
}
/*
*对外的服务方法
*/
@Override
public String hello(String name) throws IOException {
return "hello" + name;
}
/*
*对外的服务方法
*/
@Override
public int add(int num1, int num2) throws IOException {
return num1 + num2;
}
}
- Build and start
RPC Server
. Similar toB
program. Use the static classBuilde
r to construct an RPC Server, and call tostart()
start itServer
.
package com.hc.rpc;
import java.io.IOException;
import org.apache.hadoop.HadoopIllegalArgumentException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.ipc.RPC;
import org.apache.hadoop.ipc.RPC.Server;
/*
* 服务提供者
*/
public class HadoopServer {
public static void main(String[] args) throws HadoopIllegalArgumentException, IOException {
Configuration conf = new Configuration();
/*
* BindAddress和Port分别表示服务器的host和监听端口号。
* NnumHandlers 表示服务器端处理请求的线程数目。
* 到此为止,服务器处理监听状态,等待客户端请求到达。
*/
Server server = new RPC.Builder(conf).setProtocol(MyProtocol.class).setInstance(new MyProtocolmpl())
.setBindAddress("127.0.0.1").setPort(1234).setNumHandlers(5).build();
server.start();
}
}
- Construct
RPC Client
and sendRPC
a request (similar toA
a program). Use the static methodgetProxy
to construct the client proxy object, and directly call the method of the remote end through the proxy object, as follows:
package com.hc.rpc;
import java.io.IOException;
import java.net.InetSocketAddress;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.ipc.RPC;
public class Client {
public static void main(String[] args) throws IOException {
Configuration conf = new Configuration();
//动态代理组件
MyProtocol proxy = (MyProtocol) RPC.getProxy(MyProtocol.class, MyProtocol.versionID,
new InetSocketAddress("127.0.0.1", 1234), conf);
//远程调用
int result = proxy.add(5, 6);
System.out.println(result);
//远程调用
String res = proxy.hello("world");
System.out.println(res);
}
}
- For testing, start the server program first, and then start the client program.
4. Summary
RPC
It is an architectural idea for remote access. Used to simplify client remote request mode.
Hadoop rpc
It is an example of an RPC
idea-based RPC
architecture. Therefore, when the architecture is used in a distributed computing environment, the server needs to respond to multi-user requests quickly and in parallel, and to ensure data security and robustness. Therefore, understanding its principle and reading the source code can make users hadoop
more transparent when using it.