[Hadoop of big data] From custom RPC to Hadoop RPC, understand the underlying working principle of distributed communication system

1 Introduction

HadoopIt is a distributed computing system. In a distributed environment, the network communication module is one of its core modules. To learn well Hadoop, you need to understand the basic working principles of its underlying communication system. HadoopIt provides a complete RPCframework and realizes the elegant encapsulation of the underlying network communication process.

This article will RPCstart from the concept and talk Hadoop RPCabout the implementation details together.

First understand what is RPC?

RPCwhere Ris Remotethe first letter of the word, Pis Procedure the first letter of the word, Cis Callthe first letter.

Translated: remote procedure call. If it is just a translation, what has been said is equal to what has not been said.

For a thorough understanding RPC, you need to understand 过程the meaning:

  • 过程It can be considered as an 方法or 函数, or even an 对象or 子程序. In order to simplify the problem, this article refers 过程to 方法.

同进程Calls between methods in , called 本地调用.

So, can it be considered that calls that occur between different processes are remote calls? In a broad sense, if it is considered that they are not in the same process, there is nothing wrong with saying so.

Tips: In a narrow sense, remote calls refer to method calls between processes in computers with different physical locations. Such as distributed, microservices, B/S...environments.

How to implement procedure call between different processes?

The answer is: use 网络通信模块implementation.

It can be said that: 底层网络通信模块the procedure call between different processes implemented by is 远程调用. Therefore, remote invocation is a broad concept. To apply a slogan: not all milk is called milk deluxe, but 特仑苏it refers to milk. In the same way, not all of them 远程调用are called PRC, but RPCthey must be remote calls.

What kind of remote call is called RPC? You need to start with the underlying process of the remote call.

2. Native network communication

What is native network communication?

Start with a question.

If there is a process now A, it needs a business logic function, and it is found Bin the process. Then I thought: Can I use Bthe method of the process?

1.png

The idea is very good, but after all, it is not my own home, so some methods and measures are needed.

To make it easier to understand, let's take another real-life example: For example, you want to borrow your neighbor's washing machine to wash clothes. Think about it, what would you do? By the way, let's say you live next door to a nice neighbor.

Shouldn't the normal operating procedure be as follows:

  • First, you come to your neighbor's door and start knocking.
  • Neighbors knock on the door for you.
  • You send a request: Hello, can you borrow your washing machine to wash some clothes.
  • A good neighbor in China said: Yes, you can bring the clothes here, and I will wash them for you first.
  • You pack your clothes and give them to your neighbors.
  • The neighbor unpacks your package and throws your clothes in the washing machine.
  • After the neighbors have finished washing the clothes, they pack the washed clothes and give them to you.
  • Finally, don't forget to say thank you when you pick up the clothes.

The method call between different processes is similar to the process of borrowing your neighbor's washing machine to wash clothes. It is only between processes 敲门and 开门needs to be provided by the computer language 网络编程 API.

The process is roughly as follows:

  • BThe process first creates a socketlistener. BThe process must be a very enthusiastic process, waiting for other processes to knock on the door at any time.
  • ABThe process sends a network connection request to the process, and Bafter getting a response, the two establish a network connection. Similar to your neighbor opening the door.
  • APack your own data (similar to clothes) and Binitiate a processing request to the process (similar to laundry requests).
  • BThe process receives your package, unpacks it, and hands Aover the data to its own method for processing.
  • BAfter the method processing is completed, Bthe processing results will be packaged and sent to via network communication A.

The above process can be realized by using JAVAthe language provided API. The complete code is as follows:

  • BProgram code: BPrograms are service providers and may Bbe called server components.
package com.gk.server;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.net.ServerSocket;
import java.net.Socket;
/*
* 服务提供者
*/
public class B {
    
    
	/*
	 * B 中的方法,也是 A 需要的
	 */
	static String hello(String name) {
    
    
		return "Hello!" + name;
	}
    /*
    * B端的网络通信
    * 本质 B 就是服务器 socket
    */
	public static void main(String[] args) throws IOException {
    
    
		// 监听请求
		ServerSocket serverSocket = new ServerSocket(1234);
		// 等待网络连接
		Socket socket = serverSocket.accept();
		// 接受 A 传递过来的数据
		InputStream inputStream = socket.getInputStream();
		byte buffers[] = new byte[20];
		int read = inputStream.read(buffers);
		String name = new String(buffers, 0, read);
        //调用自己的方法,成全 A 的远程调用
		String info = hello(name);
		// 把处理结果传递给 A
		OutputStream outputStream = socket.getOutputStream();
		outputStream.write(info.getBytes());
		inputStream.close();
		outputStream.close();
		socket.close();
		serverSocket.close();
	}
}
  • AThe code of the program: Ait is the end that needs the service, which can Abe called the client program.
package com.gk.clien;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.net.Socket;
import java.net.UnknownHostException;
/*
* 客户端
*/
public class A {
    
    
	public static void main(String[] args) throws UnknownHostException, IOException {
    
    
		// 发起网络连接
		Socket socket = new Socket("localhost", 1234);
		// 准备数据
		String name = "rose";
		// 把数据传递给 B
		OutputStream outputStream = socket.getOutputStream();
		outputStream.write(name.getBytes());
		// 接受 B 处理好后的数据
		InputStream inputStream=socket.getInputStream();
		byte[] buffers=new byte[20];
		int read= inputStream.read(buffers);
		//输出
		System.out.println(new String(buffers,0,read));
		outputStream.close();
		inputStream.close();
		socket.close();
	}
}
  • Test: Execute Bthe program first, then execute Athe program. At Athe end, you can see Bthe data processed by the end.

2.png

In essence, the program structure of Aand Bis based on C/Sthe structure of the communication mechanism.

At this point, there should be a question?

If Athere Bare such requests often, or in addition to Athere are more processes that need requests B. It's similar to how you often have to borrow your neighbor's washing machine to do your laundry.

You will find that you have to go through a series of tedious processes such as knocking on the door and opening the door every time. And really, the only thing that changes each time is the laundry. How can you simplify these processes and make the process more artistic.

In real life, you can hire a proxy. Naturally, neighbors can also hire a proxy. Free yourself from the tedious process.

Tips: You need to understand that hiring an agent only simplifies the workload of the requester, and does not reduce the actual process.

In the same way, when communicating between processes, you can also ask for an agent. The agent here is just not a person but a component.

So far to answer what is native network communication?

Based on native API, honest and step-by-step implementation of network communication is called native network communication.

As I just said, the agent model can be used to realize network communication, the essence of which is the concept of encapsulation.

3. Agent mode

The basic idea of ​​the agent model:

  • Encapsulate native communication systems 公共流程in specific components.
  • Design a proxy for Aand respectively.B
  • When Aor other processes need Bthe service of the process, they only need to pass the data to the agent component, and then comfortably wait for the agent to Breturn the processing result to itself.
  • BAlso, its own proxy component is responsible for receiving Aor passing data from other processes, and correctly calling its own methods and returning data processing results.

Here, Athe high-level business components and Bbusiness service components can be freed from the boring process that they have to face, and can focus on their high-level business wholeheartedly. The essence is the decoupling operation based on the idea of ​​single responsibility.

3.png

Based on the agent idea, now start to customize the simple version of the remote request framework.

  • First of all, Bthe program needs to tell the demander what kind of functions it can provide in the form of an interface. Similar to when a company publishes recruitment information, it is necessary to clearly tell job seekers what kind of specific requirements the job has. Then 服务需求者(A)you need to understand Bthe job requirements, and sign a strict labor contract to clarify your responsibilities.

    The interface here is the protocol, which constrains the behavioral norms of both the supply and demand sides.

package com.gk.protocol;
/*
 * 通信双方共同遵守的行为准则
 */
public interface MyProtocol {
    
    
	  String hello(String name);
}
  • AA program has at least 2independent components:

    **Business components: **In general terms, do something specific.

    **Proxy component: **When the business component has a remote call request, it is implemented by the proxy component.

代理组件The essence is a component designed following the proxy design pattern, and the classes used here javadynamically proxygenerate proxy components. The proxy component itself cannot provide specific implementation, but encapsulates the network API, so as to access the functional modules on the specified host.

Write the proxy component first:

package com.gk.clien;
import java.io.InputStream;
import java.io.OutputStream;
import java.lang.reflect.InvocationHandler;
import java.lang.reflect.Method;
import java.lang.reflect.Proxy;
import java.net.Socket;
import com.gk.protocol.MyProtocol;
/*
 *  A 的代理组件,
 *  功能,代替服务需求组件访问 B 提供的功能。
 *  代理组件必须实现 B 定义的接口以此了解 B 提供的功能。
 *  如果对方有什么功能都不知道,代理者是不合格的
 */
public class AProxy implements InvocationHandler {
	// 远程计算机的 ip
	private String ip;
	// 远程计算机的端口
	private int port;
    
	public AProxy(String ip, int port) {
		this.ip = ip;
		this.port = port;
	}

	/*
	 * 创建动态代理组件
	 */
	MyProtocol createProxy() {
		//基于 B 程序的接口定义动态创建代理者
		MyProtocol myProtocol = (MyProtocol) Proxy.newProxyInstance(AProxy.class.getClassLoader(),
				new Class[] { MyProtocol.class }, this);
		return myProtocol;
	}

	/*
	 * 封装具体的网络请求
	 */
	@Override
	public Object invoke(Object proxy, Method method, Object[] args) throws Throwable {
		// 发起网络连接
		Socket socket = new Socket(this.ip, this.port);
		// 把数据传递给 B
		OutputStream outputStream = socket.getOutputStream();
		// 把参数和方法名传递过去。设计一个简单的字符串格式的通信协议,最好使用 json 数据格式
		StringBuffer info = new StringBuffer(method + "\t");
		for (Object arg : args) {
			// 参数之间使用逗号隔开
			info.append(arg).append(",");
		}
		info.deleteCharAt(info.length() - 1);
		info.append("]");
		outputStream.write(info.toString().getBytes());
		// 接受 B 处理好后的数据
		InputStream inputStream = socket.getInputStream();
		byte[] buffers = new byte[20];
		int read = inputStream.read(buffers);
		// 转换成字符串
		String res=new String(buffers, 0, read);
		outputStream.close();
		inputStream.close();
		socket.close();
		return res;
	}
}

ABusiness components written for :

package com.gk.clien;
import com.gk.protocol.MyProtocol;
/*
 * A 的业务组件
 */
public class AService {
    //依赖 B 接口中定义的功能
	private MyProtocol myProtocol;
	public AService(MyProtocol myProtocol) {
		this.myProtocol = myProtocol;
	}
	/*
	 * 业务方法
	 */
	public void doSomething(String name) {
		// 自己的业务
		System.out.println("自己能实现的业务");
		// 另一部分业务需要远程调用
		String res = this.myProtocol.hello(name);
		System.out.println("远程业务功能模块处理结果:" + res);
	}
}
  • BThere should also be 2components.

    BOutbound business components.

    Bagent.

The written Bbusiness component: it is the implementation of its own interface definition.

package com.gk.server;
import com.gk.protocol.MyProtocol;
/*
* 需要实现自己定义的接口
*/
public class BService implements MyProtocol {
	@Override
	public String hello(String name) {
		return "Hello!" + name;
	}
}

Write the agent component of B: the agent of B mainly parses Athe passed data, and dynamically invokes the function of the business module by means of reflection. In theory, Bthe network connection and network response components should be provided. Because this article is only to explain the concept of remote calling, it highlights the main ones and ignores the secondary ones.

package com.gk.server;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.lang.reflect.InvocationTargetException;
import java.lang.reflect.Method;
import java.net.ServerSocket;
import java.net.Socket;
import com.gk.protocol.MyProtocol;

public class BProxy {
    //真正实现了接口的组件
	private MyProtocol myProtocol;
	public BProxy(MyProtocol myProtocol) {
		this.myProtocol = myProtocol;
	}
    /*
    * 方法中代码有 3 层功能:
    *  A、网络连接
    *  B、解析数据
    *  C、处理数据并返回数据
    * 理论而言,为了响应多用户请求,需使用多线程机制,且需把上述三部分功能设计到 3 个组件中
    */
	void getRes() throws IOException, NoSuchMethodException, SecurityException, IllegalAccessException,
			IllegalArgumentException, InvocationTargetException {
		// 监听请求
		ServerSocket serverSocket = new ServerSocket(1234);
		// 等待网络连接
		Socket socket = serverSocket.accept();
		// 接受 A 传递过来的数据
		InputStream inputStream = socket.getInputStream();
		byte buffers[] = new byte[100];
		int read = inputStream.read(buffers);
         //得到请求数据
		String info = new String(buffers, 0, read);
         //解析请求数据
		String[] strs = info.split("\t");
         //方法名
		String methodName = strs[0];
		//解析参数
		String args[] = strs[1].split(",");
		Class<?> clz = MyProtocol.class;
         // 利用反射机制,根据请求者提供的数据调用相关方法
		Method method = clz.getMethod(methodName, new Class[] { String.class });
		String res = String.valueOf(method.invoke(this.myProtocol, args));
		// 返回给 A
		OutputStream outputStream = socket.getOutputStream();
		outputStream.write(res.getBytes());
		inputStream.close();
		outputStream.close();
		socket.close();
		serverSocket.close();
	}
}
  • test:

    BSide test code:

package com.gk.server;
import java.io.IOException;
import java.lang.reflect.InvocationTargetException;
public class B {
    
    
	public static void main(String[] args) throws NoSuchMethodException, SecurityException, IllegalAccessException, IllegalArgumentException, InvocationTargetException, IOException {
    
    
        //调用代理者
		BProxy bProxy = new BProxy(new BService());
		bProxy.getRes();
	}
}

​sideA test code:

package com.gk.clien;
public class A {
    
    
	public static void main(String[] args)  {
    
    
		//代理对象
		AProxy aProxy=new AProxy("127.0.0.1",1234);
		//业务组件
		AService aService=new AService(aProxy.createProxy());
		//业务实现
		aService.doSomething("world");
	}
}

Execute the B-side test code first, and then test the A-side code. Output result:

4.png

With the help of the agent idea, the encapsulation code of the native network communication can allow Athe program to access the Bfunctional modules in the program without knowing the details of the underlying network communication. Each time it is called, it only needs to pass the data to the agent, which greatly simplifies the process of remote calling.

This is also RPCthe goal. So can native network communication, as well as a custom remote call framework, be called RPC?

Back RPCto the concept of.

RPCThe essence is an idea, or an agreement. It provides a standard for uniformly encapsulating native network communication, which is also called RPCa protocol. In RPCthe protocol or standard, whether it is a client or a server, there is a stubprogram called, which is similar to an agent. Its access process is as follows:

  • The client program calls the program generated by the system in a local way Stub;
  • StubThe program encapsulates the function call information into a message packet according to the requirements of the network communication module, and hands it to the communication module to send to the remote server.
  • After the remote server receives the message, it sends the message to the corresponding Stubprogram;
  • StubThe program unpacks the message, forms the form required by the called process, and calls the corresponding function;
  • The called function is executed according to the obtained parameters, and the result is returned to Stubthe program;
  • StubThe program encapsulates the result into a message, and transmits it to the client program step by step through the network communication module

Tips: RPC It is an idea, a specification, and PRCa program with remote calls implemented based on the specification is called RPCa framework. So PRCthere are differences at the implementation level.

From this point of view, purely native network communication cannot be considered RPC, and a custom remote access framework based on the idea of ​​an agent can be regarded as a crude version of RPCthe implementation.

Tips: j2ee The essence of servletthe specification is also a remote call specification, and its interface specification is httpa protocol. tomcatand programs serlvtwritten based on specifications webare examples of remote calls.

4. Hadoop RPC

4.1 Features and structure

Hadoop RPCC/S(Client/Server)In fact, it is an application example of the model in distributed computing . Hadoop RPCFor , it has the following characteristics.

  • Transparency : Encapsulate the underlying network communication, simplify the call requirements of high-level business components, and the purpose is to make 客户端the call of the server terminal program the same as the local call.
  • high performance . HadoopEach system (such as HDFS、YARN、MapReduceetc.) adopts Master/Slavea structure, which Masteris essentially one RPC Server, responsible for responding and processing Slavethe sent requests. In order to ensure Masterthe maximum concurrent processing capability, RPC Serverit must be a high-performance server.
  • Controllability . JDKThere is already a RPCframework — RMI(Remote Method Invocation,远程方法调用), but RMIit is too large and difficult to control. HadoopReimplemented as much as possible to satisfy lightweight effects.

Hadoop RPCDesigned by adopting a four-tier architecture:

  • Serialization layer : In order to facilitate the transmission of data across machines, Hadoopvarious data will be serialized into byte streams and then transmitted on the network.
  • Function call layer : The essence of the function call layer is to use dynamic proxy to realize remote call.
  • Network transport layer : Based on socketthe real data interaction between the client and the server.
  • Server-side processing layer : Let the server have concurrent processing capabilities. hadoopAdopt an Reactorevent-driven model based on design patterns I/O.

5.jpg

4.2 Using Hadoop RPCs

Hadoop 与 RPCThe relevant main function codes are encapsulated in RPCthe class:

org.apache.hadoop.ipc.RPC

Main method introduction:

  • getProxy/waitForProtocolProxy: Construct a client proxy object (the object implements a protocol) for sending RPCrequests to the server.
public static ProtocolProxy <T>    public static <T> T getProxy(Class<T> protocol,
                                long clientVersion,
                                InetSocketAddress addr, Configuration conf,
                                SocketFactory factory) throws IOException{
    
    }
public static <T> ProtocolProxy<T> waitForProtocolProxy(Class<T> protocol,
                               long clientVersion,
                               InetSocketAddress addr, Configuration conf,
                               int rpcTimeout,
                               RetryPolicy connectionRetryPolicy,
                               long timeout) throws IOException {
    
     }
  • RPC.Builder: Construct a server object for an instance of a protocol (actually a Java interface) to handle requests sent by clients. Use Hadoop RPC can customize your own network request model.
public static Server RPC.Builder (Configuration).build()

hadoop rpcIn addition to the artistry and elegance of code design and the hierarchy of structure. hadoop rpcCorresponding items can be found in the above custom framework for related functional modules. The same implementation is now used hadoop rpcin the feature request. You will find that the whole process is similar to the implementation process in the custom framework.APIhello

  • First, customize PRCthe protocol of the server, which needs to be inherited VersionedProtocol.
package com.hc.rpc;
import java.io.IOException;
import org.apache.hadoop.ipc.VersionedProtocol;
/*
* 功能定义
*/
interface MyProtocol extends VersionedProtocol {
	// 版本号,默认情况下,不同版本号的RPC Client和Server之间不能相互通信
	public static final long versionID = 1L;
	String hello(String name) throws IOException;
	int add(int num1, int num2) throws IOException;
}
  • Implement RPCthe agreement. Hadoop RPCA protocol is just an interface that needs to be implemented to provide actual functionality.
package com.hc.rpc;
import java.io.IOException;
import org.apache.hadoop.ipc.ProtocolSignature;
/*
* 功能实现类
*/
public class MyProtocolmpl implements MyProtocol {
	// 重载的方法,用于获取自定义的协议版本号,
	public long getProtocolVersion(String protocol, long clientVersion) {
		return MyProtocol.versionID;
	}

	// 重载的方法,用于获取协议签名
	public ProtocolSignature getProtocolSignature(String protocol, long clientVersion, int hashcode) {
		return new ProtocolSignature(MyProtocol.versionID, null);
	}

    /*
    *对外的服务方法
    */
	@Override
	public String hello(String name) throws IOException {
		return "hello" + name;
	}
    /*
    *对外的服务方法
    */
	@Override
	public int add(int num1, int num2) throws IOException {
		return num1 + num2;
	}
}
  • Build and start RPC Server. Similar to Bprogram. Use the static class Builder to construct an RPC Server, and call to start()start it Server.
package com.hc.rpc;
import java.io.IOException;
import org.apache.hadoop.HadoopIllegalArgumentException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.ipc.RPC;
import org.apache.hadoop.ipc.RPC.Server;
/*
* 服务提供者
*/
public class HadoopServer {
    
    
	public static void main(String[] args) throws HadoopIllegalArgumentException, IOException {
    
    
		Configuration conf = new Configuration();
        /*
        * BindAddress和Port分别表示服务器的host和监听端口号。
        * NnumHandlers 表示服务器端处理请求的线程数目。
        * 到此为止,服务器处理监听状态,等待客户端请求到达。
        */
		Server server = new RPC.Builder(conf).setProtocol(MyProtocol.class).setInstance(new MyProtocolmpl())
				.setBindAddress("127.0.0.1").setPort(1234).setNumHandlers(5).build();
		server.start();
	}
}
  • Construct RPC Clientand send RPCa request (similar to Aa program). Use the static method getProxyto construct the client proxy object, and directly call the method of the remote end through the proxy object, as follows:
package com.hc.rpc;
import java.io.IOException;
import java.net.InetSocketAddress;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.ipc.RPC;

public class Client {
    
    
	public static void main(String[] args) throws IOException {
    
    
		Configuration conf = new Configuration();
        //动态代理组件
		MyProtocol proxy = (MyProtocol) RPC.getProxy(MyProtocol.class, MyProtocol.versionID,
				new InetSocketAddress("127.0.0.1", 1234), conf);
         //远程调用
		int result = proxy.add(5, 6);
		System.out.println(result);
        //远程调用
		String res = proxy.hello("world");
		System.out.println(res);
	}
}
  • For testing, start the server program first, and then start the client program.

5.png

4. Summary

RPCIt is an architectural idea for remote access. Used to simplify client remote request mode.

Hadoop rpcIt is an example of an RPCidea-based RPCarchitecture. Therefore, when the architecture is used in a distributed computing environment, the server needs to respond to multi-user requests quickly and in parallel, and to ensure data security and robustness. Therefore, understanding its principle and reading the source code can make users hadoopmore transparent when using it.

Guess you like

Origin blog.csdn.net/y6123236/article/details/130480457