NodeJs multi-core multi-process parallel framework

NodeJs multi-core and multi-process parallel framework implementation
Needless to say, the importance of multi-core programming, let's go straight to the topic, the current nodejs network server has the following ways to support multi-process:

#1 Open multiple processes, each process is bound to a different port, and use a reverse proxy server such as Nginx for load balancing. The advantage is that we can use the powerful Nginx to do some filtering checks and other operations, and at the same time, we can achieve better Equilibrium strategy, but the downside is obvious - we introduce a layer of indirection.

#2 Multiple processes are bound to listen on the same port
. In nodejs, the function of sending "file handles" between processes is provided. This function is really useful (it seems to be a patch submitted by yahoo engineers).
I don't understand, you can see it Here: http://www.lst.de/~okir/blackhats/node121.html
In node we can achieve the effect through the following functions:

stream.write(string, encoding='utf8', [fd]

or after fork child process in node v0.5.9+:

child.send(message, [sendHandle])

So we design the following scheme: after the master process generates the listen port, it sends the listenfd to all worker sub-processes. After the worker sub-process receives the handle, it executes the listen operation:

master :
function startWorker(handle){
    
    

output("start workers :" + WORKER_NUMBER);

worker_succ_count = 0;

for(var i=0; i var c = cp.fork(WORKER_PATH);

c.send({
    
    "server" : true}, handle);

}

}


function startServer(){
    
    

var tcpServer = net.createServer();

tcpServer.on("error", function(err){
    
    

output("server error ,check the port...");

about_exit();

})

tcpServer.listen(PORT , function(){
    
    

startWorker(tcpServer._handle);

tcpServer.close();

});

}


startServer();

Note that because we only need one handle, httpServer is actually a layer of encapsulation of netServer, so we start netServer in the master process and send the listen socket "handle" to each child process

worker :
server = http.createServer(function(req, res){
    
    

var i,r;

for(i=0; i<10000; i++){
    
    

r = Math.random();

}

res.writeHead(200 ,{
    
    "content-type" : "text/html"});

res.end("hello,world");

child_req_count++;

});


process.on("message",function(m ,handle){
    
    

if(handle){
    
    

server.listen(handle, function(err){
    
    

if(err){
    
    

output("worker listen error");

}else{
    
    

process.send({
    
    "listenOK" : true});

output("worker listen ok");

}

});


});

After the worker process receives the handle, it will listen immediately, so that there will be multiple worker processes listening to the same socket port, that is, the same socket is added to the epoll monitoring structure of multiple processes. When an external connection arrives, At this point only one lucky worker process gets the activation event and receives the connection. (It is mentioned in UNP that this situation will lead to a "shocking herd" effect, but according to rumors in the rivers and lakes, in Linux systems above 2.6, the blocking listenfd has eliminated the shocking group phenomenon, and the non-blocking listenfd still exists, that is, our epoll There will still be this problem, but I personally think that there are often many monitoring handles in the epoll structure of nodejs instead of just listenfd, so the impact of the shocking group at this time should be relatively small...)

We run 5 worker tests (the following tests are all with keep-alive mode enabled, local tests):

The test business is as shown in the above code: run Math.random() 10K times, and then output "hello, world";

System Configuration:

Linux 2.6.18-164.el5xen x86_64

CPU X5 ,Intel(R) Xeon(R) CPU E5620 @ 2.40GHz

free -m
total used free shared buffers cached
Mem: 7500 3672 3827 0 863 1183


siege -c 100 -r 1000 -b localhost:3458/

The result is:

ransactions: 100000 hits
Availability: 100.00 %
Elapsed time: 10.95 secs
Data transferred: 1.05 MB
Response time: 0.01 secs
Transaction rate: 9132.42 trans/sec
Throughput: 0.10 MB/sec
Concurrency: 55.61
Successful transactions: 100000

The requests handled by the five workers are:

child req total : 23000
child req total : 16000
child req total : 17000
child req total : 22000
child req total : 22000

Test again:

child req total : 13000
child req total : 30000
child req total : 14000
child req total : 22000
child req total : 21000

In this case, our load balancing is based on the "random reception" feature of each worker, which is guaranteed by the operating system. It should be balanced in long-term operation, but it may cause load in the short term. The phenomenon of skew, especially when the client uses a keep-alive connection and does not close it for a long time.

#3 A process is responsible for listening, receiving connections, and then sending the received connections to the child process for processing.

Let's first look at the process of an http server service under normal circumstances, which can be roughly divided into several stages:

listenfd binds listening -> received Tcp connection object → wraps it into a socket object → generates (req, res) object -> calls user code

(#1) TCP.bind --- > TCP.listen (process.binding(“tcp_wrap”))
|
|TCP.emit(“connection” ,handle)
|
(#2) Wrap TCP handle to Socket (Tcp.onconnection)
|
|Net.Server.emit(”connection” , socket)
|
(#3)Create Req ,Res based on a Socket(net.server.connectionListen)
|
|Http.server.emit(“request” ,req ,res)
|
(#4)your code writen here :function(req ,res){
    
    
res.writeHead(200 ,“content-type/text/html”);
res.end(“hello,world”)
}

The child.send(message, [sendHandle]) function of nodejs, the sendHandle here should be a tcp_wrap object at this time, so we can't directly use the socket returned to us by net.createServer, otherwise we need to "roll back" from Tcp The step to Socket is not only a waste of resources, but also unsafe, so we use tcp_wrap directly in tcpMaster:

var TCP = process.binding("tcp_wrap").TCP;


var childs = [];

var last_child_pos = 0;

function startWorker(){
    
    

for(var i=0; i var c = cp.fork(WORKER_PATH);

childs.push(c);

}

}

function startServer(){
    
    

server = new TCP();

server.bind(ADDRESS, PORT);

server.onconnection = onconnection;

server.listen(BACK_LOG);

}

function onconnection(handle){
    
    

//output("master on connection");

last_child_pos++;

if(last_child_pos >= WORKER_NUMBER){
    
    

last_child_pos = 0;

}

childs[last_child_pos].send({
    
    "handle" : true}, handle);

handle.close();

}

startServer();

startWorker();

The above is that the tcpMaster process evenly distributes the received tcp connections to tcpWorkers:

function onhandle(self, handle){
    
    

if(self.maxConnections && self.connections >= self.maxConnections){
    
    

handle.close();

return;

}

var socket = new net.Socket({
    
    

handle : handle,

allowHalfOpen : self.allowHalfOpen

});

socket.readable = socket.writable = true;

socket.resume();

self.connections++;

socket.server = self;

self.emit("connection", socket);

socket.emit("connect");

}

server = http.createServer(function(req, res){
    
    

var r, i;

for(i=0; i<10000; i++){
    
    

r = Math.random();

}

res.writeHead(200 ,{
    
    "content-type" : "text/html"});

res.end("hello,world");

child_req_count++;

});

}

process.on("message",function(m ,handle){
    
    

if(handle){
    
    

onhandle(server, handle);

}

if(m.status == "update"){
    
    

process.send({
    
    "status" : process.memoryUsage()});

}

});

The above is that tcpWorker encapsulates the received tcp handle into a socket. In order to be fully compatible with the http.server class, we also check the number of connections, and set socket.server as the current server, and then activate the "http.server" connection" event.

In this way, we use as little overhead as possible to achieve load balancing and efficient parallelism with as little and elegant code as possible on the premise of fully ensuring the compatibility of the http.server class.
The test results are as follows:

ransactions: 100000 hits
Availability: 100.00 %
Elapsed time: 10.47 secs
Data transferred: 1.05 MB
Response time: 0.01 secs
Transaction rate: 9551.10 trans/sec
Throughput: 0.10 MB/sec
Concurrency: 60.68
Successful transactions: 100000

child req total : 20000
child req total : 20000
child req total : 20000
child req total : 20000
child req total : 20000

The data will fluctuate. The overall qps is in the range of 8000~11000. Note that the number of workers above is set to 5. If the number of workers is increased appropriately, the qps can reach 10k stably, but the system load is relatively high at this time, so you need to choose carefully when using .

After several tests are completed, we check /proc/[tcpMaster]/fd , the ports occupied by it are as follows:

0 -> /dev/pts/30
1 -> /dev/pts/30
10 -> socket:[71040]
11 -> socket:[71044]
12 -> socket:[71054]
2 -> /dev/pts/30
3 -> eventpoll:[71027]
4 -> pipe:[71028]
5 -> pipe:[71028]
6 -> socket:[71030]
8 -> socket:[71032]
9 -> socket:[71036]

Check out one of the tcpWorkers:

0 -> socket:[71031]
1 -> /dev/pts/30
2 -> /dev/pts/30
3 -> eventpoll:[71049]
4 -> pipe:[71050]
5 -> pipe:[71050]

The meanings of fds of tcpMaster are as follows:

1个socket为listenfd

5个socket 用作父子进程通信

2个pipe(一对)用于asyn_watcher/signal_watcher 的触发

The rest is not explained...

The meanings of fds of tcpWorker are as follows:

1 socket (stdin here) used to communicate with the parent process

The rest of the fd is similar to the fd in the master

Therefore, the tcpMaster/tcpWorker ports are occupied normally, there is no handle leakage problem, and the load balancing is controllable, but the master responsible for receiving sockets needs to reallocate the sending sockets, which introduces additional overhead.

summary:

This article introduces two more efficient multi-process operation methods. Both methods have their own advantages and disadvantages, and users need to choose by themselves. In node v0.5.10+, the cluster library is built-in, but in my opinion, its publicity significance is greater than Practical significance, because in this way, the official can confidently claim to directly support the multi-process operation mode. In order to make the API interface foolish, the nodejs official uses some trickier methods, and the code is also more winding, and this multi-process method is inevitable. It involves process communication, process management and other things, but we often have our own needs. Now that nodejs officially solidifies it into lib, we cannot freely change and add some functions.

In addition, there are two node modules, multi-node and cluster. The strategies adopted are similar to those described in this article, but using these modules often has some disadvantages:

Update is not timely

Complex and huge, often bound with many other functions, users are often kidnapped

Difficult to hack

Based on the introduction of this article, you can easily create your own high-performance, easy-to-maintain, minimal, elegant and practical cluster, enjoy it!

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324342874&siteId=291194637