Nodejs inter-process communication

1. Scenario
Node runs in a single thread, but this does not mean that the advantages of multi-core/multi-machine multi-process cannot be used

In fact, Node initially considered the distributed network scenario from the design:

Node is a single-threaded, single-process system which enforces shared-nothing design with OS process boundaries. It has rather good libraries for networking. I believe this to be a basis for designing very large distributed programs. The “nodes” need to be organized: given a communication protocol, told how to connect to each other. In the next couple months we are working on libraries for Node that allow these networks.

PS For the reason why Node is called Node, see Why is Node.js named Node.js?

2. The
communication method of creating a process is related to the method of process generation, and Node has 4 ways to create a process: spawn(), exec(), execFile() and fork()


spawn
const { spawn } = require('child_process');
const child = spawn('pwd');
// 带参数的形式
// const child = spawn('find', ['.', '-type', 'f']);

spawn() returns an instance of ChildProcess. ChildProcess also provides some events based on the event mechanism (EventEmitter API):

exit: Triggered when the child process exits, you can know the process exit status (code and signal)

disconnect: triggered when the parent process calls child.disconnect()

error: The child process failed to be created or triggered when it was killed

close: triggered when the stdio stream (standard input and output stream) of the child process is closed

message: Triggered when the child process sends a message through process.send(), the parent and child processes can communicate through this built-in message mechanism

The stdio stream of the child process can be accessed through child.stdin, child.stdout and child.stderr. When these streams are closed, the child process will trigger the close event

The difference between PSclose and exit is mainly reflected in the scenario where multiple processes share the same stdio stream. The exit of a process does not mean that the stdio stream is closed.

In the child process, stdout/stderr has Readable characteristics, while stdin has Writable characteristics, which is the opposite of the main process:


child.stdout.on('data', (data) => {
  console.log(`child stdout:\n${data}`);
});

child.stderr.on('data', (data) => {
  console.error(`child stderr:\n${data}`);
});

Using the pipeline characteristics of the process stdio stream, you can accomplish more complex things, such as:


const { spawn } = require('child_process');

const find = spawn('find', ['.', '-type', 'f']);
const wc = spawn('wc', ['-l']);

find.stdout.pipe(wc.stdin);

wc.stdout.on('data', (data) => {
  console.log(`Number of files ${data}`);
});

The effect is equivalent to find. -type f | wc -l, recursively count the number of files in the current directory

IPC option
In addition, the IPC mechanism can be established through the stdio option of the spawn() method:


const { spawn } = require('child_process');

const child = spawn('node', ['./ipc-child.js'], { stdio: [null, null, null, 'ipc'] });
child.on('message', (m) => {
  console.log(m);
});
child.send('Here Here');

// ./ipc-child.js
process.on('message', (m) => {
  process.send(`< ${m}`);
  process.send('> 不要回答x3');
});

For more information about the IPC options of spawn(), please check options.stdio

The exec
spawn() method does not create a shell by default to execute the incoming command (so the performance is slightly better), while the exec() method creates a shell. In addition, exec() is not based on stream, but temporarily stores the execution result of the incoming command in the buffer, and then passes it to the callback function.

The feature of the exec() method is that it fully supports shell syntax and can directly pass in any shell script, for example:


const { exec } = require('child_process');

exec('find . -type f | wc -l', (err, stdout, stderr) => {
  if (err) {
    console.error(`exec error: ${err}`);
    return;
  }

  console.log(`Number of files ${stdout}`);
});

However, the exec() method also has the security risk of command injection. Pay special attention to scenarios that contain dynamic content such as user input. Therefore, the applicable scenario of the exec() method is: you want to use shell syntax directly, and the expected output data volume is not large (there is no memory pressure)

So, is there a way that not only supports shell syntax, but also has the advantages of stream IO?

Have. The best of both worlds is as follows:


const { spawn } = require('child_process');
const child = spawn('find . -type f | wc -l', {
  shell: true
});
child.stdout.pipe(process.stdout);

Turn on the shell option of spawn(), and simply connect the standard output of the child process to the standard input of the current process through the pipe() method, so that you can see the command execution result. There is actually an easier way:


const { spawn } = require('child_process');
process.stdout.on('data', (data) => {
  console.log(data);
});
const child = spawn('find . -type f | wc -l', {
  shell: true,
  stdio: 'inherit'
});

stdio:'inherit' allows the child process to inherit the standard input and output of the current process (share stdin, stdout and stderr), so the above example can get the output result of the child process by monitoring the data event of the current process process.stdout

In addition, in addition to stdio and shell options, spawn() also supports some other options, such as:


const child = spawn('find . -type f | wc -l', {
  stdio: 'inherit',
  shell: true,
  // 修改环境变量,默认process.env
  env: { HOME: '/tmp/xxx' },
  // 改变当前工作目录
  cwd: '/tmp',
  // 作为独立进程存在
  detached: true
});

Note that in addition to passing data to the child process in the form of environment variables, the env option can also be used to implement sandbox-style environment variable isolation. By default, process.env is used as the environment variable set of the child process, and the child process can access the same as the current process. All environment variables, if you specify a custom object as the environment variable set of the child process as in the above example, the child process cannot access other environment variables

Therefore, if you want to add/delete environment variables, you need to do this:


var spawn_env = JSON.parse(JSON.stringify(process.env));

// remove those env vars
delete spawn_env.ATOM_SHELL_INTERNAL_RUN_AS_NODE;
delete spawn_env.ELECTRON_RUN_AS_NODE;

var sp = spawn(command, ['.'], {cwd: cwd, env: spawn_env});

The detached option is more interesting:


const { spawn } = require('child_process');

const child = spawn('node', ['stuff.js'], {
  detached: true,
  stdio: 'ignore'
});

child.unref();

The behavior of the independent process created in this way depends on the operating system. The detached child process on Windows will have its own console window, while the process on Linux will create a new process group (this feature can be used to manage the child process family and achieve similar Characteristics of tree-kill)

The unref() method is used to sever the relationship, so that the "parent" process can exit independently (will not cause the child process to exit), but note that the stdio of the child process should also be independent of the "parent" process at this time, otherwise the "parent" process The child process will still be affected after exiting


execFile
const { execFile } = require('child_process');
const child = execFile('node', ['--version'], (error, stdout, stderr) => {
  if (error) {
    throw error;
  }
  console.log(stdout);
});

Similar to the exec() method, but it is not executed through the shell (so the performance is slightly better), so the executable file is required to be passed in. Some files under Windows cannot be executed directly, such as .bat and .cmd, these files cannot be executed with execFile(), only with exec() or spawn() with shell option turned on

PS is not stream-based like exec(), and there is also the risk of output data volume

xxxSync
spawn, exec and execFile all have corresponding synchronous blocking versions, wait until the child process exits

const { 
  spawnSync, 
  execSync, 
  execFileSync,
} = require('child_process');

Synchronous methods are used to simplify script tasks, such as the startup process. These methods should be avoided at other times

Fork
fork() is a variant of spawn(), used to create a Node process. The biggest feature is that the parent-child process has its own communication mechanism (IPC pipeline):


The child_process.fork() method is a special case of child_process.spawn() used specifically to spawn new Node.js processes. Like child_process.spawn(), a ChildProcess object is returned. The returned ChildProcess will have an additional communication channel built-in that allows messages to be passed back and forth between the parent and child. See subprocess.send() for details.

E.g:


var n = child_process.fork('./child.js');
n.on('message', function(m) {
  console.log('PARENT got message:', m);
});
n.send({ hello: 'world' });

// ./child.js
process.on('message', function(m) {
  console.log('CHILD got message:', m);
});
process.send({ foo: 'bar' });

Because fork() has the advantages of its own communication mechanism, it is especially suitable for splitting time-consuming logic, such as:


const http = require('http');
const longComputation = () => {
  let sum = 0;
  for (let i = 0; i < 1e9; i++) {
    sum += i;
  };
  return sum;
};
const server = http.createServer();
server.on('request', (req, res) => {
  if (req.url === '/compute') {
    const sum = longComputation();
    return res.end(`Sum is ${sum}`);
  } else {
    res.end('Ok')
  }
});

server.listen(3000);

The fatal problem with this is that once someone visits /compute, subsequent requests cannot be processed in time, because the event loop is still blocked by longComputation, and the service capacity cannot be restored until the time-consuming calculation is over.

In order to avoid time-consuming operations blocking the event loop of the main process, longComputation() can be split into child processes:


// compute.js
const longComputation = () => {
  let sum = 0;
  for (let i = 0; i < 1e9; i++) {
    sum += i;
  };
  return sum;
};

// 开关,收到消息才开始做
process.on('message', (msg) => {
  const sum = longComputation();
  process.send(sum);
});

The main process opens the child process to execute longComputation:


const http = require('http');
const { fork } = require('child_process');

const server = http.createServer();

server.on('request', (req, res) => {
  if (req.url === '/compute') {
    const compute = fork('compute.js');
    compute.send('start');
    compute.on('message', sum => {
      res.end(`Sum is ${sum}`);
    });
  } else {
    res.end('Ok')
  }
});

server.listen(3000);

The event loop of the main process will no longer be blocked by time-consuming calculations, but the number of processes needs to be further limited, otherwise the service capacity will still be affected when the resources are exhausted by the process

PS In fact, the cluster module is an encapsulation of multi-process service capabilities. The idea is similar to this simple example

3. Communication method
1. Pass json
stdin/stdout and a JSON payload through stdin/stdout

The most direct means of communication, the child process to get the handle, you can visit their stdio stream, and then about
a given one kind of message format began happily communication:


const { spawn } = require('child_process');

child = spawn('node', ['./stdio-child.js']);
child.stdout.setEncoding('utf8');
// 父进程-发
child.stdin.write(JSON.stringify({
  type: 'handshake',
  payload: '你好吖'
}));
// 父进程-收
child.stdout.on('data', function (chunk) {
  let data = chunk.toString();
  let message = JSON.parse(data);
  console.log(`${message.type} ${message.payload}`);
});

The child process is similar:


// ./stdio-child.js
// 子进程-收
process.stdin.on('data', (chunk) => {
  let data = chunk.toString();
  let message = JSON.parse(data);
  switch (message.type) {
    case 'handshake':
      // 子进程-发
      process.stdout.write(JSON.stringify({
        type: 'message',
        payload: message.payload + ' : hoho'
      }));
      break;
    default:
      break;
  }
});

PSVS Code inter-process communication uses this method, see access electron API from vscode extension for details

The obvious limitation is that you need to get the handle of the "child" process, and two completely independent processes cannot communicate in this way (such as cross-application or even cross-machine scenarios)

PS For more information about stream and pipe, please check the stream in Node

2. Native IPC supports
examples such as spawn() and fork(), and processes can communicate with each other through the built-in IPC mechanism

Parent process:

process.on('message')收

child.send() sends

Child process:

process.on('message')收

process.send() sends

The restrictions are the same as above, and one party must be able to get the handle of the other party.

3. Sockets
use the network to complete inter-process communication, not only across processes, but also across machines

node-ipc uses this scheme, for example:


// server
const ipc=require('../../../node-ipc');

ipc.config.id = 'world';
ipc.config.retry= 1500;
ipc.config.maxConnections=1;

ipc.serveNet(
    function(){
        ipc.server.on(
            'message',
            function(data,socket){
                ipc.log('got a message : ', data);
                ipc.server.emit(
                    socket,
                    'message',
                    data+' world!'
                );
            }
        );

        ipc.server.on(
            'socket.disconnected',
            function(data,socket){
                console.log('DISCONNECTED\n\n',arguments);
            }
        );
    }
);
ipc.server.on(
    'error',
    function(err){
        ipc.log('Got an ERROR!',err);
    }
);
ipc.server.start();

// client
const ipc=require('node-ipc');

ipc.config.id = 'hello';
ipc.config.retry= 1500;

ipc.connectToNet(
    'world',
    function(){
        ipc.of.world.on(
            'connect',
            function(){
                ipc.log('## connected to world ##', ipc.config.delay);
                ipc.of.world.emit(
                    'message',
                    'hello'
                );
            }
        );
        ipc.of.world.on(
            'disconnect',
            function(){
                ipc.log('disconnected from world');
            }
        );
        ipc.of.world.on(
            'message',
            function(data){
                ipc.log('got a message from world : ', data);
            }
        );
    }
);

PS For more examples, see RIAEvangelist/node-ipc

Of course, it is a waste of performance to complete inter-process communication through the network in a stand-alone scenario, but the advantage of network communication lies in cross-environment compatibility and further RPC scenarios

4. Both
parent and child processes of message queue communicate through an external message mechanism, and the cross-process capability depends on MQ support

That is, there is no direct communication between processes, but through the middle layer (MQ), adding a control layer can gain more flexibility and advantages:

Stability: The message mechanism provides strong stability guarantees, such as confirmation of delivery (message receipt ACK), retransmission on failure/preventing multiple transmissions, etc.

Priority control: allows to adjust the order of message response

Offline capability: messages can be cached

Transactional message processing: Combine related messages into transactions to ensure their delivery order and integrity

PS is not easy to achieve? Can it be solved with one layer? If not, just two layers...

The more popular ones are smrchy/rsmq, for example:


// init
RedisSMQ = require("rsmq");
rsmq = new RedisSMQ( {host: "127.0.0.1", port: 6379, ns: "rsmq"} );
// create queue
rsmq.createQueue({qname:"myqueue"}, function (err, resp) {
    if (resp===1) {
      console.log("queue created")
    }
});
// send message
rsmq.sendMessage({qname:"myqueue", message:"Hello World"}, function (err, resp) {
  if (resp) {
    console.log("Message sent. ID:", resp);
  }
});
// receive message
rsmq.receiveMessage({qname:"myqueue"}, function (err, resp) {
  if (resp.id) {
    console.log("Message received.", resp)  
  }
  else {
    console.log("No messages for me...")
  }
});

A Redis server will be set up. The basic principles are as follows:


Using a shared Redis server multiple Node.js processes can send / receive messages.

The receiving/sending/caching/persistence of messages depends on the capabilities provided by Redis, and a complete queue mechanism is implemented on this basis

5. The
basic idea of Redis is similar to that of message queue:


Use Redis as a message bus/broker.

Redis has its own Pub/Sub mechanism (ie publish-subscribe mode), which is suitable for simple communication scenarios, such as one-to-one or one-to-many scenarios that do not care about message reliability

In addition, Redis has a list structure, which can be used as a message queue to improve message reliability. The general approach is to produce LPUSH messages and to consume BRPOP messages. It is suitable for simple communication scenarios that require message reliability, but the disadvantage is that the message has no state and no ACK mechanism, which cannot meet the complex communication requirements

P.S.Redis的Pub/Sub示例见What’s the most efficient node.js inter-process communication library/method?

4. Summary
There are 4 ways to communicate between Node processes:

Passing json through stdin/stdout: the most direct way, suitable for scenarios where you can get the handle of the "child" process, suitable for communication between related processes, and cannot cross machines

Node native IPC support: the most native (authentic?) method, which is more "regular" than the previous one, and has the same limitations

Through sockets: the most common way, with good cross-environment capabilities, but there is a performance loss in the network

With the help of message queue: the most powerful way, since communication is required, the scene is still complicated, so you might as well expand a layer of message middleware to solve various communication problems beautifully

参考资料
Node.js Child Processes: Everything you need to know

Guess you like

Origin blog.51cto.com/15080030/2592715