Use C++ to develop nodejs extension modules

1. Nodejs module mechanism

First of all, let's briefly introduce the module import mechanism in Nodejs before starting.

1. Node.js core module

For example, for modules such as fs, net, and path, the code is in the nodejs source code (under the lib directory) and exposed to developers through the API. These core modules have their own reserved identifiers. When the identifiers passed in by the require() function and When the core modules are the same, the API of the core module will be returned.

const fs = require('fs');

2. File module

There are two types of file modules:

2.1 Third-party modules

These modules exist as Nodejs dependencies. For example, some common npm packages axios, webpack, etc.

If Nodejs require such a module, it will find the package.json file under the module project. If the package.json file is legal, it will parse the path of the main field.

When a third-party module, such as axios, is passed in to the require() function, the process for Nodejs to find the path of the axios directory is as follows:

  1. Go to node_modules in the current file directory to find
  2. If you don’t find it, go to node_modules in the parent directory of the current file to find it
  3. If you haven't found it yet, go up one level
  4. Repeat 3 until you find a matching module or root directory

Take a monorepo project as an example. Generally, some package management tools in the monorepo, such as yarn workspace, will upgrade some dependencies to the outer directory. Then the sub-project finds the outer dependencies in this way:

node_modules -> find axios here
packages
   package-a
      node_modules -> axios not found
      index.js  -> const axios = require('axios');

2.2 Project Module

Execute require() in the project to load modules starting with "/", "./" or "../" are project modules. Here, the module is loaded according to the relative path or the module pointed to by the absolute path. If you do not specify the suffix name when loading the module, Nodejs will try the suffix name through enumeration. The suffixes are .js, .json, and .node in order, and the file with the .node suffix is ​​the C++ extension.

For example, there is an addon.node file in the directory, we can require to load it (nodejs is supported by default):

const addon = require('./addon');

2. What is Nodejs C++ extension

Nature

Node.js is developed based on C++ (the bottom layer uses chrome v8 as the js engine && libuv to complete the event loop mechanism), so all APIs exposed by its bottom-level header files are also suitable for C++.

As mentioned in the previous section, when nodejs module is routed, it will find the module with the suffix .node by default. In fact, this is a binary file of a C++ module, that is, the compiled C++ module, which is essentially a dynamic link library. For example (Windows dll/Linux so/Unix dylib)

The essential difference between calling native C++ functions and calling C++ extension functions in Nodejs is that the code of the former will be directly compiled into a Node.js executable file, while the latter is in the dynamic link library.

C++ extended loading method

The source code of the loading process of C++ expansion can be referred to:
https://github.com/nodejs/node/blob/master/src/node_binding.cc#L415

to load the dynamic link library file through the method of uv_dlopen to complete

The specific loading process of the C++ extension module (.node binary link library file):

  • Use uv_dlopen to load the .node link library file of the cpp addon when the user executes require for the first time
  • The link library internally assigns the module registration function to mp
  • Pass the two objects of module and exports passed in when executing require to the module registration function (mp instance) for export

Related loading code reference:

void DLOpen(const FunctionCallbackInfo<Value>& args) {
  Environment* env = Environment::GetCurrent(args);
  uv_lib_t lib;
  ...

  Local<Object> module = args[0]->ToObject(env->isolate());

  node::Utf8Value filename(env->isolate(), args[1]);

  // 使用 uv_dlopen 函数打开 .node 动态链接库 
  const bool is_dlopen_error = uv_dlopen(*filename, &lib);

  

  // 将加载出来的动态链接库的句柄转移给 node_module 的实例对象上来
  node_module* const mp = modpending;
  modpending = nullptr;

  ...

  mp->nm_dso_handle = lib.handle;
  mp->nm_link = modlist_addon;
  modlist_addon = mp;

  Local<String> exports_string = env->exports_string();



  // exports_string 其实就是 `"exports"`

  // 这句的意思是 `exports = module.exports`

  Local<Object> exports = module->Get(exports_string)->ToObject(env->isolate());


// exports 和 module 传给模块注册函数导出出去
  if (mp->nm_context_register_func != nullptr) {
    mp->nm_context_register_func(exports, module, env->context(), mp->nm_priv);
  } else if (mp->nm_register_func != nullptr) {
    mp->nm_register_func(exports, module, mp->nm_priv);
  } else {
    uv_dlclose(&lib);
    env->ThrowError("Module has no declared entry point.");
    return;
  }
}

Some pros and cons of C++ extensions

  • C++ is more efficient than js
    • For the code with the same meaning, the efficiency of executing js code in the js interpreter is lower than directly executing a binary file compiled by Cpp (will be verified with demo later)
  • Some existing C++ wheels can be used
    • For example, some commonly used algorithms are only implemented by Cpp on the market and the code is too complicated to implement with JS (such as Bling Hashes string hash digest algorithm, Open SDK)
  • Some system underlying APIs or V8 APIs cannot be called through js, and a cpp addon can be encapsulated (for example: a way to alleviate the process exit caused by Node.js generating heap snapshot)

shortcoming:

  • The cost of development and maintenance is relatively high, and a native language needs to be mastered
  • Added native addon compilation process and extended release process

3. Development history

Here are several ways to develop Nodejs extensions:

original way

This method is more violent, directly using the native modules provided by nodejs to develop header files, such as directly using various APIs related to Nodejs and various APIs of V8 in C++ code. Developers are required to be familiar with nodejs and v8 documentation. And as the relevant APIs are iterated, they cannot be used across versions.

IN

Native Abstractions for Node.js, namely Node.js native module abstract interface set

In essence, it is a bunch of macro judgments. In the upper layer, some compatibility processing has been done for libuv and v8 APIs. It is a relatively stable API for the user side. The disadvantage is that it does not conform to the ABI (Binary Application Interface) and is stable. For different Each version of Node.js needs to recompile the C++ code to adapt to different versions of Nodejs even after reinstalling node_modules each time, that is, the code only needs to be written once, but the user needs to compile it everywhere.

N-API

Compared with NAN, N-API black-boxes all the underlying data structures in Nodejs and abstracts them into interfaces in N-API.

Different versions of Node.js use these interfaces, all of which are stable and ABI-oriented. Under different Node.js versions, the code only needs to be compiled once and can be used directly without recompilation. Released on Nodev8.x.

  • Provides a stable ABI interface in C style
  • Eliminate Node.js version differences
  • Eliminate js engine differences (e.g. Chrome v8, Microsoft ChakraCore, etc.)

Node-Addon-API

The current way of writing Cpp addon advocated by the Node.js community is actually a layer of C++ encapsulation based on N-API (it is still N-API in essence).

The earliest supported version is Nodev10.x (gradually stable after v10.x).

  • APIs are simpler
  • Documentation conscience, easier to write and test
  • official maintenance

This is also the way to write C++ extensions introduced today.

4. Development sample demo

Environment installation

  • install node-gyp
npm i node-gyp -g

node-gyp Here is a C++ build tool officially maintained by nodejs. Almost all Nodejs C++ extensions are built by it. Working based on GYP (generate your project, a build tool from Google), in simple terms, it can be imagined as Webpack for C++.

The function is to compile the C++ file into a binary file (that is, the file with the suffix .node mentioned earlier).

  • Some dependent environments that come with node-gyp (refer to the official documentation, take macos as an example)
    • Python (generally the unix system will come with it)
  • Xcode

At the same time, node-gyp also needs to have a binding.gyp file under the project for configuration. The writing method is similar to json, but you can write comments in it.

For example:

{
  "targets": [
    {
      # 编译之后的拓展文件名称,例如这里就是 addon.node
      "target_name": "addon",
      # 待编译的原 cpp 文件
      "sources": [ "src/addon.cpp" ]
    }
  ]
}

simple demo

This section mainly introduces the development of C++ Addon through some simple demos:

Hello World

After doing some preparatory work, we can first use node-addon-api to develop a simple helloworld

  1. initialization
mkdir hello-world && cd hello-world
npm init -y
# 安装 node-addon-api 依赖
npm i node-addon-api
# 新建一个 cpp 文件 && js 文件
touch addon.cpp index.js
  1. configure binding.gyp
{
  "targets": [
    {
      # 编译出来的 xxx.node 文件名称,这里是 addon.node
      "target_name": "addon",
      # 被编译的 cpp 源文件
      "sources": [
        "addon.cpp"
      ],
      # 为了简便,忽略掉编译过程中的一些报错
      "cflags!": [ "-fno-exceptions"],
      "cflags_cc!": ["-fno-exceptions"],
      # cpp 文件调用 n-api 的头文件的时候能找到对应的目录
      # 增加一个头文件搜索路径
      "include_dirs": [
        "<!@(node -p \"require('node-addon-api').include\")"
      ],
      # 添加一个预编译宏,避免编译的时候并行抛错
      'defines': [ 'NAPI_DISABLE_CPP_EXCEPTIONS' ],
    }
  ]
}
  1. Write native cpp extensions

Paste two codes here, in order to make a distinction and comparison:

Native Node Cpp Addon version:

// 引用 node.js 中的 node.h 头文件

#include<node.h>
namespace demo {
  using v8::FunctionCallbackInfo;
  using v8::Isolate;
  using v8::Local;
  using v8::Object;
  using v8::String;
  using v8::Value;

void Method(const FunctionCallbackInfo<Value>& args) {
  // 通过 v8 中的隔离实例(v8的引擎实例,有各种独立的状态, 包括推管理、垃圾回收等)
  // 存取 Nodejs 环境的实例
  Isolate* isolate = args.GetIsolate();
 //  返回一个 v8 的 string 类型,值为 "hello world" 
  args.GetReturnValue().Set(String::NewFromUtf8(ioslate, "hello world"));
}

void init(Local<Object> exports) {
// nodejs 内部宏,用于导出一个 function
// 这里类似于 exports = { "hello": Method }
  NODE_SET_METHOD(exports, "hello", Method);
}
// 来自 nodejs 内部的一个宏: https://github.com/nodejs/node/blob/master/src/node.h#L839
// 用于注册 addon 的回调函数
NODE_MODULE(addon, init);
}

Node-addon-api version:

// 引用 node-addon-api 的 头文件
#include<napi.h>

// Napi 这个实际上封装的是 v8 里面的一些数据结构,搭建了一个从 JS 到 V8 的桥梁

// 定义一个返回值为 Napi::String 的 函数
// CallbackInfo 是个回调函数类型  info 里面存的是 JS 调用这个函数时的一些信息
Napi::String Method(const Napi::CallbackInfo& info) {
  // env 是个环境变量,提供一些执行上下文的环境
  Napi::Env env = info.Env();
  // 返回一个构造好的 Napi::String 类型的值
  // New是个静态方法,一般第一个参数是当前执行环境的上下变量,第二个是对应的值
  // 其他参数不做过多介绍
  return Napi::String::New(env, "hello world~");
}

// 导出注册函数
// 这里其实等同于 exports = { hello: Method }
Napi::Object Init(Napi::Env env, Napi::Object exports) {
  exports.Set(
    Napi::String::New(env, "hello"),
    Napi::Function::New(env, Method)
  );
  return exports;
}
// node-addon-api 中用于注册函数的宏
// hello 为 key, 可以是任意变量
// Init 则会注册的函数
NODE_API_MODULE(hello, Init);

Some types in the Napi:: namespace in the code here actually wrap some native data structures of v8, making it easier to call. For data structure-related documents, please refer to: https://github.com/nodejs /node-addon-api  API documentation section.

The Napi here is essentially a communication bridge between C++ and JS.

Here is a breakdown to explain the functions of these functions. The Method function is one of our execution functions. Executing this function will return a string value of "hello world".

CallbackInfo corresponds to the FunctionCallbackInfo type in v8 (there are some function callback information in it, stored in the info address), which contains some information needed when the JS function calls this method.

  1. Call cpp addon in js code

By compiling node-gyp on the above cpp, we get a build directory that stores compiled products, and there will be compiled binary dynamic link files (suffix: .node):

$ node-gyp configure build

# 或者为了更简便一点会直接使用 node-gyp rebuild,这个命令包含了清除缓存并重新打包的功能
$ node-gyo rebuild

After compiling, we can import it directly in the js code:

// hello-world/index.js

const { hello } = require('./build/Release/addon');

console.log(hello());

A + B

In the previous section, we mentioned that Napi::CallbackInfo& info info will store some context information when JS calls this function, so we can also get it in info when we pass parameters to the cpp function in js, so we can write the following Simple a + b cpp addon demo:

#include<napi.h>
// 这里为了做演示,把 Napi 直接通过 using namespace 声明了
// 只要该文件不被其他的 cpp 文件引用就不会出现 namespace 污染 这里主要为了简洁
using namespace Napi;

// 因为这里可能会遇到抛 error 的情况,因此返回值类型设置为 Value
// Value 包含了 Napi 里面的所有数据结构
Value Add(const CallbackInfo& info) {
  Env env = info.Env();
  if (info.Length() < 2) {

  // 异常处理相关的 API 可以参考
  // 不过这里可以看到 cpp 里面抛异常代码很麻烦... 建议这里可以在 js 端就处理好
  // https://github.com/nodejs/node-addon-api/blob/main/doc/error_handling.md
    TypeError::New(env, "Number of arg wrong").ThrowAsJavaScriptException();
    return env.Null();
  }

  double a = info[0].As<Number>().Doublevalue();
  double b = info[1].As<Number>().DoubleValue();
  Number num = Number::new(env, a + b);
  return num;
}

// exports = { add: Add };
Object Init(Env env, Object exports) {
  exports.Set(String::New(env, "add"), Function::new(env, Add));
}

NODE_API_MODULE(addon, Init);

Js calls only need:

const { add } = require('./build/Release/addon');

// output is 5.2
console.log(add(2, 3.2));

callback

The same is true for the callback function, which can also be obtained through info, and then post a demo of the cpp addon:

// addon.cpp
#include<napi.h>

// 这一节用 namespace 包裹一下,提前声明一些数据结构
// 省得调用的时候一直 Napi::xxx ...
namespace CallBackDemo {
using Napi::Value;
using Napi::CallbackInfo;
using Napi::Env;
using Napi::TypeError;
using Napi::Number;
using Napi::Object;
using Napi::String;
using Napi::Function;

void RunCallBack(const CallbackInfo &info) {
  Env env = info.Env();
  Function cb = info[0].As<Function>();
  cb.Call(env.Global(), { String::New(env, "hello world") } );
}


Object Init(Env env, Object exports) {
  return Function::New(env, RunCallback);
}

NODE_API_MODULE(addon, Init);
}

Combat demo

The above briefly talked about the use of some simple APIs of node native addon, which can be regarded as a simple introductory teaching. Below is a simple practical demo to see the role of node-addon-api in specific projects:

Expand the case and talk about it, the API that encapsulates v8 is used for debugging

References:

 

A way to alleviate the process exit caused by Node.js generating heap snapshot

Code address:

bytedance/diat​github.com/bytedance/diat/tree/master/packages/addonUploading...reupload cancel

For the main extended source code analysis part, please refer to:

More demos

You can refer to:

Nodejs official addon examples:

https://github.com/nodejs/node-addon-examples​github.com/nodejs/node-addon-examplesUploading... reupload to cancel

Nodejs comes to a dozen cpp to expand the accompanying book code:

https://github.com/XadillaX/nyaa-nodejs-demo​github.com/XadillaX/nyaa-nodejs-demo

5. Performance comparison

You can do a comparison with a simple Demo:

quickSort (O(nlogn))

We can write a quick queue by hand and run it on both sides of JS or CPP to compare performance:

First of all, our cpp addon code can be written like this:

#include<napi.h>
#include<iostream>
#include<algorithm>

// 快排 时间复杂度 O(nlogn) 空间复杂度 O(1)
void quickSort(int a[], int l, int r) {
  if (l >= r) return;
  int x = a[(l + r) >> 1], i = l -1, j = r + 1;
  while (i < j) {
    while (a[++i] < x);
    while (a[--j] > x);
    if (i < j) {
      std::swap(a[i], a[j]);
    }
  }
  quickSort(a, l, j);
  quickSort(a, j + 1, r);
}


Napi::Value Main(const Napi::CallbackInfo& info) {
  Napi::Env env = info.Env();
  Napi::Array arr = info[0].As<Napi::Array>();
  int len = arr.Length();
  // 存返回值
  Napi::Array res = Napi::Array::New(env, len);
  // 初始化数组
  int* arr2 = new int[len];
  // 把 v8 的数据结构转换成 C++ 原生数据结构
  for (int i = 0; i < len; i++) {
    Napi::Value value = arr[i];
    arr2[i] = value.ToNumber().Int64Value();
  }
  // 运行 快排
  quickSort(arr2, 0, len - 1);

  // for (int i = 0; i < len; i ++) {
  //   std::cout << arr2[i] << " ";
  // }
  // std::cout << std::endl;
  // 转回 JS 的数据结构
  for (int i = 0; i < len; i ++) {
    res[i] = Napi::Number::New(env, arr2[i]);
  }
  return res;
}

Napi::Object Init(Napi::Env env, Napi::Object exports) {
  exports.Set(
    Napi::String::New(env, "quicksortCpp"),
    Napi::Function::New(env, Main)
  );
  return exports;
}

NODE_API_MODULE(addon, Init);

The code on the JS side can be written like this:

// 这里使用 bindings 这个库,他会帮我们自动去寻找 addon.node 对应目录
// 不需要再去指定对应的 build 目录了
const { quicksortCpp } = require('bindings')('addon.node');

// 构造一个随机数组出来
const arr = Array.from(new Array(1e3), () => Math.random() * 1e4 | 0);

// 构造两个一样的数组出来
let arr1 = JSON.parse(JSON.stringify(arr));
let arr2 = JSON.parse(JSON.stringify(arr));

console.time('JS');
const solve = (arr) => {
  let n = arr.length;
  const quickSortJS = (arr, l, r) => {
    if (l >= r) {
      return;
    }
    let x = arr[Math.floor((l + r) >> 1)], i = l - 1, j = r + 1;
    while (i < j) {
      while(arr[++i] < x);
      while(arr[--j] > x);
      if (i < j) {
        [arr[i], arr[j]] = [arr[j], arr[i]];
      }
    }
    quickSortJS(arr, l, j);
    quickSortJS(arr, j + 1, r);
  }
  quickSortJS(arr, 0, n - 1);
}

solve(arr2);
console.timeEnd('JS');

// C++ 直接调用 拓展里面的函数
console.time('C++');
const a = quicksortCpp(arr1);
console.timeEnd('C++');

The codes on both sides here are basically the same in terms of implementation. In actual operation, by modifying the length of the array and comparing the efficiency of the two, we can get the following data:

array length/time 1e2 1e3 1e4 1 and 5 1e6
JavaScript 0.255ms 4.391ms 10.810ms 26.004ms 116.914ms
C++ 0.065 ms 0.347ms 2.908ms 23.637ms 234.757ms

Then we can see that when the length of the array is relatively low, the quick sorting efficiency of the C++ Addon is about to blow up JS, but as the length of the array grows, C++ shows a trend of being blown out.

The reason for this situation is the consumption caused by the conversion between the V8 data structure and the native data structure in C++:

This figure is the time to test the fast sorting in C++ alone under 1e5 data

Under the data scale of 1e5, in fact, the quickSort algorithm of cpp only ran for about 6.9ms, and counting the time of data conversion, it ran for a total of 28.9ms...

As the size of the data increases, the overhead caused by this conversion will increase, so if you use C++ at this time, it may not be worth the candle.

In summary, sometimes the performance of packages written in C++ is indeed slightly higher than that of Nodejs JS code, but if the performance is higher, it is not as good as the I/O time consumed by Nodejs opening and executing C++ Addon or The time spent converting between the v8 data structure and the C++ data structure (such as the above Case) may not be worth the candle at this time.

However, in general, for non-parallel && calculation-intensive code, C++ is still more efficient than Nodejs.

Summarize

With the development of the N_API system and the continuous iterative update of the nodejs development team, the cost of developing native addons in the future will become lower and lower, in some specific scenarios (for example, some v8 API scenarios or electron + openCV scenarios need to be used) , nodejs addon may become extremely important, and future usage scenarios will continue to improve.

Guess you like

Origin blog.csdn.net/SE_JW/article/details/130180307