nng overview

I. Introduction

1.1 Basic introduction

NNG/nanomsg is a communication library used in recent projects. It is used to implement inter-process procedure calls and inter-thread communication, which is very convenient.

NNG is the successor version of nanomsg, and nanomsg is a C rewritten version of the popular ZMQ (a simple and easy-to-use transport layer, a socket library like a framework).

NNG separates the protocol used for communication from the transmission. The same protocol can work on different transport layers, similar to the layering of the application layer and transport layer of TCP/IP. At the same time, the interface shields the underlying details and uses a string URL uniformly. to describe the transfer mode. In this way, when the use scene is modified, the adaptation can be realized by simply modifying the URL, which is extremely flexible.

At the same time, as NNG describes "light-weight brokerless messaging", the communication parties in NNG do not need the intervention of third-party programs, which is different from the need for servers in MQTT/Redis communication. This is very suitable for use as a communication library without other dependencies.

1.2 Communication protocol

  • PAIR One-to-one two-way communication.
  • PIPELINE (PUSH/PULL) One-way communication, similar to the message queue of the producer-consumer model.
  • PUB/SUB Unidirectional broadcast.
  • REQ/REP request-response mode, similar to RPC mode.
  • BUS mesh connection communication, each joining node can send/receive broadcast messages.
  • SURVEY is used for multi-node voting or service discovery.

1.3 Transmission mode

  • Inproc inter-thread transmission
  • Inter-process transmission within the ipc host
  • Transmission between hosts in tcp network

1.4 Communication Mode

In addition to PAIR, the communication protocol is basically a one-to-many communication mode, which needs attention. Take PIPELINE and PUB/SUB as examples:

  1. The PUSH end of PIPELINE is the client, one PUSH can connect to multiple PULL ends, and one of the available ones will be selected when sending data; the PULL end is the server, and one PULL can receive multiple PUSH connections and data.
  2. The SUB end of PUB/SUB is client, one SUB can connect multiple different PUB ends, and receive the data broadcast by multiple PUB ends; the PUB end is server, one PUB can receive multiple SUB connections and broadcast data.

Based on the above, multiple programs cannot share a PUB/SUB channel to broadcast data, which is different from the topic in ROS and the channel mode in LCM. If you want to achieve similar functionality, you can use PIPELINE + PUB/SUB to handle:

  • A program that independently publishes a topic, with a PULL and PUB.
  • PULL agrees on a URL, and all programs that need to publish the topic will PUSH data to the URL.
  • PUB agrees on a URL, and all programs that need to obtain the topic must SUB to the URL.
  • The internal loop of the program sends the data read by PULL to PUB.

The above can simulate the ROS topic data merging or the similar function of the channel in LCM.

On the whole, the API of NNG is very simple, mainly 4, open/recv/send/close, and the functions used by open will be different according to different protocols. Configuration is setopt/getopt, similar to UNIX API. There is no context-less dependency in the API, and only one nng_socket is needed. This design and implementation method is worth learning (preliminary speculation should be to use the pointer value as the handle. If you want to force the compiler to do type detection, it will set The upper layer of struct, such as typedef struct { _nng_xxx_socket * p } nng_socket;).

The NNG protocol basically covers common communication requirements, and some special requirements can also be realized through combination protocols, such as the method of simulating ROS topic or LCM channel above. In this way, if NNG is used in a program, whether it is multi-process or multi-thread, it can be designed to further enhance modularity and flexibility. If the environment changes, whether the program is changed from multi-process to multi-threaded, or from multi-threaded to multi-host, it is easy to implement.

For communication between common modules/processes/threads, you can use PIPELINE (message queue) or REQ/REP (procedure call) according to specific needs, instead of locks + global variables. Each module unit only needs to do a single related specific transaction, no need Know the global state.

1.5 Code structure

nng.h:

API interface exposed by nng

transport.h:

The definition of the communication layer is mainly for exposing to users for extension, but it currently includes related header files under utils, where inproc.h/ipc.h/tcp.h is the corresponding transport

protocol.h:

The definition of the protocol layer is also to expose to the user to realize the extension, where reqrep.h/pubsub.h/bus.h/pair.h/pipeline.h/survey.h is the corresponding protocol

utils/:

Practical toolkit, including basic data structures (list/queue/hash), mutual exclusion and atomic operations (mutex/atomic), etc.

transports/:

Communication layer implementation, including (inproc: intra-process communication; ipc: inter-process communication; tcp: tcp communication)

protocols/:

Protocol layer implementation, including (REQREP: request response; PUBSUB: subscription publishing, etc.)

core/:

common code

aio/:

Asynchronous operation simulated by thread pool, event-driven with state machine, etc.

Two structure introduction

2.1 nng_aio

An asynchronous I/O handle. The details of this aio structure are private to the AIO framework. This structure has a public name (nng_aio) so that we minimize pollution in the public API namespace. Accessing any of these members by anything outside the AIO framework is a coding error - the definitions are provided here for convenience of inlining, but that should be the only use.

2.2 nni_id_map

We find that we often want to have a list of things listed by numeric ID, which is usually monotonically increasing. This is usually the pipeline ID. To help keep the collection of these things indexed by their IDs (possibly starting from a very large value), we provide a hashtable. The hash table uses open addressing, but we use better probes (taken from Python) to avoid hitting the same locations. Our hashing algorithm is low-bit only, and we use table sizes that are powers of two. Note that hash entries must be non-NULL. The table is protected by an internal lock.

3. Data transmission

3.1 Send data

 
 

nng_sendmsg

nng_aio_set_timeout

nng_aio_set_msg

nng_send_aio

nni_aio_get_msg

nni_sock_find

nni_sock_send --> sock_send

nni_sock_rele

nng_aio_wait

nng_aio_result

3.2 Receive data

 
 

nng_recvmsg

nng_aio_set_timeout

nng_recv_aio

nni_sock_find

nni_sock_recv --> sock_recv

nni_sock_rele

nng_aio_wait

nng_aio_result

nng_aio_free

四、AIO

4.1 AIO Status

An AIO structure can carry up to 4 different input values, up to 4 different output values, and up to 4 different "private state" values. The meaning of input and output is determined by the I/O function being called.

 
 

typedef enum {

NNG_INIT_RECV = 0,

NNG_RECV_RET_SEND,

NNG_SEND_RET_RECV,

NNG_RECV_RET_RECV,

} nng_aio_state_t;

4.2 Introduction to AIOs

AIO can only be "finished" by the caller, who must call nni_aio_finish. Until this happens, the caller guarantees that the AIO is valid. The caller must guarantee that an AIO will "finish" (by calling nni_aio_finish).

Note that the cancel routine may be called multiple times by the framework. The framework (or consumer) guarantees that the AIO will remain valid across these calls, so that the provider is free to check the aio's list membership, etc. But the provider cannot call completion more than once.

nni_aio_lk is used to protect the flags on the AIO and the expiration list on the AIO. We will not allow an AIO to be marked as completed if the due date is not completed.

To sync with expiration, we record aio as expired, and wait for that record to be cleared (or at least not equal to aio) before destroying it.

The aio framework is tightly coupled with the taskq framework. When the caller marks the aio as started (using nni_aio_begin ), we "prepare" the task for the aio and mark the task as busy. Then, when we want to know if the operation itself is complete, all we have to do is wait for the task to complete (the busy flag is cleared).

To prevent aio from being reused during teardown, we set the a_stop flag. Any attempt to initialize for a new operation after that point will fail, and the caller will get NNG_ECANCELED to indicate this. A provider calling nni_aio_begin() must check the return value, and if it returns a non-zero value (NNG_ECANCELED), it must simply discard the request and return.

Calling nni_aio_wait waits for the currently outstanding operation to complete, but does not prevent another operation from starting on the same aio. To stop aio synchronously and prevent any further operations from being started on it, call nni_aio_stop. To prevent an operation from starting without waiting for any existing operations to complete, call nni_aio_close.

In some places we want to check if aio is unused. Technically, if these checks pass, then they don't need to be done with locks, since the caller should have the only reference to them. However, race detectors are not necessarily aware of this semantics, and may complain about potential data races. To suppress false positives, define NNG_RACE_DETECTOR. Note that this will cause additional locks to be acquired, affecting performance, so don't use it in production.

おすすめ

転載: blog.csdn.net/hfut_zhanghu/article/details/123044306