Build a reliable IoT device connection | MQTT automatic reconnection best practice

background

MQTT is a publish/subscribe model protocol based on the TCP protocol, which is widely used in the Internet of Things, sensor networks and other low-bandwidth, unstable network environments. In these network environments, the network connection is often unstable, and problems such as network failure, signal weakening, and packet loss may occur, resulting in interruption of the connection between the MQTT client and server. In IoT applications, common scenarios that trigger disconnection and reconnection include:

  1. The network environment is bad or the network is disconnected, causing the MQTT client connection to be disconnected due to timeout.

  2. Since the business needs to upgrade and switch the server, the server actively shuts down and disconnects.

  3. The device restarts or the client restarts, and the client actively reconnects.

  4. Other network factors cause the disconnection of the TCP/IP transport layer and the reconnection of the MQTT connection.

In order to ensure a stable connection between the MQTT client and the server, the MQTT client needs to implement reconnection logic to help the MQTT client automatically reconnect to the server, restore the previous subscription relationship, and maintain the session state.

Why MQTT client reconnect code needs good design

MQTT device reconnection is an unavoidable situation in many IoT applications. When designing the reconnection logic of the MQTT client, it is necessary to pay attention to using the correct event callback method, and set a reasonable random backoff time for each reconnection to ensure the long-term stable operation of the client and server, thereby ensuring the normal development of the business.

Unreasonable reconnection logic design may cause many problems:

  1. The failure of the reconnection logic causes the client to silently stop accepting Broker messages.

  2. The client frequently reconnects, and there is no reconnection backoff time, resulting in a DDOS attack on the server Broker.

  3. Clients frequently go online and offline, resulting in unnecessary consumption of Broker server resources.

Reasonable reconnection logic can not only improve the stability and reliability of the MQTT client, avoid data loss and delay caused by network connection interruption, but also reduce the pressure on the server due to frequent connections.

How to design a piece of MQTT client reconnection code

The following aspects need to be considered when designing the MQTT client reconnection code:

  • Set the correct connection keepalive time

The connection keep-alive time of the MQTT client is Keep Alive, which is responsible for detecting the health status of the current connection. The Keep Alive timeout will trigger the client to reconnect and the server to close the client connection. This value will affect the length of time for the server and client to detect that the connection is disconnected and unavailable. Users need to set a reasonable Keep Alive according to their own network status and the expected maximum waiting time.

  • Reconnection strategy and backoff

Users should formulate different reconnection strategies according to different network environments. For example, when the network connection is interrupted, you can set an initial waiting time, and gradually increase the waiting time after each reconnection attempt, so as to avoid a large number of reconnection attempts caused by the network connection interruption. It is recommended to use exponential backoff algorithm or random + step delay to allow enough backoff slots.

  • Connection State Management

The connection state needs to be maintained in the client, including the record of the connection state, the reason for the connection disconnection, the list of subscribed topics and other information. When the connection is interrupted, the client should record the reason for the disconnection and make a corresponding reconnection attempt. However, if the session persistence function is used, the client does not need to save this information by itself.

  • exception handling

Various abnormal situations may occur during the connection process, such as server unavailability, authentication failure, network abnormality, etc. It is necessary to add exception handling logic in the client, and perform corresponding processing according to the abnormal situation. The MQTT 5 protocol provides detailed reasons for such disconnection, and the client can record exception logs, disconnect, reconnect, etc. based on this information.

  • Max Attempts Limit

For some low-power devices, in order to avoid excessive consumption of client resources caused by too many reconnections, sometimes it is necessary to consider limiting the maximum number of reconnection attempts. When the maximum number of attempts is exceeded, the client should abort the reconnection attempt and enter the dormant state to avoid meaningless reconnection.

  • backoff algorithm

There are two commonly used reconnection backoff methods: exponential backoff algorithm (https://en.m.wikipedia.org/wiki/Exponential_backoff) and random backoff. The exponential compensation algorithm is to find the appropriate sending/connecting rate by exponentially increasing the waiting time through a negative feedback mechanism. Random backoff means that by setting the upper and lower limits of the waiting time, each reconnection waits for a random delay time. It is widely used because it is easy to implement.

Reconnect code example

We will take the Paho MQTT C library as an example to demonstrate how to use the asynchronous programming model to elegantly complete the automatic reconnection function. Paho provides a wealth of callback functions. Please note that different callback methods have different trigger conditions and setting methods, including global callbacks, API callbacks, and asynchronous method callbacks. API callbacks are quite flexible, but when the automatic reconnection function is enabled, it is recommended to only use asynchronous callbacks.

Here are routines for all three callback functions, users can use this routine to verify the triggering of the three callback functions.

// 是 Async 使用的回调方法
// 连接成功的异步回调函数,在连接成功的地方进行Subscribe操作。
void conn_established(void *context, char *cause)
{
  printf("client reconnected!\n");
  MQTTAsync client = (MQTTAsync)context;
  MQTTAsync_responseOptions opts = MQTTAsync_responseOptions_initializer;
  int rc;


  printf("Successful connection\n");


  printf("Subscribing to topic %s\nfor client %s using QoS%d\n\n"
           "Press Q<Enter> to quit\n\n", TOPIC, CLIENTID, QOS);
  opts.onSuccess = onSubscribe;
  opts.onFailure = onSubscribeFailure;
  opts.context = client;
  if ((rc = MQTTAsync_subscribe(client, TOPIC, QOS, &opts)) != MQTTASYNC_SUCCESS)
  {
    printf("Failed to start subscribe, return code %d\n", rc);
    finished = 1;
  }
}




// 以下为客户端全局连接断开回调函数
void conn_lost(void *context, char *cause)
{
  MQTTAsync client = (MQTTAsync)context;
  MQTTAsync_connectOptions conn_opts = MQTTAsync_connectOptions_initializer;
  int rc;


  printf("\nConnection lost\n");
  if (cause) {
    printf("     cause: %s\n", cause);
    }
  printf("Reconnecting\n");
  conn_opts.keepAliveInterval = 20;
  conn_opts.cleansession = 1;
  conn_opts.maxRetryInterval = 16;
  conn_opts.minRetryInterval = 1;
  conn_opts.automaticReconnect = 1;
  conn_opts.onFailure = onConnectFailure;
  MQTTAsync_setConnected(client, client, conn_established);
  if ((rc = MQTTAsync_connect(client, &conn_opts)) != MQTTASYNC_SUCCESS)
  {
    printf("Failed to start connect, return code %d\n", rc);
    finished = 1;
  }
}


int main(int argc, char* argv[])
{
    // 创建异步连接客户端需要使用的属性结构体
  MQTTAsync client;
  MQTTAsync_connectOptions conn_opts = MQTTAsync_connectOptions_initializer;
  MQTTAsync_disconnectOptions disc_opts = MQTTAsync_disconnectOptions_initializer;
  int rc;
  int ch;
    // 创建异步连接客户端,不使用 Paho SDK 内置的持久化来处理缓存消息
  if ((rc = MQTTAsync_create(&client, ADDRESS, CLIENTID, MQTTCLIENT_PERSISTENCE_NONE, NULL))
      != MQTTASYNC_SUCCESS)
  {
    printf("Failed to create client, return code %d\n", rc);
    rc = EXIT_FAILURE;
    goto exit;
  }
    // 设置异步连接回调,注意此处设置的回调函数为连接层面的全局回调函数
    // conn_lost 为连接断开触发,有且只有连接成功后断开才会触发,在断开连接的情况下进行重连失败不触发。
    // msgarrvd 收到消息时触发的回调函数
    // msgdeliverd 是消息成功发送的回调函数,一般设置为NULL
  if ((rc = MQTTAsync_setCallbacks(client, client, conn_lost, msgarrvd, msgdeliverd)) != MQTTASYNC_SUCCESS)
  {
    printf("Failed to set callbacks, return code %d\n", rc);
    rc = EXIT_FAILURE;
    goto destroy_exit;
  }
    //设置连接参数
  conn_opts.keepAliveInterval = 20;
  conn_opts.cleansession = 1;
  // 此处设置 API调用失败会触发的回调,接下来进行connect操作所以设置为 onConnectFailure 方法
  conn_opts.onFailure = onConnectFailure;
  // 此处设置 客户端连接API调用成功会触发的回调,由于例程使用异步连接的 API,设置了会导致2个回调都被触发,所以建议不使用此回调
  //conn_opts.onSuccess = onConnect;
    // 注意第一次发起连接失败不会触发自动重连,只有曾经成功连接并断开后才会触发
  conn_opts.automaticReconnect = 1;
  //开启自动重连,并且设置 2-16s 的随机退避时间
  conn_opts.maxRetryInterval = 16;
  conn_opts.minRetryInterval = 2;
  conn_opts.context = client;
  // 设置异步回调函数,此与之前的 API 回调不同,每次连接/断开都会触发
  MQTTAsync_setConnected(client, client, conn_established);
  MQTTAsync_setDisconnected(client, client, disconnect_lost);
    // 启动客户端连接,之前设置的 API 回调只会在这一次操作生效
  if ((rc = MQTTAsync_connect(client, &conn_opts)) != MQTTASYNC_SUCCESS)
  {
    printf("Failed to start connect, return code %d\n", rc);
    rc = EXIT_FAILURE;
    goto destroy_exit;
  }


  ......
}

‍‍

Reply to " Auto Reconnect " in the backstage of the "EMQ Chinese Community" public account to view the detailed code

More options: NanoSDK built-in reconnection strategy

NanoSDK is another MQTT SDK choice besides Paho. NanoSDK is developed based on the NNG-NanoMSG project and uses the MIT License, which is friendly to both open source and business. Compared with Paho, its biggest difference lies in the built-in fully asynchronous I/O and support for Actor programming model. When QoS 1/2 messages are used, a higher message throughput rate can be obtained. Moreover, NanoSDK supports the MQTT over QUIC protocol, which can solve data transmission problems under weak networks when combined with EMQX 5.0, a large-scale IoT MQTT message server. These advantages have made it widely used in the Internet of Vehicles and industrial scenarios.

In NanoSDK, the reconnection strategy has been fully built-in, and users do not need to implement it manually.

//nanosdk 采用自动拨号机制,默认进行重连
nng_dialer_set_ptr(*dialer, NNG_OPT_MQTT_CONNMSG, connmsg);
nng_dialer_start(*dialer, NNG_FLAG_NONBLOCK);

Summarize

This article introduces the importance and best practice of reconnection logic design in the process of MQTT client code implementation. Through this article, readers can design a more reasonable MQTT device reconnection code, reduce the resource overhead of the client and server, and build a more stable and reliable IoT device connection.

Past recommendation

☞ IDC China 2022 IoT Platform Evaluation Report

☞ IoT Platform Trends in 2022: Privatization

☞ 5 failed lessons worth sharing about Internet of Things startups

☞ Selection and comparison of four domestic IoT platforms

☞ Is the [IoT platform] of cloud vendors not popular?

aef70dcf87de2d6e5da0a62aad35deb8.png

Guess you like

Origin blog.csdn.net/klandor2008/article/details/131618493