Protocol System Reference mqtt

Author: arctic
link: https: //zhuanlan.zhihu.com/p/28525517
Source: know almost
copyrighted by the author. Commercial reprint please contact the author authorized, non-commercial reprint please indicate the source.

Review your work experience most regret is not implemented in code designed to leave the system in a hurry! The main purpose of writing this article is to share ideas about the realization of communication services, designed to facilitate their communication services, also hopes to share practical knowledge inadequate design. The company is working to make electric vehicle charging can be said to be a very great things project, a EVCS system (Electric vehicle charging system), including APP, cloud platform, charging pile, electric cars and other parts. Among the many cloud platform communication service is a service charge of access gateways and embedded with back-end business services coordinated middleware. Today, mainly based on their own experiences to share about the implementation details of communications services, including the practice also has some thinking to do for the system defects. In this article is not only limited to the electric vehicle charging system to charge an electric vehicle systems, for example, may be designed as a reference system based mqtt protocol.

Term Description

Embedded Gateway: It generally consists of four parts embedded microprocessors, peripheral hardware devices, embedded operating system and user applications and so on. Charge of relay switches in the present system with a server and a communication network.

Charging device (charging pile): for electric vehicle charging equipment is connected via a charging gun and car, which contains an embedded gateway.

comm: a broker needs to expand our program to achieve, communication short.

Communication services: software services and embedded gateway is responsible for communications, composed by the broker and comm.

demand analysis

Development of M2M communications services face communication protocol between communication services and embedded gateway: the first requirement

Communication is very common for things and the key, whether it is short-range wireless transmission technology or mobile communication technology, are affecting the development of things. In the communication, the communication protocol is particularly important, refers to both the entity to complete the rules and conventions of communication or service that must be followed, things commonly used communication protocols: MQTT, DDS, AMQP, XMPP, JMS, REST, CoAP these types of protocols have been widely used, and each protocol has at least 10 kinds of code implementation, have declared support for real-time publish / subscribe protocol of things, but the specific things the system architecture design, consider the actual scene of communication needs, select the appropriate protocol.

The second demand: in front of thousands of highly available link socket is placed in front of the problem

Charging industry prospects how much of the charging post to meet the needs of the market is difficult to predict a problem, we should be designed from the beginning to ensure communication service can be extended levels, from the government point of view of charging project is a livelihood project can not be every day the problem ah! A highly available communication service is the base of things like electric vehicle charging system.

The third demand: data encryption transmission

Data security into the final discussion, the way this part of the implementation of the system related.

The fourth demand: real-time control, real-time monitoring

Electric vehicle charging system is a real-time interactive system, as users browse the web long wait is unbearable. In addition to the business processing time, transmission time should be as short as possible. A message sent through the actual test from the embedded gateway server to communicate time of around 200ms. (Using a 3G router)

The fifth demand: communication service upgrades do not affect long large-scale users

Communication service upgrades or downtime will affect the use of the system, how to quickly find and focus on service recovery is the design of the system should be concerned. Assuming that the above requirements we have been resolved, what did not think of it? This time testers stand out spoken, so many devices attached to a communication service is assumed that the communication service upgrades in question affects a lot of equipment, but ah! (Because it was publishing tasks are testers, each release in the middle of the night and also a lot of times rollback, use one word to describe the release is treading on thin ice). It is for this pain point is not that we should implement the gray publish it? After the service is upgraded so that a small number of devices connected to a server upgrade, and so there is no problem and then confirm a comprehensive upgrade.

The sixth demand: Do not let the avalanche phenomenon

Avalanche phenomenon is due to a service hang up or down eventually lead to the caller abnormalities cause the entire system into an unusable state. In the detailed design of communication services, I will focus on that communication services is how to prevent the occurrence of an avalanche phenomenon.

For the above demand I made the following communication services design, broker and comm deployed on the same server and is a one to one relationship.

 

In the above image communication need programmers work includes the listener and two comm program portion, which is the core business services is how a charging system for metering and billing logic, business services is very complex but the angle of a communication service See primarily used to process data and upload the embedded gateway control commands issued. Of course, different treatment of different business systems business, but you can guarantee that communications service to provide them with basic data.

Ado! Here to share with you how and why design communication service is implemented. By comparison I chose the mqtt protocol as the communication protocol of the Internet of Things system architecture design of the system is determined after the design of the protocol. Select the mqtt choose which broker is facing problems and achieve broker mqtt protocol as shown below, and I had a simple usage statistics in the group.

 

mosquitto only provides a way of bridging not recommended to open and persistence, because IO will reduce the performance of the broker. If you turn off log This leads to a problem not to be investigated, although an amount up but the overall feeling is not strong enough. emqttd does support the needs of the people and is distributed deployment of realization of Chinese products is very rich development documents, but can not meet the needs of the gray release of flexible, open source due to the current broker can not all meet the actual demand, so there will be above system architecture.

Communications service implementation used in communication.

one-way: the sender does not need to wait for the data to send out the receiver returns ACK;
Request-Response: the sender and receiver in a synchronized manner call;
TWO-Way: the sender transmits data in a predetermined out of the receiver will return a ack but the whole process is asynchronous in time.

detailed design:

嵌入式网关在连接通信服务时(包括重连)首先以同步的方式向监听器请求获取URL地址然后再去和具体的broker通信,连接到broker后嵌入式网关不再和监听器通信。<1.获取IP地址>的过程我们可以采用http rest方式,这里可以借鉴httpDNS 的思路 <2.数据传输>的过程采用的是mqtt协议。监听器通过设备ID、协议version和设备重要程度动态分配通信服务器IP给设备端,设备通过分配的地址和broker建立长链接。下面列举一下设备向监听器发送的请求、响应格式:

 

备用服务器地址用于在嵌入式网关端缓存一份通信服务器的地址列表如果监听器服务挂掉(connect监听器超时),设备需要在备用服务器列表中选择一个服务器进行通信。

监听器需要哪些功能呢!

功能1:

对于主流版本的设备可以采用轮询或者加权轮询的方式,主流的设备数量比较多不易采用设备ID路由表的方式。

功能2:

对于灰度发布的设备可以在监听器中配置一张内存路由表,如果设备ID在路由表中就返回对应的通信服务器地址。

功能3:

对于多版本和多协议的需求我们也可以配置一张protocol和version的路由表,将特定版本的设备路由到指定的通信服务器上。为什么要加一张这样的路由表,主要原因是设备定制化需求也是比较多的,这些非标准化的需求如果兼容到标准的产品中可能会导致耦合度越来越高最后很难维护。采用这种路由的方式可以将标准产品和非标准项目区分开。

功能4:

监听器要有向broker直接发布消息的功能,这个设计主要是防止comm程序挂掉后连接着broker的嵌入式网关还在和broker通信导致数据到达不了comm丢失。监听器在检测到comm挂掉后要向broker发送一条嵌入式网关网络重连的消息(不是重启)方便监听器重新给这些设备分配可用的通信服务。这就要求所有的嵌入式网关有一个共同的订阅topic方便broker以广播的方式通知嵌入式网关网络重连,当然也可以是重启、时间同步、通知嵌入式程序升级、数据召唤等。

comm需要哪些功能呢!

功能1:

comm是broker的一个扩展他们共同组成了通信服务,comm需要有订阅和发布的功能用于接收和发送数据。一个嵌入式网关最少需要2个主题:“data/设备ID”用于发送数据给broker,"order/设备ID"用于接收broker发送的指令。

功能2:

上图中我们可以看到监听器和comm之间是有心跳的,如果comm挂掉监听器需要设置该地址的服务为不可用状态直到comm恢复才可以分配broker的地址给嵌入式网关使用。comm在启动的时候需要向监听器注册自己的地址信息,注册成功后监听器以主动请求comm的方式作为心跳,这样可以减少comm的实现复杂度。 心跳的内容可以是连接到本服务的socket数量或者是服务的压力指数,监听器获取这些信息可以实现更好的路由。

功能3:

定时器功能,嵌入式网关的数据需要通信服务定时召唤当然也有网关突发上传的事件消息。comm定时器的设计可以参考如下:

功能4:使用缓存和消息队列。

comm使用缓存的场景是缓存嵌入式网关最后一次上传的数据方便手机端查询实时数据。当然缓存里也可以存放网关当前的网络状态(在线状态,离网状态),MAC,GPS地理位置等。

要显示嵌入式网关的网络状态,需要嵌入式网关连接到broker时发布一条上线消息表示可以接受数据处于上线状态,当网关主动close 链接时也要发送一条离线消息,如果异常断开mqtt协议提供了遗愿让broker代替嵌入式网关发送离线状态消息。这里需要说明一下,broker发送遗愿的时间是1.5个心跳的周期所以设备每次重连的时间间隔最好大于2个心跳周期,这样可以保证设备上线后broker不会再发送遗愿消息,这样网关的网络状态才能是"上线->离线->上线 ->离线",如果重连时间少于1.5个心跳周期就可能出现 "上线->上线->离线" 导致实际网络状态与平台状态不一致。

消息队列是为了与后端服务解耦。除了缓存嵌入式网关上传的数据,还可以用于后端业务服务下发指令。此时你有没有产生疑问,后端业务服务下发指令到kafka而对应的消费者是多个comm,怎样知道设备连到在哪个broker上需要哪个comm来接收指令呢?前面已经说了设备上线的时候需要发送一条上线消息给broker此时comm程序可以把嵌入式网关ID和一个固定的Topic注册到redis缓存中(每个comm程序都有一个固定唯一用于接收指令的kafka Topic),后端业务服务在发送指令时需要先向redis缓存查询设备ID对应的kafka Topic然后发送到kafka,这样订阅该topic的comm程序就可以接收到消息,并通过broker发送给嵌入式网关。

功能5:comm是如何做到防止雪崩现象的!

comm程序的实现依赖broker、redis缓存和kafka消息队列。一个健壮的comm程序应该保证redis服务不可用的时候只会影响到实时数据的更新不会影响到通过kakfa上传的数据,同理kafka服务不可用也不会影响缓存的更新。防止雪崩最简单的思路就是线程池隔离,每个依赖的服务使用一个发送或者接受线程池。被调用方不可用时相应的comm的线程池被阻塞但不会影响到其他线程池正常的工作。具体实现我们可能用Hystrix 。Hystrix 这个神器在这里就不在细说如果希望通信服务在不可用状态恢复正常少不了他。

监听器的压力分析

根据个人经验socket链接在8000+ 时每分钟断开链接重连的次数大约在50次左右,如果其中一个broker 宕机并发请求数量可能增加到几千。因为嵌入式网关有应对监听器宕机的策略和监听器在初始化时就将策略加载到内存,程序运行中很少进行数据库IO所以监听器使用单机基本也可以满足需求,如果不放心可以使用主从方案。

mqtt协议如何才能保证设备与云平台状态一致

mqtt协议中提到的qos(quality of service) 只是数据到达broker的服务质量不能保证消费者(comm程序)不丢失数据。为了保证数据安全到达后端业务服务需要在设计业务流程时要添加业务层的ACK机制。这个确认由后端业务服务确认如果在规定的时间内没有收到ACK消息嵌入式网关需要重发。

上图中的消息都是以异步的方式发送而且一条消息到达对方中间需要经过kafka,comm,broker如果其中一方出问题都可能导致数据丢失。一条指令下发给嵌入式网关需要2次确认才能保证双方状态的一致性,

ACK1表示网关已经收到指令并且已经执行完成(成功或者失败),后端业务服务收到ACK1后根据结果做相应的业务处理。

ACK2是后端业务服务返回给嵌入式网关的确认,表示我已经收到你的返回结果咱俩的状态一致了,ACK1和ACK2里面应该包含业务所需要的数据。只有确认消息还不能满足功能实现的要求,后端业务服务在下发指令时应该包含一个指令生命周期T,后端业务服务和嵌入式网关的本次处理必须要在这个生命周期内完成,如果超过T时间都以超时处理。

1.如果指令丢失对嵌入式网关不会有影响,后端业务系统做超时处理。

2.如果ACK1丢失后端业务服务不会发送ACK2嵌入式设备如果在一个生命周期T后如果收不到ACK2嵌入式网关要进行业务回滚(结束充电)。

3.同理,如果ACK2丢失嵌入式设备也要做业务回滚。此时会导致平台与设备的状态不一致(后端业务服务已经收到ACK1认为执行完成)所以在回滚后要上传一次业务结束明细结果。因为这个返回结果是保证平台和设备状态一致的重要报文要有业务ACK机制。如果嵌入式网关没有业务回滚不需要立即发送业务明细等业务结束后发送。

上面的报文流程只是一个保证状态一致的例子,大家可以根据自己的业务添加相应的报文消息。

数据加密和身份认证等安全措施mqtt协议已经为我们考虑,详细内容可以参考mqtt协议使用手册。

设计回顾

1.监听器和comm都是对broker的一个扩展。监听器可以让socket的路由规则更加灵活而且解决了一些broker不支持分布式部署的问题;comm程序用于管理和使用broker的消息同时把设备ID和相应的topic注册到缓存方便后端业务服务下发指令。

2.每个comm接收指令的kafka topic是唯一的,这样才能保证只有一个comm来接收后端业务发送过来的指令,如果后端业务在缓存中查询不到设备ID对应的topic表示设备在离线状态不能发送指令给嵌入式网关。

3.kafka 消息队列非常灵活,后端业务服务如果使用相同的客户端ID接收消息则只有一个消费者可以请求到其他消费者不会得到重复的消息;如果后端业务采用不同的客户端ID,kafka则是广播的方式每个消费者都会收到相同的消息。

最后感谢在通信服务上给予我一路指导的师傅。

 

Guess you like

Origin www.cnblogs.com/coolYuan/p/12161135.html