Some thoughts on data access and data storage based on the Internet of Things

Data access and data storage based on the Internet of Things

One, edge computing

Edge computing refers to the side close to the source of things or data, using an open platform that integrates network, computing, storage, and application core capabilities. The edge of the network can be any functional entity from the data source to the cloud computing center. These entities are equipped with an edge computing platform that integrates the core capabilities of network, computing, storage, and applications, providing end users with real-time, dynamic and intelligent service computing . Different from processing and algorithmic decision-making in the cloud, edge computing is to push intelligence and computing closer to actual actions, while cloud computing needs to be calculated in the cloud. The main differences are reflected in multi-source heterogeneous data processing, bandwidth load and Resource waste, resource restriction, security and privacy protection, etc.
Edge computing has outstanding advantages in application scenarios featuring low latency, high bandwidth, high reliability, massive connections, heterogeneous convergence, and local security and privacy protection, such as smart transportation, smart cities, and smart homes. . Here is a point of intelligent transportation, a large-sized intelligent car as an example. Rapid data processing is a vital ability, and edge computing is the key to realizing autonomous driving. A smart car can essentially be seen as a large, high-powered computer on a wheel, which collects data through multiple sensors. In order for these vehicles to operate safely and reliably, these sensors need to respond immediately to the surrounding environment, and any lag in processing speed can be fatal.
A very vivid example is: each Shanghai citizen sorts and handles garbage by himself, which is called edge computing. Garbage is processed at the side of the garbage can, garbage truck, and garbage station, which is called fog calculation. Regardless of the three or seven twenty one, the garbage is first brought to the garbage dump for centralized processing, called cloud computing.
The application of edge computing solves the deficiencies of cloud computing. With the rise of edge computing, huge amounts of data need to be calculated and immediate feedback is obtained in too many scenarios. These scenarios have begun to expose the shortcomings of cloud computing, mainly as follows:
(1) Big data transmission problem: It is estimated that by 2020, each person will generate 1.5GB of data per day on average. As more and more devices connect to the Internet and generate data, cloud computing with central servers as nodes may encounter bandwidth bottlenecks.
(2) Real-time data processing: According to statistics, driverless cars generate about 1GB of data per second, and Boeing 787s generate more than 5GB of data per second; in 2020, my country's data storage capacity will reach about 39ZB, of which about 30% of the data comes from For the access of IoT devices. Real-time processing of massive amounts of data may make cloud computing powerless.
(3) Privacy and energy consumption issues: Cloud computing has a relatively long path for transmitting private data collected by body wearable, medical, industrial manufacturing and other equipment to the data center, which may easily lead to risks such as data loss or information leakage; high data centers The high energy consumption caused by load is also the core issue of data center management planning.
The development prospect of edge computing is broad and it is called "the last mile of artificial intelligence". However, it is still in its early stage of development and there are many problems to be solved, such as: the selection of framework, the specification of communication equipment and protocols, the identification of terminal equipment, The need for lower latency, etc. With the popularization of IPv6 and 5G technologies, some of these problems will be solved, although this is a long journey. Compared with cloud computing, edge computing has the following advantages. Advantage 1: More nodes to load traffic, making data transmission faster. Advantage 2: Closer to the terminal equipment, safer transmission and more immediate data processing. Advantage 3: More scattered nodes have less impact than cloud computing failures, and also solves the problem of heat dissipation of equipment. The two are not only different, but also cooperate with each other. The above mentioned the shortcomings of cloud computing and the advantages of edge computing. Does it mean that edge computing will be better than cloud computing in the future? actually not! Cloud computing is the interaction between people and computing devices, while edge computing is the interaction between devices and devices, and finally serves people indirectly. Edge computing can process a large amount of real-time data, and cloud computing can finally access the history or processing results of these real-time data and do summary analysis. Edge computing is the complement and extension of cloud computing.
Edge computing is currently mainly used in fields such as autonomous driving, smart home, industrial Internet of Things, etc., but there are still some problems:
①The wide distribution of equipment, each device must install such a set of computing capabilities;
②Different equipment The heterogeneity between the two, whether the logic of edge computing processing and the required capabilities (configuration) need to be distinguished;
③The computing power of edge computing? Does edge node data still need to be stored?
④Will edge computing really cost less?
⑤ Embedded development?

Two, the Internet of Things communication protocol

Network protocol:
Internet of Things communication based on power distribution network equipment first excludes wired protocols (USB, M-Bus) and wireless short-range protocols (NFC, Bluetooth, WIFI). At present, only the long-distance wireless communication protocol meets the requirements.
The long-distance wireless communication protocol includes a cellular communication protocol and a non-cellular communication protocol. Cellular communication protocols are mainly the standards and protocols adopted by various telecom operators under technologies such as 2/3/4/5G and NB-IoT. Non-cellular communication protocols are mainly ZigBee technology and LoRa technology.
First of all, Zigbee is used to connect devices within a range of 10-100 meters, which is not suitable. 2G/3G faces withdrawal from the network. 4/5G has long transmission distance and high speed, but is the cost too high. LoRa is also a good long-distance data communication technology, but its data transmission rate is 0.3kbps to 50kbps. It is necessary to consider whether the transmission rate is sufficient.
Then there is NB-IoT. Here is an introduction to NB-IoT:
NB-IoT, the full name is Narrow Band Internet of Things, also known as Narrow Band Internet of Things, is a 3GPP LPWA cellular solution customized for operators. It uses ultra-narrowband, Repeated transportation, streamlined network protocols and other designs, at the expense of certain speed, delay, mobile performance, etc., to obtain the carrying capacity for LPWA IoT.
From the perspective of the access network, the NB-IoT uplink transmission scheme supports two forms of single-frequency audio transmission and multi-frequency audio transmission. The single-frequency audio solution supports better coverage, capacity and terminal power consumption; the multi-frequency audio solution can be used to support larger peak rates.
In terms of technical characteristics, the deployment method of NB-IoT is relatively fast and flexible. In terms of power consumption and performance, NB-IoT terminals have low power consumption. From the perspective of cost and market promotion, because NB-IoT can be directly deployed in 2G/3G/4G networks, the radio frequency and antennas of existing wireless network base stations can be reused.
Compared with traditional 2G, 3G, and 4G cellular communication modes, NB-IoT has its own advantages of low power consumption, wide coverage, low cost, and large capacity, making it widely applicable to a variety of vertical industries, such as remote meter reading, Asset tracking, smart parking, smart agriculture, etc.

Device-cloud communication protocol: After the
IoT device terminal is connected to the network, it is only the beginning of the IoT application. After the device is connected to the network, the device and the device need to communicate with each other, and the device and the cloud need to communicate with each other. Only by interoperability can the value of the Internet of Things be revealed. Since intercommunication is required, a set of IoT communication protocol is needed. Only devices that follow this set of protocols can communicate with each other and exchange data.
Commonly used IoT communication protocols mainly include the MQTT protocol, which is implemented based on the message model. The communication between the device and the device, and between the device and the cloud, is realized by exchanging messages, and the messages carry communication data.
MQTT (Message Queuing Telemetry Transport) is a "lightweight" communication protocol based on the publish/subscribe mode, which is built on the TCP/IP protocol. The biggest advantage of MQTT is that it can provide real-time and reliable messaging services for connecting remote devices with very few codes and limited bandwidth. As a low-overhead, low-bandwidth instant messaging protocol, it has a wide range of applications in the Internet of Things, small devices, and mobile applications.
MQTT is a client-server message publish/subscribe transmission protocol. The MQTT protocol is lightweight, simple, open and easy to implement. These features make it a very wide range of applications. In many cases, including restricted environments, such as machine-to-machine (M2M) communication and the Internet of Things (IoT). It has been widely used in communicating sensors through satellite links, occasionally dial-up medical equipment, smart homes, and some miniaturized devices.
Insert picture description here

A brief introduction to the MQTT protocol open source product EMQ:
Insert picture description here
Insert picture description here
Insert picture description here

Three, data access

First, an intelligent collection device is required to upload the collected data to the server. The form in which the collected data falls to the disk is the key to data access. The MQTT protocol is connected to EMQ, and then to kafka/influxDB, etc.

Four, time series data storage

Technical requirements for time series databases:
(1) Mass data storage (distributed storage)
(2) High concurrency and high throughput writing
(3) Multi-dimensional aggregation query
(4) OLAP analysis
Time series database components:
InfluxDB mainstream time series database, stand-alone free, Cluster charges.
TimescaleDB is based on PostgreSQL, which may be suitable for situations where the amount of data is not too large, but it provides rich SQL functions.
KairosDB is based on Cassandra, its operation and maintenance should be relatively simple, the scalability should be good, and the write performance is estimated to be good, but it does not support SQL.
CrateDB is based on ElasticSearch and supports ANSI SQL. The write performance should be good, and the scalability should be good. It is estimated that SQL support and read performance will be worse.
The bottom layer of OpenTsdb uses Hbase as its distributed storage engine, which has the advantages of hbase and does not support SQL.
Kudu columnar storage (based on hbase), supports SQL, and the better one is to support update/insert, and SQL queries can be supported through impala or spark.
Kudu is a new open-source columnar storage system from Cloudera and a member of the Apache Hadoop ecosystem. It is designed for rapid analysis of rapidly changing data. Most of Kudu's scenarios are similar to Hbase. Its design reduces random read and write performance and improves scanning performance. In most scenarios, Kudu has random read and write performance close to Hbase while also far exceeding Hbase scanning performance. , To fill the vacancy of the previous Hadoop storage layer.
Different from the HBase storage engine, Kudu has the following advantages: ①Fast
OLAP query processing speed.
②It is highly compatible with common systems in the Hadoop ecosystem such as MapReduce and Spark, and its connection driver is maintained by official support.
③Deeply integrated with Impala. Compared with the traditional architecture of HDFS+Parquet+Impala, Kudu+Impala has better performance in most scenarios.
④ It can support OLTP and OLAP requests at the same time, and both have good performance.
⑤Support structured data, pure column storage, save space, and provide more efficient query speed.

Guess you like

Origin blog.csdn.net/weixin_44455388/article/details/108243668