7 advantages of using time series database as the backend of industrial IoT data

Data characteristics and pain points of the Industrial Internet of Things

The data collection of the Industrial Internet of Things has the characteristics of high frequency, multiple devices, and high dimensions. The amount of data is very large, and there are high requirements for the throughput of the system. At the same time, the industrial Internet of Things often requires the system to be able to process data in real time, to provide early warning, monitoring, and even counter-control to the system. Many systems also need to provide graphical terminals for operators to monitor the operation of equipment in real time, which puts greater pressure on the entire system. For the massive historical data collected, offline modeling and analysis are usually required. Therefore, the data platform of the Industrial Internet of Things has very demanding requirements. It must have very high throughput and low latency; it must be able to process streaming data in real time, and it must be able to process massive amounts of historical data; both To meet the requirements of simple point query, but also to meet the requirements of complex analysis of batch data.

Traditional transactional databases, such as SQL Server, Oracle, and MySQL, cannot meet high-throughput data writing and massive data analysis. Even if the amount of data is small and can meet the requirements of data writing, it cannot respond to requests for real-time computing at the same time.

The Hadoop ecosystem provides multiple components such as message engine, real-time data writing, streaming data calculation, offline data warehouse, and offline data calculation. The combination of these big data systems can solve the data platform problem of the Industrial Internet of Things. However, such a scheme is too large and bloated, and the cost of implementation and operation and maintenance is high.

Data is the blood of the Industrial Internet of Things. However, the vast majority of domestic MES systems, as well as the so-called smart factories, will save the massive process data generated during the production process for no more than three months, not to mention further research and utilization of the accumulated data. The real-time collection, calculation and counter-control of data put forward high requirements on the real-time computing capabilities of the data platform behind the Industrial Internet of Things. Traditional relational databases, most of the open source NoSQL, and the new generation of NewSQL are far from these two demanding requirements of the industrial Internet of Things data platform.

Therefore, the time series database came into being. Taking DolphinDB as an example, there are the following 7 natural advantages when using a time series database as an industrial IoT data backend.

1. One-stop data solution

The Industrial Internet of Things not only collects the process data generated by the machine, but also performs real-time calculation and early warning, and displays the results to the operator or directly feeds back to the machine. At the same time, these original process data need to be saved to the database for online or offline query. After accumulating a large amount of historical data, more complex big data mining can be carried out. And all of this can be done in one system of DolphinDB. The following figure shows the data processing flow of DolphinDB database.

a459f1c3e82dfe856e70cf9a7194d66b.png

For system integrators or enterprises, developing and maintaining on one system is much lower than integrating, developing and maintaining on multiple systems, whether it is development cost, maintenance cost, or hardware purchase cost.

2. Lightweight cross-platform deployment

Industrial IoT platforms are usually very complex. There are cheap industrial computers (low-profile PCs or embedded systems), and servers or server clusters. There are edge computing, local platform deployment and cloud platform deployment. The operating systems involved are both Linux and Windows. There are many open source or commercial time series databases on the market, as well as related big data ecosystems, with many complex components and huge volumes, which require high software and hardware. Using a system for cross-platform deployment is very difficult.

DolphinDB is a very lightweight system, developed with GNU C++, the system size is only more than 20 megabytes, without any dependencies, and can be deployed on any of the above platforms. This greatly saves the development and maintenance costs of the system integrator.

3. Safe and controllable

The security and control of the data and system of the industrial Internet of Things platform are of vital importance to an enterprise and even the country. DolphinDB is a distributed time series database developed entirely from scratch. From the underlying distributed file system and storage engine, to the database and core class library, to the distributed computing engine, to the scripting language, to the development interface of various programming languages, and even the peripheral development integration environment GUI and cluster management tools. 100% independent research and development, without any external dependence, safe and controllable.

In addition to supporting the x86 and arm instruction systems, DolphinDB is also adapting the MIPS instruction system to support domestic CPUs such as Loongson. In this way, software and hardware can be independently controlled at the same time on the industrial IoT platform.

4. Massive historical data storage and processing

Industrial Internet of Things data collection has high dimensionality, high frequency, a large number of devices, and a particularly large amount of data, and all are data with high time accuracy. At present, most of the MES systems used in the manufacturing industry use relational databases, which can only store process data in a short period of time, and cannot retain all high-precision data. The limitation of the database system prevents enterprises from exerting the value of historical data.

DolphinDB database uses columnar storage, supports data compression (compression rate is about 20%), supports up to nanosecond precision time series data processing, and single table supports millions of partitions. The storage capacity and computing capacity of the DolphinDB cluster can be expanded horizontally by adding nodes. DolphinDB cluster supports multi-copy distributed storage and distributed transactions. When the data of one copy is faulty or lost, the other copy is enabled for recovery to ensure high availability and strong consistency of data. Companies can use historical data accumulated over the years to conduct in-depth data mining and data analysis, such as predictive maintenance of equipment, improvement of technological processes, improvement of product quality, optimization of manufacturing plans, and so on.

Simply put, on the same hardware device, relational databases (Oracle, SQL Server) can support hundreds of millions of time series data, and DolphinDB can support trillions of time series data.

5. Real-time stream computing

The real-time data collected by the Internet of Things can be handed over to DolphinDB's stream computing engine for cleaning, real-time statistics, real-time storage, and visualization in real-time display. DolphinDB naturally has flow table duality. Publishing a message is equivalent to adding a piece of data to the flow data table. SQL injection and query analysis of the flow data can be directly used, which is extremely convenient. DolphinDB's stream computing engine is based on a publish-subscribe-consumption model. Publish data through the stream data table, and other data nodes or third-party applications can subscribe to the consumption stream data through DolphinDB scripts or APIs, and feed back the calculation results to the machine or operator in real time. For streaming computing tutorials, please refer to DolphinDB streaming data tutorial and DolphinDB streaming data aggregation engine tutorial .

6. Abundant calculation functions

DolphinDB's computing functions can be said to be the most abundant in the time series database on the market. DolphinDB has a built-in scripting language that can directly perform complex calculations and interactive analysis in the database, avoiding data migration. Most computing functions and functions have been optimized, and the performance far exceeds the same functions in other databases. The following lists the commonly used calculation functions in DolphinDB.

6.1 Range query

DolphinDB uses data pairs to express ranges. For example, to query the data in a certain time range of a table:

select * from table where date between beginDate:endDate

6.2 Multidimensional query

DolphinDB can aggregate different columns to achieve high-dimensional or low-dimensional range query functions. For example, to filter and group and aggregate the field1 and field2 columns:

select sum(prc) from table where field1 in (1..100) and field2 = ‘A’ group by field1, field2

6.3 Sampling query

DolphinDB provides a sampling query mechanism based on partitions. The partitions can be sampled according to a specified ratio or number. You only need to call the sample function after where. For example, perform range partitioning by device ID, extract 10% of the data in the partition and 10 partitions

//Sampling 10% partition select * from trades where sample(equipmentId, 0.1) 
//Sampling 10 partitions select * from trades where sample(equipmentId, 10)

6.4 Precision query

DolphinDB has a time accuracy of nanoseconds, supports massive high-precision historical data storage, and also supports the aggregation and conversion of high-precision large data sets into low-precision small data set storage. At the same time, DolphinDB supports multiple time precision group sampling. For example, select the data between certain two dates and group them by minute.

select avg(tint) from t1 where date(timestamp) between 2018.01.01:2018.10.11 group by minute(timestamp)

DolphinDB也支持自定义精度分组。例如,每5秒一个分组:

select avg(tint) from t1 where date(timestamp) between 2018.01.01:2018.10.11 group by bar(timestamp,5000)

6.5 插值查询

在工业领域经常会发生采集的数据缺失。DolphinDB在查询计算时提供了4种插值方式补全数据,向前/向后取非空值填充(bfill/ffill),线性填充(lfill)和指定值填充(nullFill)。用户也可以通过脚本或C++插件扩充新的插值函数。

6.6 聚合查询

DolphinDB的函数库非常丰富,支持以下聚合函数:atImax, atImin, avg, beta, contextCount, contextSum, contextSum2, count, corr, covar, derivative, difference, first, imax, last, lastNot, max, maxPositiveStreak, mean, med, min, mode, percentile, rank, stat, std,sum, sum2,var, wavg, wsum, zscore。

6.7 面板数据分组查询

处理面板数据时,有时候希望为每个分组的每一行数据生成一个值。DolphinDB提供了context by和滑动统计函数。

DolphinDB支持以下滑动统计函数:deltas, mavg, mbeta, mcorr, mcount, mcovar, mimax, mimin, mmax, mmed, mmin, mpercentile, mrank, mstd, msum, mvar, ratios。

例如,计算每台设备过去10个采集点的移动平均温度:

select equipmentId, mavg(temperature,10) as mavg_temperature context by equipmentId

DolphinDB对部分滑动统计函数进行了优化,每次计算时,充分利用上一个窗口的计算结果,最大程度地降低了重复计算。

6.8 对比查询

DolphinDB的pivot by可用于数据透视,特别是同一时间不同列的指标对比。例如,想要对比同一时间段不同设备的平均温度,可以使用以下代码:

equipmentId = `A`B`B`B`C`C`A`A`A$symbol; 				temperature= 49.6 29.46 29.52 30.02 174.97 175.23 50.76 50.32 51.29;							timestamp = [09:34:07,09:35:42,09:36:51,09:36:59,09:35:47,09:36:26,09:34:16,09:35:26,09:36:12];	t = table(timestamp, equipmentId, temperature)select avg(temperature) from t pivot by timestamp.minute() as minute, equipmentId

返回的结果为:

minute	A	B	C09:34m	50.18		09:35m	50.32	29.46	174.9709:36m	51.29	29.77	175.23

6.9 关联查询

DolphinDB支持的关联查询种类非常多,包括等值连接、完全连接、交叉连接、左连接、asof join和窗口连接。其中asof join和窗口连接(window join)是专门为时间序列数据设计的连接方式,能够满足更多场景的需求。

当两个表中的时间字段不完全对应时,可以使用asof join,如果左表中的时间为t,它会自动选择右表中不超过t的最近时间。窗口连接是asof join的扩展,如果窗口为w1:w2,它会在右表中选择(t+w1)到(t+w2)之间的数据,并对这些数据使用聚合函数。例如:

select equipmentId,t1.temperature,t2.humidity from aj(t1,t2,`timestamp)select * from wj(t1,t2,-5:0,<avg(temperature)>,`equipmentId`timestamp)

6.10 机器学习和分布式计算

DolphinDB提供了map-reduce,iterative map-reduce等分布式计算框架。用户只需要指定数据源、map函数、reduce函数和final函数,无需编译、部署,可以直接在线使用。为方便用户,DolphinDB内置了常用的拟合和分类算法,可在本地数据源和分布式数据源上使用,这些算法包括线性回归、广义线性模型(GLM)、随机森林(Random Forest)、逻辑回归等。后续将会推出更多机器学习算法。

 

除了已有的功能外,DolphinDB提供了几种途径扩展系统功能。DolphinDB内置强大的类SQL和Python的脚本语言。用户可以用脚本语言自定义函数来扩展系统功能。DolphinDB支持使用C++开发插件来扩展系统功能。除此以外,DolphinDB提供了C++、C#、Java、Python、R、JS、Excel等语言和系统的API,方便与其它系统集成。

7. 综合使用成本低

The profit margin of industrial enterprises is not high. If the cost of the data platform (software and hardware purchase cost, system integration cost, maintenance cost, application development cost, etc.) is too high, it will severely limit the development of the Industrial Internet of Things. The one-stop solution, cross-platform deployment capabilities, powerful real-time data and massive historical data processing capabilities, rich computing functions and expansion capabilities of the time series database taking DolphinDB as an example greatly reduces the overall cost of ownership of the system.


Guess you like

Origin blog.51cto.com/15022783/2569341