Monitoring and early warning丨DolphinDB anomaly detection engine tutorial for time series database

1 Overview

IoT devices (such as machine tools, boilers, elevators, water meters, gas meters, etc.) generate massive amounts of equipment status data and business message data all the time. The collection, calculation, and analysis of these data often involve the detection of abnormal data.

As a high-performance distributed time series database, DolphinDB has a built-in streaming data framework, which can process and analyze these IoT data in real time, as well as perform calculation and analysis on historical data, helping users use and play the role of these data. value. The built-in streaming data framework of DolphinDB supports streaming data publishing, subscription, preprocessing, real-time memory calculation, and rolling window calculation of complex indicators, etc. It is an efficient and convenient streaming data processing framework. For details, please refer to the DolphinDB Streaming Data Tutorial .

In response to the needs of abnormal data detection, DolphinDB provides an anomaly detection engine function based on the streaming data framework. Users only need to specify anomaly indicators, and the anomaly detection engine can perform anomaly data detection in real time.

2. Anomaly detection engine framework

DolphinDB's anomaly detection engine is built on the publish-subscribe model of streaming data. In the following example, an anomaly detection engine is created through the createAnomalyDetectionEngine function, and streaming data is subscribed through the subscribeTable function. Each time new data flows in, append!{engine} is triggered according to the specified rules, and the streaming data is continuously input into the anomaly detection engine. The anomaly detection engine detects whether the data meets the user-defined alarm indicator temp>65 in real time. If abnormal data is found, it will output them to the table outputTable.

share streamTable(1000:0, `time`device`temp, [TIMESTAMP, SYMBOL, DOUBLE]) as sensor
share streamTable(1000:0, `time`device`anomalyType`anomalyString, [TIMESTAMP, SYMBOL, INT, SYMBOL]) as outputTable
engine = createAnomalyDetectionEngine("engine1", <[temp > 65]>, sensor, outputTable, `time, `device, 10, 1)
subscribeTable(, "sensor", "sensorAnomalyDetection", 0, append!{engine}, true)

Here is a brief introduction to some concepts involved in the exception handling engine:

  • Streaming data table: DolphinDB provides a specific table object for streaming data, which provides the publishing function of streaming data. Through the subscribeTable function, other nodes or applications can subscribe and consume streaming data.
  • Exception handling engine data source: Provide a channel of "raw materials" for the exception handling engine. The createAnomalyDetectionEngine function returns an abstract table. Writing data to this abstract table means that the data enters the exception handling engine for calculation.
  • Anomaly indicators: Provide a set of Boolean expressions for processing streaming data in a meta-code format. It can contain aggregate functions to support complex scenarios.
  • Data window: The length of the stream data window intercepted during each calculation. The data window is only meaningful when the indicator contains aggregate functions.
  • Output table: The first column of the output table of the anomaly detection engine must be of the time type, which is used to store the timestamp of the detected anomaly. If there is a specified grouping column, the second column is the grouping column, and the following two columns are of type int and The string or symbol type is used to record the abnormal type (the subscript of the abnormal index expression in metrics) and the abnormal content.

3. Abnormal indicators

The indicators in the anomaly detection engine are required to return Boolean values. Generally it is a function or an expression. When the indicator contains an aggregate function, the window length and calculation interval must be specified. The anomaly detection engine calculates the indicator in a fixed-length moving window at regular intervals. There are generally three types of abnormal indicators:

  • Only include column names or non-aggregate functions, such as qty> 10, lt(qty, prev(qty)). For such indicators, the anomaly detection engine will calculate each piece of data received, determine whether it meets the indicators and decide whether to output.
  • All the column names that appear are in the parameters of the aggregate function, such as avg(qty-price)> 10, percentile(qty, 90) <100, sum(qty)> prev(sum(qty)). For such indicators, the anomaly detection engine only aggregates the data when the window moves, similar to the Time Series Aggregator.
  • Among the column names that appear, there are both parameters that are aggregate functions and some that are not aggregate function parameters, such as avg(qty)> qty, le(med(qty), price). For such indicators, the anomaly detection engine will perform aggregation calculations on aggregate columns when the window moves, and calculate each piece of data when data arrives. The return value of the aggregate function uses the calculated value of the most recent window.

4. Data Window

When an aggregate function is included in the abnormal indicator, the user must specify the data window. The flow data aggregation calculation is performed in a fixed-length moving window at regular intervals. The window length is set by the parameter windowSize; the calculation time interval is set by the parameter step.

In the case of multiple sets of data, if each group constructs the boundary of the data window according to the time when the first piece of data enters the system, it is generally impossible to compare the calculation results of each group in the same data window. In consideration of this point, the system determines an integer alignment size alignmentSize according to the parameter step value to perform alignment processing on the boundary value of the first data window of each group.

(1) When the data time type is MONTH, January of the year corresponding to the first data is used as the upper boundary of the window.

(2) When the time type of the data is DATE, the boundary value of the first data window is not adjusted.

(2) When the data time precision is seconds or minutes, such as MINUTE, DATETIME or SECOND type, the alignmentSize value rule is as follows:

step     alignmentSize
0~2      2
3~5      5
6~10     10
11~15    15
16~20    20
21~30    30
31~60    60

(2) When the data time precision is milliseconds, such as TIMESTAMP or TIME type, the alignmentSize value rule is as follows:

step       alignmentSize
0~2        2
3~5        5
6~10       10
11~20      20
21~25      25
26~50      50
51~100     100
101~200    200
201~250    250
251~500    500
501~1000   1000

Assuming that the minimum precision of the first data time is x, then the minimum precision of the left boundary of the first data window after regularization is x/alignmentSize\*alignmentSize, where / represents the rounding after division. For example, if the time of the first piece of data is 2018.10.08T01:01:01.365, then x=365. If step=100, according to the above table, alignmentSize=100, it can be concluded that the minimum precision of the left boundary of the first data window after normalization is 365\100*100=300, so the range of the first data window after normalization is 2018.10. 08T01:01:01.300 to 2018.10.08T01:01:01.400.

5. Application examples

5.1 Application scenarios

Now simulate the sensor equipment to collect temperature. Assuming that the window length is 4ms, the window is moved every 2ms, and the temperature is collected every 1ms. The following abnormal indicators are specified:

  • The temperature of a single collection exceeds 65;
  • The temperature of a single acquisition exceeds 75% of the value in the previous window;
  • The relative error between the average temperature in the window and the average temperature of the previous window is greater than 1%.

5.2 System design

The collected data is stored in the streaming data table, and the anomaly detection engine obtains real-time data by subscribing to the streaming data table, and performs anomaly detection, and outputs the data that meets the abnormal index to another table.

5.3 Implementation steps

(1) Define the flow data table sensor to store the collected data:

share streamTable(1000:0, `time`temp, [TIMESTAMP, DOUBLE]) as sensor

(2) Define anomaly detection engine and output table outputTable, the output table is also a flow data table:

share streamTable(1000:0, `time`anomalyType`anomalyString, [TIMESTAMP, INT, SYMBOL]) as outputTable
engine = createAnomalyDetectionEngine("engine1", <[temp > 65, temp > percentile(temp, 75), abs((avg(temp) - prev(avg(temp))) / avg(temp)) > 0.01]>, sensor, outputTable, `time, , 6, 3)

(3) Anomaly detection engine engine subscribes to flow data table sensor:

subscribeTable(, "sensor", "sensorAnomalyDetection", 0, append!{engine}, true)

(4) Write 10 times to the flow data table sensor to simulate the acquisition temperature:

timev = 2018.10.08T01:01:01.001 + 1..10
tempv = 59 66 57 60 63 51 53 52 56 55
insert into sensor values(timev, tempv)

View the contents of the flow data table sensor:

time                       temp
2018.10.08T01:01:01.002    59
2018.10.08T01:01:01.003    66
2018.10.08T01:01:01.004    57
2018.10.08T01:01:01.005    60
2018.10.08T01:01:01.006    63
2018.10.08T01:01:01.007    51
2018.10.08T01:01:01.008    53
2018.10.08T01:01:01.009    52
2018.10.08T01:01:01.010    56
2018.10.08T01:01:01.011    55

Check the result table outputTable again:

time                      anomalyType    anomalyString
2018.10.08T01:01:01.003    0             temp > 65
2018.10.08T01:01:01.003    1             temp > percentile(temp, 75)
2018.10.08T01:01:01.005    1             temp > percentile(temp, 75)
2018.10.08T01:01:01.006    2             abs((avg(temp) - prev(avg(temp))) / avg(temp)) > 0.01
2018.10.08T01:01:01.006    1             temp > percentile(temp, 75)
2018.10.08T01:01:01.009    2             abs((avg(temp) - prev(avg(temp))) / avg(temp)) > 0.01

The calculation process of the anomaly detection engine is explained in detail below. For the convenience of reading, the same part of 2018.10.08T01:01:01 is omitted from the description of time, and only the millisecond part is listed.

(1) The indicator temp> 65 only contains the column temp that is not a function parameter, so it will be calculated when each piece of data arrives. Only the temperature at 003 in the simulated data meets the criteria for detecting abnormalities.

(2) In the indicator temp> percentile(temp, 75), the temp column is used as a parameter of the aggregate function percentile, and it appears separately. Therefore, when each piece of data arrives, the temp and the percentile calculated from the previous window ( temp, 75) Comparison. The first window is aligned based on the time 002 of the first row of data. After the alignment, the starting boundary of the window is 000. The first window is from 000 to 002 and only contains 002 records. The result of calculating percentile(temp, 75) is 59. Data 003 to 005 are compared with this value, and 003 and 005 meet the conditions. The second window is from 002 to 005, the result of calculating percentile(temp, 75) is 60, and the data from 006 to 008 is compared with this value, and the condition is 006. The third window is from 003 to 008, the result of calculating percentile(temp, 75) is 63, the data 009 to 011 are compared with this value, and there is no row that meets the condition. After the last piece of data 011 arrives, a new window calculation has not yet been triggered.

(3) In the indicator abs((avg(temp)-prev(avg(temp))) / avg(temp))> 0.01, temp only appears as a parameter of the aggregate function avg, so it will only be checked every time the window is calculated . Similar to the analysis of the previous indicator, the avg(temp) calculated in the first three windows are 59, 60.5, 58.33 respectively, which satisfies abs((avg(temp)-prev(avg(temp))) / avg(temp))> The time of 0.01 is the calculation time 006 and 009 of the second window and the third window.

5.4 Monitoring the status of the anomaly detection engine

getAggregatorStat().AnomalDetectionAggregator
name    user  status lastErrMsg numGroups numRows numMetrics metrics             
------- ----- ------ ---------- --------- ------- ---------- --------------------
engine1 guest OK                0         10      3          temp > 65, temp > percentile(temp, 75), abs((avg(temp) - prev(avg(temp))) / avg(temp)) > 0.01

5.5 Delete anomaly detection engine

removeAggregator("engine1")

6. Introduction to createAnomalyEngine function

grammar

createAnomalyDetectionEngine(name, metrics, dummyTable, outputTable, timeColumn, [keyColumn], [windowSize], [step], [garbageSize]) 

Return object

The function of createAnomalyDetectionEngine is to return a table object. Writing data to the table means that these data enter the anomaly detection engine for calculation.

parameter

  • name: A string representing the name of the anomaly detection engine, which is the unique identifier of the anomaly detection engine. It can contain letters, numbers and underscores, but it must start with a letter.
  • metrics: meta code. Its return value must be of type bool. It can be a function or expression, such as <[qty> 5, eq(qty, price)]>. You can use system built-in or user-defined aggregate functions (defined with the defg keyword), such as <[sum(qty)> 5, lt(avg(price), price)]>. For details, please refer to meta programming .
  • dummyTable: Table object, it does not need to contain data, but its structure must be the same as the subscribed stream data table structure.
  • outputTable: Table object, used to save calculation results. Its first column must be of the time type, used to store the timestamp when the abnormality is detected, and the data type of this column must be consistent with the time column of the dummyTable. If the keyColumn parameter is not empty, then the second column of the outputTable is keyColumn. The next two columns are int type and string/symbol type, which are used to record the abnormal type (subscript in metrics) and the abnormal content
  • timeColumn: String scalar, representing the time column name of the input stream data table.
  • keyColumn: String scalar, representing the grouping column. The anomaly detection engine will group the input data according to the keyColumn and perform aggregation calculations in each group. It is an optional parameter.
  • windowSize: Positive integer. When an aggregation function is included in the metrics, windowSize must be specified, indicating the length of the data window used for aggregation calculation. If there is no aggregate function in the metrics, this parameter has no effect.
  • step: positive integer. When an aggregate function is included in the metrics, step must be specified, indicating the time interval for calculation. windowSize must be an integer multiple of step, otherwise an exception will be thrown. If there is no aggregate function in the metrics, this parameter has no effect.
  • garbageSize: a positive integer. It is an optional parameter and the default value is 50,000. If the keyColumn is not specified, when the number of historical data in the memory exceeds garbageSize, the system will clean up historical data that is not needed for this calculation. If keyColumn is specified, it means that when group calculations are needed, memory cleaning is performed independently for each group. When the number of historical data records of a group exceeds garbageSize, historical data that is no longer needed by the group will be cleaned up. If the number of historical data records of a group does not exceed garbageSize, the group of data will not be cleaned up. If there is no aggregate function in the metrics, this parameter has no effect.

7. Summary

The anomaly detection engine provided by DolphinDB is a lightweight and easy-to-use streaming data engine. It cooperates with the streaming data table to complete the real-time detection task of streaming data, which can meet the needs of real-time monitoring and early warning of the Internet of Things.

Guess you like

Origin blog.csdn.net/qq_41996852/article/details/110931593