eKuiper 1.10.0 released: timing rules and EdgeX v3 adaptation

After two months of development, we are happy to announce that eKuiper 1.10.0 is now officially released!

As a milestone version, eKuiper 1.10.0 has upgraded the version of the basic dependencies, such as the Go language version is upgraded to 1.20, EdgeX supports the latest major version Minnesota (v3), etc. We also continue to improve the product's expressiveness, connectivity, and ease of use, while keeping it lightweight and compact for edge deployment.

The new features and improvements of the latest version mainly include the following aspects:

  • Rule management: The execution time of rules can be planned, which realizes edge autonomy of rules to a certain extent.
  • Connection ecology: Added/improved more data sources and targets, including EdgeX v3, Kafka Sink, file Sink, etc. Sink/Source supports more efficient data transformation, such as data extraction, batching and compression. To help users better connect various data sources and targets, and adapt to more complex data structures.
  • Expressiveness: Added more functions and syntax, such as array and object processing, external state support, array dynamic subscript syntax, etc., to help users achieve more complex data processing.

Please check the update log for detailed update content .

Regular execution of rules

In some scenarios, user data may be periodic. In order to save running resources, the user wants to stop the rule when there is no data, and only enable the rule during the specified time period. Users need the rules to be executed automatically and periodically, such as once every morning, once a week, and so on. Users can use eKuiper's API to manually start and stop rules, but in the case of large-scale deployment at the edge, manual start and stop is not feasible, and edge rule autonomy is imminent. Therefore, we have added a scheduled execution function to the new version of the rules. Users can specify the scheduled execution time of the rules, and the rules will automatically start and stop within the specified time period without manual intervention.

The option of the rule adds two parameters, namely cronand duration.

  • cronThe parameter specifies the scheduled execution time of the rule, and the format follows the format of the cron expression, such as 0 0 0 * * *means to execute once every day at 0:00 am.
  • durationThe parameter specifies the execution time of the rule, in the format of duration string, including numbers and time units, such as 1mmeans to execute for 1 minute.

For example, the following rule is defined to be executed every two minutes through the parameters in options, and each execution is 1 minute.

{
  "id": "ruleSchedule",
  "sql": "SELECT * FROM demo;",
  "options": {
    "cron": "*/2 * * * *",
    "duration": "1m"
  },
  "actions": [{
    "mqtt":  {
      "server": "tcp://broker.emqx.io:1883",
      "topic": "result/rule"
    }
  }]
}

Note that duration should not exceed the time interval between two cron cycles, otherwise unexpected behavior will result.

The addition, deletion, modification, and status query of scheduled task rules are consistent with ordinary rules, and can be operated through API or CLI. When the scheduled task executes, the rule is in Runningstate. If the scheduled task execution time expires, the rule will automatically stop waiting for the next scheduling, and the status will change to Stopped: waiting for next schedule.. Stopping a scheduled task with the Stop command will immediately stop the rule and remove it from the scheduler.

Flexible adaptation of data sources and targets

eKuiper is the default rule engine implementation of EdgeX Foundry. The upcoming EdgeX Minnesota (v3) is an important version, and eKuiper is also supported and updated simultaneously. At the same time, we also added more data sources and targets, such as Kafka Sink, file Sink, etc. The support of these data sources and targets enables eKuiper to better connect various data sources and targets and access various data infrastructures more conveniently.

EdgeX v3 support

EdgeX v3 is the next important version of EdgeX Foundry, eKuiper 1.10.0 version already supports EdgeX v3. eKuiper's EdgeX Source and Sink have been updated and adapted, and users' existing rules can be seamlessly migrated to EdgeX v3.

At the same time, in our test, eKuiper version 1.10 is still compatible with EdgeX v2, and users can choose EdgeX v2 or EdgeX v3 according to their needs.

It should be noted that since the support of the ZeroMQ bus has been removed in EdgeX, we have also removed the zmq protocol support of EdgeX Source/Sink in eKuiper. Users using the default Redis bus or MQTT bus are not affected.

More powerful file sink

The file system belongs to the kernel of the operating system and does not require any external system dependencies, so it has high applicability and can be applied to almost any deployment environment, especially in systems with limited resources. Using the file sink, we can use it as a way to persist data in batches in an environment with high security requirements or no network, and then transfer the data to other systems through other means to achieve network gateway penetration. In addition, in an environment with low bandwidth, we can write data to files in batches before compressing and transmitting, so as to achieve greater compression ratio and reduce bandwidth consumption.

Continuing the optimization of the file connector in the previous version, in the new version, the file sink supports more file types, such as csv, json, and lines. At the same time, the file sink supports more data transformations, such as data extraction, batching and compression, which is conducive to the adaptation of more applications. In addition, file writing supports custom segmentation strategies, supporting larger data volumes and more convenient management.

The main highlights of the new version of File Sink are:

  • Multiple file formats are supported, and the written files can be read by the File source to realize the cyclic transmission of data.
  • Multiple segmentation strategies are supported:
    • Divide by time, support setting the interval time of file division
    • Divide by number of messages
    • Segment the file name to automatically add a timestamp to avoid duplication of the file name, and set the location where the timestamp is added
  • Supports writing multiple files, i.e. dynamic file names. According to the content of the message, the message can be written into different files to realize the shunting of data.
  • Write performance optimization, support batch write, improve write efficiency. When multiple files are written, concurrent writing is supported, and timers are shared to improve writing efficiency.
  • Support compression, support gzip and ztsd two compression methods.

All of these capabilities are configurable through properties. Below is an example of a rule using a file sink. Among them, the path adopts a dynamic file name, that is, the message is written to different files according to the content of the message. In the following example, the file type is set to csv, and rollingthe attribute at the beginning configures the file splitting strategy. compression configures the compression method, using gzip compression. For detailed configuration instructions, please refer to the product documentation.

{
  "id": "fileSinkRule",
  "sql": "SELECT * from demo ",
  "actions": [
     {
      "file": {
        "path": "{
   
   {.device}}_{
   
   {.ts}}.csv.gzip",
        "format": "delimited",
        "delimiter": ",",
        "hasHeader": true,
        "fileType": "csv",
        "rollingCount": 10,
        "rollingInterval": 0,
        "rollingNamePattern": "none",
        "compression":"gzip"
      }
    }
  ]
}

Kafka Sink

Kafka is a distributed message system with high throughput, high availability, scalability and durability. The new version adds Kafka Sink, which can write eKuiper data into Kafka, realizing the seamless connection between eKuiper and Kafka. An example of usage is as follows:

{
  "id": "kafka",
  "sql": "SELECT * from demo",
  "actions": [
    {
      "kafka":{
        "brokers": "127.0.0.1:9091",
        "topic": "sample_topic",
        "saslAuthType": "none"
      }
    }
  ]
}

Database Support Optimization

Some optimizations have been made to the SQL source/sink plugin in the new version. mainly include:

  1. Update the ClickHouse driver and test the support of ClickHouse.
  2. Support Dameng database.
  3. Supports connection pool configuration to improve the efficiency of database connections.

Users can configure the sql/maxConnections property in the configuration file etc/kuiper.yaml or through environment variables to specify the maximum number of connections in the database connection pool to avoid performance problems caused by too many connections. An example looks like this:

  sql:
    # maxConnections indicates the max connections for the certain database instance group by driver and dsn sharing between the sources/sinks
    # 0 indicates unlimited
    maxConnections: 0

Sink data transformation

The user's data may have a nested structure. For example, the data accessed by Neuron usually contains some metadata, and the values ​​field in the payload is the data that the user needs. In addition, when complex SQL statements are used for data processing, some intermediate results of calculation may be defined in the SELECT clause, and all of them do not need to be output to the Sink. In this case, the Sink side needs to transform or format the data again. Data template is a commonly used method, which is powerful and supports various formats. However, it requires users to have a certain ability to write templates, and at the same time, the running performance is poor.

In the new version, the Sink side supports more commonly used data transformations, including data extraction, related attributes for batch sending, and extended to most Sink types. In some commonly used simple data transformations, users can configure parameters, which reduces the workload of users in writing templates and improves operating efficiency.

Batch sending

By default, Sink produces one piece of data for each event. However, if the data throughput is large, there will be some problems, such as high IO overhead; if compressed, the compression ratio is low; the network overhead for sending to the cloud is large, etc. At the same time, the fast sending rate may increase the processing pressure on the cloud. In order to solve these problems, the new version supports the batch sending function in MQTT Sink.

The principle of batch sending is that Sink will adopt a certain strategy to cache the data, and then send it to the cloud at one time. Users can control the amount of data sent in batches and the time interval by configuring parameters batchSizeand . lingerIntervalAn example looks like this:

{
    "id": "batch",
    "sql": "select a,b from demo",
    "actions": [
       {
        "log": {
        },
        "mqtt": {
          "server": "tcp://broker.emqx.io:1883",
          "topic": "devices/messages",
          "lingerInterval": 10000,
          "batchSize": 3
        }
      }
    ]
}

In this example, when the amount of data reaches 3 or accumulates for 10 seconds, the Sink will send data once. Users can adjust these two parameters according to their needs.

data extraction

When using intermediate data or the format of the calculated data is inconsistent with the written data format, we need to extract the required data on the Sink side. fieldsIn the new version, two common parameters and are added to all Sinks dataField. These two parameters have been supported in previous data storage related sinks, including SQL, Redis, InfluxDB, etc. Because in data writing, the target database usually has strict column definitions, and the SQL SELECT statement may not necessarily match the columns, and often has redundant selected fields. In other sinks, there will also be such data extraction requirements. In the new version, these two attributes are extended to sinks such as MQTT, Kafka, and File. Among them, dataFieldparameters are used to specify fields representing data to distinguish them from fields representing other data such as metadata, eg dataField: values. fieldsParameters are used to specify the fields that need to be output, so that they can exactly match the target system requirements, eg fields: ["a","b"].

Example 1: Extract the values ​​section output of Neuron data. Extract nested data by configuring the dataField attribute as follows:

{
  "id": "extract",
  "sql": "SELECT * FROM neuronStream",
  "actions": [
    {
      "mqtt": {
        "server": "tcp://broker.emqx.io:1883",
        "topic": "devices/messages",
        "dataField": "values"
      }
    }
  ]
}

Example 2: Extract the required fields, ignoring the output of some fields of the intermediate calculation results. As shown below, the specified fields are extracted by configuring the fields property:

{
  "id": "extract",
  "sql": "SELECT temperature, lag(temperature) as lt, humidity FROM demo WHERE lt > 10",
  "actions": [
    {
      "mqtt": {
        "server": "tcp://broker.emqx.io:1883",
        "topic": "devices/messages",
        "fields": ["temperature","humidity"]
      }
    }
  ]
}

In this example, in the SQL statement lag(temperature) as ltwill generate an intermediate calculation result, which is convenient for filtering in the WHERE field and simplifies SQL writing. But on the Sink side, we only need temperatureand humiditytwo fields, so fieldsspecify the fields that need to be output by configuring the property.

These two attributes can be used at the same time, or they can be used in conjunction with DataTemplate to complete more complex data transformations. After all three properties are configured, the DataTemplate will be executed first, then the dataField will be executed, and finally the data extraction of the dataField will be executed.

Array and Object Handling

SQL syntax was originally designed for relational databases, and there are fewer compound data types in the database, so it has limited processing capabilities for arrays and objects. In IoT scenarios, the data format accessed is mostly JSON, and nested composite data types are first-class citizens. eKuiper SQL has built in the ability to access nested data from the very beginning. However, there are still many unmet needs for deeper data transformations. In the new version, we have enhanced the processing capabilities of arrays and objects, including converting array data into multiple rows, array and object processing functions, etc.

Array payload that supports data sources

When the data source uses the JSON format, the previous version only supported the payload of the JSON object, but the new version supports the payload of the JSON array. Moreover, the user does not need to configure, and the system will automatically identify the payload type.

For example, array type data can be accessed in the new version of MQTT Source:

[
    {"temperature":23},
    {"temperature":24},
    {"temperature":25}
]

When array data is received, the data will be split into multiple pieces of data for processing, and each piece of data contains an array element. For example, the above data will be split into three pieces of data. Thereafter, processing is the same as for normal JSON object data.

Array data into multiple lines

Some data sources pass in batches of data, but there are some common metadata, so the overall format is still a JSON object, such as the following data. This data format is especially common in the return values ​​of HTTP services.

{
    "device_id": "device1",
    "data": [
        {"temperature":23},
        {"temperature":24},
        {"temperature":25}
    ]
}

Such a piece of data is processed as a single row of data in eKuiper . Logically, what the user needs is multiple rows of data. In the new version, we have added a new function type: multi-row function, which is used to convert single-row data into multi-row processing. Also, we added the only multiline function: unnest. Used to expand array columns into multiple lines.

unnest | unnest(array) | The parameter column must be an array object. This function expands the parameter array into multiple lines and returns the result. If each sub-item in the array object is a map[string]interface{} object, the sub-item will be listed as a column in the returned row.

Nested data can be processed as multiple rows, resulting in multiple output results. For example, the above data can get three output results.

usage example

Create a flow demo and give the following input.

{"a": [1,2], "b": 3} 

Rules for getting unnest results:

SQL: SELECT unnest(a) FROM demo
___________________________________________________
{"unnest":1}
{"unnest":2}

Rules for getting unnest results with other columns:

SQL: SELECT unnest(a), b FROM demo
___________________________________________________
{"unnest":1, "b":3}
{"unnest":2, "b":3}

Create a flow demo and give the following input.

{"x": [{"a": 1,"b": 2}, {"a": 3,"b": 4}], "c": 5} 

Rules for getting unnest results with other columns:

SQL: SELECT unnest(x), b FROM demo
___________________________________________________
{"a":1, "b":2, "c": 5}
{"a":3, "b":4, "c": 5}

Array and Object Handling Functions

In the new version, we have improved the ability to handle arrays and objects in the form of functions. For example, a function to get the largest value in a list array_max, a function to get the smallest value in a list array_min, a function to get the number of elements in a list array_length, a function to get the elements in a list array_element, a function to get the elements in an object object_element. For currently supported functions, please check the function documentation .

In the next version, we will continue to enhance the ability to handle arrays and objects.

Syntactic sugar for nested structure access

Probably the most common question asked by users new to eKuiper is how to access data in nested structures. There is no such syntax defined in standard SQL. In programming languages, we usually use dot (.) to access nested data. However, in SQL, a period represents a table name. Therefore, we extended the SQL syntax to use the arrow notation (->) to access embedded structures. But this syntax is not intuitive, and there is a learning cost for novices.

In the new version, we have added nested structure access syntactic sugar to simplify access to nested structures. Where there is no ambiguity, users can use dot notation to access nested structures. For example, for the following data:

{
    "a": {
        "b": {
            "c": 1
        }
    }
}

Can be used directly in the statement a.b.cto access nested structures. The original arrow notation is also still supported compatibly, eg a->b->c.

External state support

eKuiper is a stateful stream processing engine. The state is mainly used internally, including window state, analysis function state, etc. In previous releases, we supported coarser-grained (row-based) access to external state through Tables. In the new version, we have added Key (column) based external state storage and access capabilities. Through external state access, more functions can be realized, such as dynamic threshold and dynamic switch state. Users can easily share status with third-party applications to achieve collaborative work.

External state storage can coexist with the system's internal state storage, or it can be used independently. External state stores also support SQLite or Redis. KV-based Redis is more suitable for storing external state. In the configuration file etc/kuiper.yaml, we can configure the type of external state storage.

store:
  #Type of store that will be used for keeping state of the application
  extStateType: redis
  • State access: Assume that a third-party application has written a cached data $device1$temperatureL: "20". In SQL, we can get_keyed_stateaccess external state through functions. For example, get_keyed_state(\"$device1$temperatureL\", \"bigint\", 0) as temperatureLthereby allowing external state to participate in computations.
  • State writing: Suppose we need to write the calculation result to the Redis external state, we can use Redis Sink. In the new version, Redis Sink supports writing multiple key-value pairs at a time. In the following example, by configuring keyTypeas multiple, we can write multiple key-value pairs at once. fieldThe field name to be written can also be specified through the configuration item.
{
  "id": "ruleUpdateState",
  "sql":"SELECT status as `$device1$status`,temperatureH as `$device1$temperatureH`,temperatureL as `$device1$temperatureL` FROM stateStream",
  "actions":[
    {
      "redis": {
        "addr": "{
   
   {localhost}}:6379",
        "dataType": "string",
        "keyType": "multiple",
        "sendSingle": true
      }
    }
  ]
}

SQL syntax updates

In addition to some SQL syntax updates mentioned earlier, the new version also includes the following SQL syntax updates:

get current rules

The function has been added rule_id()to obtain the ID of the current rule, which is convenient for users to trace back the rules generated by the data.

Array dynamic subscript

In the new version, expressions can be used for array subscripts to realize dynamic indexing. For example, SELECT a[start] FROM stream, where startcan be a field whose value is a variable; the subscript can use any expression.

Dynamization enables very flexible array operations that were difficult to achieve in previous versions. For example, there are multiple sensors on the pipeline whose data is collected as an array. After the object enters the assembly line, the position of the object on the assembly line can be calculated according to the assembly line and speed, so as to determine the sensor data of the object. This calculation process can be realized through dynamic calculation of array subscripts.

delayed execution function

In the new version, we have added a delayed execution function. When these functions are executed, there will be a delay for a period of time. For example, delaya function returns the value of an input after a delay.

If the data purpose has traffic restrictions, use this function to realize the effect of peak elimination and valley filling.

Graph API enhancements

In the new version, we have added support for Graph API to access defined streams and query tables. Also, JoinOp supports streams and lookup tables. We've also improved the Graph API validation messages to make it easier for users to locate errors. Graph API and even the visual editor based on it can realize more data processing capabilities.

Users need to define streams and query tables through Create Streamand . Create TableIn Graph API rules, you can sourceNamepoint to defined streams and lookup tables through the property. For example, in the following rules, demoand alertTablepoint to the defined stream and query table respectively.

{
  "id": "ruleAlert",
  "graph": {
    "nodes": {
      "demo": {
        "type": "source",
        "nodeType": "mqtt",
        "props": {
          "sourceType": "stream",
          "sourceName": "demo"
        }
      },
      "alertTable": {
        "type": "source",
        "nodeType": "memory",
        "props": {
          "sourceType": "table",
          "sourceName": "alertTable"
        }
      },
      "joinop": {
        "type": "operator",
        "nodeType": "join",
        "props": {
          "from": "demo",
          "joins": [
            {
              "name": "alertTable",
              "type": "inner",
              "on": "demo.deviceKind = alertTable.id"
            }
          ]
        }
      },
      "log": {
        "type": "sink",
        "nodeType": "log",
        "props": {}
      }
    },
    "topo": {
      "sources": ["demo", "alertTable"],
      "edges": {
        "demo": ["joinop"],
        "alertTable": ["joinop"],
        "joinop": ["log"]
      }
    }
  }
}

Dependency update

In addition to EdgeX-related dependencies, eKuiper has also updated the following dependencies:

  • Go language version updated to 1.20
  • SQLite dependency switched to pure Go version
  • Redis depends on GitHub - redis/go-redis: Redis Go client updated to v9
  • Remove the default zeroMQ dependency
  • Update other dependent libraries

Special thanks

The development of eKuiper version 1.10 has received strong support from the community.

Special thanks to the following contributors:

  • @carlclone: ​​Contributed the implementation of Kafka sink and the implementation of various compression/decompression algorithms.
  • @wangxye: Contributed multiple array/object functions.

Thanks to the development team and all contributors for their hard work and dedication! Have fun with eKuiper!

Copyright statement: This article is original by EMQ, please indicate the source for reprinting.
Original link: https://www.emqx.com/zh/blog/ekuiper-v-1-10-0-release-notes

Guess you like

Origin blog.csdn.net/emqx_broker/article/details/131682436