Basic concepts and modeling ideas of time series database TDengine

Table of contents

 1. Basic concepts of TDengine database

  1. collection volume
  2. Label
  3. data collection point
  4. surface
  5. super table
  6. child table
  7. library

2. TDengine database modeling strategy

  1. Create table mode
  2. architectural expression  
     

Time series database :

Time-series data : Time-series data refers to data columns recorded in chronological order , mainly generated in the Internet of Things industry, electric power, chemical industry, meteorology, geography, etc. It is the data continuously generated by a measuring point or an event source.

Time series database : It is mainly used to process data with time tags, so each data in a table has a unique time stamp. Data is time-ordered and unique


The difference between row and column database storage:

The data mode of the traditional business we come into contact with is row storage. We will create different tables for different types of objects to store their respective attributes. In general, we will encapsulate all their attributes at once to write data, instead of saving attributes one by one. Therefore, the row storage data integrity of the relational database can be determined.

Column storage needs to split a row of data into single instances for storage, so the number of writes is obviously more than that of row storage. Therefore, time series database is mainly a data storage engine developed for application scenarios such as the Internet of Things and the Industrial Internet.

If the data is stored by column, each column is stored separately, the data type is consistent, and the data characteristics are similar. The data is the index, so the query efficiency will be significantly improved.

Therefore, when we use it, the thinking logic of data modeling methods and business scenario design will also be somewhat different from the previous use of traditional databases.


1. Basic concepts of TDengine database

basic attributes

1. Collection volume

  • Collection quantity refers to the physical quantity collected by sensors, equipment or other types of collection points, such as current, voltage, temperature, pressure, GPS position, etc., which are parameters that change with time

2. Label

  • Tags are static properties of sensors, devices, or other types of collection points that do not change over time . Such as device ID, device model, device location

3. Data collection point

  • The data collection point refers to the hardware or software that collects physical quantities according to a preset time period or triggered by events. A data collection point can collect one or more collections, but these collections are all collected at the same time and have the same time stamp . For complex equipment, there are often multiple data collection points, and the collection cycle of each data collection point may be different, and it is completely independent and asynchronous.

4. table

  • TDengine adopts the strategy of one table for one data collection point , and requires a separate table for each data collection point (for example, if there are 10 million smart meters, 10 million tables need to be created) to store all data collected by this data collection point. Acquired time series data. This design has several advantages :
    • ​​​​​​① The process of generating data at different data collection points is completely independent. The data source of each data collection point is unique, and there is only one writer for a table, so it can be written in a lock-free manner. The writing speed can be greatly improved.
    • ② For a data collection point, the data generated by it is sorted according to time, so the write operation can be implemented in an additional way, further greatly improving the data writing speed .
    • ③ The data of a data collection point is stored continuously in units of blocks. If you read data for a period of time, it can greatly reduce random read operations and increase the read and query speed by an order of magnitude.

5. Super table

  • Because one data collection point has one table, the number of tables increases dramatically , and applications often need to perform aggregation operations between collection points, and the aggregation operation becomes complicated. To solve this problem, TDengine introduces the concept of Super Table (STable for short). Moreover, applications often need to perform aggregation operations between collection points, and the aggregation operation becomes complicated. To solve this problem, TDengine introduces the concept of Super Table (STable for short).
  • A super table refers to a collection of data collection points of a certain type. Data collection points of the same type have the same table structure, but the static attributes (labels) of each table (data collection point) are different. To describe a super table (a collection of data collection points of a certain type), in addition to the table structure that needs to define the collection volume, you also need to define the Schema of its label. The data type of the label can be integer, floating point number, string, In JSON, there can be multiple tags, which can be added, deleted or modified afterwards. If there are N different types of data collection points in the whole system, N super tables need to be established.
  • In the design of TDengine, a table is used to represent a specific data collection point, and a super table is used to represent a group of data collection points of the same type .

6. Sub table

  • When creating a table for a specific data collection point, the user can use the definition of the super table as a template, and at the same time specify the specific tag value of the specific collection point (table) to create the table. Tables created through supertables are called subtables .
  • A super table contains multiple sub tables, these sub tables have the same collection schema, but with different tag values
  • The mode of data or labels cannot be adjusted through sub-tables, and the data mode modification of super-tables takes effect immediately on all sub-tables
  • A hypertable only defines a template and does not store any data or label information itself. Therefore, data cannot be written to a supertable, only to subtables
  • Queries can be performed on tables or supertables. For the super table query, TDengine will treat the data in all sub tables as a whole data set for processing, and will first find out the tables satisfying the label filtering conditions from the super table, and then scan the time series data of these tables to perform Aggregation operations, so that the data sets that need to be scanned will be greatly reduced, thereby significantly improving query performance. In essence, TDengine realizes efficient aggregation of multiple similar data collection points by supporting super table query
     

In the smart meter example, we can create subtables d1001, d1002, d1003, d1004, etc. through the supermeter meters. In order to better understand the relationship between collection volume, tags, super and sub-meters, you can refer to the following schematic diagram of the smart meter data model.
 

7. Libraries

  • TDengine allows one running instance to have multiple libraries, and each library can be configured with different storage strategies. Different types of data collection points often have different data characteristics, including the frequency of data collection, the length of data retention time, the number of copies, the size of data blocks, whether data is allowed to be updated, and so on. In order for TDengine to work with maximum efficiency in various scenarios, TDengine recommends creating super tables with different data characteristics in different libraries

2. TDengine database modeling strategy

        1. Table creation mode

                1. Single column mode:

                        Each collected physical quantity has a separate table, so each type of physical quantity has a separate super table

               2. Multi-column mode:

                       As long as the physical quantities are collected at one data collection point at the same time (the time stamps are consistent), these quantities can be placed in a super table as different columns

        2. Architectural appearance

​​​​​​1. Create a table according to the equipment, that is, one table for one equipment. This situation has the following characteristics

  1. As far as the same device is concerned, all collection indicators are collected at the same time, and the collection time stamps are the same.
  2. As far as the same device is concerned, it is best to report the data collected by each collection index to TDengine in the same message instead of reporting them separately
  3. For the same type of equipment, the acquisition indicators are exactly the same.

In this scenario, the multi-column mode is used, that is, a multi-column super table is created for each type of device.

Create a subtable for each device of that type.

From the second column onwards in the sub-table, each column is a collection indicator.

2. Create a table according to the equipment, but the collection indicators of the same type of equipment are not exactly the same

The difference between this situation and scenario 1 is that each device may have personalized collection indicators. :

  •        ① For the same type of equipment, the collection indicators are generally the same, but each equipment may have a small number of personalized indicators .
  •                 (Prerequisite: The total number of all personalized indicators plus the total number of common indicators does not exceed 4096)

The way to deal with this situation is often called "big wide table":

        1. Create a multi-column super table, which contains all the different collection index columns, and is the complete set of all collection indexes.

        2. Create a subtable for each device.

        3. Starting from the second column of the sub-table, each column is a collection index.

        4. When writing data, fill in null values ​​for the collection indicators that the device does not have.

3. Create a table according to the collection index, and each collection parameter corresponds to a table

        In more complex scenarios, it is difficult to abstract the fixed table structure of the device, or the table structure changes frequently and cannot be fixed. In this case, more flexible processing ideas are required. When any of the following characteristics are met, consider the idea of ​​building a table by index:

        1. For the same type of equipment, the collection indicators cannot be fixed, or each device has a large number of personalized indicators.

        2. As far as the same device is concerned, each collection index has its own collection time stamp. (That is, the collection time and collection cycle of different collection indicators of the same device cannot be guaranteed to be the same)

        3. As far as the same device is concerned, the data collected by each collection index is reported to TDengine in multiple messages, and the time delay cannot be determined.

This situation requires greater flexibility. The general processing idea is as follows:

  1. Create a single-column super table, that is, timestamp + collection value, and add the label of the device ID of the collection index to the label item.

  2. Create a separate table for each collection indicator of each device, and add a specific device ID to the label item.

  3. The collection indicators of different data types should be classified into different super tables. For example, the data of numeric type, Boolean type and string type should be classified into the corresponding super tables of the three types.

4. Build a table according to the data collection point

For each data collection point, a table needs to be created independently. When creating, you need to use the super table as a template, and specify the specific value of the label

Application scenario:

        1. For the same type of equipment, the collection indicators are not fixed.

        2. As far as the same device is concerned, each collection index has its own collection frequency, but the collection frequency is determined. For example, the collection frequency of PLC1~PLC10 is 10ms, and the collection frequency of PLC11~PLC20 is 20ms.

        3. As far as the same device is concerned, PLC channels may be dynamically added later. It is expected that the columns of the super table can be dynamically increased without modifying the code.

A super table can be established for each type of data collection point. In the Internet of Things, a device may have multiple data collection points (such as a wind turbine, some collection points collect current, voltage and other parameters, Some collection points collect environmental parameters such as temperature, humidity, and wind direction). At this time, for this type of equipment, multiple super tables need to be established.

Use the indicators with the same collection period under the same device, that is, parameters with the same time stamp, as a collection point to create a super table.

The data of each collection point corresponds to a subtable.

The above modeling ideas can basically deal with most business scenarios. Of course, there are various situations in actual business, and more complex business scenarios can be discussed together.

Guess you like

Origin blog.csdn.net/weixin_42599091/article/details/128674166