KaiwuDB multi-mode database-timing performance optimization

With the rapid development of the Internet of Things field, the demand for the generation and processing of time series data continues to grow. In order to meet the requirements of real-time, efficiency and accuracy, the database needs to be optimized for timing performance to provide fast data writing, real-time query and efficient data storage and processing capabilities.

This live broadcast introduces the characteristics of time series data and time series databases, as well as analysis based on the TSBS time series test standard, and analyzes the time series model architecture and optimization design of KaiwuDB based on this.

1. Basic concepts of timing

1. Basic concepts of timing

Time series data refers to data with time tags, which is mainly collected and generated by various real-time monitoring, inspection and analysis equipment in electric power, chemical industry, meteorology, geographical information and other industries.

In order to facilitate the explanation of basic concepts, the solar power generation panel of the microgrid is used as a typical time series data scenario. Assume that each power generation panel collects three quantities: current, voltage, and temperature, and there are multiple solar panels.

  • Measurement: a collection of equipment of a type;

  • Data source: a specific device;

  • Tags: Description tags of a device;

  • Timestamp: The collection time of this time series data.

2. Characteristics of time series database

The full name of time series database is time series database. Time series database is mainly used to process data with time tags (changing in the order of time, that is, time serialization).

Basic Features:

  • Processing of large data volumes;

  • High compression ratio;

  • A copy of storage for redundant duplicate data;

  • Sequence partitioning processing between ;

  • Generally there is no transaction related processing.

2. TSBS test standard description

1. TSBS test standard

TimeScale open source project:

  • Generation and writing of time series data;

  • Typical queries for time series scenarios.

Two typical application scenarios:

  • DevOps – Ordered time series data of server CPU monitoring scenarios;

  • IoT – IoT truck fleet scenario suffers from unordered and missing time series data.

2. DevOps scenario (CPU-only)

CPU-only scenario features:

  • The data intervals are all 10 seconds;

  • Scenario five has a maximum data volume of 180 million records, and scenario four has a minimum data volume of 18 million records;

  • Scenarios four and five have a larger number of devices and only cover a 3-minute time span.

3. Analysis of TSBS statements of different categories

3. KaiwuDB multi-mode database timing engine

1. Basic execution architecture

  • application layer;

  • SQL Engine;

  • Distribution layer;

  • Storage Engine。

Generally there is no transaction related processing.

2. Timing optimization and transformation

2.1 Storage structure optimization

In view of the characteristics of large and increasing amount of time series data, and some data are static values, the storage structure has been evolved as follows:

  • Large table, all devices are written to one table;

  • Divide tables, one table for each device;

  • Partition, divide the data area according to time;

  • Combined tables, a group of partial devices, merged storage of static attributes, and time partitioning at the same time

2.2 KaiwuDB performs computing architecture optimization

For the timing model, KaiwuDB has made a series of execution architecture adjustments:

  • The actuator is lowered;

  • Use mmap technology to reduce data copying;

  • Partition parallelism;

  • data clipping;

  • Customized execution plan;

  • Special timing operators such as Timebucket;

  • Multi-level dynamic parallelism.

2.3 KaiwuDB timing statistics information

KaiwuDB customizes and implements a set of time series precomputed statistical information for the characteristics of time series queries. Its characteristics are as follows:

  • The timing table is a special composite table;

  • The template table corresponds to the tag table;

  • The instance table is only an index in the corresponding tag table, not a complete table;

  • When writing data, tag items can be dynamically created and data written;

  • The tag table supports basic statistical information, such as TSBS;

  • Query supports general data reading;

  • Special queries can be pushed down, such as multi-tag query push-down and single tag specific aggregated data;

  • Data blocks are partitioned by time, increasing block statistics.

 

Tang Xiaoou, founder of SenseTime, passed away at the age of 55 In 2023, PHP stagnated Wi-Fi 7 will be fully available in early 2024 Debut, 5 times faster than Wi-Fi 6 Hongmeng system is about to become independent, and many universities have set up “Hongmeng classes” Zhihui Jun’s startup company refinances , the amount exceeds 600 million yuan, and the pre-money valuation is 3.5 billion yuan Quark Browser PC version starts internal testing AI code assistant is popular, and programming language rankings are all There's nothing you can do Mate 60 Pro's 5G modem and radio frequency technology are far ahead MariaDB splits SkySQL and is established as an independent company Xiaomi responds to Yu Chengdong’s “keel pivot” plagiarism statement from Huawei
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/5148943/blog/10150346