Analysis of the principle of Tsinghua self-developed time series database Apache IoTDB

Cloud Intelligence AIOps community is initiated by Cloud Intelligence, aiming at operation and maintenance business scenarios, providing an overall service system of algorithms, computing power, data sets, and a solution exchange community for intelligent operation and maintenance business scenarios. The community is committed to spreading AIOps technology, aiming to solve technical problems in the intelligent operation and maintenance industry with customers, users, researchers and developers in various industries, promote the implementation of AIOps technology in enterprises, and build a healthy and win-win AIOps developer ecosystem.

Data characteristics in the field of intelligent operation and maintenance

As an important observation item in operation and maintenance scenarios, indicator data is the main data source for scenarios such as service availability monitoring and system health measurement. As can be seen from the following architecture diagram, the collector collects various indicator data on the server, sends it to the message queue, and finally stores it in the database through real-time stream processing and offline computing.

In this above scenario, we often encounter the following data challenges:

  1. The number of indicators that we need to monitor on a daily basis exceeds one million, and even reaches tens of millions at peak times. The amount of indicator data accumulated every day reaches GB level, or even TB level.
  2. The daily analysis behavior of indicator data usually involves various time spans such as nearly 1 hour, nearly 1 day, nearly 7 days, nearly 30 days, and nearly 1 year. There are certain requirements for the performance of range queries.

3) In the process of data transmission, due to network, equipment resources and other reasons, problems such as out-of-order arrival, missing points, peak and valley tides, and repeated data occur in a short time.

4) Due to the server or device itself, the time of the collected index data is often not accurate enough, resulting in the problem of uneven data granularity. For example, for a second-level indicator, the timestamp of the last collected data point is 2021-01-01 10:00:00:000, and the next data point may be 2021-01-01 10:00:01:015. The timestamps of the data points collected at the same time for different indicators are 2021-01-01 10:00:00:000 and 2021-01-01 10:00:00:015 respectively.

So far, the overall requirements have been basically clarified, and the following requirements need to be met when making database selection:

1) Support long-term storage of data;

2) Support fast retrieval of large time span;

3) High-speed data throughput;

4) Efficient data compression ratio;

5) It can effectively solve data quality problems such as disordered data, missing values, uneven granularity and repeated data.

How to store time series data in the field of intelligent operation and maintenance

How should we choose a model for the above requirements? Is it a traditional relational database, a general-purpose NoSQL database, or a dedicated time series database? Can they meet the needs of the database selection above?

How data is stored also needs to be combined with the characteristics of the data itself. Here is a case in a real scenario. An operator has about 30 million monitoring indicators, and there are situations such as null data, missing data, data duplication, etc. during the collection process, and even new indicators may appear. If the indicators are collected once a minute and a certain data delay is allowed, the write rate must exceed 50w/s, and 43.2 billion data needs to be stored in one day. This is very important for relational databases, both in terms of write rate and query timeliness. Difficulty meeting demand.

Let’s take a look at the general NoSQL database. First, we briefly sort out the characteristics of these indicator data. We find that in addition to timestamps and indicator values, these indicator data will also have some tags to identify which machine the data comes from. A sample of the collected data is shown in the following figure:

Although the general NoSql database can meet the throughput performance and query performance, in order to meet the dynamic change of indicators, we can only model one table for one device or one table for multiple devices, as shown in the following figure.

Whether it is one table for one device or one table for multiple devices to be stored, in order to distinguish which indicator the data comes from, we can only store tags as a column. It is not difficult to find that there will be a lot of The problem of redundant storage of tag data. And the general NoSql database is often not friendly in solving the problem of data duplication, and more relies on some reloading strategies to achieve. There are usually two strategies for re-arrangement: one is to rely on external re-arrangement methods to achieve that the data has already been re-arranged when stored, and the other is to rely on SQL for re-arrangement when storage does not exclude queries. If the first type of data reordering is used, it will undoubtedly increase the complexity of the system, and if the second type is used, it will lead to redundant data storage during processing. And there is a problem with the general NoSql database: there is no native operation to support granular snapping or linear filling to solve the problem of poor data quality.

However, some of the data challenges we encountered above are actually typical problems to be solved by time series databases. There are also many excellent time series databases on the market, such as InfluxDB, Apache IoTDB, etc., which have high throughput, low latency query, and data deduplication. , data filling, data downsampling, high compression ratio and other functions can meet the data storage requirements in Chapter 1. Next, let's focus on how the fully open source Apache IoTDB is designed.

Design of IoTDB

Architecture of IoTDB

IoTDB is a columnar storage database designed based on the architecture of LSM-Tree (Log-Structured Merge Tree). The core idea of ​​LSMtree is to give up partial read capability in exchange for maximum write capability. From the overall architecture diagram of IoTDB in the figure below, we can see that IoTDB is mainly composed of three parts: database engine, storage engine and analysis engine.

The database engine is mainly responsible for sql statement parsing, data writing, data query, data deletion and other functions.

The storage engine is mainly composed of TsFile, which is also the most distinctive design of IoTDB. It can not only be used for the IoTDB storage engine, but also can be directly used by the analysis engine through the linker. You can get the content directly through the API.

The analysis engine is mainly used for docking with open source data processing platforms.

Data reading and writing process of IoTDB

As mentioned above, IoTDB is implemented based on the idea of ​​LSM-Tree. From the following data writing flow chart, we can see that when the data passes through the time detector, it will judge whether the data is in order according to the maximum timestamp maintained in the memory. The memory buffer memtable is divided into ordered sequence and disordered sequence. At the same time, in order to ensure that data is not lost after power failure, IoTDB will also write data to WAL (Write-Ahead Logging), and the data written to this client is written. just finished. With the continuous writing of data, after the data in the memtable reaches a certain level, IoTDB turns the memtable into Immutable through the submit flush task and finally flushes it to the disk to become an Sstable, that is, a TsFile file. At the same time, when the persistent TsFile file reaches a certain level, the Trigger a merge.

After introducing the writing process of IoTDB data, let's look at the query process of IoTDB core. As shown in the figure below, when the client sends a query request, it first performs SQL parsing through Antlr4, and then goes to the MemTable, ImmuTable in memory and TsFile in the hard disk to query. Of course, IoTDB will use BloomFilter and indexes to improve data query efficiency. We know that the principle of BloomFilter is that if the hash result does not exist, there must be no such data. If the hash result exists, IoTDB needs to continue to use the index for further search.

TsFile structure

In the previous section, the reading and writing process of IoTDB is inseparable from TsFile, so let's take a look at the structure of the core TsFile of IoTDB. It can be seen from the following schematic diagram of the TsFile structure that the TsFile is divided into two parts: one part is the data area, and the other part is the index area.

The data area mainly includes Page data page, Chunk data block and ChunkGroup data group. The Page consists of a PageHeader and a piece of data (time-value encoded key-value pairs), the Chunk data block consists of multiple Pages and a Chunk Header, and the ChunkGroup stores the data of an entity for a period of time, which consists of several Chunk, consists of a byte separator 0x00 and a ChunkFooter.

The index area mainly includes TimeseriesIndex, IndexOfTimeseriesIndex and BloomFilter, in which TimeseriesIndex contains a list of header information and data block index (ChunkIndex), the header information records the data type and statistical information (maximum and minimum timestamps, etc.) of a time series in the file; data block The index list records the offset of each Chunk of the sequence in the file, and records related statistical information (maximum and minimum timestamps, etc.); IndexOfTimeseriesIndex is used to index the offset of each TimeseriesIndex in the file; BloomFilter is the Bloom filter for entities. In the figure below, TsFile includes two entities d1 and d2, and each entity includes three physical quantities s1, s2, and s3, totaling 6 time series, and each time series includes two Chunks.

TsFile index build

All index nodes in TsFile form a multi-fork index tree with a B+ tree structure. This tree consists of two parts: the entity index part and the physical quantity index part. Let's take an example to show the composition of the index tree: Suppose we set the degree of the tree to 10, we have 150 devices, each device has 150 measurement points, a total of 22,500 time series, then we need to build an index with a depth of 6 The tree is enough. At this time, we need to do 6 disk IOs to query the location of the data, as shown in the following figure.

The above method seems to have more disk IO times. This is because the degree of the tree we set is relatively small, resulting in a relatively large depth of the overall tree. If we increase the degree of the tree to 300, a subtree with a height of 2 in the physical index part can store 90,000 devices. Similarly, an index part with a height of 2 physical quantities can also store 90,000 physical quantities. The entire index tree can store 8.1 billion time series. At this time, we only need to do 4 disk IOs to locate the data location we need to read the data.

TsFile query process

After understanding the structure of TsFile and the construction of the index, then how IoTDB completes a query inside TsFile, the following uses a specific query such as select s1 from root.ln.d1 where time>100 and time<200, to demonstrate in TsFile how to locate the required data. His specific steps and schematic diagram are as follows:

1) Read TsFile MetadataSize information

2) Get the location of TsFile MetaData according to TsFile MetadataSize and offset

3) Read the data in the Metadata IndexNode and locate the device root.ln.d1 through the name in the MetadataIndexEntry

4) Read the offset offset in the device root.ln.d1, find the information in the TimeSeries Metadata according to the offset, and then find s1

5) Comparing the statistical information startTime and endTime recorded by ChunkMetadata with the interval (100, 200) we queried to obtain the offset of the measurement point s1 under the root.ln.d1 device

6) According to the offset in s1, the ChunkGroup can be directly obtained

7) Locate the Chunk data through ChunkGroupHeader and ChunkHeader in turn and read the PageHeader in the Page in turn. If the time interval is in (100, 200), then we directly read the PageData data.

Summarize

This article briefly introduces the reading and writing of IoTDB and the design of the core file of TsFile. In fact, the overall design and implementation of IoTDB is relatively complicated, and there are many details involved, which will not be introduced here.

The latest Meetup preview:

write at the end

In recent years, under the background of the rapid development of the AIOps field, the urgent needs of IT tools, platform capabilities, solutions, AI scenarios and available data sets have burst out in various industries. Based on this, Cloud Wisdom released the AIOps community in August 2021, aiming to build an open source banner, build an active user and developer community for customers, users, researchers and developers in various industries, and jointly contribute to and solve the industry. problems and promote technological development in this field.

The community has open sourced the data visualization orchestration platform-FlyFish, the operation and maintenance management platform OMP , the cloud service management platform-Moore platform, Hours algorithm and other products.

Visual Orchestration Platform-FlyFish:

Project introduction: https://www.cloudwise.ai/flyFish.html

Github address: https://github.com/CloudWise-OpenSource/FlyFish

Gitee address: https://gitee.com/CloudWise/fly-fish

Industry case: https://www.bilibili.com/video/BV1z44y1n77Y/

Some large screen cases:

Please get to know us through the link above, add a little assistant (xiaoyuerwie) note: flying fish. Join the developer exchange group and have 1V1 exchanges with big names in the industry!

You can also obtain cloud intelligence AIOps information through the little assistant, and learn about the latest progress of cloud intelligence FlyFish!

{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/yunzhihui/blog/5507844