HiTSDB Time Series Database Technical Architecture and Product Analysis

Abstract: At the Alibaba Cloud Database Technology Summit on August 24, Zhong Yu, a senior expert from the Alibaba Database Division, gave a speech on HiTSDB time series database. This article mainly starts with time series data, including the characteristics of time series data, then introduces the time series data business scenarios, and the optimization of OpenTSDB on HBase, and finally shares the optimization and improvement of HiTSDB.

Introduction to time series data

       Time series data is a series of values ​​distributed in time. Time and value are two keywords. Time series data generally refers to indicator data, such as stock prices, advertising data, temperature changes, PV/UV of websites, personal health data, industrial Sensor data, and performance monitoring about applications, like server system monitoring data, such as cpu and memory usage, and the Internet of Vehicles.

       According to statistics, in the field of big data, time series data will exceed half.

 

       The picture shows the monitoring data of advertising. You can see that three advertising sources are tracked in the example, and each source tracks three indicators, including how many times it is displayed, how many times it is clicked, and how much revenue it generates. Guangao sources are differentiated by different tags, such as who published them, advertisers, the gender of the target users, and the country in which they were published. You can clearly see that each indicator has different values ​​at different time points, which constitutes a series of time data. The left side becomes the data source, the middle becomes the metric, and the right side is called the time series. The time series has different values ​​in time.

 

       If you model a time series, there are two ways, one is single-valued, the other is multi-valued. Single value is to treat each value of each indicator of each data source as a row. The multi-value model is to put different indicators of the same data source in different columns, that is, each data source will only generate one row of data at each time point.

       Multivalued models can certainly be simulated by single-valued models. Multivalued models are more convenient when dealing with certain data, but single-valued modeling can simulate all scenarios.

 

       The processing of time series data is different from the general database processing. Generally, the database is based on rows, and each data point is a row. Time series data processes data according to the timeline. The data on each timeline is very related. For example, the revenue of a certain advertising source constitutes a time series at different times. The revenue in these time series can be drawn as a change curve. We can do time series change processing for the curve. The most common are interpolation and precision reduction. Due to the sampling of the data source, some points are often lost. We use interpolation to insert common linear interpolation or zero-value compensation in the middle; if the advertising data does not necessarily require the finest time granularity, we can reduce the accuracy. The way to reduce the precision is different.

 

       For time data, there is also one of the most common processing - aggregation, we often look at not only indicators from one data source, if we look at the sum of all revenue generated by a certain advertising source in North America over a period of time , we need to pick out all the timelines corresponding to the advertising source tags, and then add the advertising revenue time points together to get a new summation curve, as shown in the figure, we have found a lot of timelines, Finally aggregate them together with some kind of aggregate function. The aggregation method required by each business is also different. For example, the advertising data is added up, or averaged by advertising sources, it is also possible to find the maximum and minimum values, or statistical things, such as 99% of the values ​​are above a certain value .

Characteristics of time series data

       Therefore, we conclude that the characteristics of time series include the following aspects:

1. Continually generate large amounts of data. Whether it is advertising monitoring or sensors, temperature, it targets many situations. For example, to monitor the power consumption of lights in an industrial park, there will be sensors for each light to transmit the power consumption of the lights in real time. If the sampling interval is one second, each light will generate a data point every second, and tens of thousands of lights will be generated. There will be tens of thousands of writes per second, and if there are many buildings involved, millions of writes per second will be generated.

2. The data generation rate is stable without obvious peaks and valleys. This brings the advantages and disadvantages of optimization. The advantage is that there will be no obvious peaks, so it is more convenient to do capacity evaluation; the disadvantage is that there is no way to do data merging and compensation in leisure time.

3. The recent data is more concerned.

4. Old data, rarely accessed, or even no longer needed. Therefore, time series data generally requires a data rollback function.

5. The data has labels of multiple dimensions.

6. It is often necessary to aggregate data for display or use.

Time series data business scenarios

Alibaba Eagle Eye System

The Alibaba Eagle Eye system can track the system calls of a distributed system, including monitoring application indicators. Including the system memory processor usage and the TPS and QPS of the application itself, in the actual internal Alibaba system, the peak value of writing last year was 5.7 million points per second, and the average writing point was 3.5 million points per second, resulting in tens of thousands of different The metric, tens of millions of time series, each time series has an average of 5 dimensions (tags), hundreds of aggregated impressions per second.

Alibaba Smart Park

 

       There is also the Internet of Things. The data of the Internet of Things is very similar to system monitoring. You can imagine the sensors in the device as different indicators of the server application. For example, there are indicators for each lamp and the temperature of each air conditioner outlet. The incoming data spans two cities, three campuses, and has tens of thousands of devices, generating millions of collection points per second, requiring data writes to be available immediately.

When traditional databases encounter time series data

 

       There are some problems when time series data is stored in traditional databases. For example, time series data are directly stored in relational databases (such as MySQL's InnoDB engine) and analyzed using SQL statements. Here you will encounter the following problems:

1. The tag is stored repeatedly, and the storage cost is high. If the model is turned into a MySQL row, each data point will generate an independent row, that is, the tag will be stored repeatedly, and each data point in a time series needs to be saved to the tag Repeat the storage, so that all data points in the series can be found with labels.

2. The multi-dimensional problem can be partially solved by the joint index, but the storage overhead is further increased.

3. The B-tree index generates a large amount of random IO during continuous writing, and the writing performance drops rapidly. The multi-joint index aggravates the problem of slow writing. InnoDB uses B+Tree for indexing. Different combinations of tags create different indexes. When writing data, the data written by different tags is sorted differently. Most tags will be randomly inserted into the index when they are written.

4. The large amount of data causes the index/data to easily exceed the memory capacity, and the search/aggregation performance is not high. When querying, it will cause a lot of disk IO overhead.

5. Reduced precision SQL subqueries are difficult to optimize by the SQL optimizer.

Optimization of OpenTSDB on HBase

 

       The OpenTSDB time series database architecture is shown in the figure. Its storage is based on HBase, which has high performance and can be linearly expanded. TSD is designed as a stateless node. Any node can replace another node for services at any time. The RPC protocol provided externally is HTTP/Json interface, which is very convenient to use. TSD relies on HBase to solve consistency. When TSD generates a write, it will update the write to HBase in time, and when a read miss occurs, it will also read the data back from HBase.

 

       The OpenTSDB storage format is shown in the figure, and its storage has done a lot of consideration and optimization for timing.

1. The core point is to compress tags and divide tags into two levels. The first level has a table to convert tags into ids. All tags id, indicator name and time are combined into row key and row key. In fact, it is the row key saved by OpenTSDB. Only the row key is repeatedly stored in each row. The row key is actually converted into an integer for each tag, and the row key is relatively short.

2. Each row key corresponds to a time series + a timestamp. The data in the time series always exists on the time series. In theory, the row key does not need to be stored repeatedly, so one hour of data is stored in the same row. The storage format of OpenTSDB There are 3600 columns in each row, and each column corresponds to a point within an hour, and the time boundary, indicator name, and id of the hour together constitute the row key. The space for repeated storage of the row key becomes 1/3600 of the row-by-row design, which greatly compresses the row key.

3. The construction of Row key has also been well designed. The time is between metrics and tags, and there is no need to pre-define the data format, which ensures flexibility; when scanning tags, we often encounter aggregation, and we need to find a certain A series of timelines of tags, in the OpenTSDB scenario, when the tags and search conditions just meet the prefix rules, it can be well optimized. If there are three tags in the park, floor, and floor to construct a row key, if you search for the park, the scanned data It is the data we want, and other park data will be discharged. The common problem of HBase is to generate hot spots. OpenTSDB uses the salt mechanism to ensure hot spots.

Disadvantages of OpenTSDB

OpenTSDB also has many disadvantages, as follows:

1. The Meta Data of the time series exists in all TSD nodes in a cached manner. When there are too many time series, the memory pressure is very high.

2. Do multi-dimensional query in the way of RowScan. When the query condition does not meet the prefix of RowKey, many useless RowKeys will be scanned.

3. Save time points within one hour in a fixed Column, Qualifier has additional overhead

4. Single-point aggregation is prone to aggregation performance bottlenecks (cpu & memory)

5. General compression algorithm, the compression rate is still not ideal (each data point consumes about 20 bytes, including the overhead of RowKey)

Optimization and improvement of HiTSDB

Inverted index

       Referring to the implementation of the inverted index of the search engine, each time series is regarded as a document, and the inverted index of tags is used to obtain the time series ID, and the timestamp + time series ID is used as the RowKey to replace the RowKey spliced ​​by timestamp + metric ID + tag IDs. .

The problems and comparisons solved by using inverted indexes are as follows:

1. Fragmentation and consistency of inverted index in the cluster, solution: BinLog is written to HDFS, one BinLog file per shard

2. The problem of sharding strategy: by metric, by specific tag, or by metric+tags?

3. Inverted index speeds up multi-dimensional arbitrary conditional queries

4. Inverted index can easily realize the input prompt of metric and tagkey/tag value

5. RowScan vs mget

6. Reading data from HBase is the bottleneck, including network throughput and disk IO

High compression ratio algorithm

 

       We generally think that the most recent data is the hottest, and we hope that the most recent data can be completely cached in memory, but the amount of time series data is relatively large, so we need to use a high compression ratio algorithm: on average, each time point is compressed to 1.37 bytes. The timestamp is compressed by delta-delta, and the value is compressed by binary xor.

       The high compression ratio enables the data of the recent period (several hours) to be completely cached in memory, avoiding the mget operation of HBase when querying. The decompression speed is very fast, and the precision reduction can be processed at the same time during the decompression process, reducing the memory overhead.

Pre-fall accuracy function

       We have made pre-drop precision. HiTSDB will calculate the data according to many predicted drop levels before writing. There will be some logical problems with the pre-drop precision, including the following aspects:

• Data aging vs pre-descent accuracy

• Pre-fall precision level and extra space overhead

• Combination of pre-drop accuracy and real-time drop accuracy

• Problems with averages

• Precise calculation vs rough calculation, count P99 on pre-descent precision data

• Time windows and data modifications

This article is the original content of Yunqi Community, and may not be reproduced without permission. If you need to reprint, please send an email to [email protected]

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326221818&siteId=291194637