HUAWEI CLOUD GaussDB (for Influx) Reveals Phase 6: Data Hierarchical Storage

Abstract: GaussDB (for Influx) separates hot and cold data storage, provides high-performance storage for massive data while saving 85% of storage costs, and efficiently meets various scenarios of time-series applications.

This article is shared from Huawei Cloud Community " Huawei Cloud GaussDB (for Influx) Revealing Issue 6: Data Hierarchical Storage ", author: Gauss Influx official blog.

"It costs more than 2 million a year to just store this data?"

Facing the boss's doubts, Xiao Wang explained the evaluation plan again. In order to support production analysis and system operation and maintenance, a single device needs dozens of detection point data. All devices continuously collect data 24 hours a day, and the amount of data in one day will reach TB level. These data are stored for at least 2 years, plus high availability. 3 copies, the total data volume will reach PB level.

Xiao Wang also showed the survey results of the current cloud vendor storage price and performance comparison:

The performance of different storage varies greatly. For example, the throughput of NVME disk is 7 times that of SATA disk, and the IOPS is more than 20 times. Of course, the corresponding cost is also about 10 times higher. According to the test and evaluation, the low-cost storage performance cannot satisfy the writing of a large amount of data and the monitoring of real-time services, so high-performance SSD disks have to be used, which leads to higher storage costs.

As the cost increases, the boss is naturally dissatisfied. So how can we meet performance requirements while keeping costs under control? Xiao Wang thought, "Actually, not all data processing requires high performance. If you put high-value data on high-performance disks to meet business needs, and low-value data on low-cost disks to reduce costs, this way Wouldn’t it be possible to meet demand and reduce costs?”

However, the idea is beautiful, and the reality is cruel. To realize this plan, Xiao Wang faces more problems:

(1) How to use both high-performance storage and low-cost storage in one system?
(2) How to distinguish high-value data?
(3) How to automatically dump high-value data into low-value data?
(4) The amount of current business transformation should be as small as possible.

1. GaussDB (for Influx) solution

Data is the foundation of digital transformation of enterprises. In order to grasp the equipment and system status in real time, a large amount of data needs to be collected and processed in real time. These data are all time series data with obvious characteristics, such as timestamps, few updates, and unique data sources. In addition to the characteristics of the data itself, it also has the following characteristics in business applications:

  • Over time, its probability of being queried and analyzed becomes lower and lower.
  • As time goes by, the real-time requirements for data analysis are getting lower and lower.
  • Over time, the accuracy requirements of the data are getting lower and lower.
  • The data is only kept for a certain period of time and will be deleted when it expires.

How to combine the characteristics of time series data to realize Xiao Wang's wish to meet business performance and control costs? The hierarchical data storage function of HUAWEI CLOUD's GaussDB (for Influx) time series database perfectly solves the problems that plagued Xiao Wang.

1. HUAWEI CLOUD GaussDB (for Influx) relies on cloud native capabilities to realize a distributed architecture that separates computing and storage. The storage is based on Huawei's distributed storage DFV and object storage OBS, which solves the problem that high-performance storage can be used in one system. , and the problem of using low-cost storage, its specific architecture is as follows:

Distributed DFV storage is high-performance storage, and hot data is stored in DFV to ensure business performance requirements; OBS storage is low-cost storage, and cold data is stored in OBS to reduce customer costs.

2. Provides a solution for automatic separation of hot and cold data. When creating a retention policy, the user can specify the division time of hot and cold data. The system automatically divides the data into hot data and cold data according to the user's designation, which solves how to divide the data. The problem.

3. Over time, the hot data becomes cold, and the system will automatically dump the data to the cold storage.

4. In the above process, only the hot and cold data policy needs to be specified when the RP is created. It is unaware of the business side, avoiding business adaptation and transformation.

2. The use of hot and cold storage in GaussDB (for Influx)

The storage tiering function of GaussDB (for Influx) is very convenient to use. After purchasing cold storage, you can specify the cold storage time when setting the RP policy. The system will automatically dump the cold data to the low-cost storage according to the RP policy. When a business accesses cold data, the system will automatically read it from the cold storage. The business is not aware of the entire process and has no impact on the business.

2.1 Purchasing Cold Storage

GaussDB (for Influx) supports one-click purchase of cold storage space. You can choose whether to purchase cold storage space when purchasing an instance. If you select "Yes", you can choose the size of cold storage according to business needs, as shown in the following figure:

You can also purchase cold storage independently after purchasing an instance. Go to the instance details page and click Create cold storage space, as shown in the following figure:

Jump to the following page and select the storage space size according to your business needs:

The cold storage space also supports online expansion, and the expansion process does not affect the business.

2.2 Purchase cold storage

After purchasing the cold storage space, you can set the rules for cold data according to business requirements. The system will automatically divide the hot and cold data according to the rules, and store the cold data in the cold storage space. You can specify rules for hot and cold data by creating RPs. Specific examples are as follows:

// 在db名为mydb上创建名为myrp的RP,显示指定WARM DURATION为6d,表示6天前的数据是冷数据。
create retention policy myrp on mydb duration 30d replication 1 warm duration 6d shard duration 3d

// 在db名为mydb上创建名为myrp的RP,没有指定WARM DURATION,表示没有冷数据。
create retention policy myrp on mydb duration 30d replication 1 shard duration 3d

// 创建名为mydb的db,并带有名为myrp的RP,显示指定WARM DURATION为3d,表示3天前的数据是冷数据。
create database mydb with duration 6d warm duration 3d name myrp

// 修改WARM DURATION为7d,表示7天前的数据是冷数据。
alter retention policy myrp on mydb warm duration 7d

After the rules are set, the system will automatically determine which data is cold data according to the specified rules, and automatically dump the data to cold storage.

2.3 Purchasing cold storage

After the cold data rule is set, after inserting data for a period of time, the system will automatically determine whether the data has become cold data. If it has become cold data, the system will automatically dump the data to cold storage. You can view the status of the data through the show shards command, as follows:

> show shards
name: hsdb
id database retention_policy shard_group start_time           end_time             expiry_time          owners tier
-- -------- ---------------- ----------- ----------           --------             -----------          ------ ----
5  hsdb     myrp             2           2019-08-12T00:00:00Z 2019-08-19T00:00:00Z 2019-08-19T00:00:00Z 4      cold
6  hsdb     myrp             2           2019-08-12T00:00:00Z 2019-08-19T00:00:00Z 2019-08-19T00:00:00Z 5      moving
7  hsdb     myrp             2           2019-08-12T00:00:00Z 2019-08-19T00:00:00Z 2019-08-19T00:00:00Z 6      warm
8  hsdb     myrp             2           2019-08-12T00:00:00Z2019-08-19T00:00:00Z 2019-08-19T00:00:00Z 7     

cold: indicates that the data is cold data and has been stored in cold storage;

moving: Indicates that the data is cold data, and the data is being dumped into cold storage;

warm: Indicates that the data is hot data.

3. Summary

After applying the hot and cold tiered storage solution of GaussDB (for Influx), the data volume of 100T is stored for one year. According to the data within one month, the data is hot data, and the rest is cold data. The overall storage cost is reduced from 2.5 million to 2.5 million. 375,000, which can save 85% of storage costs.

In addition to the hot and cold tiered storage function, GaussDB (for Influx) has also been deeply optimized in terms of clustering, read and write performance, compression ratio, and high availability, which can better meet various scenarios of time series applications.

 

Click Follow to learn about HUAWEI CLOUD's new technologies for the first time~

{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/4526289/blog/5516600