Interpretation of HUAWEI CLOUD GaussDB (for Influx): Best Practice Data Modeling

This article is shared from Huawei Cloud Community " Huawei Cloud GaussDB (for Influx) Revealing Issue 7: Best Practice Data Modeling ", author: GaussDB database.

HUAWEI CLOUD's GaussDB (for Influx) time series database provides data security, high performance, low storage cost, and O&M-free capabilities for industrial IoT scenarios with massive time series data. It has attracted more and more enterprises' attention. Features such as simple use, SQL-like query statements, no schema design required, and suitable for rapid business iteration, are increasingly recognized by developers.

However, as the scale of the business continues to increase, there will also be problems such as skyrocketing timelines, high query latency, and sometimes inconsistency in querying data due to the same name of Tag and Field. The fundamental reason is that there is no good data model design during use. . This issue will start with the GaussDB (for Influx) data model, share the best method of GaussDB (for Influx) data modeling, and avoid some common problems in the use process.

0 1  Data Model and Key Concepts

  • Database

It is the same concept as Database in MySQL.

Create command: CREATE DATABASE "mydb".

User permissions and data retention policies are set at the granularity of Database. For example, give the user read-only permission to the "mydb" database: GRANT read ON mydb TO username.

  • Measurement

Similar to the Table concept in MySQL. The difference is that GaussDB (for Influx) is Schemaless, Measurement does not need to be created in advance, nor does it need to design the fields and types in the table. Measurements are automatically created when data is written, and fields can be added or subtracted arbitrarily, but the data types of the same fields must be consistent.

  • Retention Policy(RP)

The data retention policy is a concept that does not exist in relational databases. It is specially designed for time series scenarios. It means specifying the maximum retention time of data in the database, and expired data will be automatically cleaned up.

  • Tag

Data source identifier, only supports string type

  • Field

Collect indicators, support string, float, int, bool types

  • Line Protocol (Data Model)

As shown in the figure, when writing data to GaussDB (for Influx), a single piece of data consists of six parts: measurement, Tag_key, Tag_value, Field_key, Field_value, and timestamp. <Tag_key= Tag_value> can be one or more, <Field_key=Field_value> can be one or more, and each piece of data must carry a timestamp. 

  • Point _

Point usually contains four parts measurement+Tags+Field+timestamp. For example, the following data contains 2 Points.

<monitorInfo,area=“葡萄花”,,device=“钻机A” pressure=1.8,level=35 1650443961100400200>
Point1:
<monitorInfo,area=“葡萄花”,device=“钻机A”,pressure=1.8 1650443961100400200>
Point2:
<monitorInfo,area=“葡萄花”,device=“钻机A”,level=35 1650443961100400200>

That is, how many Field Keys a piece of data contains can simply be considered as how many Points exist. In GaussDB (for Influx), a piece of data can contain one Point or multiple Points.

  • Series (Timeline)

In GaussDB (for Influx), we call the combination of an indicator + a set of tags a timeline. Below a timeline, the sampled data at consecutive time points are time series data. For example there is data:

monitorInfo,area=”葡萄花”,device=”钻机A”,pressure=1.8,1650443961100400200
monitorInfo,area=”葡萄花”,device=”钻机B”,pressure=1.6,1650443961100400200
monitorInfo,area=”榆树林”,device=”钻机B”,pressure=1.7,1650443961100400200
monitorInfo,area=”榆树林”,device=”钻机A”,pressure=1.5,1650443961100400200

Represents 4 timelines, namely:

Pressure sensor (pressure) on rig A in Putaohua Oilfield

Pressure sensor (pressure) on Rig B in Putaohua Oilfield

Pressure sensor (pressure) on rig B in Yushulin Oilfield

Pressure sensor (pressure) on rig A of Yushulin Oilfield

0 Best Practices for Data Modeling

Often, data modeling is done to make queries simpler and more efficient. For most use cases, we recommend the following design guidelines:

1. Reasonable design of Tag and Field

  • Tag only supports string type, numeric and boolean data should be designed as Field;

  • Design common query conditions and grouping conditions as tags;

    Because Tag will create an index, and Field will not have an index. For example, in business, the average CPU utilization of a certain machine is often queried:

SELECT mean(cpu)
FROM monitor
WHERE host=“192.168.1.1” AND time > now() – 1h

Or query the average hourly power generation of each wind turbine in a wind farm:

SELECT mean(elect)
FROM monitor
WHERE farm_id=“737f738a-bd63” AND time > now() – 24h
GROUP BY time(1h),device_id

The host, farm_id, device_id in the above query statement should be set to Tag, provided that the string type can be set to Tag.

  • time is a built-in keyword and cannot be used as Tag_key and Field_key;

  • Fields using InfluxQL functions (Max, Min, Count, etc.) are stored as Fields.

2. Follow the naming convention of Tag_Key and Field_Key

  • Do not use reserved keywords as the key (name) of Tag and Field;

  • Tag and Field do not use the same name, otherwise there will be unexpected problems;

  • Tag and Field names should be as short and clear as possible, which can save Index memory space and make queries more efficient;

  • Avoid multi-layer meanings in a Tag, such as machine = "192.168.2.1-Ubuntu", including ip address and operating system name, it is recommended to split into two tags: host and os;

  • It is recommended to set the data with small changes as Tag. For example, the process name can be set to Tag, and the process ID is recommended to be set to Field.

3. Avoid exceeding the number of timelines that the node specification can handle

The corresponding relationship between GaussDB (for Influx) specifications and the number of timelines is as follows:

If the timeline exceeds the limit, the performance will drop sharply, which may affect the business operation. Consideration should be given to expanding the node capacity.

4. Avoid too many tags or

It is recommended to store the same type of business data in one table, such as logistics vehicle monitoring data. Too much business data is placed in the same table, which will cause a surge in the number of tags and fields, which directly affects query efficiency. When there are too many fields, the calculation of each field will be calculated separately, which may cause query timeout when executing fuzzy query.

5. Avoid storing multi-user data in the same Retention policy

The expiration time of different business data is different. It should be stored in different RPs according to the specific needs of the business. Otherwise, the expired data cannot be deleted in time and still occupies storage space, which increases the cost of data storage and affects the query efficiency.

6. Avoid storing multiple user data in the same Database

Since the current permission control granularity of GaussDB (for Influx) is at the DB level, the same Database saves multi-user data, which may easily lead to data being accessed and modified by other users. It is recommended to use separate Databases for different users, and only grant access to a single user.

0 3  Summary

In industrial IoT industries such as manufacturing, energy, agriculture, and electric power, most digital information systems are built on relational databases such as MySQL. However, with the further expansion of enterprise business and scale, and the rapid growth of data volume, relational databases such as MySQL face many problems such as concurrency, storage cost, query performance, scalability, and maintenance, and are gradually being replaced by time series databases.

GaussDB (for Influx) abandons the complicated design rules of relational database paradigm, supports Schemaless design, and business can be modeled in a simple and efficient way. In the face of industrial IoT scenarios with rapid business changes and serious diversification of access devices, GaussDB (for Influx) data modeling is more flexible, compatible with different devices without changing services, and is more suitable for industrial IoT scenarios.

0 4  end

The author of this article: HUAWEI CLOUD Database Innovation Lab & HUAWEI CLOUD Spatiotemporal Database Team
Welcome to join us!
Cloud Database Innovation Lab (Chengdu, Beijing) Resume Delivery Email: [email protected]
HUAWEI CLOUD Spatiotemporal Database Team (Xi'an, Shenzhen) Resume Delivery Email: [email protected]

Click Follow to learn about HUAWEI CLOUD's new technologies for the first time~​

Guess you like

Origin blog.csdn.net/devcloud/article/details/124420697