What is the performance of TDengine 3.0? Teach you to reproduce the TSBS test results in IoT scenarios with one click

Not long ago, based on TSBS, we released the first issue of the TDengine 3.0 test report series - "TDengine 3.0 Comparative Test Report in the DevOps Scenario". The report verified the unique architecture of TDengine based on the time series data scenario, and brought The performance advantage and cost control level. In this issue, we continue to explore the performance of TDengine compared with TimescaleDB and InfluxDB in writing and querying in the IoT scenario——"TDengine 3.0 Performance Comparative Analysis Report in the IoT Scenario is here! ", as a reference for developers of Time Series Database (Time Series Database) selection requirements.

This report shows that in all five scenarios, the write performance of TDengine is better than that of TimescaleDB and InfluxDB. The write performance is up to 3.3 times that of TimescaleDB and 16.2 times that of InfluxDB; in addition, TDengine consumes the least computing (CPU) resources and disk IO overhead during the writing process. In terms of query, for most query types, TDengine's performance is better than InfluxDB and TimescaleDB, and TDengine shows a huge advantage in complex mixed queries - the query performance of avg-load and breakdown-frequency is 426 times that of InfluxDB and 53 times; the query performance of daily-activity and avg-load is 34 times and 23 times that of TimescaleDB. 

In order to facilitate the verification of the report results, this article will explain the test data and environment construction and other links one by one, so that developers who need it can access and copy. In addition, the data in this test report can be generated by one-click script execution after the physical environment is prepared, and the test steps are also covered in this article.

1. Test background

1. Introduction to test scenarios

In this test report, we used the IoT scenario of TSBS as the basic data set to simulate the time series data of a group of trucks in the fleet of a virtual freight company under the framework of TSBS. The diagnostic data (diagnostics) records for each truck include 3 Measurements and 1 (nanosecond resolution) timestamp, 8 tag values; the truck's indicator information (readings) record contains 7 measurement values ​​and 1 (nanosecond resolution) timestamp, 8 tag values. The data schema (schema) is shown in the figure below, and the generated data is recorded every 10 seconds. As the IoT scenario introduces environmental factors, there is unordered and missing time-series data for each truck.

sample data

In the entire benchmark performance evaluation, the following five scenarios are involved. The specific data scale and characteristics of each scenario are shown in the table below. Due to missing data, the average value of a single truck data record is taken:

Scene 1
100 devices × 10 metrics

Scenario 2
4,000 devices × 10 metrics

Scenario 3
100,000 devices × 10 metrics

Scenario 4
1,000,000 devices × 10 metrics

Scenario 5
10,000,000 devices × 10 metrics

data interval

10 seconds

10 seconds

10 seconds

10 seconds

10 seconds

duration

31 days

4 days

3 hours

3 minutes

3 minutes

number of trucks

100

4000

100,000

1,000,000

10,000,000

Number of individual truck records

241,145

31,118

972

16

16

Number of records in the dataset

48,229,186

248,944,316

194,487,997

32,414,619

324,145,090

As can be seen from the above table, the difference between the five scenarios is mainly the difference in the number of individual truck records and the total number of trucks contained in the data set, and the data time interval is maintained at 10 seconds. On the whole, the data scale of the five scenarios is not large. The largest data scale is scenario 5, and the smallest data scale is scenario 4. In Scenario 4 and Scenario 5, due to the relatively large number of trucks, the dataset only covers a time span of 3 minutes.

2. Data modeling

In the TSBS framework, TimescaleDB and InfluxDB will automatically create corresponding data models and generate data in corresponding formats. This article will not repeat its specific data modeling method, but only introduces the data modeling strategy of TDengine. An important innovation of TDengine is its unique data model - create an independent data table (sub-table) for each device, and manage devices of the same collection type logically and semantically through a super table (Super Table) . For the data content of the IoT scenario, we created two tables for each truck (equipment and truck are synonymous in the following) to store the time series data of diagnostic information and indicator information. In the above data records, truck name can be used as the identification ID of each truck, because there are two super tables, so in TDengine, use truck name concatenation d(r) as the name of the sub table. We use the following statement to create a super table named diagnostics and readings, respectively containing 3, 7 measurements and 8 labels.

 

​Then, we create subtables named r_truck_1 and d_truck_1 using the following statement:

 

​It can be seen that for scenario 1 of 100 devices (CPU), we will create 100 sub-tables; for scenario 2 of 4000 devices, 4000 sub-tables will be created in the system to store their corresponding data. In the data generated by the TSBS framework, we found that the label information truck is null data content, so we established the d_truck_null(r_truck_null) table to store all the data that failed to identify the truck.

3. Software version and configuration

This report compares three types of databases: TDengine, InfluxDB and TimeScaleDB. The versions and configurations used are described below.

01 TDengine 

We directly adopt TDengine 3.0, and clone the compiled version of TDengine code from GitHub as the version for performance comparison. gitinfo: 1bea5a53c27e18d19688f4d38596413272484900 Compile, install and run on the server:

 

​In the configuration file of TDengine, six configuration parameters related to query are set:

 

​The first parameter numOfVnodeFetchThreads is used to set the number of Fetch threads of Vnode (virtual node) to 4; the second parameter queryRspPolicy is used to enable the query response fast return mechanism; the third parameter compressMsgSize allows TDengine to exceed 128000 bytes on the transport layer The message is automatically compressed; the fourth parameter is to enable the built-in FMA/AVX/AVX2 hardware acceleration if the CPU supports it; the fifth parameter is used to enable the filter cache of the tag column; the sixth parameter is used to set The number of threads in the task queue is 24. 

Since the number of tables in the IoT scenario is twice the size of the truck, TDengine creates 12 vnodes by default, that is, the created tables will be randomly assigned to 12 virtual nodes according to the table names, and the LRU cache is set to the last_row cache mode. For Scenario 1 and Scenario 2, stt_trigger is set to 1. At this time, TDengine will prepare a Sorted Time-series Table (STT) file to accommodate the data when the amount of writing in a single table is less than the minimum rows. When the STT file cannot accommodate new When saving data, the system will organize the data in STT and then write it into the data file. In this report, the stt_trigger is set to 8 in scenario 3, and stt_trigger is set to 16 in scenarios 4 and 5, that is, a maximum of 16 STT files are allowed to be generated. For scenarios with many low-frequency tables, it is necessary to moderately increase the value of STT to obtain better writing performance.

02 TimescaleDB

To ensure comparable results, we chose TimescaleDB version 2.10.1. In order to obtain better performance, TimescaleDB needs to set different Chunk parameters for different scenarios. The parameter settings in different scenarios are shown in the following table.

scene one

scene two

scene three

scene four

scene five

Number of devices

100

4000

100,000

1,000,000

10,000,000

Number of Chunks

12

12

12

12

12

Chunk duration

2.58 days

8 hours

15 marks

15 seconds

15 seconds

Number of records in the Chunk

2,009,550

10,372,680

8,103,667

1,350,610

13,506,045

Regarding the settings of the above parameters, we fully refer to the configuration parameter settings recommended in the [TimescaleDB vs. InfluxDB] comparison report to ensure that the write performance indicators can be maximized. TimescaleDB vs. InfluxDB: Purpose Built Differently for Time-Series Data: https://www.timescale.com/blog/timescaledb-vs-influxdb-for-time-series-data-timescale-influx-sql-nosql-36489299877/

03 InfluxDB

For InfluxDB, this report uses version 1.8.10. The latest 2.x version of InfluxDB is not used here because TSBS has not adapted it, and the selected version 1.8.10 is the latest version that InfluxDB can run on the TSBS framework. In terms of configuration, still use the method recommended in the [TimescaleDB vs. InfluxDB] comparison report to configure it, configure the buffer to 80GB, so that the 1000W device can write smoothly, and enable Time Series Index (TSI) at the same time. Configure the system to start data compression 30s after the system inserts data:

 

2. Test steps

1. Hardware preparation

In order to be very close to the environment reported in the [TimescaleDB vs. InfluxDB] comparison, we use the r4.8xlarge type instance provided by Amazon AWS's EC2 as the basic operating platform, including an environment consisting of two nodes, one server and one client. The client and server hardware configurations are exactly the same, and the client and server use a 10 Gbps network connection. The configuration profile is as follows:

CPU

Memory

Disk

server

Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz 32vCPU

244GiB

800G SSD, 3000 IOPS. The throughput limit is 125 MiB/Sec

client

Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz 32vCPU

244GiB

800G SSD, 3000 IOPS. The throughput limit is 125 MiB/Sec

2. Server environment preparation

In order to run the test script, the server OS needs to be a system above ubuntu20.04. The server system information of AWS EC2 is as follows:

  1. OS:Linux tv5931 5.15.0-1028-aws #32~20.04.1-Ubuntu SMP Mon Jan 9 18:02:08 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
  2. Gcc: gcc version 9.4.0 (Ubuntu 9.4.0-1ubuntu1~20.04)
  3. Basic environment, the version information is: Go1.16.9 , python3.8 , pip20.0.2 (no manual installation is required, the test script will be installed automatically)
  4. Compilation dependencies: gcc, cmake, build-essential, git, libssl-dev (no need to install manually, the test script will be installed automatically)

In addition, the following two configurations must be done:

  1. Client and server configure ssh access without secrets, so that the script does not expose passwords, please refer to the secret-free configuration document: https://blog.csdn.net/qq_38154295/article/details/121582534.
  2. Make sure all ports between client and server are open.

3. Get the test script

In order to facilitate repeated testing and hide details such as cumbersome downloading, installation, configuration, startup, and summary results, the entire TSBS testing process is encapsulated into a test script. To repeat this test report, you need to download the test script first, and the script temporarily supports the ubuntu20.04 system. The following operations require root privileges. On the client machine, enter the test directory to pull the code, and enter the /usr/local/src/ directory by default:

 

You need to modify the IP addresses of the server and client in the configuration file test.ini (you can configure the private network address of AWS here) and hostname. If the server is not configured to be password-free, you also need to configure the root password on the server. Since this test is in the IoT scenario, modify the caseType to iot. ​ 

4. One-click execution of comparison test

Execute the following command:

 

The test script will automatically install TDengine, InfluxDB, TimescaleDB and other software, and automatically run various comparison test items. Under the current hardware configuration, it takes about three days for the entire test to run. After the test is over, a comparison test report in CSV format will be automatically generated and stored in the /data2 directory of the client, under the folders corresponding to the prefixes of load and query.

3. Conclusion

After reading, you must have a deeper understanding of TDengine's data modeling, three major database test versions and configurations, and how to use test scripts for one-click reproduction. If anyone wants to verify the test results of the three major databases in the IoT scenario, please follow the above steps to verify the test results. If you have any questions, please contact us in time. Now you can also add small T vx: tdengine1, apply to join the TDengine user exchange group, and chat with more like-minded developers about technology and actual combat.

Guess you like

Origin blog.csdn.net/taos_data/article/details/131681110