Focus on TimescaleDB VS TDengine performance comparison report, comprehensive analysis of write and query in IoT scenarios

Based on the standard data set of the third-party benchmark performance testing platform TSBS (Time Series Benchmark Suite), the TDengine  team preset five scales of truck fleet basic data sets in the IoT scenario of TSBS. ( Time Series DatabaseTDengine 3.0  and TimescaleDB 2.10.1 were compared and analyzed. This article will summarize and analyze test results for you from several dimensions such as writing, storage, query, and resource overhead.

In order to allow TimescaleDB to achieve better performance and ensure that the results are comparable, TimescaleDB needs to set different Chunk parameters for different scenarios. The parameter settings in different scenarios are shown in the following table:

scene one scene two scene three scene four scene five
Number of devices 100 4000 100,000 1,000,000 10,000,000
Number of Chunks 12 12 12 12 12
Chunk duration 2.58 days 8 hours 15 marks 15 seconds 15 seconds
Number of records in the Chunk 2,009,550 10,372,680 8,103,667 1,350,610 13,506,045

The settings of the above parameters fully refer to the configuration parameter settings recommended in the TimescaleDB vs. InfluxDB comparison report below to ensure the optimization of write performance indicators.

TimescaleDB vs. InfluxDB 测试报告:TimescaleDB vs. InfluxDB: Purpose-built for time-series data

For details about system configuration, how to reproduce test results with one click, and detailed test data introduction, you can refer to the article "Get Test Script with One Click, Easily Verify TSBS Test Report in TDengine 3.0 IoT Scenario" , and this article will not go into details .

write performance

Overall, TDengine's write performance is better than TimescaleDB's in five preset truck fleet scenarios. Compared with TimescaleDB,  the leading scenario of TDengine writing speed is 3.3 times (Scenario 1), at least 1.04 times (Scenario 4), and for Scenario 4, if the number of records for each collection point is increased from 18 to 576 records, and when vgroups=24, the writing speed of TDengine  is 7 times that of TimescaleDB. In addition, TDengine  consumes the lowest CPU resource and disk IO overhead during the writing process.

Write Performance Comparison in Different Scenarios

Comparison of write performance in different scenarios (metrics/sec. The larger the value, the better)

From the figure above, we can see that in all five scenarios, the writing performance of TDengine  completely surpasses that of TimescaleDB. In Scenario 2,  the write performance of TDengine  is up to 3.3 times that of TimescaleDB , and in Scenario 5 with the smallest gap, it is also 1.04 times that of TimescaleDB.

Write process resource consumption comparison

The data writing speed alone cannot fully reflect the overall performance of the three systems writing data in different scenarios. To this end, we use 1,000,000 devices × 10 metrics (Scenario 4) as a data template to check the overall load status of the server and client (including the client and the server) during the data writing process, and use this to compare the two systems when writing resource usage of the server/client node during the entry process. The resource occupation here mainly includes server-side CPU overhead/disk IO overhead and client CPU overhead.

Server CPU overhead

The figure below shows the server-side CPU load during the writing process of Scenario 4. It can be seen that the two major systems continue to use the resources of the server to perform corresponding processing work after returning the writing completion message to the client. TimescaleDB reported that the writing to the client was completed in 7x seconds, but its server still invoked CPU resources for data compression and collation. Of course, the CPU load brought by the whole work is relatively not high, only the peak CPU overhead About half, but its duration is quite long, nearly 4 times the net write time.

Server CPU overhead during writes

Comparing the two systems, TDengine  has the least CPU demand on the server, and only uses about 17% of the server CPU resources at the peak. It can be seen that TDengine's  unique data model is not only reflected in the time series data writing performance, but also in the overall resource overhead.

Disk  I/O  Comparison

The figure below shows the disk writing status on the server side during the data writing process of 1,000,000 devices × 10 metrics (Scenario 4). It can be seen that, combined with the server-side CPU overhead performance, the IO action and the CPU are in a synchronous active state.

Server IO overhead during writing

In the case of writing data sets of the same scale, TDengine  occupies much less disk writing capacity during the writing process than TimescaleDB, and only occupies part of the disk writing capacity (125MiB/Sec. 3000IOPS). As can be seen from the figure above, the disk IO bottleneck does exist during the data writing process, and TimescaleDB's demand for disk writing capabilities during the writing process far exceeds that of TDengine .

Client CPU overhead

Client CPU overhead during writes

As can be seen from the above figure, TDengine's CPU demand on the client is greater than that of TimescaleDB. TimescaleDB puts more pressure on the client, with a CPU peak of about 20%; TDengine  has the largest overhead on the client, peaking at 70% in an instant, and then falling back quickly. Its overhead on the client is twice as much as that of TimescaleDB. However, considering the resource overhead of the server and the client, the TDengine  write duration is shorter, and TDengine still has an advantage in terms of the overall CPU overhead of the system.

query performance

Among the 15 different types of queries in Scenario 1 (only containing 4 days of data) and Scenario 2, the average query response time of TDengine  is better than that of TimescaleDB, and the advantage is more obvious in complex queries, while having the smallest computing resource overhead . Compared with TimeScaleDB, the query performance of TDengine in Scenario 1  is 1.1 to 16.4 times, and the query performance of TDengine  in Scenario 2   is 1.02 to 87 times.

In the query performance evaluation part, we use Scenario 1 and Scenario 2 as benchmark datasets. Before query performance evaluation, for TimescaleDB, we adopted the configuration recommended in the [TimescaleDB vs. InfluxDB] comparison report that appeared above, and set it to 8 Chunks to ensure that it can give full play to its query performance. In the entire query comparison, the number of virtual nodes (vnodes) of the TDengine database is kept at 6 by default (1 is configured when scale=100), and other database parameters are configured as default values.

4,000 devices × 10 metrics query performance comparison

Since the response time of most types of single queries is too long, in order to more accurately measure the relatively stable response time of each query scenario, we increased the number of single query operations to 2,000 times (Scenario 1) and 500 times according to the number of trucks (Scenario 2), and then use TSBS to automatically count and output the results. The final result is the arithmetic mean of multiple queries, and the number of concurrent client Workers is 4. The following table shows the query performance comparison results of Scenario 2 (4,000 devices).

query type TDengine TimescaleDB TimescaleDB/TDengine
last-loc 11.52 11.77 102.17%
low-fuel 30.72 416.75 1356.61%
high-load 10.74 11.62 108.19%
stationary-trucks 23.9 195.46 817.82%
long-driving-sessions 59.44 2938.54 4943.71%
long-daily-sessions 218.97 19080.95 8713.96%
avg-vs-projected-fuel-consumption 3111.18 37127.24 1193.35%
avg-daily-driving-duration 4402.15 73781.97 1676.04%
avg-daily-driving-session 4034.09 80765.04 2002.06%
avg-load 1295.97 30452.26 2349.77%
daily-activity 2314.64 79242.14 3423.52%
breakdown-frequency 5416.3 70205.29 1296.19%

Next, we will make a certain analysis and description of each query result:

Note : Query 1=daily-activity; Query 2=avg-daily-driving-session; Query 3=avg-daily-driving-duration; Query 4=avg-vs-projected-fuel-consumption

4000 devices query response time (the smaller the value, the better)

In the group selection query, TDengine  adopts the design method of one table and one device (truck), and uses the cache mode last_row function to query the latest data. From the results, the query response time of TDengine is better than that of TimescaleDB.

4000 devices Aggregates query response time (the smaller the value, the better)

In complex grouping and aggregation queries, we see that TDengine query performance has a great advantage over TimescaleDB; while in the query process of time window aggregation, TimescaleDB query performance is not good for large-scale data sets——long Both -driving-sessions and long-daily-sessions performed poorly. The query performance of TDengine  in stationary-trucks is 8 times that of TimescaleDB; in long-daily-sessions it is 87 times that of TimescaleDB.

4000 devices Double rollups query response time (the smaller the value, the better)

4000 devices query response time (the smaller the value, the better)

In complex mixed queries,  TDengine  shows a huge performance advantage. Measured by query response time, TDengine is 34 times faster than TimescaleDB in daily-activity queries  , and 23 times faster than TimescaleDB in avg-load queries .

Resource overhead comparison

Due to the extremely short duration of some queries, it is not possible to fully see the IO/CPU/network conditions of the server during the query process. To this end, for scenario 2, we take the daily-activity query as an example, execute 50 queries, record and compare the server CPU, memory, and network overheads of the two major software systems during the entire process of query execution.

Server CPU overhead

Server CPU overhead during query

 As can be seen from the figure above, the CPU usage of the two systems is relatively stable throughout the entire query process. The overall CPU usage of TDengine  is about 70% during the query process, and the instantaneous CPU usage of TimescaleDB is the lowest at about 22% during the query process. From the perspective of overall CPU overhead, although TimescaleDB has the lowest instantaneous CPU overhead, it takes the longest time to complete the query, so the overall CPU resource consumption is the most . The time for TDengine  to complete all queries is only 1/30 of that of TimescaleDB, and the overall CPU overhead is the lowest.

Server Memory Status

Server memory status during query

As shown in the figure above, TDengine  memory maintains a relatively stable state during the entire query process, with an average usage of about 12GB; TimescaleDB memory usage remains stable throughout the query process, with an average of about 10GB. In addition, its impact on buffer and cache Use more.

server network bandwidth

Network usage during the query process

The figure above shows the uplink and downlink network bandwidth conditions of the servers of the two major systems during the query process. The load conditions are basically similar to the CPU conditions——TDengine has the highest network  bandwidth overhead because all queries are completed in the shortest time. The query results are returned to the client.

100 devices × 10 metrics query performance comparison

For scenario 1 (100 devices x 10 metrics), the comparison results of 15 queries of TSBS are as follows:

query type TDengine TimescaleDB TimescaleDB/TDengine
last-loc 1.03 1.35 131.07%
low-fuel 4.61 6.74 146.20%
high-load 1.03 1.31 127.18%
stationary-trucks 3.59 4.02 111.98%
long-driving-sessions 5.4 61.87 1145.74%
long-daily-sessions 13.88 228.38 1645.39%
avg-vs-projected-fuel-consumption 267.03 830.79 311.12%
avg-daily-driving-duration 278.62 1049.07 376.52%
avg-daily-driving-session 166.49 1066.69 640.69%
avg-load 102.31 487.39 476.39%
daily-activity 146.5 1245.05 849.86%
breakdown-frequency 413.82 955.2 230.82%

As shown in the above table, from the comparison of queries on smaller-scale datasets (Scenario 1), we can see that  TDengine  also shows excellent performance on the whole. It is better than TimescaleDB in all query statements, and some query performance exceeds that of TimescaleDB. TimescaleDB 16 times .

disk space usage

After the data of the two major systems are completely dumped, we   compared the disk space usage of TimescaleDB and TDengine in different scenarios.

Disk space occupied (the smaller the value, the better)

As can be seen from the figure above, the data size of TimescaleDB is significantly larger than that  of TDengine in all scenarios , and this gap becomes larger as the data size increases.  Among them, TimescaleDB takes up 11 times more disk space than TDengine in scenarios 4 and 5  .

There is also a small episode during the test. The following table reflects the compression ratio of TimescaleDB. It can be seen that the compression ratio of TimescaleDB is normal in the case of small data scale, but in scenarios 4 and 5 with large data scale, the compression ratio The proportion of disk space occupied in the future has increased by about 3.4 times, which is suspected to be a bug.

Disk space occupied after compression (KB) Disk space occupied before compression (KB) compression ratio
998312 6907312 14%
4246528 36490408 12%
6035528 26290904 23%
16612380 4841552 343%
165769964 48305396 343%

write at the end

It is worth mentioning that the benchmark performance testing platform TSBS used in this performance test is created by Timescale, which shows the fairness and impartiality of the test results. From the TSBS test report in the IoT scenario above, we can conclude that TDengine  is slightly better than TimescaleDB in terms of write performance, query performance, and storage performance, regardless of the server's CPU, IO, or client overhead. According to statistics, TDengine  is much better than TimescaleDB.

In practice, in the new energy power Internet of Things platform project of 85 Information, the database used was TimescaleDB. Later, for various reasons, they chose to use TDengine to upgrade the data structure. For specific information about  this  case, please refer to "Replacement TimescaleDB, TDengine takes over the photovoltaic daily power system with a daily increase of 4 billion data . "

In order to facilitate everyone to verify the test results, this test report supports one-click reproduction of running test scripts. Welcome to test. At the same time, we also welcome everyone to add small T vx: tdengine, join the TDengine user exchange group, and discuss data processing problems with more like-minded developers.

Guess you like

Origin blog.csdn.net/taos_data/article/details/132076402