The performance comparison test report between InfluxDB and TDengine in the IoT scenario is released! Click to view

In order to verify the performance of TDengine 3.0 in the IoT scenario, we preset five truck fleet basic data sets of five sizes for the IoT scenario in the third-party benchmark performance testing platform TSBS (Time Series Benchmark Suite), in the same AWS cloud environment The following is a comparative analysis of TDengine 3.0 and InfluxDB 1.8 (this version is the latest version that InfluxDB can run TSBS framework). In this article, we will summarize and analyze the test results of the two databases from the dimensions of writing, storage, query, and resource overhead, for your reference.

In order to maximize the performance of InfluxDB, we configure InfluxDB using the method recommended in the comparison report [TimescaleDB vs. InfluxDB] below, and configure the buffer to 80GB so that the writing of 1000W devices can proceed smoothly. At the same time, open the Time Series Index ( TSI). Configure the system to start data compression 30s after the system inserts data.

TimescaleDB vs. InfluxDB 测试报告: TimescaleDB vs. InfluxDB: Purpose-built for time-series data

For details about system configuration, how to reproduce test results with one click, and detailed test data introduction, you can refer to the article "Get Test Script with One Click, Easily Verify TSBS Test Report in TDengine 3.0 IoT Scenario" , and this article will not go into details .

write performance

Overall, TDengine's write performance is better than InfluxDB's in the five preset truck fleet scenarios. Compared with InfluxDB, the leading scenario of TDengine's writing speed is 16.2 times (scenario 5), and the minimum is 1.82 times (scenario 3). In addition, TDengine consumes minimal CPU resources and disk IO overhead during writing. Let's take a look at the specific analysis:

Write Performance Comparison in Different Scenarios

Comparison of write performance in different scenarios (metrics/sec. The larger the value, the better)

 It can be seen that in all five scenarios, the writing performance of TDengine completely surpasses that of InfluxDB . Compared with InfluxDB, the writing performance of TDengine in Scenario 5 is 16 times that of InfluxDB, and it is also 1.8 times in Scenario 3 with the smallest gap.

Write process resource consumption comparison

The data writing speed does not fully reflect the overall performance of the three systems writing data in different scenarios. To this end, we take 1,000,000 devices × 10 metrics (Scenario 4) as an example to check the overall load status of the server and client (including the client and server) during the data writing process, and compare TDengine and InfluxDB in writing Resource occupancy of the server/client node during the process. The resource occupancy here mainly includes the CPU overhead/disk IO overhead of the server and the CPU overhead of the client.

  • Server CPU overhead

The figure below shows the CPU load on the server during the writing process of Scenario 4. It can be seen that after TDengine and InfluxDB return the writing completion message to the client, they both continue to use the resources of the server for corresponding processing. Among them, InfluxDB uses quite a lot of CPU resources, and the instantaneous peak even uses all of the CPU resources. Its write load is high, and its duration is much longer than TDengine. Comparing the two systems, TDengine has the least CPU demand on the server, and only uses about 17% of the server CPU resources at the peak. It can be seen that TDengine's unique data model is not only reflected in the performance of time series data writing, but also in the overall resource overhead.

Server CPU overhead during writing

  •  Disk I/O Comparison

The figure below shows the disk writing status on the server side during the data writing process of 1,000,000 devices × 10 metrics (Scenario 4). It can be seen that, combined with the server-side CPU overhead performance, its IO actions and CPU are in an active state synchronously.

Server IO overhead during writing

 In the case of writing the same size of data, TDengine occupies much less disk writing capacity than InfluxDB during the writing process, and only part of the disk writing capacity (125MiB/Sec. 3000IOPS) is occupied during the writing process. As can be seen from the above figure, for the two major databases, the IO bottleneck of the disk during the data writing process does exist. However, InfluxDB consumes all the disk writing capabilities for a long time, which far exceeds TDengine's demand for disk writing capabilities.

  • Client CPU overhead

Client CPU overhead during writes

 As can be seen from the figure above, TDengine's CPU demand on the client is greater than that of InfluxDB. Overall, during the entire writing process, the InfluxDB client load computing resource usage is low, and the pressure on the client is small, because the writing pressure is basically concentrated on the server side, but this mode can easily lead to The server becomes the bottleneck. TDengine has the largest overhead on the client side, peaking at 70% in an instant, and then falling back quickly. However, considering the resource overhead of the server and the client, the TDengine write duration is shorter, and TDengine still has an advantage in terms of the overall CPU overhead of the system.

query performance

In the query performance evaluation part, we use Scenario 1 (contains only 4 days of data, this modification is consistent with the requirements in [TimescaleDB vs. InfluxDB]) and Scenario 2 as benchmark datasets. In the entire query comparison, the number of virtual nodes (vnodes) of the TDengine database is kept at 6 by default (1 is configured when scale=100), and other database parameters are configured as default values.

In general, in terms of queries, among the 15 different types of queries in Scenario 1 (only containing 4 days of data) and Scenario 2, the average query response time of TDengine is better than that of InfluxDB, and the advantage is more obvious in complex queries , with minimal computational resource overhead. Compared with InfluxDB, the query performance of TDengine in Scenario 1 is 2.4 to 155.9 times, and the query performance of TDengine in Scenario 2 is 6.3 to 426.3 times.

4,000 devices × 10 metrics query performance comparison

Since the response time of most types of single queries is too long, in order to more accurately measure the relatively stable response time of each query scenario, we increased the number of single query operations to 2,000 times (Scenario 1) and 500 times according to the number of trucks (Scenario 2), and then use TSBS to automatically count and output the results. The final result is the arithmetic mean of multiple queries, and the number of concurrent client Workers is 4. First, we provide the query performance comparison results of Scenario 2 (4,000 devices).

query type TDengine InfluxDB InfluxDB/TDengine
last-loc 11.52 562.86 4885.94%
low-fuel 30.72 635 2067.06%
high-load 10.74 861.13 8017.97%
stationary-trucks 23.9 3156.65 13207.74%
long-driving-sessions 59.44 374.98 630.85%
long-daily-sessions 218.97 1439.19 657.25%
avg-vs-projected-fuel-consumption 3111.18 40842.05 1312.75%
avg-daily-driving-duration 4402.15 43588.02 990.15%
avg-daily-driving-session 4034.09 84494.79 2094.52%
avg-load 1295.97 552493.78 42631.68%
daily-activity 2314.64 15248.66 658.79%
breakdown-frequency 5416.3 288804.93 5332.14%

Next, we will make a certain analysis and description of each query result:

4000 devices query response time (the smaller the value, the better)

 In the query of group selection, TDengine adopts the design method of one table and one device (truck), and uses the cache mode last_row function to query the latest data, which is better than InfluxDB in terms of query response time.

4000 devices Aggregates query response time (the smaller the value, the better)

 In complex grouping and aggregation queries, we see that TDengine query performance has a great advantage over InfluxDB. Among them, the query performance of TDengine in stationary-trucks is 132 times that of InfluxDB, and in long-daily-sessions is 6.5 times that of InfluxDB.

4000 devices Double rollups query response time (the smaller the value, the better)

4000 devices query response time (the smaller the value, the better)

 In complex mixed queries, TDengine also shows a huge performance advantage. From the perspective of query response time, the query performance of avg-load and breakdown-frequency is 426 times and 53 times that of InfluxDB.

Resource overhead comparison

Since the duration of some queries is very short, the IO/CPU/network status of the server during the query process cannot be fully seen. For this reason, we use the daily-activity query as an example to execute 50 queries for scenario 2 and record the entire process. The overheads of server CPU, memory, and network in the query execution of the three software systems are compared.

  • Server CPU overhead

Server CPU overhead during query

As can be seen from the figure above, the CPU usage of the two systems is relatively stable throughout the entire query process. TDengine occupies about 70% of the overall CPU during the query process, and InfluxDB’s stable CPU usage is the largest, about 98% (there are more instantaneous 100%). From the perspective of overall CPU overhead, InfluxDB basically uses 100% of all CPU, and the duration is three times that of TDengine, followed by overhead; TDengine completes all queries in a shorter time and has the lowest overall CPU overhead.

  • Server Memory Status

Server memory status during query

As shown in the figure above, TDengine memory remains relatively stable during the entire query process, with an average usage of about 12GB; InfluxDB memory usage remains stable throughout the query process, with an average of about 10GB.

  • server network bandwidth

 Network usage during the query process

The figure above shows the uplink and downlink network bandwidth on the server side during the query process of the two systems. The load status is basically similar to the CPU status. Among them, TDengine has the highest network bandwidth overhead, because all queries are completed in the shortest time, and the query results need to be returned to the client.

100 devices × 10 metrics query performance comparison

For scenario 1 (100 devices x 10 metrics), the comparison results of 15 queries of TSBS are as follows:

query type TDengine InfluxDB InfluxDB/TDengine
last-loc 1.03 14.94 1450.49%
low-fuel 4.61 17.45 378.52%
high-load 1.03 18.33 1779.61%
stationary-trucks 3.59 69.1 1924.79%
long-driving-sessions 5.4 13 240.74%
long-daily-sessions 13.88 42.91 309.15%
avg-vs-projected-fuel-consumption 267.03 1033.72 387.12%
avg-daily-driving-duration 278.62 942.47 338.26%
avg-daily-driving-session 166.49 1707.27 1025.45%
avg-load 102.31 15956.73 15596.45%
daily-activity 146.5 510.3 348.33%
breakdown-frequency 413.82 6953.83 1680.40%

As shown in the above table, from the comparison of queries on smaller-scale datasets (Scenario 1), we can see that TDengine also shows excellent performance on the whole. It is better than InfluxDB in all query statements, and some query performance Even more than 155 times that of InfluxDB.

disk space usage

After the data of the two major systems are completely placed on the disk, we also compared the disk space usage of TDengine and InfluxDB in different scenarios.

Disk space occupied (the smaller the value, the better)

As can be seen from the figure above, in the first three scenarios, the data file size after InfluxDB is placed on disk is relatively close to that of TDengine; however, in scenarios 4 and 5, the disk space occupied by files after InfluxDB is placed on disk is 2 times that of TDengine. more than double.

write at the end

Based on this test report, we can conclude that in all IoT data scenarios, the writing performance and query performance of TDengine are all better than InfluxDB. During the entire writing process, TDengine is superior to InfluxDB in terms of CPU and IO of the server on the premise of providing higher writing and query capabilities. Even with the client's overhead statistics, TDengine is below InfluxDB in terms of write overhead.

From a practical point of view, the results of this report also explain why many companies and developers choose TDengine in the selection survey of Time Series Database (Time Series Database) InfluxDB and TDengine. In "From InfluxDB to TDengine, why do we Will make this choice" in this article, it expounds the author's specific thinking in the process of project selection research.

In order to facilitate everyone to verify the test results, this test report supports one-click reproduction of running test scripts. You are welcome to interact and communicate in the comment area . At the same time, you can also add a small T vx: tdengine1, join the TDengine user exchange group, and discuss data processing problems with more like-minded developers.

Guess you like

Origin blog.csdn.net/taos_data/article/details/131845442