How does the query performance of TDengine compare with that of old-fashioned time-series databases? come and see

In the previous article "Writing Performance in IoT Scenario: TDengine=16.2 x InfluxDB" ​​​​​​​​​​, based on the TSBS time series database (Time Series Database) performance benchmark test report in the IoT scenario, we tested the three major databases The writing performance has been interpreted, which shows the many writing advantages of TDengine more intuitively. This article will focus on query performance and bring some help to friends who are struggling with data analysis pain points in IoT scenarios.

In the query performance evaluation part, we use Scenario 1 (contains only 4 days of data, this modification is consistent with the requirements in [TimescaleDB vs. InfluxDB]) and Scenario 2 as benchmark data sets. For specific characteristics of basic data sets, please refer to "One Key Get the test script to easily verify the TSBS test report under the TDengine 3.0 IoT scenario ".

Before the query performance evaluation, in order to ensure that the two databases can give full play to the query performance, for TimescaleDB, we adopt the recommended configuration in [TimescaleDB vs. InfluxDB] and set it to 8 Chunks to ensure that it can give full play to the query performance; for lInfluxDB, we Enable InfluxDB's TSI (time series index). In the entire query comparison, the number of virtual nodes (vnodes) of the TDengine database is kept at 6 by default (1 is configured when scale=100), and other database parameters are configured as default values.

TimescaleDB vs. InfluxDB: Purpose Built Differently for Time-Series Data:https://www.timescale.com/blog/TimescaleDB-vs-influxdb-for-time-series-data-timescale-influx-sql-nosql-36489299877/

4,000 devices × 10 metrics query performance comparison

Since the response time of most types of single queries is too long, in order to more accurately measure the relatively stable response time of each query scenario, we increased the number of single query operations to 2,000 times (Scenario 1) and 500 times according to the number of trucks (Scenario 2), and then use TSBS to automatically count and output the results. The final result is the arithmetic mean of multiple queries, and the number of concurrent client Workers is 4. The following table shows the query performance comparison results of Scenario 2 (4,000 devices).

query type TDengine InfluxDB InfluxDB/TDengine TimescaleDB TimescaleDB/TDengine
last-loc 11.52 562.86 4885.94% 11.77 102.17%
low-fuel 30.72 635 2067.06% 416.75 1356.61%
high-load 10.74 861.13 8017.97% 11.62 108.19%
stationary-trucks 23.9 3156.65 13207.74% 195.46 817.82%
long-driving-sessions 59.44 374.98 630.85% 2938.54 4943.71%
long-daily-sessions 218.97 1439.19 657.25% 19080.95 8713.96%
avg-vs-projected-fuel-consumption 3111.18 40842.05 1312.75% 37127.24 1193.35%
avg-daily-driving-duration 4402.15 43588.02 990.15% 73781.97 1676.04%
avg-daily-driving-session 4034.09 84494.79 2094.52% 80765.04 2002.06%
avg-load 1295.97 552493.78 42631.68% 30452.26 2349.77%
daily-activity 2314.64 15248.66 658.79% 79242.14 3423.52%
breakdown-frequency 5416.3 288804.93 5332.14% 70205.29 1296.19%

Next, we will make a certain analysis and description of each query result:

Note : Query 1=daily-activity; Query 2=avg-daily-driving-session; Query 3=avg-daily-driving-duration; Query 4=avg-vs-projected-fuel-consumption

4000 devices query response time (the smaller the value, the better) 

 In the group selection query, TDengine adopts the design method of one table and one device (truck), and uses the cache mode last_row function to query the latest data. From the results, the query response time of TDengine is better than that of InfluxDB and TimescaleDB.

4000 devices Aggregates query response time (the smaller the value, the better)

In the query of complex grouping and aggregation, we see that the query performance of TDengine has a great advantage over TimescaleDB and InfluxDB; while in the query process of time window aggregation, for large-scale data sets, the query performance of TimescaleDB is not good—— --long-driving-sessions and long-daily-sessions both performed poorly. The query performance of TDengine in stationary-trucks is 132 times that of InfluxDB and 8 times that of TimescaleDB; in long-daily-sessions it is 87 times that of TimescaleDB and 6.5 times that of InfluxDB.

4000 devices Double rollups query response time (the smaller the value, the better)

4000 devices query response time (the smaller the value, the better)

In complex mixed queries, TDengine shows a huge performance advantage, measured by query response time, in avg-load and breakdown-frequency queries, TDengine performance is 426 times and 53 times that of InfluxDB ; compared with TimescaleDB, in daily TDengine outperforms by a factor of 34 in -activity queries and by a factor of 23 in avg-load queries.

Resource overhead comparison

Due to the extremely short duration of some queries, it is not possible to fully see the IO/CPU/network conditions of the server during the query process. To this end, we take the daily-activity query as an example for Scenario 2, execute 50 queries, record and compare the server CPU, memory, and network overheads of the three software systems during the entire process of query execution.

Server CPU overhead

Server CPU overhead during query

As can be seen from the figure above, the CPU usage of the three systems is relatively stable throughout the entire query process. TDengine takes up about 70% of the overall CPU during the query process, TimescaleDB has the lowest instantaneous CPU usage during the query process, about 22%, and InfluxDB has the largest CPU usage during the stable phase, about 98% (there are many instantaneous 100%). From the perspective of overall CPU overhead, although TimescaleDB has the lowest instantaneous CPU overhead, it takes the longest time to complete the query, so the overall CPU resource consumption is the most; InfluxDB basically uses 100% of the entire CPU, and the duration is three times that of TDengine, followed by overhead . The time for TDengine to complete all queries is only 1/30 of that of TimescaleDB, and the overall CPU overhead is the lowest.

Server Memory Status

Server memory status during query

As shown in the figure above, TDengine memory maintained a relatively stable state during the entire query process, with an average usage of about 12GB; the memory usage of TimescaleDB and InfluxDB remained stable throughout the query process, with an average of about 10GB; among them, TimescaleDB’s memory usage for buffer It is used more with cache.

server network bandwidth

Network usage during the query process

The figure above shows the upstream and downstream network bandwidth conditions of the servers of the three major systems during the query process. The load conditions are basically similar to the CPU conditions——TDengine has the highest network bandwidth overhead because all queries are completed in the shortest time. Query results are returned to the client; InfluxDB and TimescaleDB have approximately the same network bandwidth.

100 devices × 10 metrics query performance comparison

For scenario 1 (100 devices x 10 metrics), the comparison results of 15 queries of TSBS are as follows:

query type TDengine InfluxDB InfluxDB/TDengine TimescaleDB TimescaleDB/TDengine
last-loc 1.03 14.94 1450.49% 1.35 131.07%
low-fuel 4.61 17.45 378.52% 6.74 146.20%
high-load 1.03 18.33 1779.61% 1.31 127.18%
stationary-trucks 3.59 69.1 1924.79% 4.02 111.98%
long-driving-sessions 5.4 13 240.74% 61.87 1145.74%
long-daily-sessions 13.88 42.91 309.15% 228.38 1645.39%
avg-vs-projected-fuel-consumption 267.03 1033.72 387.12% 830.79 311.12%
avg-daily-driving-duration 278.62 942.47 338.26% 1049.07 376.52%
avg-daily-driving-session 166.49 1707.27 1025.45% 1066.69 640.69%
avg-load 102.31 15956.73 15596.45% 487.39 476.39%
daily-activity 146.5 510.3 348.33% 1245.05 849.86%
breakdown-frequency 413.82 6953.83 1680.40% 955.2 230.82%

As shown in the above table, from the comparison of queries on smaller-scale datasets (Scenario 1), we can see that TDengine also shows excellent performance on the whole, which is better than TimescaleDB and InfluxDB in all query statements, and some queries The performance is 16 times higher than that of TimescaleDB and 155 times higher than that of InfluxDB .

write at the end

Based on the above, it can be concluded that in terms of queries as a whole, among the 15 different types of queries in Scenario 1 (only including 4 days of data) and Scenario 2, the average query response time of TDengine is better than that of InfluxDB and TimescaleDB . The advantages are more obvious on complex queries, while having the smallest computational resource overhead . Compared with InfluxDB, the query performance of TDengine in Scenario 1 is 2.4 to 155.9 times, and the query performance of TDengine in Scenario 2 is 6.3 to 426.3 times; compared to TimescaleDB, the query performance of TDengine in Scenario 1 is 1.1 to 16.4 times, and in Scenario 2 The query performance of TDengine is 1.02 ~ 87 times.

Similarly, the efficient query performance of TDengine 3.0 has also been verified in enterprise practice. In the article "China Mobile Internet of Things Project, Application in TDengine 3.0 ", it can be seen that from 2.0 to 3.0, TDengine read data performance It is still very prominent. Faced with the most commonly used single-device single-day query in China Mobile IoT scenarios, 3.0 can return results within 0.1s. If you are also facing data processing problems or want to upgrade the data structure, welcome to add a small T vx: tdengine1 , join the TDengine user exchange group, and overcome difficulties with more like-minded developers.

Guess you like

Origin blog.csdn.net/taos_data/article/details/131779494