Comparison between TimescaleDB and InfluxDB

Time series database

As the name implies, time series databases are designed to store data that changes over time. This can be any type of data collected over time. He may be an indicator collected from some systems, in fact, all trend systems are examples of time series data.

How do I choose between different types of time series databases?

In this article, we mainly discuss the difference between TimescaleDB and InfluxDB time series databases.

InfluxDB

InfluxDB is created by InfluxData. It is a custom, open source, NoSQL time series database written in Go language. The data store provides a SQL-like query language called InfluxQL, which allows developers to easily integrate it into their applications. It also has a new custom query language called Flux , which can make it easier to perform certain tasks, but there is always a learning curve when using a custom query language.

The following is an example of a Flux query:

from(db:"testing")
|> range(start:-1h)
|> filter(fn: (r) => r._measurement == "cpu")
|> exponentialMovingAverage()

In this database, each measurement result contains a time stamp, as well as a set of tags and a set of fields associated with it. This field represents the actual measurement reading, and the label represents the original data describing the measurement. Field data types are limited to float, int, string and boolean, and cannot be changed without rewriting the data. The tag values ​​are indexed. They are represented as strings and cannot be updated.

Getting started with InfluxDB is very easy, because you don't have to worry about creating prototypes or indexes. However, it is very rigid and restrictive, and cannot create additional indexes, indexes on continuous fields, update original data afterwards, and enforce data verification.

He is not without prototypes. It will automatically create a basic model based on the input data.

InfluxDB must implement multiple fault tolerance tools from scratch, such as multiple copies, high availability, and backup / restore, and be responsible for the reliability of the disk. We are limited to using these tools, and many of these features (such as HA) are only available in the Enterprise Edition.

The InfluxDB backup tool can perform full or incremental backups and can be used for point-in-time recovery.

InfluxDB also provides better disk compression than PostgreSQL and TimescaleDB.

TimescaleDB

TimescaleDB is an open source time series database that has been optimized for rapid extraction and complex queries that support comprehensive SQL. It is based on PostgreSQL and provides the best NoSQL and relational world for time series data.

The following is an example of a TimescaleDB query:

SELECT time,
exponential_moving_average(value, 0.5) OVER (ORDER BY time)
FROM testing
WHERE measurement = cpu and time > now() - '1 hour';

As a PostgreSQL extension, TimescaleDB is a relational database. This allows new users to have a shorter learning curve, and can inherit tools such as pg_dump or pg_backup for backup, as well as high-availability tools, which is an advantage over other time series databases. It also supports streaming replication as the primary replication method, which can be used in high availability settings. In terms of fault escaping and backup, you can use an external system such as ClusterControl to perform it automatically.

In TimescaleDB, each time series measurement value is recorded in its own row, the time field is followed by any number of other types of fields, including floating point numbers, integers, strings, Boolean values, arrays, JSON, geospatial dimensions, dates / Time / timestamp, currency, binary data, etc.

You can create an index on any field (standard index) or multiple fields (in accordance with the index) or on expressions such as functions, and even limit the index to the row itself (partial index). Any of these fields can be used as a foreign key to the auxiliary table, and then the auxiliary table can store other original data.

In this way, you need to select a prototype and determine which indexes are needed by the system.

performance

If we talk about performance, then you can check out the TimescaleDB blog. There, you can compare the performance of the two databases in detail through graphs and indicators. Now let's take a look at some of the most important information in this blog.

You can create indexes on any field (standard indexes) or multiple fields (composite indexes), or on expressions like functions, or even limit an index to a subset of rows (partial index). Any of these fields can be used as a foreign key to secondary tables, which can then store additional metadata.

In this way, you need to choose a schema, and decide which indexes you’ll need for your system.

Insert performance

img

  • For workloads with very low cardinality (for example, 100 devices), InfluxDB performs better than TimescaleDB.
  • As the cardinality increases, InfluxDB's insertion performance drops faster than TimescaleDB.
  • For medium to high cardinality workloads (for example, 100 devices sending 10 metrics), TimescaleDB performs better than InfluxDB.

img

Read performance

img

  • For simple queries, the results vary widely: in some cases, one database is significantly better than another database, while other databases depend on the cardinality of the data set. The difference here is usually in the range of one-digit to two-digit milliseconds.
  • For complex queries, TimescaleDB's performance is much better than InfluxDB, and supports a wider range of query types. The difference here is usually between a few seconds to tens of seconds.
  • With this in mind, the best way to test correctly is to benchmark using queries that you plan to execute.

stability

  • InfluxDB has stability and performance issues when the cardinality is high (100K +).
to sum up

If your data is suitable for the InfluxDB data model, and you do n’t want to change in the future, then you should consider using InfluxDB, because the model is easier to get started, like most databases using column-oriented methods, provides better than PostgreSQL and TimescaleDB Disk compression.

However, the relational model is more versatile than the InfluxDB model, and provides more functionality, flexibility, and control. This is especially important as the application develops. When planning your system, you should consider current and future needs.

In this blog, we can see a short comparison between TimescaleDB and InfluxDB, and it can be said that TimescaleDB as a PostgreSQL extension looks mature and feature-rich because it inherits many things from PostgreSQL. But you can make your own decisions based on the pros and cons mentioned earlier in this blog, and make sure to benchmark your workload. Good luck in this new world of time series databases!

Published 7 original articles · won 3 · views 314

Guess you like

Origin blog.csdn.net/sl285720967/article/details/103135109