Storage and Computation of Time Series Data - Analysis of Open Source Time Series Database (1)

Open source time series database




 

  The picture shows the ranking of time series databases on db-engines in June 2017. I will choose open source and distributed time series databases for detailed analysis. In the top ten rankings, RRD is an old stand-alone storage engine, and the bottom layer of Graphite is Whisper, which can be considered as an optimized and more powerful RRD database. kdb+, eXtremeDB and Axibase are not open source and do not parse. InfluxDB open source version and the bottom layer of Prometheus are based on the self-developed stand-alone storage engine of levelDB. The commercial version of InfluxDB supports distributed, and the support plan of distributed storage engine is also planned on the roadmap of Prometheus.
  Taking a comprehensive look, I will choose OpenTSDB, KairosDB and InfluxDB for a detailed analysis. I am familiar with OpenTSDB and have studied its source code, so I will describe OpenTSDB in great detail, but I don't know much about other time series databases. If there is any wrong description, please correct me.

OpenTSDB

  OpenTSDB is a distributed and scalable time-series database that supports millions of writes per second, supports data storage with millisecond precision, and can store data permanently without reducing precision. Its superior writing performance and storage capabilities benefit from its underlying dependence on HBase. HBase adopts the LSM tree structure storage engine and distributed architecture to provide superior writing capabilities. The underlying fully dependent HDFS provides Superior storage capacity. OpenTSDB is deeply dependent on HBase, and has made many clever optimizations according to the characteristics of the underlying storage structure of HBase. Regarding the optimization of storage, I have a detailed analysis in this article . In the latest release, support for BigTable and Cassandra has also been extended.

Architecture



  The figure shows the architecture of OpenTSDB, and the core components are TSD and HBase. TSD is a set of stateless nodes that can be extended arbitrarily, with no dependencies other than HBase. TSD exposes HTTP and Telnet interfaces to support data writing and querying. The deployment and operation and maintenance of TSD itself are very simple, thanks to its stateless design, but the operation and maintenance of HBase is not so simple, which is one of the reasons for extending support for BigTable and Cassandra.

data model

  OpenTSDB is modeled by metrics, a data point will contain the following components:

  • metric: The name of the time series data metric, such as sys.cpu.user, stock.quote, etc.
  • timestamp: A Unix timestamp in seconds or milliseconds, representing the specific time at this point in time.
  • tags: One or more tags, that is, different dimensions that describe the subject. Tag consists of TagKey and TagValue, TagKey is the dimension, and TagValue is the value of the dimension.
  • value: The value of the indicator, currently only numeric values ​​are supported.

storage model

  For the optimization idea of ​​OpenTSDB underlying storage, you can refer to this article . A brief summary is the following key optimization ideas:

  • Data optimization: Assign UniqueID to Metric, TagKey, and TagValue, establish an index between the original value and UniqueID, and store the UniqueID corresponding to Metric, TagKey, and TagValue in the data table instead of the original value.
  • Optimization of the number of KeyValues: If you have a good understanding of the underlying storage model of HBase, you know that each column in a row corresponds to a KeyValue when stored, reducing the number of rows and columns, which can greatly save storage space and improve query efficiency.
  • Query optimization: Use HBase's Server Side Filter to optimize multi-dimensional queries, and use Pre-aggregation and Rollup to optimize GroupBy and precision-reduced queries.

UIDTable

  Next, let's take a look at the design of several key table structures of OpenTSDB on HBase. The first is the tsdb-uid table. The structure is as follows:



 

Metric、TagKey和TagValue都会被分配一个相同的固定长度的UniqueID,默认是三个字节。tsdb-uid表使用两个ColumnFamily,存储了Metric、TagKey和TagValue与UniqueID的映射和反向映射,总共是6个Map的数据。

从图中的例子可以解读出:

  • TagKey为'host',对应的UniqueID为'001'
  • TagValue为'static',对应的UniqueId为'001'
  • Metric为'proc.loadavg.1m',对应的UniqueID为'052'

  为每一个Metric、TagKey和TagValue都分配UniqueID的好处,一是大大降低了存储空间和传输数据量,每个值都只需要3个字节就可以表示,这个压缩率是很客观的;二是采用固定长度的字节,可以很方便的从row key中解析出所需要的值,并且能够大大减少Java堆内的内存占用(bytes相比String能节省很多的内存占用),降低GC的压力。
  不过采用固定字节的UID编码后,对于UID的个数是有上限要求的,3个字节最多只允许有16777216个不同的值,不过在大部分场景下都是够用的。当然这个长度是可以调整的,不过不支持动态更改。

DataTable

第二张关键的表是数据表,结构如下:


 

  该表中,同一个小时内的数据会存储在同一行,行中的每一列代表一个数据点。如果是秒级精度,那一行最多会有3600个点,如果是毫秒级精度,那一行最多会有3600000个点。
  这张表设计的精妙之处在于row key和qualifier(列名)的设计,以及对整行数据的compaction策略。row key格式为:

<metric><timestamp><tagk1><tagv1><tagk2>tagv2>...<tagkn><tagvn>

  其中metric、tagk和tagv都是用uid来表示,由于uid固定字节长度的特性,所以在解析row key的时候,可以很方便的通过字节偏移来提取对应的值。Qualifier的取值为数据点的时间戳在这个小时的时间偏差,例如如果你是秒级精度数据,第30秒的数据对应的时间偏差就是30,所以列名取值就是30。列名采用时间偏差值的好处,主要在于能大大节省存储空间,秒级精度的数据只要占用2个字节,毫秒精度的数据只要占用4个字节,而若存储完整时间戳则要6个字节。整行数据写入后,OpenTSDB还会采取compaction的策略,将一行内的所有列合并成一列,这样做的主要目的是减少KeyValue数目。

查询优化

  HBase仅提供简单的查询操作,包括单行查询和范围查询。单行查询必须提供完整的RowKey,范围查询必须提供RowKey的范围,扫描获得该范围下的所有数据。通常来说,单行查询的速度是很快的,而范围查询则是取决于扫描范围的大小,扫描个几千几万行问题不大,但是若扫描个十万上百万行,那读取的延迟就会高很多。
  OpenTSDB提供丰富的查询功能,支持任意TagKey上的过滤,支持GroupBy以及降精度。TagKey的过滤属于查询的一部分,GroupBy和降精度属于对查询后的结果的计算部分。在查询条件中,主要的参数会包括:metric名称、tag key过滤条件以及时间范围。上面一章中指出,数据表的rowkey的格式为:<metric><timestamp><tagk1><tagv1><tagk2>tagv2>...<tagkn><tagvn>,从查询的参数上可以看到,metric名称和时间范围确定的话,我们至少能确定row key的一个扫描范围。但是这个扫描范围,会把包含相同metric名称和时间范围内的所有的tag key的组合全部查询出来,如果你的tag key的组合有很多,那你的扫描范围是不可控的,可能会很大,这样查询的效率基本是不能接受的。

我们具体看一下OpenTSDB对查询的优化措施:

  • Server side filter
    HBase提供了丰富和可扩展的filter,filter的工作原理是在server端扫描得到数据后,先经过filter的过滤后再将结果返回给客户端。Server side filter的优化策略无法减少扫描的数据量,但是可以大大减少传输的数据量。OpenTSDB会将某些条件的tag key filter转换为底层HBase的server side filter,不过该优化带来的效果有限,因为影响查询最关键的因素还是底层范围扫描的效率而不是传输的效率。

  • 减少范围查询内扫描的数据量
    要想真正提高查询效率,还是得从根本上减少范围扫描的数据量。注意这里不是减小查询的范围,而是减少该范围内扫描的数据量。这里用到了HBase一个很关键的filter,即FuzzyRowFilter,FuzzyRowFilter能够根据指定的条件,在执行范围扫描时,动态的跳过一定数据量。但不是所有OpenTSDB提供的查询条件都能够应用该优化,需要符合一定的条件,具体要符合哪些条件就不在这里说明了,有兴趣的可以去了解下FuzzyRowFilter的原理。

  • 范围查询优化成单行查询
    这个优化相比上一条,更加的极端。优化思路非常好理解,如果我能够知道要查询的所有数据对应的row key,那就不需要范围扫描了,而是单行查询就行了。这里也不是所有OpenTSDB提供的查询条件都能够应用该优化,同样需要符合一定的条件。单行查询要求给定确定的row key,而数据表中row key的组成部分包括metric名称、timestamp以及tags,metric名称和timestamp是能够确定的,如果tags也能够确定,那我们就能拼出完整的row key。所以很简单,如果要能够应用此优化,你必须提供所有tag key对应的tag value才行。

  以上就是OpenTSDB对HBase查询的一些优化措施,但是除了查询,对查询后的数据还需要进行GroupBy和降精度。GroupBy和降精度的计算开销也是非常可观的,取决于查询后的结果的数量级。对GroupBy和降精度的计算的优化,几乎所有的时序数据库都采用了同样的优化措施,那就是pre-aggregation和auto-rollup。思路就是预先进行计算,而不是查询后计算。不过OpenTSDB在已发布的最新版本中,还未支持pre-aggregation和rollup。而在开发中的2.4版本中,也只提供了半吊子的方案,它只提供了一个新的接口支持将pre-aggregation和rollup的结果进行写入,但是对数据的pre-aggregation和rollup的计算还需要用户自己在外层实现。

总结

  OpenTSDB的优势在于数据的写入和存储能力,得益于底层依赖的HBase所提供的能力。劣势在于数据查询和分析的能力上的不足,虽然在查询上已经做了很多的优化,但是不是所有的查询场景都能适用。可以说,OpenTSDB在TagValue过滤查询优化,是这次要对比的几个时序数据库中,优化的最差的。在GroupBy和Downsampling的查询上,也未提供Pre-aggregation和Auto-rollup的支持。不过在功能丰富程度上,OpenTSDB的API是支持最丰富的,这也让OpenTSDB的API成为了一个标杆。

本文为云栖社区原创内容,未经允许不得转载,如需转载请发送邮件至[email protected]

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326124196&siteId=291194637