Five minutes Ali timing understand spatio-temporal database

Brief introduction

Temporal sequence database (Time Series & Spatial Temporal Database, TSDb of abbreviation) is a high performance, low cost, reliable timing temporal line database services, provide efficient reading and writing, high compression ratio is stored, and the polymerization time series data interpolation computing services widely used in the Internet of things (IoT) equipment monitoring systems, enterprise energy management systems (EMS), production safety monitoring system and power detection systems and other industries scenes; in addition, it also provides the ability to query and spatio-temporal scene analysis.

Three databases

The timing spatio-temporal database documents recently after several major changes, a bit chaotic, to see when your attention.

Timing database TSDB version

After Ali Group, a large-scale verification of timing database, support for distributed cluster architecture level expansion, support for ten million things device access, since the inquiry compression algorithm, with efficient compression ratio.

- 针对时序数据优化,包括存储模型,多值数据模型,时序数据压缩、聚合、采样,高效压缩算法,列存,边缘一体化;
- 具备高性能,内存优先数据处理,分布式MPP SQL并行计算,动态schema,实时流式数据计算引擎,海量时间线自适应索引;
- 高可扩展,数据动态分区,水平扩展,动态弹性扩容,动态升降配规格;高可靠性,自动集群控制,线程级读写分离,多层数据备份,分级存储;
- 瞄准的是大规模指标数据,事件数据场景

Protocol compatible OpenTSDB, but the back is core to achieve self-development of Ali. But still you can treat it as OpenTSDB Ali cloud version, see compared OpenTSDB advantage

InfluxDB®

More than just a database, but a monitoring system around the acquisition, visualization, analysis services, events and indicators storage and computing systems; taking the tick ecology, aiming indicators, events, trace, log, analyze real-time scene.

InfluxDB® just on the line soon, we are still in the beta stage. Write speed has been tested every 500 data can be performed about 26 times per second, with an average speed of 10,000 / s, increasing the number of each write data should also improve speed. In addition, the request address is outside the network, if the network speed vpc should also speed up a lot.

Note: InfluxDB time-limit line (the database level up to 10,000) in the cloud Ali, defined timeline see later introduction.

Spatio-temporal database

Temporal database capable of storing, managing, and comprising a space time series data related to geographic locations. Spatio-temporal data is a high-dimensional data with temporal data model, spatio-temporal and spatio-temporal index operator, is fully compatible with SQL and SQL / MM standard that supports the integration of spatial and temporal data with business data storage, seamless, easy-to-use integration.

Spatio-temporal database space is mainly related to the scene, such as thermodynamic diagram, shop location and so on.

Timing database Introduction (mainly InfluxDB)

English full time series database Time Series Database, providing efficient access to time-series data and statistical analysis of the data management systems. The main timing database including OpenTSDB, Druid, InfluxDB and Beringei four. I mainly know a little OpenTSDB and InfluxDB, but the timing of the database have a lot in common.

Base Noun

measurement:

tag,field和time列的容器
对InfluxDB: measurement在概念上类似于传统DB的table(表格)
  从原理上讲更像SQL中表的概念,这和其他很多时序数据库有些不同
对其他时序DB: Measurement与Metric等同

Field (column value):

TSDB For InfluxDB®中不能没有field。
注意:field是没有索引的
在某种程度上,可以把field理解为k/v表的value

tag (Dimension column):

tag不是必须要有的字段
tag是被索引的,这意味着以tag作为过滤条件的查询会更快
在某种程度上,可以把field理解为k/v表的key

timestamp (timestamp):

默认使用服务器的本地时间戳
时间戳是UNIX时间戳,单位:纳秒
最小的有效时间戳是-9223372036854775806或1677-09-21T00:12:43.145224194Z
最大的有效时间戳是9223372036854775806或2262-04-11T23:47:16.854775806Z

Point (data points):

由时间线(series)中包含的field组成。每个数据点由它的时间线和时间戳(timestamp)唯一标识
您不能在同一时间线存储多个有相同时间戳的数据点
Series (timeline)

Series InfluxDB is the most important concept of the time series data lines is: an indicator of a data acquisition source over time and a steady flow of discharged data line data thus formed is called a timeline.

The figure, there are two data sources, each data source will collect two metrics:

Series由Measurement和Tags组合而成,
Tags组合用来唯一标识Measurement
就是说:
1. Measurement不同,就是不同的时间线
2. Measurement相同,Tags不同也是不同的时间线
retention policy (retention policy, referred to as RP)

A retention policy describes:

  1.InfluxDB保存数据的时间(DURATION)
  2.以及存储在集群中数据的副本数量(REPLICATION)
  3.指定ShardGroup Duration
注:复本系数(replication factors)不适用于单节点实例。
autogen:无限的存储时间并且复制系数设为1

RP create statement as follows:

CREATE RETENTION POLICY ON <retention_policy_name> ON <database_name>
DURATION <duration> REPLICATION <n> [SHARD DURATION <duration> ] [DEFAULT]
实例:
CREATE RETENTION POLICY "one_day_only" ON "water_database"
DURATION 1d REPLICATION 1 SHARD DURATION 1h DEFAULT

Designated rp writing writing:

% 如果没有指定任何RP,则使用默认的RP
curl -X POST 'http://localhost:8086/write?db=mydb&rp=six_month_rollup'
    --data-binary 'disk,host=server01 value=442221834240i 1435362189575692182'
Shard Group

Shard Group is a logical concept InfluxDB important in:

Shard Group会包含多个Shard,每个Shard Group只存储指定时间段的数据
不同Shard Group对应的时间段不会重合

Each Shard Group corresponds to how long the field is specified by the Retention Policy "SHARD DURATION":

如果没有指定,也可以通过Retention Duration(数据过期时间)计算出来,两者的对应关系为:

Retention Duration              SHARD DURATION
<2 days                             1h
>=2days and <=6month                1day
>6month                             7day

Shard:

类似于HBase中Region,Kudu中Tablet的概念
1. Shard是InfluxDB的存储引擎实现,具体称之为TSM(Time Sort Merge Tree) Engine
    负责数据的编码存储、读写服务等。
TSM类似于LSM,因此Shard和HBase Region一样包含Cache、WAL以及Data File等各个组件,
    也会有flush、compaction等这类数据操作
2. Shard Group对数据按时间进行了分区
    InfluxDB采用了Hash分区的方法将落到同一个Shard Group中的数据再次进行了一次分区
    InfluxDB是根据hash(Series)将数据映射到不同的Shard,而非根据Measurement进行hash映射

InfluxQL

Line Protocol

format:

<measurement>[,<tag_key>=<tag_value>[,<tag_key>=<tag_value>]] 
  <field_key>=<field_value>[,<field_key>=<field_value>] [<timestamp>]

The following is an example of the format of the data written to conform TSDB For InfluxDB® of:

1. cpu,host=serverA,region=us_west value=0.64
2. payment,device=mobile,product=Notepad,method=credit billed=33,licenses=3i 1434067467100293230
3. stock,symbol=AAPL bid=127.46,ask=127.48
4. temperature,machine=unit42,type=assembly external=25,internal=37 1434067467000000000
log in
// 登录
$> influx -ssl -username <账号名称> -password <密码> -host <网络地址> -port 3242
// 创建用户
> create user gordon with password '1QAZ2wsx'
// 赋值权限
grant all privileges to gordon
// 创建数据库
create database testdb
Basic QL
1. # 显示时间线
show series
2. # 显示度量
show measurements
3. # 显示Tag的Key
show tag keys
4. # 显示数据字段的Key
show field keys

Inquire:

1. select * from metrics
2. show tag keys from metrics
3. show field keys from metrics

# 查看自定度量的数据, 里面的相关字段,官方建议使用“双引号”标注出来
select * from "CPU" order by time desc

# 查看指定的Field和Tag
select "load1","role" from "CPU" order by time desc

# 只查看Field
select *::field from "CPU"

# 查询指定Tag的数据,注意,Where子句的字符串值要使用“单引号”,字符串值
# 如果没有使用引号或者使用了双引号,都不会有任何值的返回
select * from "CPU" where role = 'FrontServer'

# 查询Field中,load1 > 20 的所有数据
select * from "CPU" where "load1" > 20

insert:

INSERT weather,location=us-midwest temperature=82 1465839830100400200

Basic operations:

# 执行基本的运算
select ("load1" * 2) + 0.5 from "CPU"

// SELECT语句支持使用基本的数学运算符,例如,+、-、/、*和()等等。
SELECT field_key1 + field_key2 AS "field_key_sum"
  FROM "measurement_name" WHERE time < now() - 15m

SELECT (key1 + key2) - (key3 + key4) AS "some_calculation"
  FROM "measurement_name" WHERE time < now() - 15m

// 使用聚合函数计算百分比:
SELECT (sum(field_key1) / sum(field_key2)) * 100 AS "calculated_percentage"
  FROM "measurement_name" WHERE time < now() - 15m GROUP BY time(1m)

Guess you like

Origin yq.aliyun.com/articles/703560