【前置文章】
1. Metric queries介绍
- Metric queries建立在Log queries的结果集上,进而创建metrics。
- 可以用来计算错误的日志发生率或是在过去的3小时内打印的top n条log。
- 结合Log queries的Parsers功能,metric queries可以对日志中简单的value值进行计算,如延迟时间或是request size。所有的labels,包括Parsers新生成的,都可以用来当然被聚合或是新生成一个series表。
2. Range Vector aggregation
LogQL借鉴了Prometheus中的range vector概念,可以按时间维度对已经filter出来的日志再进行聚合。如查询出过去3小时的日志,可以再按每秒进行统计,这里的每秒就是一个Time Durations的概念,LogQL支持的Time Durations和Prometheus中的一样:
ms
- millisecondss
- secondsm
- minutesh
- hoursd
- days - assuming a day has always 24hw
- weeks - assuming a week has always 7dy
- years - assuming a year has always 365d
示例如:5h,1h30m,5m,10s。
Loki支持两种类型的聚合:
- a. log range aggregations (#2.1)
- b. unwrapped range aggregations (#2.2)
2.1 Log range aggregations
利用function在某个区间内做聚合。这个区间写在在Log stream selector或是Log pipeline后面。
Functions列表:
rate(log-range)
: calculates the number of entries per secondcount_over_time(log-range)
: counts the entries for each log stream within the given range.bytes_rate(log-range)
: calculates the number of bytes per second for each stream.bytes_over_time(log-range)
: counts the amount of bytes used by each log stream for a given range.absent_over_time(log-range)
: returns an empty vector if the range vector passed to it has any elements and a 1-element vector with the value 1 if the range vector passed to it has no elements. (absent_over_time
is useful for alerting on when no time series and logs stream exist for label combination for a certain amount of time.)
【举例】
- 每5分钟一个区间,统计job=mysql的日志数量:
count_over_time({job="mysql"}[5m])
更为具体的说,Log范围为:
00:01 --> log1
00:02 --> log2
...
00:10 --> log 10
那么用count_over_time函数统计的时候,会变为:
00:01 ~ 00:05,这一期间的log数量为5。
00:06 ~ 00:10,这一期间的log数量也为5。
- 每1分钟为一个统计区间,统计的对象为标签为job的mysql日志,并且日志内容包括error,但不包含timeout字符,执行时间大于10s的每秒的个数。最终用sum函数,把各个host中的value都加起来。
sum by (host) (rate({job="mysql"} |= "error" != "timeout" | json | duration > 10s [1m]))
2.2. Unwrapped range aggregations
Unwrapped ranges可以使用通过Parser新生成的label中的values值(统计的对象不仅仅是log日志的内容本身)。语法为| unwrap label_identifier
自带的functions有:
duration_seconds(label_identifier)
(or its short equivalentduration
) which will convert the label value in seconds from the go duration format (e.g5m
,24s30ms
).bytes(label_identifier)
which will convert the label value to raw bytes applying the bytes unit (e.g.5 MiB
,3k
,1G
).
还支持其它很多functions:
rate(unwrapped-range)
: calculates per second rate of the sum of all values in the specified interval.rate_counter(unwrapped-range)
: calculates per second rate of the values in the specified interval and treating them as “counter metric”sum_over_time(unwrapped-range)
: the sum of all values in the specified interval.avg_over_time(unwrapped-range)
: the average value of all points in the specified interval.max_over_time(unwrapped-range)
: the maximum value of all points in the specified interval.min_over_time(unwrapped-range)
: the minimum value of all points in the specified intervalfirst_over_time(unwrapped-range)
: the first value of all points in the specified intervallast_over_time(unwrapped-range)
: the last value of all points in the specified intervalstdvar_over_time(unwrapped-range)
: the population standard variance of the values in the specified interval.stddev_over_time(unwrapped-range)
: the population standard deviation of the values in the specified interval.quantile_over_time(scalar,unwrapped-range)
: the φ-quantile (0 ≤ φ ≤ 1) of the values in the specified interval.absent_over_time(unwrapped-range)
: returns an empty vector if the range vector passed to it has any elements and a 1-element vector with the value 1 if the range vector passed to it has no elements. (absent_over_time
is useful for alerting on when no time series and logs stream exist for label combination for a certain amount of time.)
【举例】
- 每1分钟为一个统计区间,日志包含metrics.go,并按日志内容进行logfmt,使用新生成的label名=bytes_proceed进行相加,最后按org_id进行求和。
sum by (org_id) (
sum_over_time(
{cluster="ops-tools1",container="loki-dev"}
|= "metrics.go"
| logfmt
| unwrap bytes_processed [1m])
)
3. Built-in aggregation operators
类似PromQL,LogQL支持在aggregation的基础上,再进行新的聚合操作,以下函数可以将日志内容统计后聚合到新的图表中:
sum
: Calculate sum over labelsavg
: Calculate the average over labelsmin
: Select minimum over labelsmax
: Select maximum over labelsstddev
: Calculate the population standard deviation over labelsstdvar
: Calculate the population standard variance over labelscount
: Count number of elements in the vectortopk
: Select largest k elements by sample valuebottomk
: Select smallest k elements by sample value
语法:
<aggr-op>([parameter,] <vector expression>) [without|by (<label list>)]
当使用topk
和bottomk
的时候,需要parameter
。如topk的时候,可以传入parameter为10,即统计top 10。
当需要按某个input vector进行group的时候,可以使用by
和without
,without
语句可以将结果集中的labels移除掉(但保留剩余的)。by
表示按某个labels进行统计。
【举例】
- 每5分钟一个区间进行统计,region=us-east1下的日志,每秒Log出现最多次的前10条,显示name。即:列出前10个日志吞吐量最大的应用名。
topk(10,sum(rate({region="us-east1"}[5m])) by (name))
- 列出mysql job过去5分钟的日志数量,按level进行统计:
sum(count_over_time({job="mysql"}[5m])) by (level)