【Monitoring】LogQL学习(二):Metric queries部分

【前置文章】

1. Metric queries介绍

官方文档:grafana.com/docs/loki/l…

  • Metric queries建立在Log queries的结果集上,进而创建metrics。
  • 可以用来计算错误的日志发生率或是在过去的3小时内打印的top n条log。
  • 结合Log queries的Parsers功能,metric queries可以对日志中简单的value值进行计算,如延迟时间或是request size。所有的labels,包括Parsers新生成的,都可以用来当然被聚合或是新生成一个series表。

2. Range Vector aggregation

LogQL借鉴了Prometheus中的range vector概念,可以按时间维度对已经filter出来的日志再进行聚合。如查询出过去3小时的日志,可以再按每秒进行统计,这里的每秒就是一个Time Durations的概念,LogQL支持的Time Durations和Prometheus中的一样:

  • ms - milliseconds
  • s - seconds
  • m - minutes
  • h - hours
  • d - days - assuming a day has always 24h
  • w - weeks - assuming a week has always 7d
  • y - years - assuming a year has always 365d

示例如:5h,1h30m,5m,10s。

Loki支持两种类型的聚合:

  • a. log range aggregations (#2.1)
  • b. unwrapped range aggregations (#2.2)

2.1 Log range aggregations

利用function在某个区间内做聚合。这个区间写在在Log stream selector或是Log pipeline后面。

Functions列表:

  • rate(log-range): calculates the number of entries per second
  • count_over_time(log-range): counts the entries for each log stream within the given range.
  • bytes_rate(log-range): calculates the number of bytes per second for each stream.
  • bytes_over_time(log-range): counts the amount of bytes used by each log stream for a given range.
  • absent_over_time(log-range): returns an empty vector if the range vector passed to it has any elements and a 1-element vector with the value 1 if the range vector passed to it has no elements. (absent_over_time is useful for alerting on when no time series and logs stream exist for label combination for a certain amount of time.)

【举例】

  • 每5分钟一个区间,统计job=mysql的日志数量:
    count_over_time({job="mysql"}[5m])
    

更为具体的说,Log范围为:
00:01 --> log1
00:02 --> log2
...
00:10 --> log 10

那么用count_over_time函数统计的时候,会变为:
00:01 ~ 00:05,这一期间的log数量为5。
00:06 ~ 00:10,这一期间的log数量也为5。

  • 每1分钟为一个统计区间,统计的对象为标签为job的mysql日志,并且日志内容包括error,但不包含timeout字符,执行时间大于10s的每秒的个数。最终用sum函数,把各个host中的value都加起来。
    sum by (host) (rate({job="mysql"} |= "error" != "timeout" | json | duration > 10s [1m]))
    

2.2. Unwrapped range aggregations

Unwrapped ranges可以使用通过Parser新生成的label中的values值(统计的对象不仅仅是log日志的内容本身)。语法为| unwrap label_identifier

自带的functions有:

  • duration_seconds(label_identifier) (or its short equivalent duration) which will convert the label value in seconds from the go duration format (e.g 5m24s30ms).
  • bytes(label_identifier) which will convert the label value to raw bytes applying the bytes unit (e.g. 5 MiB3k1G).

还支持其它很多functions:

  • rate(unwrapped-range): calculates per second rate of the sum of all values in the specified interval.
  • rate_counter(unwrapped-range): calculates per second rate of the values in the specified interval and treating them as “counter metric”
  • sum_over_time(unwrapped-range): the sum of all values in the specified interval.
  • avg_over_time(unwrapped-range): the average value of all points in the specified interval.
  • max_over_time(unwrapped-range): the maximum value of all points in the specified interval.
  • min_over_time(unwrapped-range): the minimum value of all points in the specified interval
  • first_over_time(unwrapped-range): the first value of all points in the specified interval
  • last_over_time(unwrapped-range): the last value of all points in the specified interval
  • stdvar_over_time(unwrapped-range): the population standard variance of the values in the specified interval.
  • stddev_over_time(unwrapped-range): the population standard deviation of the values in the specified interval.
  • quantile_over_time(scalar,unwrapped-range): the φ-quantile (0 ≤ φ ≤ 1) of the values in the specified interval.
  • absent_over_time(unwrapped-range): returns an empty vector if the range vector passed to it has any elements and a 1-element vector with the value 1 if the range vector passed to it has no elements. (absent_over_time is useful for alerting on when no time series and logs stream exist for label combination for a certain amount of time.)

【举例】

  • 每1分钟为一个统计区间,日志包含metrics.go,并按日志内容进行logfmt,使用新生成的label名=bytes_proceed进行相加,最后按org_id进行求和。
sum by (org_id) (
  sum_over_time(
  {cluster="ops-tools1",container="loki-dev"}
      |= "metrics.go"
      | logfmt
      | unwrap bytes_processed [1m])
  )

3. Built-in aggregation operators

类似PromQL,LogQL支持在aggregation的基础上,再进行新的聚合操作,以下函数可以将日志内容统计后聚合到新的图表中:

  • sum: Calculate sum over labels
  • avg: Calculate the average over labels
  • min: Select minimum over labels
  • max: Select maximum over labels
  • stddev: Calculate the population standard deviation over labels
  • stdvar: Calculate the population standard variance over labels
  • count: Count number of elements in the vector
  • topk: Select largest k elements by sample value
  • bottomk: Select smallest k elements by sample value

语法:

<aggr-op>([parameter,] <vector expression>) [without|by (<label list>)]

当使用topkbottomk的时候,需要parameter。如topk的时候,可以传入parameter为10,即统计top 10。

当需要按某个input vector进行group的时候,可以使用bywithoutwithout语句可以将结果集中的labels移除掉(但保留剩余的)。by表示按某个labels进行统计。

【举例】

  • 每5分钟一个区间进行统计,region=us-east1下的日志,每秒Log出现最多次的前10条,显示name。即:列出前10个日志吞吐量最大的应用名。
topk(10,sum(rate({region="us-east1"}[5m])) by (name))
  • 列出mysql job过去5分钟的日志数量,按level进行统计:
sum(count_over_time({job="mysql"}[5m])) by (level)

猜你喜欢

转载自juejin.im/post/7126800358931693599