prometheus的数据类型介绍

一、简介

Prometheus将所有采集到的样本数据以时间序列（time-series）的方式保存在内存数据库中，并定时保存在硬盘上。时间序列中的每一个样本由以下三部分组成。

指标(metric): metric name和描述当前样本特征的labelsets组成，参考格式如 <metric name>{<label name>=<label value>, ...}；，其中metric name的命名规则为：应用名称开头_监测对像_数值类型_单位
时间截(timestamp):一个精确到毫秒的时间截；
样本值(value):一个float64的浮点类型数据表示当前的样本值。

二、Prometheus的四种数据类型

2.1 Counter(计数器类型)
Counter类型的指标的工作方式和计数器一样，只增不减（除非系统发生了重置）。Counter一般用于累计值，例如记录请求次数、任务完成数、错误发生次数。counter主要有两个方法：

//将counter值加1.
Inc()
// 将指定值加到counter值上，如果指定值< 0会panic.
Add(float64)

在Prometheus自定义的metrics监控中，Counter的使用可以参考如下：

public class PrometheusMetricsInterceptor extends HandlerInterceptorAdapter {

    static final Counter requestCounter = Counter.build()
            .name("io_namespace_http_requests_total").labelNames("path", "method", "code") //metric name建议使用_total结尾
            .help("Total requests.").register();

    @Override
    public void afterCompletion(HttpServletRequest request, HttpServletResponse response, Object handler, Exception ex) throws Exception {
        String requestURI = request.getRequestURI();
        String method = request.getMethod();
        int status = response.getStatus();

        requestCounter.labels(requestURI, method, String.valueOf(status)).inc(); //调用inc()函数，每次请求发生时计数+1
        super.afterCompletion(request, response, handler, ex);
    }
}

Counter类型数据可以让用户方便的了解事件产生的速率的变化，在PromQL内置的相关操作函数可以提供相应的分析，比如以HTTP应用请求量来进行说明：

//通过rate()函数获取HTTP请求量的增长率
rate(http_requests_total[5m])
//查询当前系统中，访问量前10的HTTP地址
topk(10, http_requests_total)

2.2 Gauge(仪表盘类型)
Gauge是可增可减的指标类，可以用于反应当前应用的状态。比如在监控主机时，主机当前的内容大小(node_memory_MemFree)，可用内存大小（node_memory_MemAvailable）。或者时容器当前的cpu使用率，内存使用率。
Gauge指标对象主要包含两个方法inc()以及dec()，用户添加或者减少计数。
在Prometheus自定义的metrics监控中，Gauge的使用可以参考如下：

public class PrometheusMetricsInterceptor extends HandlerInterceptorAdapter {

...省略的代码
static final Gauge inprogressRequests = Gauge.build()
        .name("io_namespace_http_inprogress_requests").labelNames("path", "method", "code")
        .help("Inprogress requests.").register();

@Override
public boolean preHandle(HttpServletRequest request, HttpServletResponse response, Object handler) throws Exception {
    ...省略的代码
    inprogressRequests.labels(requestURI, method, String.valueOf(status)).inc();// 计数器+1
    return super.preHandle(request, response, handler);
}

@Override
public void afterCompletion(HttpServletRequest request, HttpServletResponse response, Object handler, Exception ex) throws Exception {
    ...省略的代码
    inprogressRequests.labels(requestURI, method, String.valueOf(status)).dec();// 计数器-1
    super.afterCompletion(request, response, handler, ex);
}
}

对于Gauge类型的监控指标，通过PromQL内置函数delta()可以获取样本在一段时间内的变化情况，比如：

dalta(cpu_temp_celsius{host="zeus"}[2h]) //计算CPU温度在两小时内的差异
predict_linear(node_filesystem_free{job="node"}[1h], 4*3600) //预测系统磁盘空间在4小时之后的剩余情况

2.3 Histogram(直方图类型)
Histogram 由 < basename>_bucket{le="< upper inclusive bound>"}，< basename>_bucket{le="+Inf"}, < basename>_sum，_count 组成，主要用于表示一段时间范围内对数据进行采样（通常是请求持续时间或响应大小），并能够对其指定区间以及总数进行统计，通常它采集的数据展示为直方图。
在Prometheus自定义的metrics监控中，Histgram的使用可以参考如下：
以请求响应时间requests_latency_seconds为例，比如我们需要记录http请求响应时间符合在分布范围{0.005，0.01，0.025，0.05，0.075，0.1，0.25，0.5，0.75，1，2.5，5，7.5，10}中的次数时

public class PrometheusMetricsInterceptor extends HandlerInterceptorAdapter {

    static final Histogram requestLatencyHistogram = Histogram.build().labelNames("path", "method", "code")
            .name("io_namespace_http_requests_latency_seconds_histogram").help("Request latency in seconds.")
            .register();

    private Histogram.Timer histogramRequestTimer;

    @Override
    public boolean preHandle(HttpServletRequest request, HttpServletResponse response, Object handler) throws Exception {
        ...省略的代码
        histogramRequestTimer = requestLatencyHistogram.labels(requestURI, method, String.valueOf(status)).startTimer();
        ...省略的代码
    }

    @Override
    public void afterCompletion(HttpServletRequest request, HttpServletResponse response, Object handler, Exception ex) throws Exception {
        ...省略的代码
        histogramRequestTimer.observeDuration();
        ...省略的代码
    }

使用Histogram构造器在创建Histogram监控指标时，默认的buckets范围为{0.005，0.01，0.025，0.05，0.075，0.1，0.25，0.5，0.75，1，2.5，5，7.5，10}，如果要修改默认的buckets，可以使用.buckets(double… bukets)覆盖。
Histogram会自动创建3个指标，分别为：

事件发生的总次数，basename_count。

# 实际含义： 当前一共发生了2次http请求
io_namespace_http_requests_latency_seconds_histogram_count{path="/",method="GET",code="200",} 2.0

所有事件产生值的大小的总和，basename_sum。

# 实际含义： 发生的2次http请求总的响应时间为13.107670803000001 秒
io_namespace_http_requests_latency_seconds_histogram_sum{path="/",method="GET",code="200",} 13.107670803000001

事件产生的值分布在bucket中的次数，basename_bucket{le=“上包含”}

# 在总共2次请求当中。http请求响应时间 <=0.005 秒 的请求次数为0
io_namespace_http_requests_latency_seconds_histogram_bucket{path="/",method="GET",code="200",le="0.005",} 0.0
# 在总共2次请求当中。http请求响应时间 <=0.01 秒 的请求次数为0
io_namespace_http_requests_latency_seconds_histogram_bucket{path="/",method="GET",code="200",le="0.01",} 0.0
# 在总共2次请求当中。http请求响应时间 <=0.025 秒 的请求次数为0
io_namespace_http_requests_latency_seconds_histogram_bucket{path="/",method="GET",code="200",le="0.025",} 0.0
io_namespace_http_requests_latency_seconds_histogram_bucket{path="/",method="GET",code="200",le="0.05",} 0.0
io_namespace_http_requests_latency_seconds_histogram_bucket{path="/",method="GET",code="200",le="0.075",} 0.0
io_namespace_http_requests_latency_seconds_histogram_bucket{path="/",method="GET",code="200",le="0.1",} 0.0
io_namespace_http_requests_latency_seconds_histogram_bucket{path="/",method="GET",code="200",le="0.25",} 0.0
io_namespace_http_requests_latency_seconds_histogram_bucket{path="/",method="GET",code="200",le="0.5",} 0.0
io_namespace_http_requests_latency_seconds_histogram_bucket{path="/",method="GET",code="200",le="0.75",} 0.0
io_namespace_http_requests_latency_seconds_histogram_bucket{path="/",method="GET",code="200",le="1.0",} 0.0
io_namespace_http_requests_latency_seconds_histogram_bucket{path="/",method="GET",code="200",le="2.5",} 0.0
io_namespace_http_requests_latency_seconds_histogram_bucket{path="/",method="GET",code="200",le="5.0",} 0.0
io_namespace_http_requests_latency_seconds_histogram_bucket{path="/",method="GET",code="200",le="7.5",} 2.0
# 在总共2次请求当中。http请求响应时间 <=10 秒 的请求次数为0
io_namespace_http_requests_latency_seconds_histogram_bucket{path="/",method="GET",code="200",le="10.0",} 2.0
# 在总共2次请求当中。http请求响应时间 10 秒 的请求次数为0
io_namespace_http_requests_latency_seconds_histogram_bucket{path="/",method="GET",code="200",le="+Inf",} 2.0

2.4 Summary(摘要类型)
Summary类型和Histogram类型相似，由< basename>{quantile="< φ>"}，< basename>_sum，< basename>_count组成，主要用于表示一段时间内数据采样结果（通常时请求持续时间或响应大小），它直接存储了quantile数据，而不是根据统计区间计算出来的。Summary与Histogram相比，存在如下区别：

都包含 < basename>_sum和< basename>_count;
Histogram需要通过< basename>_bucket计算quantile，而Summary直接存储了quantile的值。
在Prometheus自定义的metrics监控中，Summary的使用可以参考如下：

public class PrometheusMetricsInterceptor extends HandlerInterceptorAdapter {

    static final Summary requestLatency = Summary.build()
            .name("io_namespace_http_requests_latency_seconds_summary")
            .quantile(0.5, 0.05)
            .quantile(0.9, 0.01)
            .labelNames("path", "method", "code")
            .help("Request latency in seconds.").register();


    @Override
    public boolean preHandle(HttpServletRequest request, HttpServletResponse response, Object handler) throws Exception {
        ...省略的代码
        requestTimer = requestLatency.labels(requestURI, method, String.valueOf(status)).startTimer();
        ...省略的代码
    }

    @Override
    public void afterCompletion(HttpServletRequest request, HttpServletResponse response, Object handler, Exception ex) throws Exception {
        ...省略的代码
        requestTimer.observeDuration();
        ...省略的代码
    }
}

Summary类型指标中包含的数据如下：

事件发生总的次数

# 含义：当前http请求发生总次数为12次
io_namespace_http_requests_latency_seconds_summary_count{path="/",method="GET",code="200",} 12.0

事件产生的值的总和

# 含义：这12次http请求的总响应时间为 51.029495508s
io_namespace_http_requests_latency_seconds_summary_sum{path="/",method="GET",code="200",} 51.029495508

事件产生的值的分布情况

# 含义：这12次http请求响应时间的中位数是3.052404983s
io_namespace_http_requests_latency_seconds_summary{path="/",method="GET",code="200",quantile="0.5",} 3.052404983
# 含义：这12次http请求响应时间的9分位数是8.003261666s
io_namespace_http_requests_latency_seconds_summary{path="/",method="GET",code="200",quantile="0.9",} 8.003261666

参考

1.Prometheus Metric类型
2.自定义Metrics：让Prometheus监控你的应用程序（Spring版）
3.自定义Metrics：让Prometheus监控你的应用程序
4.使用Prometheus+Grafana监控MySQL实践
5.全面学习Prometheus
6.Prometheus笔记（一）metric type