prometheus practice

Common scene

growth rate

increase( counter类型的范围向量)

Calculate the increment of a time series in a range vector
Algorithm: first sample - last sample

rate( counter类型的范围向量)

Calculate the average growth rate per second of the time series in the range vector Algorithm
: (first sample - last sample) / time (seconds)

The source data is coarse-grained, and there will be many null values ​​when calculating rate by second, forming breakpoints and broken links.
Rate is more suitable for fine-grained source data than increase

Long tail effect: Using the average value, it is easy to flatten the peak. Does not reflect the surge in visits
Not sensitive enough, suitable for staging long-term trends or alarm rules

  • Usage Recommendations
    It is recommended to time the range vector for the rate calculation to be at least four times the fetch interval. This will ensure that even if the fetch is slow and one fetch fails, you will always have two samples available. Such problems arise frequently in practice, so it is important to maintain this resilience. For example, for a crawl interval of 1 minute, you can use a rate calculation of 4 minutes, but this is usually rounded up to 5 minutes.

irate(range vector)

Calculate the instantaneous growth rate per second of the time series in the range vector
Algorithm: Take the last two data and calculate the difference

Up to 5 minutes backward query:
sensitive, suitable for short-term

predict

predict_linear growth prediction

common problem

long tail problem

  1. The average shows no instantaneous change

  2. To reflect instantaneous changes, there are requirements for sampling frequency

Counter reset (jump) problem

Counter is reset to 0. Interpret these resets as negative rates

The problem of jumping is solved during calculation, not when it is put into storage: it is automatically processed after the counter data drops

  • The rate will automatically handle the problem of counter reset.
    The counter usually keeps getting bigger. For example, an exporter starts and then crashes.
    Originally incrementing at a rate of about 10 per second, but only running for half an hour, rate(x_total[1h]) returns a result of about 5 per second.
    Additionally, any decrease in counter is also considered a counter reset. For example, if the time series has values ​​[5,10,4,6], treat it as [5,10,14,16].
    insert image description here

Data extrapolation (fitting)

  • Problem:
    For counters with only integer increments, increase() may also return non-integer results, such as 2.5883

  • Reason:
    The calculated behavior of the rate() and increase() functions

  • Data Extrapolation
    The first and last samples in a time window never coincide 100% with the start and end of the specified time window.
    Thus increase(), rate() will extrapolate the slope between the first and last data point in the window-to-window bounds ,
    yielding a value that on average is closer to the expected growth over the entire window (if at the window bounds do have samples).
    推测一个虚拟值 拿来计算
    insert image description here

  • Extrapolated source code extrapolatedRate
    https://github.com/prometheus/prometheus/blob/main/promql/functions.go#L66

irate() only looks at the per-second increase between two samples, so it doesn't do this extrapolation.

Guess you like

Origin blog.csdn.net/xyc1211/article/details/129799569