Introduction to a function
The gauge type data is a randomly changing value, which is not continuously increasing like the counter
1 increase()
The increase function is used in promethes to intercept the increasing value of Counter for a period of time. increase (node_cpu [1m]) =》 In this way, the increment of the total CPU usage time within 1 minute is obtained, and the increment of a CPU in one minute is obtained. Increase and rate are very similar
rate (1m) is the average number of increments per second,
increase (1m) is the total amount of increments taken over an
example:
increase(node_network_receive_bytes[1m]) 取的是 1分钟内的 增量总量
rate(node_network_receive_bytes[1m]) 取的是 1分钟内的增量 除以 60秒 每秒数量
2 sum()
sum () acts as a value addition just like the word meaning, sum (increase (node_cpu [1m])) applies a sum to the outside to add all the core values, and get all the cpu in one. Increment in minutes
用法:sum(rate(node_network_receive_bytes[1m]))
3 (instance)
This function can add the sum to the value of the sum. The split instance of the layer according to the specified method represents the machine name.
For example:
sum(increase(node_cpu_seconds_total{mode="idle"}[1m]))by (instance)
4 Use the above function to view the CPU usage
idle represents the CPU idle time
(1-((sum(increase(node_cpu_seconds_total{mode="idle"}[1m])) by (instance))/(sum(increase(node_cpu_seconds_total[1m])) by (instance)))) * 100
sum(increase(node_cpu{mode="idle"}[1m])) by (instance) =》 是空闲CPU时间 1分钟的增量
sum(increase(node_cpu_seconds_total[1m])) by (instance) 是全部CPU时间 1分 钟增量
5 Use of rate function
(.) rate function is designed so that data type with counter function Using its function is to set in increments ⼀ time period taken per second counter in the period of
examples:
Rate (node_network_receive_bytes [1M])
to You can get the average increment per second in 1 minute
So when we use any counter data type in the future, we always remember to add a rate () or increase () to it without first doing it first.
6 balls ()
Definition: Take the highest value of the first place
Usage: Use of
Gauge type topk (3, count_netstat_wait_connections)
Use of Counter type topk (3, rate (node_network_receive_bytes [20m]))
7 count()
Definition: Add the output numbers whose values meet the conditions. For
example: find the current (or historical) number of machines with TCP waiting count of 200
count (count_netstat_wait_connections> 200)
Two prometheus command format
1 Exact match
The query of the command is based on the original input first using {} to further filter the count_netstat_wait_connections {exported_instance = "log"}
exported_instance indicates that the monitored server "log" is the machine name of a log server
2 Fuzzy matching
count_netstat_wait_connections{exported_instance=~"web.*"}
Display all machines with web in the machine name
. * Belongs to regular expression
Fuzzy match = ~
Fuzzy does not match! ~
3 Numerical filtering
After the label is filtered, it is the numerical filter. For example, we only want to find the count_netstat_wait_connections {exported_instance = ~ "web. *"}> 200 where the number of wait_connections is greater than 200