Prometheus monitoring indicators commonly used meanings node-exporter

I. Description

Prometheus recently used to build new centos6 and monitoring indicators under centos7 memory acquisition memory is found not collect the same formula for calculating the monitoring system, the last unified calculation method and analyzed is calculated as follows:

1
100-(node_memory_MemFree_bytes+node_memory_Cached_bytes+node_memory_Buffers_bytes) /node_memory_MemTotal_bytes *10

 

Two, node-exporter commonly used indicators of meaning (reference documentation)

https://www.gitbook.com/book/songjiayang/prometheus/details (Prometheus combat) 

https://github.com/1046102779/prometheus (Prometheus unofficial Chinese manual)

http://www.bubuko.com/infodetail-2004088.html (based prometheus monitoring k8s cluster)

http://www.cnblogs.com/sfnz/p/6566951.html (mounting prometheus + grafana monitoring mysql redis kubernetes the like, is mounted non-docker)

https://github.com/kayrus/prometheus-kubernetes (Prometheus-kubernetes) 

https://github.com/prometheus/node_exporter (prometheus/node_exporter)

http://dockone.io/article/2579 (Prometheus monitoring practices at the Kubernetes)

https://github.com/prometheus/prometheus/releases (prometheus download list)

https://github.com/prometheus/node_exporter/releases/ (node_exporter download list)

 

 

 

Premised on the notion:

1. The time sequence of the same number of columns to the value of statistical indicators such time sequence of occurrence of arrayed

2. Expressions 

=: Select exactly equal string label

! =: Selecting unequal string label

= ~: Selecting a regular expression matching tags (or sub-labels)

! ~: Choose not to match the regular expression tag (or sub-label)

3. Time definitions

s:seconds

m:minutes

h:hours

d:days

w:weeks

y:years

Note: [5m] refers to the last 5 minutes

4. Operators

bool

and

or

unless

on

without: without (label) removing labels and values ​​in parentheses in results

by: by (label) reserved only labels and values ​​in parentheses in results

 

1.CPU idle rate

1
sum (irate(node_cpu{mode= "idle" , instance= "134node" }[1m])) * 100 / count_scalar(node_cpu{mode= "user" , instance= "134node" })  

Comment:

## instance: refers to the label, the actual specific configuration, can also be used to match a regular

## mode: refers to cpu mode, node-exporter has crawled out, can be deployed ip in node-exporter: 9100 view on this website

                    For example: http: //172.17.123.134: 9100 / metrics

## sum () function: the indicator means sums the values ​​in brackets

## irate () function: means per instantaneous (per-second) time series calculation range velocity vector (calculates the 

                          per-second instant rate of increase of the time series in the range vector)

## count_scalar () function: it refers to the number of elements as the time series vector scalar result (returns the number of     

                                      elements in a time series vector as a scalar)

 

2.CPU load factor

1
node_load1{instance= "134node" } / count by(job, instance)(count by(job, instance, cpu)(node_cpu{instance= "134node" })) 

Comment:

## node_load1: cpu refers to the average load within 1 minute, the same means cpu_load5 cpu load average over 5 minutes, cpu_load15 fingers 15    

                           cpu load average within minutes

## count: refers to each element in the vector of polymerization (i.e., count)

## follow-up notes to be added

 

3. available memory

1
node_memory_MemAvailable{instance= "88node" }  

Comment:

## node_memory_MemAvailable: Memory information field MemAvailable, node-exporter has crawled out, just to show the query;

    Note: The index for different systems is to collect different, you can not collect this indicator on CentOS6.X; may the CentOS7;

 

4. The free file system space

1
2
3
sum (node_filesystem_free{fstype= "xfs" ,instance= "88node" })  
 
sum (node_filesystem_free{fstype= "ext4" ,instance= "134node" })

## node_filesystem_free: Filesystem free space in bytes

## fstype the following species:

## aufs: refers to the combined file system for two originally separate file systems joined together

## cgroup: Cgroups (control group) is a function of the Linux kernel, used to limit the resource statistics and a separate process group  

                   (CPU, memory, disk input and output).

## tmpfs: tmpfs is a virtual memory file system, rather than a block device.

## overlay: a overlay file system contains two file systems, file system a upper and a lower file system  

                      System, is a new joint document system 

### proc, xfs, mqueue and so on.

 

5.swap hard disk swap: from hard disk to memory or from memory to the hard disk, virtual memory swap

Swap free :

1
node_memory_SwapFree{instance= "134node" }

## node_memory_SwapTotal: Memory information field SwapTotal.

## swap: similar to the hard disk when memory can be used, then this section is generally called swap memory

 

Swap Usage :

1
node_memory_SwapTotal{instance= "134node" } - node_memory_SwapFree{instance= "134node" }

## node_memory_SwapFree: Memory information field SwapFree

 

Swap I/O(in):

1
rate(node_vmstat_pswpin{instance= "88node" }[1m]) * 4096 or irate(node_vmstat_pswpin{instance= "88node" }[5m]) * 4096

 

Swap I/O(out):

1
rate(node_vmstat_pswpout{instance= "88node" }[1m]) * 4096 or irate(node_vmstat_pswpout{instance= "88node" }[5m]) * 4096

 

## vmstat: vmstat command is the most common Linux / Unix monitoring tool that can show server at a given time interval value of the state, 

                    Including the server's CPU usage, memory usage, virtual memory swap case, IO read and write conditions.

## pswpin / s: the number of times per second transferred from the memory into the hard disk exchange zone.

## pswpout / s: the number of times per second transferred from memory to the hard disk exchange zone.

## pswpin / s, pswpout / s is described in associated with the hard disk swap swap activity. Exchange related to the efficiency of the system. In exchange zone

     Read the hard disk on your hard disk, memory write operations than read, write, much slower, therefore, in order to improve the efficiency of the system should try to reduce exchange.  

     The usual practice is to increase the memory, so that the exchange activities swap is zero, or close to zero. If the value swpot / s large

     To 1, indicating the possible need to increase or decrease the buffer memory (buffer can reduce the release of part of the free memory space).

 

Swap free rate (hundred percent)

(node_memory_SwapFree{instance=~"$server"}  /node_memory_SwapTotal{instance=~"$server"}) * 100

 

6.CPU usage

1
avg without (cpu) (irate(node_cpu{instance= "88node" , mode!= "idle" }[5m]))

## avg: the average

 

7. Internet usage

Upload speed:

1
irate(node_network_transmit_bytes{device!= "lo" ,instance= "88node" }[1m])

Download speed:

1
irate(node_network_receive_bytes{device!= "lo" ,instance= "88node" }[1m])

## eth0: ethernet shorthand, generally used for the Ethernet interface.

## wifi0: wifi is a wireless local area network, so wifi0 generally refers to wireless network interface.

## ath0: Atheros shorthand, generally refers to wireless network interface chip Atheros included.

## tunl0: tunl0 tunnel interface is time, using data encapsulation

## lo: local shorthand, generally refers to the local loopback interface.

 

8. Memory Usage

Used Memory :( total memory - free memory - = cache memory used)

      node_memory_MemTotal{instance="88node"} -  

      node_memory_MemFree{instance="88node"} - 

      node_memory_Cached{instance="88node"} - 

      node_memory_Buffers{instance="88node"} - 

      node_memory_Slab{instance="88node"}

 

Buffer cache:

     node_memory_Buffers{instance="88node"}

Cached cache:

     node_memory_Cached{instance="88node"}  

     + node_memory_Slab{instance="88node"}

Free free memory:

     node_memory_MemFree{instance="88node"}

 

The proportion of available memory:

1
2
3
(node_memory_MemAvailable{instance= "88node" } /
 
node_memory_MemTotal{instance= "88node" }) * 100

 

## total: total physical memory size.

## Free: How much free memory.

## Shared: multiple processes to share memory total.

## Buffers: represents the number of memory buffers cache, generally read and write buffer block device requires only

## Cached: represents the number of cached page cached in memory, generally for the file system will be frequently accessed files    

                  cached. If the cached value is large, it means more number of files cached. If at this time the bi IO is relatively small, it is                                                                                                                                                    

                  Documentation system efficiency is better

## Slab: slab allocator can not only provide dynamic memory management functions, but also as a memory cache frequently allocated and released

## MemAvailable: Free + Buffers + Cached - unrecoverable portion. Non-recovery section comprising: a shared memory segment,     

                             tmpfs,ramfs等

 

9. disk read and write (of IOPs)

Disk reads per second (within 5 minutes)

1
sum  by (instance) (irate(node_disk_reads_completed{instance= "88node" }[5m]))

##node_disk_reads_completed: The total number of reads completed successfully

Disk writes per second (within 5 minutes)

1
sum  by (instance)(irate(node_disk_writes_completed{instance= "88node" }[5m]))

##node_disk_writes_completed :The total number of writes completed successfully.

Use milliseconds of I / O (within 5 minutes)

1
sum  by (instance) (irate(node_disk_io_time_ms{instance= "88node" }[5m]))

##node_disk_io_time_ms: Total Milliseconds spent doing I/Os

Total disk read per second (within 5 minutes)

1
sum  by (instance) (irate(node_disk_reads_completed{instance= "88node" }[5m])) +  sum  by (instance) (irate(node_disk_writes_completed{instance= "88node" }[5m]))

 

10.I/O Usage

Total disk reads (1 minute)

1
sum (irate(node_disk_bytes_read{instance= "88node" }[1m]))

## node_disk_bytes_read: (number of bytes successfully read) The total number of bytes read successfully

Total disk writes (1 minute)

1
sum (irate(node_disk_bytes_written{instance= "88node" }[1m]))

## node_disk_bytes_written: (number of bytes successfully written) The total number of bytes written successfully

Milliseconds using I / O's (1 minute)

1
sum (irate(node_disk_io_time_ms{instance= "88node" }[1m]))

## node_disk_io_time_ms: Total Milliseconds spent doing I / Os (IO using the total number of milliseconds).

 

11. The file system free space

Lowest:

1
min(node_filesystem_free{fstype=~ "xfs|ext4" ,instance= "88node" } / node_filesystem_size{fstype=~ "xfs|ext4" ,instance= "88node" })

The highest value:

1
max(node_filesystem_free{fstype=~ "xfs|ext4" ,instance= "88node" } / node_filesystem_size{fstype=~ "xfs|ext4" ,instance= "88node" })

## ext4 is the fourth generation of extended file system (English: Fourth EXtended filesystem, abbreviated as ext4) is linlli

     Log file system in linux, filesystem capacity reaches ext4 1EB, while files of up to 16TB

## XFS is a 64-bit file system that supports a maximum minus 1 byte 8EB single file system, depending on the host operating system when actually deployed  

     Limiting the maximum block the system. For a 32-bit linux system, the size of the file and the file system will be limited to 16TB.

Original: https: //blog.csdn.net/ffzhihua/article/details/88131507 

Guess you like

Origin www.cnblogs.com/linyouyi/p/11242478.html