Prometheus use
surroundings
- View previous installation chapter
- MacOS
- CentOS 7
- prometheus-2.12.0.linux-amd64.tar.gz
- grafana-6.3.5-1.x86_64
- node_exporter-0.18.1.linux-amd64
- pushgateway-0.9.1.linux-amd64
Examples of command line entry
-
CPU usage calculation
In t1 to t2 CPU total use time period =
( user2+ nice2+ system2+ idle2+ iowait2+ irq2+ softirq2) - ( user1+ nice1+ system1+ idle1+ iowait1+ irq1+ softirq1)
CPU use time in an idle period t1 to t2 =(idle2 - idle1)
The CPU time instant t1 to t2 = utilization
1 - CPU空闲使用时间 / CPU总的使用时间
increase()
Function: to solve the counter type of time incrementMulticore CPU calculates
sum()
The results are summed- Get CPU time
- Get free time
idle
The total acquisition time
-
The total CPU utilization on a single machine
1-(sum(increase(node_cpu_seconds_total{instance="192.168.9.232:9100",mode="idle"}[1m]))/sum(increase(node_cpu_seconds_total{instance="192.168.9.232:9100"}[1m])))
-
by (instance): differentiate between instances of
-
(1-( sum(increase(node_cpu_seconds_total{mode="idle"}[1m])) by(instance) / sum(increase(node_cpu_seconds_total{}[1m]) ) by(instance) )) * 100
-
Other CPU state is calculated using the time
-
iowait io latency
sum(increase(node_cpu_seconds_total{mode="iowait"}[1m])) by(instance) / sum(increase(node_cpu_seconds_total{}[1m]) ) by(instance)
-
irq hardware interrupt
sum(increase(node_cpu_seconds_total{mode="irq"}[1m])) by(instance) / sum(increase(node_cpu_seconds_total{}[1m]) ) by(instance)
-
soft irq soft interrupt
sum(increase(node_cpu_seconds_total{mode="softirq"}[1m])) by(instance) / sum(increase(node_cpu_seconds_total{}[1m]) ) by(instance)
-
Time slicing virtual machine steal
sum(increase(node_cpu_seconds_total{mode="steal"}[1m])) by(instance) / sum(increase(node_cpu_seconds_total{}[1m]) ) by(instance)
-
nice nice value of time allocation process
sum(increase(node_cpu_seconds_total{mode="nice"}[1m])) by(instance) / sum(increase(node_cpu_seconds_total{}[1m]) ) by(instance)
-
idle idle
sum(increase(node_cpu_seconds_total{mode="idle"}[1m])) by(instance) / sum(increase(node_cpu_seconds_total{}[1m]) ) by(instance)
-
user mode user
sum(increase(node_cpu_seconds_total{mode="user"}[1m])) by(instance) / sum(increase(node_cpu_seconds_total{}[1m]) ) by(instance)
-
sytem kernel mode
sum(increase(node_cpu_seconds_total{mode="system"}[1m])) by(instance) / sum(increase(node_cpu_seconds_total{}[1m]) ) by(instance)
-
Extended use of the command line
-
filter
- Label Filtering
key{label=""}
- Fuzzy matching
key{label=~"web.*"}
- Fuzzy matching
- Value filter
- Arithmetic
key{.} > 400
- Arithmetic
- Label Filtering
-
function
-
rate(.[5m])
With data type counter, according to a set time period, takingcounter
the average of the increments per period- Value of the time period to be considered acquired program data collection interval
-
increase(.[5m])
With thecounter
data type, the increment takes a period of time
-
sum()
Addition sum- Combine
by()
- Combine
-
topk(x,key)
Take the highest position before x- Not suitable
graph
; suitable forconsole
viewing - Suitable for instantaneous alarm
- Not suitable
-
count()
- Fuzzy surveillance judge
-
Data collection
Start the server - for production
-
Peometheus load configuration file
- For signaling to prometheus
- kill -HUP pid
- Send HTTP requests to prometheus
- curl -XPOST http://prometheus.chenlei.com/-/reload
- For signaling to prometheus
-
Background process
-
> yum install -y kernel-devel > yum groupinstall -y Development tools > git clone https://github.com/bmc/daemonize.git > cd daemonize > ./configure && make && make install
-
Start
prometheus
additional parameters- -web.listen-address: listen address
0.0.0.0:9090
- -web.read-timeout: The maximum waiting time request link
2m
- -web.max-connections: Maximum number of connections
10
- -storage.tsdb.retention: data retention
90d
- -storage.tsdb.path: Data save path
/data/prometheus/server/data
- -query.max-concurrency: the maximum number of concurrent
20
- -query.timeout: Query timeout
2m
- -web.listen-address: listen address
-
Storage structure
server/ └── data ├── 01DM9HP1PHHK2BD1MGC7J1C0YC │ ├── chunks │ │ └── 000001 │ ├── index │ ├── meta.json │ └── tombstones ├── 01DM9ZDG8QKWTPYZ86K7XW6FKZ │ ├── chunks │ │ └── 000001 │ ├── index │ ├── meta.json │ └── tombstones ├── 01DMAM0NM51YSQ4EVRRV46X2E1 │ ├── chunks │ │ └── 000001 │ ├── index │ ├── meta.json │ └── tombstones ├── 01DMAM0P4CGJWSSA15QPWJGZXF │ ├── chunks │ │ └── 000001 │ ├── index │ ├── meta.json │ └── tombstones ├── lock ├── queries.active └── wal ├── 00000011 ├── 00000012 ├── 00000013 ├── 00000014 ├── 00000015 ├── 00000016 ├── 00000017 ├── 00000018 └── checkpoint.000010 └── 00000000
-
Recent data stored in the
wal/
directory, to prevent a sudden power failure or reboot, to be used to recover data in memory
Server configuration file written
global:
scrape_interval: 5s #抓取频率
evaluation_interval: 1s
alerting:
alertmanagers:
- static_configs:
- targets:
rule_files:
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: '233-node-exporter'
static_configs:
- targets: ['192.168.9.233:9100']
- job_name: '232-node-exporter'
static_configs:
- targets: ['192.168.9.232:9100']
- job_name: '239-node-exporter'
static_configs:
- targets: ['192.168.9.239:9200']
node_exporter
- Indicators collection server
- There are enough default collection items
- You can start, open or disable certain indicators
pushgateway
-
Introduce
the initiative to push dataprometheus server
Can be run separately on different nodes, the node monitoring is not required
-
installation
- 0.9.1 / 2019-08-01
- Download: Link
- Decompression
- run
-
Custom scripts sent to collect pushgateway
-
Installation pushgeteway
-
prometheus configuration job related pushgateway
-
Scripting data collection target host
-
The timing metric data to perform transmission pushgateway
#!/bin/bash instance_name=instance_name label=label value=123 echo "$label $value" | curl --data-binary @- http://192.168.9.233:9091/metrics/job/test/instance/$instance_name
-
-
Shortcoming
- A single point of bottleneck
- No data filtering
Custom exporter
-
Development Process
- Official website
- web HTTP service, in response to an external request GET
- Run in the background, periodically crawl trigger monitoring data locally
- Results must meet the metrics of response format prometheus
-
[Java]Spring版exporter
-
- Go language development Prometheus Exporter examples
Interface visualization
grafana
-
Introduction
of open source data mapping tools -
installation
- grafana 官 网
- Official website installation guide
- Default Port: 3000
-
Configuration
-
Adding
prometheus
data sources -
Add to
dashboard
-
Establish Dashboard
- Data source configuration
- Data source configuration
-
-
Graphical Configuration
- Visualization
- Axes
- Legend
- Thresholds & Time Regions
- Data link
- Visualization
-
Spoken arrangement
-
Alarm Configuration
-
Backup
- Export json
- save as
-
reduction
- Import json / paste json
-
Alarm
Alarm is agrafana 4.0
new feature- Nail alarm
- pageduty
practice
-
Memory Usage
- Source
node_exporter - The formula
Actual available memory = free + buffers + cached - Formula achieve
((node_memory_MemFree_bytes+node_memory_Buffers_bytes+node_memory_Cached_bytes)/node_memory_MemTotal_bytes)*100
- Source
-
Monitoring hard disk io
- Source
node_exporter - The formula
- 公式实现
函数: predict_linear(), 预测趋势
(rate(node_disk_read_bytes_total[1m])+rate(node_disk_written_bytes_total[1m]))
- Source
-
网络监控
-
数据来源
bash脚本+pushgateway -
脚本编写
采集内网流量ping延迟和丢包率instance=`hostname -f` #外网联通 lostpk=`timeout 5 ping -q -A -s 500 -W 1000 -c 100 baidu.com | grep transmitted | awk '{print $6}'` #时间 rrt=`timeout 5 ping -q -A -s 500 -W 1000 -c 100 baidu.com | grep transmitted | awk '{print $10}'` # value只允许数值型 value_lostpk=${lostpk%%\%} value_rrt=${rrt%%ms} # 通过 pushgateway 发送给prometheus echo "lostpk_$instance : $value_lostpk" echo "lostpk_$instance $value_lostpk" | curl --data-binary @- http://192.168.9.233:9091/metrics/job/network-traffic/instance/$instance echo "rrt_$instance : $value_rrt" echo "rrt_$instance $value_rrt" | curl --data-binary @- http://192.168.9.233:9091/metrics/job/network-traffic/instance/$instance
-
定时执行
资料
定时执行步骤:- 安装crontab
- 在
/etc/crontab
配置cron运行对应可执行脚本
-
查看结果
- 在prometheus查看targets有没有在线,如果没有需要到prometheus配置,记得刷新配置
- 查看配置
- 看指标,在命令行输入刚刚自定的key应该会有提示出现
lostpk
rrt
- 在prometheus查看targets有没有在线,如果没有需要到prometheus配置,记得刷新配置
-