[Cloud native] Preliminary understanding and system construction of Prometheus monitoring system

foreword


Promethues is an open source system monitoring and alarm system. It has now joined the CNCF Foundation and has become the second CNCF-hosted project after k8s. In the kubernetes container management system, prometheus is usually used for monitoring. It also supports multiple exporters to collect data and pushgateway for data reporting. The performance of Prometheus is sufficient to support tens of thousands of clusters.
 

 1. Prometheus related knowledge

 1.1 Knowledge and understanding of Prometheus

Prometheus is an open source service monitoring system and time series database, which provides a common data model and fast data collection, storage and query interfaces. Its core component, the Prometheus server, periodically pulls data from statically configured monitoring targets or targets automatically configured based on service discovery , and the newly pulled data is persisted to the storage device.

Each monitored host can provide an interface for outputting monitoring data through a dedicated exporter program. It will collect monitoring data at the target and expose an HTTP interface for Prometheus server to query. Prometheus collects data periodically through HTTP-based pull . 


If there is an alert rule, the data will be calculated according to the rules after the data is captured , and an alert will be generated if the alert condition is met, and sent to Alertmanager to complete the summary and distribution of the alert .


When the monitored target needs to actively push data, the Pushgateway component can be used to receive and temporarily store the data, and then wait for the Prometheus server to complete the data collection.

Any monitored target needs to be included in the monitoring system in advance for time-series data collection, storage, alarm and display. The monitoring target can be specified in a static form through configuration information, or it can be dynamically managed by Prometheus through the service discovery mechanism.


Prometheus can directly use API Server as a service discovery system, and then dynamically discover and monitor all monitorable objects in the cluster .
 

1.2 Features of Prometheus 


● Multidimensional data model: time-series data identified by metric names and key-value pairs Time-
   series data: data that records system and device status changes in chronological order, and each data is called a sample; server indicator data, application performance monitoring data, network data, etc. are all time-series data

● Built-in time series (Time Series) database: Prometheus; external remote storage is usually used: InfluxDB, OpenTSDB, etc.

● promQL is a flexible query language that can use multidimensional data to complete complex queries

● HTTP-based pull (pull) method to collect time series data

● Support PushGateway component to collect data at the same time

● Discover targets through static configuration or service discovery

● Support access to Grafana as a data source    
 

1.3 Features and advantages of Prometheus storage engine TSDB

 As the storage engine of Prometheus, TSDB perfectly fits the application scenario of monitoring data


● The magnitude of the stored data is very large
● Most of the time is write operation
● The write operation is almost added sequentially, most of the time the data is sorted by time ● The data is
rarely updated, most of the data will be written to the database after a few seconds or minutes after the data is collected ● The
deletion operation is generally a block deletion, select the starting historical time and specify the subsequent block. It is rare to delete data at a certain time or separate random time
● The basic data is large, generally exceeding the memory size. Generally, only a small part of them is selected and is irregular, and the cache has almost no effect
● Read operations are very typical sequential reads in ascending or descending order
● Highly concurrent read operations are very common


1.4 Ecological components of Prometheus 

 Prometheus is responsible for the collection and storage of time series index data, but the functions of data analysis, aggregation, visual display and alarm are not handled by Prometheus Server. The Prometheus ecosystem contains multiple components, some of which are optional:


(1 ) The core component of the Prometheus server service uses the pull method to collect monitoring data and transmits it through the http protocol; stores time series data; generates alarm notifications based on "alarm rules".
 Prometheus server consists of three parts: Retrieval, Storage, PromQL

Retrieval: Responsible for capturing monitoring indicator data on active target hosts.
 Storage: storage, mainly to store the collected data in the disk. The default is 15 days.
 PromQL: is a query language module provided by Prometheus.    
(2) Client Library 
 client library, the purpose of which is to provide a convenient development path for those applications that expect to provide Instrumentation functions natively, and is used for the built-in measurement system based on the application.

(3) Exporters 
 index exposer, which is responsible for collecting performance index data of applications or services that do not support built-in Instrumentation, and obtains it for Prometheus Server through the HTTP interface.
In other words, the Exporter is responsible for collecting and aggregating data in the original format from the target application, and converting or aggregating it into Prometheus format indicators for external exposure.

Commonly used Exporters: 

Node-Exporter: Used to collect physical indicator status data of server nodes, such as average load, CPU, memory, disk, network and other resource information indicator data, which needs to be deployed to all computing nodes. Detailed introduction to metrics: https://github.com/prometheus/node_exporter
mysqld-exporter/nginx-exporter
Kube-State-Metrics : An exporter that collects K8S resource data for Prometheus, and collects status indicator data of resource objects in the kubernetes cluster by monitoring the APIServer, such as pod, deployment, service, and so on. At the same time, it also provides its own data, mainly statistics on the number of resources collected and the number of abnormalities that occurred in the collection. It should be noted that kube-state-metrics simply provides a metric data, and does not store these metric data, so Prometheus can be used to capture and store these data. The main focus is some metadata related to the business, such as Deployment, Pod, replica status, etc.; how many replicas are scheduled? How many are available now? How many Pods are running/stopped/terminated? How many times was the pod restarted? How many jobs are running.
cAdvisor : Used to monitor the information of resources used inside the container, such as CPU, memory, network I/O, and disk I/O.
blackbox-exporter : Monitor the survivability of business containers.
 
(4) Service Discovery 
Service discovery, used to dynamically discover targets to be monitored, Prometheus supports multiple service discovery mechanisms: files, DNS, Consul, Kubernetes, etc. Service discovery can use the interface provided by a third party. Prometheus can query the list of targets that need to be monitored, and then poll these targets to obtain monitoring data. This component is currently supported by Prometheus Server 

 
(5) Alertmanager

is an independent alarm module. After receiving the "alert notification" from the Prometheus server, it will deduplicate, group, and route to the corresponding receiver to send an alarm. The common receiving methods are: email, DingTalk, WeChat, etc.
Prometheus Server is only responsible for generating alert indications, and the specific alert behavior is in charge of another independent application, AlertManager; alert indications are periodically calculated and generated by Prometheus Server based on alert rules provided by users, and after Alertmanager receives alert indications from Prometheus Server, it sends alert information to alert receivers based on user-defined alert routes.

 

(6) Pushgateway 
is similar to a transfer station. The server side of Prometheus can only use the pull method to pull data, but some nodes can only use the push method to push data for some reasons, so it is used to receive the data from the push and expose it to the transfer station of the Prometheus server.
It can be understood that the target host can report the data of short-term tasks to Pushgateway, and then the Prometheus server will pull the data from Pushgateway uniformly.

 

(7) Grafana 
is a cross-platform open source measurement analysis and visualization tool, which can visualize the collected data and notify the alarm receiver in time. Its official library has a wealth of dashboard plugins. 

 

1.5 Prometheus working mode 

Prometheus Server obtains the target (Target) to be monitored based on the service discovery (Service Discovery) mechanism or static configuration, and collects (Scrape) indicator data through the indicator exporter on each target;


Prometheus Server has a built-in file-based time series storage to persistently store indicator data. Users can use the PromQL interface to retrieve data, and can also send alarm requests to Alertmanager on demand to complete the sending of alarm content;


The life cycle of some short-running jobs is too short to effectively supply the necessary indicator data to the server side. They generally use push (Push) to output indicator data. Prometheus receives the pushed data by means of Pushgateway, and then the server side grabs it
 

1.6 Prometheus workflow 


 (1) Prometheus uses Prometheus Server as the core to collect and store time series data. PrometheusServer pulls indicator data from monitoring targets through pull, or pulls collected data to Prometheus server through pushgateway.
(2) The Prometheus server stores the collected monitoring indicator data in the local HDD/SSD through TSDB.
(3) The monitoring indicator data collected by Prometheus is stored in time series, and the triggered alarm notification is sent to Alertmanager by configuring alarm rules.
(4) Alertmanager sends alarms to email, DingTalk or WeChat, etc. by configuring the alarm receiver.
(5) The Web UI interface that comes with Prometheus provides the PromQL query language, which can query monitoring data.
(6) Grafana can be connected to the Prometheus data source to display the monitoring data in a graphical form

 

1.7 Limitations of Prometheus 


Prometheus is an indicator monitoring system that is not suitable for storing events and logs; it shows more trend monitoring than accurate data; Prometheus believes that only the most recent monitoring data needs to be queried, and its local storage is only designed to save short-term (for example, one month) data, so it does not support the storage of a large amount of historical data; if you need to store long-term historical data, it is recommended to store the data in systems such as InfluxDB or OpenTSDB based on remote storage mechanisms; High availability and federated clusters of Prometheus clusters can be realized based on
Thanos
.

 

 2. How to choose between promethues and zabbix

2.1 First understand the background of the two monitoring systems 


(1) Development background of zabbix 
Zabbix is ​​an enterprise-level open source monitoring product developed based on C language. It can be used for target object monitoring such as servers, operating systems, networks, and applications. In terms of monitoring and data collection methods, it can support multiple methods such as zabbix agent, SNMP, ping, and port monitoring .

Zabbix is ​​a large and comprehensive system with a complete web interface and integrates functions such as visualization and alarm. Users can complete most of the operations on the interface, which makes it easy to learn and can be quickly mastered. But at the same time, the disadvantage of high integration is that customization is very difficult and cannot be expanded well .

(2) Development background of Prometheus 
Promehteus is a very popular monitoring system in recent years. It is developed using the go language, and its design idea comes from Google's Borgmom (a system for monitoring container platforms). In terms of monitoring, in addition to supporting traditional servers, networks, operating systems and other objects, it also naturally supports cloud-native products such as Kubernetes and Docker, which makes it shine in the cloud-native era .

Compared with Zabbix's large and comprehensive concept, Prometheus is much simpler. The product only focuses on monitoring functions and provides a simple web interface for users to query, while the functions of visualization and alarming are handed over to third-party products such as Grafana and Alertmanager . The simplicity of the function makes Prometheus small and flexible, and it can be deployed and upgraded very conveniently, and it can be customized with third-party open source products.

The operation of Prometheus needs to be realized . You must also master the built-in PromQL language. The learning threshold will be high and it will be difficult to get started .
 

2.2 Functional comparison between the two 


(1) Indicator collection method 
Zabbix

Zabbix is ​​divided into two parts, the server side and the agent side. The agent is used to deploy on the target machine and provide data indicators to the server , and communicate between them based on the TCP protocol .

The agent supports passive polling and active push modes . In the passive mode, the server regularly initiates requests to the agent, and the agent will process the request and return the value to the server. Under active push, the agent sends results to the server at regular intervals.

Prometheus

Prometheus collects data based on the client , and the server interacts with the client regularly, and obtains relevant monitoring indicators through pull.

Prometheus communicates based on HTTP, which makes it easy to integrate with other tools. Any component can access monitoring as long as it provides the corresponding HTTP interface. At present, many open source products provide support for Prometheus, and can provide indicators according to the formats they support, such as Kubernetes and Harbor. If this is not possible, there are many libraries that can help export existing indicators. These libraries are called exporters , and commonly used ones are node exporter, mysql exporter, redis exporter, etc.

(2) Data storage 
Zabbix

Zabbix uses an external database to store data. Currently supported databases include MySQL, PostgreSQL, Oracle, etc. In terms of stored data types, Zabbix supports text, log and other formats in addition to the key-value format.

Prometheus

Prometheus stores data in the built-in time series database (TSDB) , which saves a lot of storage space compared to relational databases, has higher processing efficiency, and can quickly search for complex results.

However, the native TSDB is not very friendly to the storage of large amounts of data, so by default Pormetheus will only save 15 days of data. If longer-term data storage is required, remote storage mode can be configured to use third-party storage media to save data indicators.

It should be noted that Prometheus only supports storing time series values .

(3) Query Performance
Zabbix

Zabbix has weak query functions, and can only do some limited operations through the web interface, or directly query the database using SQL.

Prometheus

Prometheus is much more powerful than Zabbix in querying . Prometheus provides its own query language PromQL, which is very flexible, concise and powerful. It can cooperate with functions and operators to perform operations such as calculation, filtering, and grouping, and supports regular expressions.

The Prometheus web interface can perform expression queries, and the query results are displayed in the form of graphs or tabular data.

(4) Alarm function 
Zabbix

As in the case of visualization, Zabbix has built-in alerting capabilities and supports sending across multiple media. Zabbix alert system allows managing events in different ways: sending messages, executing remote commands, escalating problems according to service level, etc.

Promtheus

For alerting, Prometehus needs to be used in conjunction with Alertmanager . Because the Prometheus alert is divided into two parts, the alert rule is defined on the Prometheus Server side, and when the rule is triggered, it will be sent to the Alertmanager and sent to the corresponding recipient.

Alertmanager can manage alarm information, has functions such as silence, grouping, aggregation, etc., and supports email, IM and other media to send.
 

2.3 Summary 


First of all, it is certain that both zabbix and Prometheus are excellent monitoring systems, but in terms of selection, the appropriate monitoring system can be determined according to actual needs.

Zabbix has a longer production time. Based on the development of C language, Zabbix is ​​much less difficult to get started. It has excellent monitoring capabilities for traditional servers, systems, networks, etc., and can monitor according to custom templates and add alarm monitoring mechanisms. It is suitable for traditional enterprises that do not have high requirements for monitoring and have weak overall technical capabilities . However, the development time of zabbix in cloud-native deployment is not as good as Prometheus. For cloud-native, the use and efficiency of zabbix are not so ideal .

Prometheus is the second product of the CNCF Foundation. It can be said that it is like a brother to k8s. It supports Kubernetes and other container products very friendly and has a high degree of customization. However, it will be more difficult to get started , so it is more suitable for Internet companies with better technical capabilities and complex monitoring requirements . To use Prometheus well, the mastery of promQL data language is essential.
 

 3. Deployment and construction of prometheus

 (1) Upload prometheus-2.35.0.linux-amd64.tar.gz and decompress it

mkdir -p /opt/prometheus
cd /opt/prometheus
tar xf prometheus-2.35.0.linux-amd64.tar.gz
mv prometheus-2.35.0.linux-amd64 /usr/local/prometheus
 
cat /usr/local/prometheus/prometheus.yml | grep -v "^#"
global:                    #用于prometheus的全局配置,比如采集间隔,抓取超时时间等
  scrape_interval: 15s            #采集目标主机监控数据的时间间隔,默认为1m
  evaluation_interval: 15s         #触发告警生成alert的时间间隔,默认是1m
  # scrape_timeout is set to the global default (10s).
  scrape_timeout: 10s            #数据采集超时时间,默认10s
 
alerting:                #用于alertmanager实例的配置,支持静态配置和动态服务发现的机制
  alertmanagers:
    - static_configs:
        - targets:
          # - alertmanager:9093
 
rule_files:                #用于加载告警规则相关的文件路径的配置,可以使用文件名通配机制
  # - "first_rules.yml"
  # - "second_rules.yml"
 
scrape_configs:            #用于采集时序数据源的配置
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: "prometheus"        #每个被监控实例的集合用job_name命名,支持静态配置(static_configs)和动态服务发现的机制(*_sd_configs)
 
    # metrics_path defaults to '/metrics'
    metrics_path: '/metrics'    #指标数据采集路径,默认为 /metrics
    # scheme defaults to 'http'.
 
    static_configs:                #静态目标配置,固定从某个target拉取数据
      - targets: ["localhost:9090"]
 
 

(2) Add Prometheus to the system service 

cat > /usr/lib/systemd/system/prometheus.service <<'EOF'
[Unit]
Description=Prometheus Server
Documentation=https://prometheus.io
After=network.target
 
[Service]
Type=simple
ExecStart=/usr/local/prometheus/prometheus \
--config.file=/usr/local/prometheus/prometheus.yml \
--storage.tsdb.path=/usr/local/prometheus/data/ \
--storage.tsdb.retention=15d \
--web.enable-lifecycle
  
ExecReload=/bin/kill -HUP $MAINPID
Restart=on-failure
 
[Install]
WantedBy=multi-user.target
EOF
 
 
systemctl start prometheus
systemctl enable prometheus
 
netstat -natp | grep :9090

(3) Perform interface access

http://192.168.73.108:9090 (host IP), access to Prometheus's Web UI interface 

 

Visit: http://192.168.50.20:9090/metrics to view the data collection information of prometheus

4. Deploy Exporters and add monitoring hosts

Below we take the common service application, the noode node in k8s (take the one master and two worker nodes deployed by kubeadmin as an example), nginx and mysql as examples, and add them to the monitoring of Prometheus 

4.1 Deploy Node Exporter to monitor system-level indicators (for each node node)

(1) Upload node_exporter-1.3.1.linux-amd64.tar.gz for decompression

mkdir -p /opt/prometheus
cd /opt/prometheus
tar xf node_exporter-1.3.1.linux-amd64.tar.gz
mv node_exporter-1.3.1.linux-amd64/node_exporter /usr/local/bin

(2) Add node_exporter to the system service

cat > /usr/lib/systemd/system/node_exporter.service <<'EOF'
[Unit]
Description=node_exporter
Documentation=https://prometheus.io/
After=network.target
 
[Service]
Type=simple
ExecStart=/usr/local/bin/node_exporter \
--collector.ntp \
--collector.mountstats \
--collector.systemd \
--collector.tcpstat
 
ExecReload=/bin/kill -HUP $MAINPID
Restart=on-failure
 
[Install]
WantedBy=multi-user.target
EOF
 
(3)启动 
systemctl start node_exporter
systemctl enable node_exporter
 
netstat -natp | grep :9100

 (3) Modify the prometheus configuration file and add it to prometheus monitoring

vim /usr/local/prometheus/prometheus.yml
#Add the following content at the end
  - job_name: nodes
    metrics_path: "/metrics"
    static_configs:
    - targets:
      - 192.168.50.20:9100
      - 192.168.50.21:9100
      - 192.168.73:50.22 :9100
      labels:
        service: kubernetes
        
(5) Reload the configuration
curl -X POST http://192.168.50.20:9090/-/reload or systemctl reload prometheus
browser to view the Status -> Targets of the Prometheus page

 

4.2 Monitoring MySQL Configuration Example 

Operate on the MySQL server

(1) Upload mysqld_exporter-0.14.0.linux-amd64.tar.gz and decompress it

mkdir /opt/prometheus
cd /opt/prometheus
tar xf mysqld_exporter-0.14.0.linux-amd64.tar.gz
mv mysqld_exporter-0.14.0.linux-amd64/mysqld_exporter /usr/local/bin/

(2) Add system services 

cat > /usr/lib/systemd/system/mysqld_exporter.service <<'EOF'
[Unit]
Description=mysqld_exporter
Documentation=https://prometheus.io/
After=network.target
 
[Service]
Type=simple
ExecStart=/usr/local/bin/mysqld_exporter --config.my-cnf=/etc/my.cnf
 
ExecReload=/bin/kill -HUP $MAINPID
Restart=on-failure
 
[Install]
WantedBy=multi-user.target
EOF

 (3) Modify the MySQL configuration file and authorize the exporter user

vim /etc/my.cnf
[client]
......
host=localhost
user=exporter
password=abc123
 
########授权 exporter 用户
mysql -uroot -pabc123
GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO 'exporter'@'localhost' IDENTIFIED BY 'abc123';

(4) Restart the mysqld service and exporter service 

systemctl restart mysqld
systemctl start mysqld_exporter
systemctl enable mysqld_exporte

 (5) Add mysqld monitoring items on the Prometheus host 

vim /usr/local/prometheus/prometheus.yml
#Add the following content at the end
  - job_name: mysqld
    metrics_path: "/metrics"
    static_configs:
    - targets:
          - 192.168.73.109:9104
      labels:
        service: mysqld
 
curl -X POST http://192.168.73.108: 9090/-/reload or systemctl reload prometheus
browser to view the Status -> Targets of the Prometheus page

4.3 Monitoring Nginx Configuration Example 

Prepend operations on the Nginx server:

Download nginx-exporter address: https://github.com/hnlq715/nginx-vts-exporter/releases/download/v0.10.3/nginx-vts-exporter-0.10.3.linux-amd64.tar.gz
Download nginx address: http://nginx.org/download/
Download nginx plugin address: https://github.com/vozlt/nginx-module-vts/tags

(1) Unzip related plug-ins of nginx

cd /opt/prometheus
tar xf nginx-module-vts-0.1.18.tar.gz
mv nginx-module-vts-0.1.18 /usr/local/nginx-module-vts

(2) Compile and install nginx from source code and set exposure monitoring items 

yum -y install pcre-devel zlib-devel openssl-devel gcc gcc-c++ make
useradd -M -s /sbin/nologin nginx
 
cd /opt/prometheus
tar xf nginx-1.18.0.tar.gz
 
cd nginx-1.18.0/
./configure --prefix=/usr/local/nginx \
--user=nginx \
--group=nginx \
--with-http_stub_status_module \
--with-http_ssl_module \
--add-module=/usr/local/nginx-module-vts
 
make -j 2 & make install
 
(3)修改 nginx 配置文件,启动 nginx
vim /usr/local/nginx/conf/nginx.conf
http {
    vhost_traffic_status_zone;                    #添加
    vhost_traffic_status_filter_by_host on;        #添加,开启此功能,在 Nginx 配置有多个 server_name 的情况下,会根据不同的 server_name 进行流量的统计,否则默认会把流量全部计算到第一个 server_name 上
    ......
    server {     ......     }     server {         vhost_traffic_status off; #In the server area where you do not want to count traffic, you can disable vhost_traffic_status         listen 8080;         allow 127.0.0.1;         allow 192.168.73.110; #Set as the ip address of prometheus         location /nginx-status {             stub_status on;             access_log off;         }         location /status {             vhost_traffic_status_display;             vhost_traffic_status_display_format html;         } }     }


    





 




 






 
#If nginx does not specify server_name or does not need to be monitored on the server, it is recommended to disable the statistical monitoring function on this vhost. Otherwise, domain name monitoring information such as 127.0.0.1 and hostname will appear.
 
ln -s /usr/local/nginx/sbin/nginx /usr/local/sbin/
nginx -t
 
cat > /lib/systemd/system/nginx.service <<'EOF'
[Unit]
Description=nginx
After=network.target
 
[Service]
Type=forking
PIDFile=/usr/local/nginx/logs/nginx.pid
ExecStart=/usr/local/ nginx/sbin/nginx
ExecReload=/bin/kill -s HUP $MAINPID
ExecStop=/bin/kill -s QUIT $MAINPID
PrivateTmp=true
 
[Install]
WantedBy=multi-user.target
EOF
 
systemctl start nginx
systemctl enable nginx
 
 
Browser access: http://192.168.73.110:8080/status, you can see the page information of Nginx Vhost Traffic Status
 

 (3) Install the exporter plugin on the nginx host 

cd /opt/
tar -zxvf nginx-vts-exporter-0.10.3.linux-amd64.tar.gz
mv nginx-vts-exporter-0.10.3.linux-amd64/nginx-vts-exporter /usr/local/bin/
 
cat > /usr/lib/systemd/system/nginx-exporter.service <<'EOF'
[Unit]
Description=nginx-exporter
Documentation=https://prometheus.io/
After=network.target
 
[Service]
Type=simple
ExecStart=/usr/local/bin/nginx-vts-exporter -nginx.scrape_uri=http://localhost:8080/status/format/json
 
ExecReload=/bin/kill -HUP $MAINPID
Restart=on-failure
 
[Install]
WantedBy=multi-user.target
EOF
 
systemctl start nginx-exporter
systemctl enable nginx-exporter
 
netstat -natp | grep :9913

4) Add nginx monitoring items on the Prometheus host 

#########Modify the prometheus configuration file and add it to prometheus monitoring
vim /usr/local/prometheus/prometheus.yml
#Add the following content at the end
  - job_name: nginx
    metrics_path: "/metrics"
    static_configs:
    - targets:
          - 192.168.73.110:9913
      labels:
        service: ng inx
        
################ reload configuration
curl -X POST http://192.168.73.108:9090/-/reload or systemctl reload prometheus
browser to view the Status -> Targets of the Prometheus page

Guess you like

Origin blog.csdn.net/zhangchang3/article/details/131796896
Recommended