Automated monitoring system Prometheus&Grafana

Prometheus is an all-rounder, with native support for container monitoring. Of course, monitoring traditional applications is not a piece of cake, so it supports both containers and non-containers. All monitoring systems have this process, data collection → data processing → data storage → Data display→alarm

Prometheus feature expansion directory

  • Multidimensional data model: time series data identified by measure name and key-value pairs
  • PromSQL: — a flexible query language that can complete complex queries using multi-dimensional data
  • Does not rely on distributed storage, a single server node can work directly
  • HTTP-based pull method to collect time series data
  • Pushing time series data is supported through the PushGateway component
  • Discover targets through service discovery or static configuration
  • Multiple graphics modes and dashboard support (grafana)

Prometheus composition and architecture exhibition

Insert image description here

name illustrate
Prometheus Server Collect indicators and store time series data, and provide a query interface
Push Gateway Short-term storage of indicator data, mainly used for temporary tasks
Exporters Collect existing third-party service monitoring indicators and expose metrics
Alert manager Alarm
Web UI Simple WEB console

A series of processes that integrate data collection, processing, storage, display, and alarm are already in place.

data model

Prometheus stores all data as time series, with the same metric name and label belonging to the same indicator. That is to say, after Prometheus gets the data from the data source, it will be stored in the built-in TSDB. What is stored here is the time series data. It stores The data will have a metric name. For example, if you are monitoring an nginx now, you must first give it a name. This name is also the metric name. There will also be N labels. You can understand that the name is the table name and the label is the field, so , each time series is uniquely identified by a metric name and a set of key-value pairs (also known as labels).
The format of the time series is like this,
< metricname > { < labelname >=< labelvalue >,…}
metric name refers to the metric name, and label name is the label name. There can be multiple labels. For example,
jvm_memory_max_bytes{area="heap",id="Eden Space",}
the metric name is jvm_memory_max_bytes, followed by two labels and their corresponding values. Of course, you can continue to specify labels. The more labels you specify, the more dimensions you can query.

Indicator type

type name illustrate
Counter Incrementing counter, suitable for collecting the number of interface requests
Guage A value that can be changed arbitrarily, applicable to CPU usage
Summary Similar to Histogram type

Task and instance expansion directory

Instance refers to the target target that you can grab. This will be reflected in the Prometheus configuration file. A task is a collection of instances with the same target. You can understand it as a group (for example, multiple instance machines for order service can be placed In one task, multiple instance targets are captured)

Prometheus deployment

Use docker to install, create a new directory docker-monitor, and create the file docker-compose.yml in it with the following content:

version: "3"
services:
  prometheus:
    image: prom/prometheus:v2.4.3
    container_name: 'prometheus'
    volumes:
    - ./prometheus/:/etc/prometheus/    #映射prometheus的配置文件
    - /etc/localtime:/etc/localtime:ro  #同步容器与宿主机的时间,这个非常重要,如果时间不一致,会导致prometheus抓不到数据
    ports:
    - '9090:9090'

Monitor web application performance metrics

Add a new prometheus directory under the docker-monitor directory, and create a prometheus configuration file prometheus.yml in it, with the following content:

global:  #全局配置
  scrape_interval:   15s  #全局定时任务抓取性能数据间隔

scrape_configs:  #抓取性能数据任务配置
- job_name:       'mall-order'  #抓取订单服务性能指标数据任务,一个job下可以配置多个抓紧的targets,比如订单服务多个实例机器
  scrape_interval: 10s  #每10s抓取一次
  metrics_path: '/actuator/prometheus'  #抓取的数据url
  static_configs:
  - targets: ['192.168.31.60:8844']  #抓取的服务器地址
    labels:
      application: 'mall-order-label'  #抓取任务标签
  #- targets: ['192.168.31.60:8844']  #依次向下加   
  #  labels:
  #    application: 'mall-order-label'  #抓取任务标签

- job_name: 'prometheus'  #抓取prometheus自身性能指标数据任务
  scrape_interval: 5s
  static_configs:
  - targets: ['localhost:9090']

Execute the following command in the docker-monitor directory to start prometheus

docker-compose up -d

Visit prometheus in the browser: http://192.168.31.60:9090, as shown in the figure below:
Insert image description here
Click the Status drop-down, select Targets, the interface is as follows:
Insert image description here
Here shows the two crawling tasks configured in prometheus, but the mall-order task It failed and the state is down. Next we need to configure the mall-order service to allow prometheus to capture the data.
First, you need to add pom dependency under tulingmall-order service, as follows:

<!-- 开启springboot的应用监控 -->
<dependency>
   <groupId>org.springframework.boot</groupId>
   <artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
<!-- 增加prometheus整合 -->
<dependency>
   <groupId>io.micrometer</groupId>
   <artifactId>micrometer-registry-prometheus</artifactId>
</dependency>

You also need to add the configuration to enable springboot admin monitoring in the mall-order service configuration file, as follows:

management: #开启SpringBoot Admin的监控
  endpoints:
    promethus:
      enable: true
    web:
      exposure:
        include: '*'
  endpoint:
    health:
      show-details: always

Restart the mall-order service and refresh the prometheus page, as shown below:
Insert image description here
Click the prometheus link under mall-order: http://192.168.31.60:8844/actuator/prometheus, which will open the externally exposed performance indicator data of the order service, as shown below :
Insert image description here
Take one of the indicators as an example: jvm_threads_states_threads{state="runnable",} 13.0, which represents the jvm_threads_states_threads metric, where state is equal to runnable. There are 13 data. We click the Graph link on the prometheus page, and enter the indicator query page to query related indicators
. , as follows:
Insert image description here
Enter the metric indicators into the query box, click the Execute button, as follows:
Insert image description here
Click the Graph link under the Execute button to view the icon corresponding to the indicator, as follows:
Insert image description here
The above is the indicator query interface that comes with prometheus, but it is too simple, generally we are Use the grafana graphical display tool to work with prometheus

Grafana deployment

First use docker to install grafana, and add grafana installation configuration to the above docker-compose.yml file, as shown below:

version: "3"
services:
  prometheus:
    image: prom/prometheus:v2.4.3
    container_name: 'prometheus'
    volumes:
    - ./prometheus/:/etc/prometheus/    #映射prometheus的配置文件
    - /etc/localtime:/etc/localtime:ro  #同步容器与宿主机的时间,这个非常重要,如果时间不一致,会导致prometheus抓不到数据
    ports:
    - '9090:9090'
  grafana:  
    image: grafana/grafana:5.2.4
    container_name: 'grafana'
    ports:  
    - '3000:3000'
    volumes: 
    - ./grafana/config/grafana.ini:/etc/grafana/grafana.ini  #grafana报警邮件配置
    - ./grafana/provisioning/:/etc/grafana/provisioning/  #配置grafana的prometheus数据源
    - /etc/localtime:/etc/localtime:ro
    env_file:
    - ./grafana/config.monitoring  #grafana登录配置
    depends_on:
    - prometheus  #grafana需要在prometheus之后启动

Add a new grafana directory under the docker-monitor directory and create the file config.monitoring in it with the following content:

GF_SECURITY_ADMIN_PASSWORD=password  #grafana管理界面的登录用户密码,用户名是admin
GF_USERS_ALLOW_SIGN_UP=false  #grafana管理界面是否允许注册,默认不允许

Create the provisioning directory in the grafana directory, create the datasources directory in it, and create a new file datasource.yml in the datasources directory with the following content:

# config file version
apiVersion: 1

deleteDatasources:  #如果之前存在name为Prometheus,orgId为1的数据源先删除
- name: Prometheus
  orgId: 1

datasources:  #配置Prometheus的数据源
- name: Prometheus
  type: prometheus
  access: proxy
  orgId: 1
  url: http://prometheus:9090  #在相同的docker compose下,可以直接用prometheus服务名直接访问
  basicAuth: false
  isDefault: true
  version: 1
  editable: true

Create the directory config in the grafana directory, and create the file grafana.ini in it with the following content:

#################################### SMTP / Emailing ##########################
# 配置邮件服务器
[smtp]
enabled = true
# 发件服务器
host = smtp.qq.com:465
# smtp账号
user = 135*****[email protected]
# smtp 授权码
password = fyjucfwgwjadgfdj
# 发信邮箱
from_address = 135*****[email protected]
# 发信人
from_name = yuyang

The authorization code is obtained in the qq mailbox settings
Insert image description here

Use docker compose to start grafana, visit the grafana page: http://192.168.31.60:3000, the user name is admin, the password is password, as follows: Log in and enter the home page as follows: click the
Insert image description here
plus
Insert image description here
sign on the left and import a visualization we have prepared in advance Indicator file web-dashboard.json (the files are in the course materials of this lesson, they are all operation and maintenance indicators, you can find ready-made ones online) After
Insert image description here
importing web-dashboard.json, select Prometheus on the page, click the import button and the page The display is as follows (there may not be any data):
Insert image description here

Write an example of a monitoring indicator alarm, for example, if the system reports an error of 5XX to a certain level, it will send an email notification to the alarm:

Click the Errors panel and select Edit to enter the detailed panel of the Errors indicator, as follows:
Insert image description here
Click on the picture below to add a new alarm channel:
Insert image description here
Then select the email alarm, or you can choose the webhook method to configure an http calling interface for alarm notification, which can indirectly implement all The notification method is as follows:
Insert image description here
Finally, click the save button to save
. Enter the Errors detailed page and configure the alert alarm. There are the following places that need to be configured, as shown in the figure:
Insert image description here
Insert image description here
Insert image description here
The alarm email is as follows:

Insert image description here

Guess you like

Origin blog.csdn.net/Forbidden_City/article/details/132713385