Prometheus is an all-rounder, with native support for container monitoring. Of course, monitoring traditional applications is not a piece of cake, so it supports both containers and non-containers. All monitoring systems have this process, data collection → data processing → data storage → Data display→alarm
Prometheus feature expansion directory
- Multidimensional data model: time series data identified by measure name and key-value pairs
- PromSQL: — a flexible query language that can complete complex queries using multi-dimensional data
- Does not rely on distributed storage, a single server node can work directly
- HTTP-based pull method to collect time series data
- Pushing time series data is supported through the PushGateway component
- Discover targets through service discovery or static configuration
- Multiple graphics modes and dashboard support (grafana)
Prometheus composition and architecture exhibition
name | illustrate |
---|---|
Prometheus Server | Collect indicators and store time series data, and provide a query interface |
Push Gateway | Short-term storage of indicator data, mainly used for temporary tasks |
Exporters | Collect existing third-party service monitoring indicators and expose metrics |
Alert manager | Alarm |
Web UI | Simple WEB console |
A series of processes that integrate data collection, processing, storage, display, and alarm are already in place.
data model
Prometheus stores all data as time series, with the same metric name and label belonging to the same indicator. That is to say, after Prometheus gets the data from the data source, it will be stored in the built-in TSDB. What is stored here is the time series data. It stores The data will have a metric name. For example, if you are monitoring an nginx now, you must first give it a name. This name is also the metric name. There will also be N labels. You can understand that the name is the table name and the label is the field, so , each time series is uniquely identified by a metric name and a set of key-value pairs (also known as labels).
The format of the time series is like this,
< metricname > { < labelname >=< labelvalue >,…}
metric name refers to the metric name, and label name is the label name. There can be multiple labels. For example,
jvm_memory_max_bytes{area="heap",id="Eden Space",}
the metric name is jvm_memory_max_bytes, followed by two labels and their corresponding values. Of course, you can continue to specify labels. The more labels you specify, the more dimensions you can query.
Indicator type
type name | illustrate |
---|---|
Counter | Incrementing counter, suitable for collecting the number of interface requests |
Guage | A value that can be changed arbitrarily, applicable to CPU usage |
Summary | Similar to Histogram type |
Task and instance expansion directory
Instance refers to the target target that you can grab. This will be reflected in the Prometheus configuration file. A task is a collection of instances with the same target. You can understand it as a group (for example, multiple instance machines for order service can be placed In one task, multiple instance targets are captured)
Prometheus deployment
Use docker to install, create a new directory docker-monitor, and create the file docker-compose.yml in it with the following content:
version: "3"
services:
prometheus:
image: prom/prometheus:v2.4.3
container_name: 'prometheus'
volumes:
- ./prometheus/:/etc/prometheus/ #映射prometheus的配置文件
- /etc/localtime:/etc/localtime:ro #同步容器与宿主机的时间,这个非常重要,如果时间不一致,会导致prometheus抓不到数据
ports:
- '9090:9090'
Monitor web application performance metrics
Add a new prometheus directory under the docker-monitor directory, and create a prometheus configuration file prometheus.yml in it, with the following content:
global: #全局配置
scrape_interval: 15s #全局定时任务抓取性能数据间隔
scrape_configs: #抓取性能数据任务配置
- job_name: 'mall-order' #抓取订单服务性能指标数据任务,一个job下可以配置多个抓紧的targets,比如订单服务多个实例机器
scrape_interval: 10s #每10s抓取一次
metrics_path: '/actuator/prometheus' #抓取的数据url
static_configs:
- targets: ['192.168.31.60:8844'] #抓取的服务器地址
labels:
application: 'mall-order-label' #抓取任务标签
#- targets: ['192.168.31.60:8844'] #依次向下加
# labels:
# application: 'mall-order-label' #抓取任务标签
- job_name: 'prometheus' #抓取prometheus自身性能指标数据任务
scrape_interval: 5s
static_configs:
- targets: ['localhost:9090']
Execute the following command in the docker-monitor directory to start prometheus
docker-compose up -d
Visit prometheus in the browser: http://192.168.31.60:9090, as shown in the figure below:
Click the Status drop-down, select Targets, the interface is as follows:
Here shows the two crawling tasks configured in prometheus, but the mall-order task It failed and the state is down. Next we need to configure the mall-order service to allow prometheus to capture the data.
First, you need to add pom dependency under tulingmall-order service, as follows:
<!-- 开启springboot的应用监控 -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
<!-- 增加prometheus整合 -->
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-registry-prometheus</artifactId>
</dependency>
You also need to add the configuration to enable springboot admin monitoring in the mall-order service configuration file, as follows:
management: #开启SpringBoot Admin的监控
endpoints:
promethus:
enable: true
web:
exposure:
include: '*'
endpoint:
health:
show-details: always
Restart the mall-order service and refresh the prometheus page, as shown below:
Click the prometheus link under mall-order: http://192.168.31.60:8844/actuator/prometheus, which will open the externally exposed performance indicator data of the order service, as shown below :
Take one of the indicators as an example: jvm_threads_states_threads{state="runnable",} 13.0, which represents the jvm_threads_states_threads metric, where state is equal to runnable. There are 13 data. We click the Graph link on the prometheus page, and enter the indicator query page to query related indicators
. , as follows:
Enter the metric indicators into the query box, click the Execute button, as follows:
Click the Graph link under the Execute button to view the icon corresponding to the indicator, as follows:
The above is the indicator query interface that comes with prometheus, but it is too simple, generally we are Use the grafana graphical display tool to work with prometheus
Grafana deployment
First use docker to install grafana, and add grafana installation configuration to the above docker-compose.yml file, as shown below:
version: "3"
services:
prometheus:
image: prom/prometheus:v2.4.3
container_name: 'prometheus'
volumes:
- ./prometheus/:/etc/prometheus/ #映射prometheus的配置文件
- /etc/localtime:/etc/localtime:ro #同步容器与宿主机的时间,这个非常重要,如果时间不一致,会导致prometheus抓不到数据
ports:
- '9090:9090'
grafana:
image: grafana/grafana:5.2.4
container_name: 'grafana'
ports:
- '3000:3000'
volumes:
- ./grafana/config/grafana.ini:/etc/grafana/grafana.ini #grafana报警邮件配置
- ./grafana/provisioning/:/etc/grafana/provisioning/ #配置grafana的prometheus数据源
- /etc/localtime:/etc/localtime:ro
env_file:
- ./grafana/config.monitoring #grafana登录配置
depends_on:
- prometheus #grafana需要在prometheus之后启动
Add a new grafana directory under the docker-monitor directory and create the file config.monitoring in it with the following content:
GF_SECURITY_ADMIN_PASSWORD=password #grafana管理界面的登录用户密码,用户名是admin
GF_USERS_ALLOW_SIGN_UP=false #grafana管理界面是否允许注册,默认不允许
Create the provisioning directory in the grafana directory, create the datasources directory in it, and create a new file datasource.yml in the datasources directory with the following content:
# config file version
apiVersion: 1
deleteDatasources: #如果之前存在name为Prometheus,orgId为1的数据源先删除
- name: Prometheus
orgId: 1
datasources: #配置Prometheus的数据源
- name: Prometheus
type: prometheus
access: proxy
orgId: 1
url: http://prometheus:9090 #在相同的docker compose下,可以直接用prometheus服务名直接访问
basicAuth: false
isDefault: true
version: 1
editable: true
Create the directory config in the grafana directory, and create the file grafana.ini in it with the following content:
#################################### SMTP / Emailing ##########################
# 配置邮件服务器
[smtp]
enabled = true
# 发件服务器
host = smtp.qq.com:465
# smtp账号
user = 135*****[email protected]
# smtp 授权码
password = fyjucfwgwjadgfdj
# 发信邮箱
from_address = 135*****[email protected]
# 发信人
from_name = yuyang
The authorization code is obtained in the qq mailbox settings
Use docker compose to start grafana, visit the grafana page: http://192.168.31.60:3000, the user name is admin, the password is password, as follows: Log in and enter the home page as follows: click the
plus
sign on the left and import a visualization we have prepared in advance Indicator file web-dashboard.json (the files are in the course materials of this lesson, they are all operation and maintenance indicators, you can find ready-made ones online) After
importing web-dashboard.json, select Prometheus on the page, click the import button and the page The display is as follows (there may not be any data):
Write an example of a monitoring indicator alarm, for example, if the system reports an error of 5XX to a certain level, it will send an email notification to the alarm:
Click the Errors panel and select Edit to enter the detailed panel of the Errors indicator, as follows:
Click on the picture below to add a new alarm channel:
Then select the email alarm, or you can choose the webhook method to configure an http calling interface for alarm notification, which can indirectly implement all The notification method is as follows:
Finally, click the save button to save
. Enter the Errors detailed page and configure the alert alarm. There are the following places that need to be configured, as shown in the figure:
The alarm email is as follows: