Prometheus&Grafana monitoring

Table of contents

1. Introduction to Prometheus

        1.What is Prometheus?

2. Features of Prometheus 

Easy to manage:

Monitor the internal running status of the service:

Powerful data model:

Powerful query language PromQL

Efficient

Scalable

Easy to integrate

Visualization

openness

2.1 Prometheus architecture

1.2.1 Prometheus ecosystem components

1.2.2 Architecture understanding

1. Storage computing layer

2. Collection layer

3. Application layer

2. Installation of Prometheus

2.1 Install Prometheus Server

1. Upload the installation package

2. Unzip the installation package

3. Modify the directory name

4. Modify the configuration file prometheus.yml

2.2 Install Pushgateway

1. Upload the Pushgateway installation package

2. Unzip the installation package

2.3 Install Node Exporter (selective installation)

1. Upload the installation package

2. Unzip the installation package

 2.4 Start Prometheus Server, Pushgateway and Alertmanager

1. Execute the startup command in the Prometheus Server directory

2. Execute the startup command in the Pushgateway directory

3. Start in the Alertmanager directory

2.5 Open the web page to view

3. Introduction to PromQL

3.1 Query time series

4. Integration of Prometheus and Grafana

4.1 Upload and decompress Grafana (Method 1)

4.2 Install Grafana using docker (method 2)

  1. Pull the image

2. Start the container

3. View running containers

4.3 The default account and password for accessing grafana (port 3000) is admin.

4.4 Add data source Prometheus

4.5 Manually add dashboard


1. Introduction to Prometheus

1.What is Prometheus?

        Prometheus was inspired by Google's Brogmon monitoring system (similarly, Kubernetes evolved from Google's Brog system). It was developed as open source software by former Google engineers in Soundcloud in 2012 and was released to the public in early 2015. Version. In May 2016, it became the second project to officially join the CNCF Foundation after Kubernetes, and version 1.0 was officially released in June of the same year. At the end of 2017, version 2.0 based on a new storage layer was released, which can better cooperate with container platforms and cloud platforms. Prometheus is a new generation of cloud-native monitoring system. Currently, more than 650+ contributors have participated in the research and development of Prometheus, and there are more than 120+ third-party integrations.

2. Features of Prometheus 

        Prometheus is an open source complete monitoring solution that completely subverts the testing and alarming models of traditional monitoring systems and forms a new model based on centralized rule calculation, unified analysis and alarming. Compared with traditional monitoring systems, Prometheus has the following advantages:

  • Easy to manage:

  1. The core part of Prometheus has only a single binary file and does not have any third-party dependencies (database, cache, etc.). The only thing required is a local disk, so there is no risk of potential cascading failures.
  2. Prometheus is based on the Pull model architecture and can build our monitoring system anywhere (local computer, development environment, test environment).
  3. For some complex situations, you can also use the Prometheus Service Discovery capability to dynamically manage monitoring targets.
  • Monitor the internal running status of the service:

  1. Pometheus encourages users to monitor the internal status of services. Based on Prometheus' rich Client library, users can easily add support for Prometheus to their applications, so that users can obtain the real running status of services and applications.
  • Powerful data model:

  1. All collected monitoring data are saved in the built-in time series database (TSDB) in the form of indicators . In addition to the basic indicator name, all samples also contain a set of labels used to describe the characteristics of the sample. As follows:
http_request_status{code='200',content_path='/api/path',environment='produment'} => 
[value1@timestamp1,value2@timestamp2...] 

http_request_status{code='200',content_path='/api/path2',environment='produment'} => 
[value1@timestamp1,value2@timestamp2...]

 Each time series is uniquely identified by a metric name (Metrics Name) and a set of labels (Labels). Each time series stores a series of sample values ​​in chronological order.

  1. http_request_status: Metrics Name
  2. {code='200',content_path='/api/path',environment='produment'}: Labels representing dimensions. Based on these Labels, we can easily aggregate, filter, and clip monitoring data.
  3. [value1@timestamp1,value2@timestamp2...]: Sample values ​​stored in time order.
  • Powerful query language PromQL

        Prometheus has a built-in powerful data query language PromQL. Query and aggregation of monitoring data can be realized through PromQL. At the same time, PromQL is also used in data visualization (such as Grafana) and alerting. Questions similar to the following can be easily answered with PromQL:

  1. What is the distribution range of 95% application latency in the past period?
  2. Predict what the disk space usage will be after 4 hours?
  3. What are the top 5 services with the highest CPU usage? (filter)
  • Efficient

        For monitoring systems, a large number of monitoring tasks will inevitably lead to the generation of a large amount of data. Prometheus can process this data efficiently. For a single Prometheus Server instance, it can process:

  1. Millions of monitoring indicators
  2. Process hundreds of thousands of data points per second
  • Scalable

        An independent Prometheus Sevrer can be run in each data center and each team. Prometheus' support for federated clusters allows multiple Prometheus instances to generate a logical cluster. When the workload of a single instance of Prometheus Server is too large, it can be expanded by using functional partitioning (sharding) + federation clustering (federation).

  • Easy to integrate

        Monitoring services can be quickly built using Prometheus and can be easily integrated into applications. Currently supported: client SDKs in Java, JMX, Python, Go, Ruby, .Net, Node.js and other languages. Based on these SDKs, applications can be quickly incorporated into Prometheus monitoring, or their own monitoring data collection programs can be developed. . At the same time, the monitoring data collected by these clients not only supports Prometheus, but also supports other monitoring tools such as Graphite. At the same time, Prometheus also supports integration with other monitoring systems: Graphite, Statsd, Collected, Scollector, muini, Nagios, etc. The Prometheus community also provides monitoring data collection support for a large number of third-party implementations: JMX, CloudWatch, EC2, MySQL, PostgresSQL, Haskell, Bash, SNMP, Consul, Haproxy, Mesos, Bind, CouchDB, Django, Memcached, RabbitMQ, Redis, RethinkDB , Rsyslog, etc.

  • Visualization

  1. The Prometheus UI that comes with Prometheus Server can easily query data directly and supports direct display of data in graphical form. At the same time, Prometheus also provides an independent Ruby On Rails-based Dashboard solution Promdash.
  2. The latest Grafana visualization tool has also provided complete Prometheus support, and more beautiful monitoring icons can be created based on Grafana.
  3. Based on the API provided by Prometheus, you can also implement your own monitoring visualization UI.
  • openness

        Generally speaking, when we need to monitor an application, we generally need the application to provide support for the corresponding monitoring system protocol, so the application will be bound to the selected monitoring system. In order to reduce the restrictions caused by this binding, for decision makers, you can either directly integrate the support of the monitoring system in the application, or create a separate service externally to adapt to different monitoring systems. For Prometheus, the output format of the client library using Prometheus not only supports Prometheus' formatted data, but can also output formatted data that supports other monitoring systems, such as Graphite. Therefore, you can even use Prometheus's client library to enable your application to support monitoring data collection without using Prometheus.

2.1 Prometheus architecture

1.2.1 Prometheus ecosystem components

  • Prometheus Server: the main server, responsible for collecting and storing time series data
  • Client libraries: Application code instrumentation, embedding monitoring indicators into monitored applications
  • Pushgateway: Push gateway, provides a push gateway to support short-lived jobs
  • exporter: A data ingestion component—exporter specially developed for some applications, such as HAProxy, StatsD, Graphite, etc.
  •  Alertmanager: a component specifically designed to handle alerts

1.2.2 Architecture understanding

Since Prometheus is designed as a dimensional storage model, it can be understood as an OLAP system

1. Storage computing layer

  • Prometheus Server, which contains storage engine and computing engine.
  • The Retrieval component is a fetching component that actively pulls indicator data from Pushgateway or Exporter.
  • Service discovery can dynamically discover targets to be monitored.
  • TSDB, data core storage and query.
  • HTTP server provides HTTP services to the outside world.

2. Collection layer

The collection layer is divided into two categories, one is jobs with a short life cycle, and the other is jobs with a long life cycle.

  • Short job: Directly through the API, the indicator is pushed to Pushgateway at the exit time.
  • Long jobs: The Retrieval component directly pulls data from Job or Exporter.

3. Application layer

The application layer is mainly divided into two types, one is AlertManager and the other is data visualization.

  • AlertManager is connected to Pagerduty and is a paid monitoring and alarm system. It can realize SMS alarm, call notification if there is no ack in 5 minutes, if there is still no ack, notify the manager on duty... Emial, send email...
  • data visualization

        Prometheus

        build-in WebUI

        Grafana

                Other clients developed based on API

2. Installation of Prometheus

Official website: https://prometheus.io/

Download address: https://prometheus.io/download

2.1 Install Prometheus Server

        Prometheus is written based on Golang. The compiled software package does not rely on any third-party dependencies. You only need to download the binary package of the corresponding platform, unzip it and add basic configuration to start Prometheus Server normally.

1. Upload the installation package

        Upload prometheus-2.29.1.linux-amd64.tar.gz to the /opt/software directory of the virtual machine

2. Unzip the installation package

        Unzip it to the /opt/module directory

tar -zxvf prometheus-2.29.1.linux-amd64.tar.gz -C /opt/module

3. Modify the directory name

cd /opt/module

mv prometheus-2.29.1.linux-amd64 prometheus-2.29.1

4. Modify the configuration file prometheus.yml

[root@VM-12-8-centos prometheus-2.29.1]#  vim prometheus.yml

Add configuration under scrape_configs configuration item:

scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: "prometheus"

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
      - targets: ["localhost:9090"]
     # 添加 PushGateway 监控配置
 # 添加 PushGateway 监控配置
  - job_name: 'pushgateway'
    static_configs:
    - targets: ['要监控的服务ip:端口']
      labels:
        instance: pushgateway
    # 添加 Node Exporter 监控配置
  - job_name: 'node exporter'
    static_configs:
    - targets: ['要监控的服务ip:端口']

Be sure to pay attention to the yml format! ! Because I am testing on a single machine, the targets are filled with local IP: 9090 (9090 is the default port of prometheus)

Configuration instructions:

1. Global configuration block : Controls the global configuration of the Prometheus server

  • scrape_interval: Configure the time interval for pulling data, the default is 1 minute.
  • evaluation_interval: The time interval for rule verification (generating alert), the default is 1 minute.

2. rule_files configuration block: rule configuration file

3. scrape_configs: configuration block: configure the collection target related, and the target monitored by prometheus. Prometheus's own running information can be accessed through HTTP, so Prometheus can monitor its own running data.

  • job_name: name of the monitoring job
  • static_configs: Indicates static target configuration, which is to pull data from a certain target.
  • Targets: Specify the monitoring target, which is actually where to pull the data. Prometheus will pull data from http://hadoop202:9090/metrics. Prometheus can automatically load configurations at runtime. Need to add when starting: --web.enable-lifecycle

2.2 Install Pushgateway

        Under normal circumstances, Prometheus uses pull mode to pull monitoring data from jobs or exporters that generate metrics (such as NodeExporter, which specializes in monitoring hosts). But what we want to monitor is the Flink on YARN job. It is obviously difficult for Prometheus to automatically discover the submission and completion of the job and automatically pull the data. PushGateway is a transit component. By configuring the Flink on YARN job, push the metric to PushGateway, and then Prometheus pulls it from PushGateway.

1. Upload the Pushgateway installation package

        Upload pushgateway-1.4.1.linux-amd64.tar.gz to the /opt/software directory of the virtual machine

2. Unzip the installation package

        Unzip it to the /opt/module directory

tar -zxvf pushgateway-1.4.1.linux-amd64.tar.gz -C /opt/module

        Modify directory name

cd /opt/module

mv pushgateway-1.4.1.linux-amd64 pushgateway-1.4.1

2.3 Install Node Exporter (selective installation)

        In the architectural design of Prometheus, Prometheus Server is mainly responsible for data collection, storage and external data query support, while the actual collection of monitoring sample data is completed by Exporter. So in order to be able to monitor something, such as the CPU usage of the host, we need to use Exporter. Prometheus periodically pulls monitoring sample data from the HTTP service address exposed by Exporter (usually /metrics).

         Exporter can be a relatively open concept. It can be an independently running program independent of the monitoring target, or it can be directly built into the monitoring target. As long as you can provide monitoring sample data in a standard format to Prometheus. In order to collect the host's operating indicators such as CPU, memory, disk and other information. We can use Node Exporter. Node Exporter is also written in Golang and does not have any third-party dependencies. It only needs to be downloaded and decompressed to run. The latest node exporter version of the binary package can be obtained from https://prometheus.io/download/.

1. Upload the installation package

        Upload node_exporter-1.2.2.linux-amd64.tar.gz to the /opt/software directory of the virtual machine

2. Unzip the installation package

        Unzip it to the /opt/module directory

tar -zxvf node_exporter-1.2.2.linux-amd64.tar.gz -C /opt/module

        Modify directory name

cd /opt/module

mv node_exporter-1.2.2.linux-amd64 node_exporter-1.2.2

        Start and check if successful through the page

        Execute ./node_exporter in /opt/module/node_exporter-1.2.2

        Browser input: http://xxx.xxx.xxx.xxx:9100/metrics, you can see all the monitoring data of the current host obtained by the current node exporter

 2.4 Start Prometheus Server, Pushgateway and Alertmanager

1. Execute the startup command in the Prometheus Server directory

nohup ./prometheus --config.file=prometheus.yml > ./prometheus.log 2>&1 &

2. Execute the startup command in the Pushgateway directory

nohup ./pushgateway 
--web.listen-address :9091 > ./pushgateway.log 2>&1 &

3. Start in the Alertmanager directory

nohup ./alertmanager --config.file=alertmanager.yml > ./alertmanager.log 2>&1 &

2.5 Open the web page to view

  • Browser input: http://xxx.xxx.xxx.xxx:9090/
  • Click Status and select Targets:

  •  prometheus, pushgateway and node exporter are all up, indicating that the installation is started successfully:

3. Introduction to PromQL

Prometheus uniquely defines a time series through a metric name and a corresponding set of labels. The indicator name reflects the basic identification of the monitoring sample, and the label provides multiple feature dimensions for the collected data based on this basic feature. Users can filter, aggregate, and count based on these feature dimensions to generate a new calculated time series. PromQL is Prometheus's built-in data query language , which provides support for rich query, aggregation and logical operation capabilities of time series data. And it is widely used in daily applications of Prometheus, including data query, visualization, and alarm processing. It can be said that PromQL is the foundation of all application scenarios of Prometheus. Understanding and mastering PromQL is the first lesson for getting started with Prometheus.

3.1  Query time series

        After Prometheus collects the corresponding monitoring indicator sample data through Exporter, we can query the monitoring sample data through PromQL. When we query directly using the monitoring indicator name, we can query all time series under the indicator. like:

Equivalent to:

This expression will return all time series with the metric name prometheus_http_requests_total

Use  prometheus_http_requests_total{}[5m] to return all sample data in the last 5 minutes in the queried time series.

PromQL will not be described in detail here.

4. Integration of Prometheus and Grafana

        Grafana is an open source application written in Go language. It is mainly used for visual display of large-scale indicator data. It is the most popular time series data display tool in network architecture and application analysis. It currently supports most commonly used time series databases. Download address: https://grafana.com/grafana/download

4.1 Upload and decompress Grafana (Method 1)

        Upload grafana-8.1.2.linux-amd64.tar.gz to the /opt/software/ directory and decompress:

 tar -zxvf grafana-enterprise-8.1.2.linux-amd64.tar.gz -C /opt/module/

        Open the web: http://xxx.xxx.xxx:3000, default username and password: admin

4.2 Install Grafana using docker (method 2)

  1. Pull the image

[root@VM-12-8-centos ~]# docker pull grafana/grafana

2. Start the container

docker run -d -p 3000:3000 --name=grafana -v /data/docker/grafana:/var/lib/grafana grafana/grafana

3. View running containers

[root@VM-12-8-centos ~]# docker ps
CONTAINER ID   IMAGE             COMMAND                  CREATED         STATUS         PORTS                                                  NAMES
6f766b468f19   grafana/grafana   "/run.sh"                3 minutes ago   Up 3 seconds   0.0.0.0:3000->3000/tcp, :::3000->3000/tcp              grafana
1f205439cb44   mysql:5.7         "docker-entrypoint.s…"   3 weeks ago     Up 3 weeks     0.0.0.0:3306->3306/tcp, :::3306->3306/tcp, 33060/tcp   mysql
122ba76732eb   redis             "docker-entrypoint.s…"   3 weeks ago     Up 3 weeks     0.0.0.0:6379->6379/tcp, :::6379->6379/tcp              redis

4.3 The default account and password for accessing grafana (port 3000) is admin.

4.4 Add data source Prometheus

Click Configuration, click Data Sources:

Configure prometheus address port

​Click Save & test to save the test

4.5 Manually add dashboard

Click the "+" sign in the left column and select Dashboard

To add a new dashboard, click Add an empty panel

 

Effect: 

The above is a summary of learning from Shang Silicon Valley 

Guess you like

Origin blog.csdn.net/weixin_53922163/article/details/126911141