Prometheus principle and secondary development

Prometheus functions, architecture, components, configuration

1 Introduction

1.1 Introduction

Promethues is an open source system monitoring and alarm framework, inspired by Google's borgmon, created by SoundCoud employees in 2012, officially released in 2015, and officially joined the Cloud Native Computing Foundation in 2016

1.2 Monitoring purpose

Trend analysis: Continuous collection and statistics of monitoring indicators, and trend analysis.
Use the disk space growth rate to judge when to expand the disk capacity.
Comparative analysis: track and compare the operating resource usage or performance data of different versions of the system.
Alarm: Even if a fault is found, respond to the problem quickly to avoid a major impact on the business.
Fault analysis and location: analyze and troubleshoot problems, and find the root cause of the problem.
Data visualization: Visually display the system running status, resource usage, service status, etc. through the dashboard.

1.3 Features

Multidimensional data model: use metric name and key-value pairs to distinguish time series data.

Flexible and powerful query language support: PromQL. Mathematical and logical operations can be performed on metrics.
Single-node deployment does not depend on distributed storage.
Data collection is based on the HTTP protocol, using the Pull mode to obtain data.
Scan target and alarm management support static configuration and dynamic service discovery.
You can use PushGateway to push time series data to Prometheus Server.
The built-in Dashboard is used for basic function viewing and management.

1.4 Components

Prometheus Server: data collection and storage of time series data,
alarm analysis of data
client libaries: provide application data collection function
push gateway: support short-term jobs, push data from jobs to push gateway, and then expose the gateway to Prometheus Server
Exporters: expose third parties Service data collection, non-intrusive
Alertmanger: perform functions such as deduplication, grouping, and sending of alarms generated by the Prometheus server

1.5 Architecture

Process: Prometheus discovers the collection target through Service discovery,
uses HTTP request to obtain the collection target index data, and stores it persistently

Regularly run local rules for data aggregation to generate new time series data or alarms

Send the alarm to the alarmmanager, and the alarmmanager performs grouping, deduplication and other functions to send alarm notifications or recovery notifications

The HTTP API provided by Prometheus Server can be called to obtain the collected data for visualization 

1.6 Usage scenarios:

Prometheus can record time series data of any pure data

Commonly used for machine-centric monitoring and highly dynamic service architecture monitoring
Not suitable for 100% accuracy requirements, such as request billing

Prometheus environment construction and use

2 Deployment and installation

Building components Prometheus Server, Alertmanger, node_exporter, mysqld_exporter

Install process monitoring systemd supervisord

Prometheus environment construction and use

3 Basic concepts

3.1 Data model:

Prometheus will also generate temporary time series for the query results.
Each time series is uniquely determined by the metric name and its label key-value pair set
format: <metric name>{label name> = <lable value>, ...} samples [millisecond]
Indicator name: format [a-zA-Z_:][a-zA-Z0-9_:] *
label name: [a-zA-Z_][a-zA-Z0-9_] *
label value: any Unicode value in the format of
sampled data: float64 format

3.2 Indicator Types

The Prometheus client library provides four types of core degrees. Prometheus Server does not use the type information
Counter: counting class
used in the case of monotonically increasing or monotonically decreasing cumulative indicators, which can only be automatically reset to zero after the target is restarted.
For example -- service request processing Quantity
The number of completed tasks
The number of errors Guage: Measurement
class
When using data that can be increased or decreased
For example--current memory usage The number of concurrent
requests For example -- delay time, response size Contains the statistics of each bucket, the total statistics and the total statistics: <basename>_bucket{le=""} <basename>_sum <basename>_count Summary: The summary class is similar to the histogram Figure, statistics on percentiles on the client side For example -- delay time, response size Contains percentile information of each bucket, total statistical information and total statistical quantity: <basename>{quantile=""} <basename> _sum <basename>_count











3.3 Assignments and examples

Example: instance
refers to the endpoint of data collection
Job: job
refers to a set of instances with the same purpose
Prometheus automatically adds job and instance tag names and values ​​to time series data when collecting data: job
instance

Generate collection indicators for each instance Prometheus: up-- online or not

scrape_duration_seconds--collect continuous events
scrape_samples_post_metric_relabeling--reset the number of label samples
scrape_samples_scraped--the number of exposed samples
scrape_series_added--approximate number of new time series

4 configuration

Command line parameter -- cannot be modified after startup
Configuration file: specified by command line parameter --config.file, the default is prometheus.yml
can be modified and reloaded through signal or API: kill -s SIGHUP $PID
api /-/reload You need to enable the
global configuration rule file through the command line parameter --web.enable-lifecycle
: public configuration

illustrate

groups configuration rule group
name rule group name
interval rule running time interval
rules rule list

record: used to calculate time series data to generate new time series data for storage

Description record new time series database index name
expr
promql expression
used to query time series data
labels labels


alert is used to generate an alert


example

collection list


Rules file check


reload configuration


Alarm indicators

5 PromQL

Prometheus provides PromQL function for time series data query and statistical
expression data type:

Timely vector--contains only one set of time-series data and sample values ​​for each query result set single item
Range vector--contains multiple sets of time-series data and sample values ​​for each query result
set single item Scalar--floating point data
string --Data in characters

Time vector query
Range vector query
Offset Sub
query
Operation
function


6 HTTPAPI

Inquire:

Time vector query
Range vector query
Indicator query Query label Query tag value
Query collection target Query rule Query alarm Query deep set target metadata Query metadata Query alarm manager Query configuration information Query command line configuration Query runtime information Query compilation information Query TSDB state












Administrator: command line parameters through --web.enable-admin-api start
snapshot path /api/v1/admin/tsdb/snapshot
request method POST PUT
query parameter skip head skip header data

delete timing path /api/v1/admin/ tsdb/delete series
request method Post PUT
query parameter match[]
start
end
Description The data is not actually deleted from the hard disk, and will be cleaned up later during compression

clean disk data


Life cycle management: Command line parameters are started via --web.enable-lifecycle
Health status path /-/healthy
request method GET
preparation status path ///ready
request method GET
reload configuration path /-/reload
request method PUT Post
exit Path /-/quit
request method PUT POST

7 joint mode


8 Alarm management

  

Prometheus's main code interpretation and service discovery and control

go-demo/cmdb at main · yunixiangfeng/go-demo · GitHub

server development

prometheus model management

Prometheus node check and delete

Prometheus job addition, deletion, modification and query

Prometheus target addition, deletion, modification and query

prometheus terminal registration

Prometheus configuration acquisition

Promagent framework construction

promagent registration

Prometheus configuration file update

Prometheus client introduction

mysql_exporter development

Alarm notification

Alarm problem handling

CMDB management Prometheus Target

Prometheus target addition, deletion, modification and query

prometheus terminal registration

http://localhost:8080/v1/prometheus/register

Prometheus configuration acquisition

http://localhost:8080/v1/prometheus/config?uuid=xyz

Promagent framework construction

promagent registration

prometheus exporter development

mysql_exporter development

Features for service discovery

Alarm function

exporter collects data

day16

readme

exported_instance

push gateway
prometheus cascading

Rules
    generate alerts
    for generating new time series


operation => immediate (instantaneous) vector

on(url)

request {url=abc, code=200}
request {url=abc, code=400}
request {url=abc1, code=200}
request {url=abc1, code=400}

request {url=abc} 1
request {url=abc} 2
request {url=abc1} 3
request {url=abc1} 4

+

request_total {url=abc} 10
request_total {url=abc1} 11

                on()
left + right group_left()
             group_right()

left join
right join
inner join

Set operation
count
sum
min
max
avg
topk
bottomk

to develop


1. Develop application API, configuration, and PromQL
    service discovery based on Prometheus => CMDB Prometheus Target configuration
    Graphical management => Visual
    alarm rule management
    alarmmanager => CMDB
         email
         SMS

    Multiple Prometheus


    Configuration file operation yml, json
    API => http client

    file_sd_configs:


    1. Service Discovery

    a. prometheus CRUD
        , name, ip
    b. job management job CRUD
        , prometheus, jobname, scheme, metrics, basic_auth, tls
    c. target
        target

    DB

    prometheus => config

    Agent

    systemctl reload exec
    http client api

    2. API to obtain data
        display data, graphical, js (js + host resource monitoring)

    3. Exporter
        Agent => prometheus client => statistical
                 collection target for communication (socket)

    4. Alarm => Db storage
              notification SMS => Tencent Cloud


2. Prometheus
    service discovery

Data collection method

Prometheus component diagram

server development

prometheus model management

Prometheus job addition, deletion, modification and query

Prometheus target addition, deletion, modification and query

prometheus terminal registration

Prometheus configuration acquisition

Promagent framework construction

promagent registration

Prometheus configuration file update

Prometheus client introduction

mysql_exporter development

Alarm notification

Alarm problem handling

go-demo/client at main · yunixiangfeng/go-demo · GitHub

prometheus client-go

https://github.com/prometheus/client_golang

prometheus package - github.com/prometheus/client_golang/prometheus - Go Packages

prometheus exporter development

https://github.com/yunixiangfeng/go-demo/tree/main/mysql_exporter
 

day17

network request

readme 

1. prometheus
    Node
        => query, delete

        Agent => API register => does not exist to add
                                 exists to update

        => 属性
            uuid
            hostname
            addr https://host:port/
            #username
            #password
            created_at
            updated_at
            deleted_at

    Job
        => CRUD
        => attribute
            task ID [a-zA-Z][0-9a-zA-Z_]
            remarks
            Node node_id
            created_at
            updated_at
            deleted_at

    Target
        => add, delete, modify, check
        name
        remark
        Addr
        Job
        created_at
        updated_at
        deleted_at

Questions:
    1. prometheus.yaml => modify => prometheus reload
        => api => enable lifecycle nginx => req
            Authorization: Basic base64(name:password)
        => systemctl reload =>
            systemctl is required
    2. job: other Configuration, target and other configurations are added according to your own business needs
    3. Rewrite the problem every interval and reload every time when the content changes, load
        prometheus:
            job: job1 job2 No change without reload
        target:
            No change is not allowed to write to the file

        jobs targets sort sort json.Marshal => []byte => write

day 18

1. prometheus client_golang
    client
        a. prometheus client(java, go, php, python)
        b. http exposure => processor
2. mysqld_exporter
3. web basic auth
    a. The server responds to tell the browser that authentication is required => popup username and password Input box
        401
        www-authenticate: basic realm="my site"
    b. Authorization: Basic xx => Read the message and verify it
        Basic => xxx => Base64 decoding: split, verify Username: password (plain text)

        Username: password => configuration file
            hash => bcrypt/md5
            md5

    cmdb user login
        form
            user/password (plain text)
        db: user/password (bcrypt=>hash)

4. The application monitors
    the total number of http requests Counter
    http url response time statistics histogram/summary
    http The number of occurrences of each response status code Counter+labels

    Filter:

5. alertmanager
6. Alert management
    a. Alert reception & storage
        authentication:
            Authorization: Token xxxxxx
            basic auth: Basic xxxx
            bearer auth: Bearer xxxxx
        webhook
            API => alertmanager => json => db

            id => labels generated (same labels generate the same id)
            1 point => instance 1.1.1.1:9999 offline
                restored
            10 points => instance 1.1.1.1:9999 offline id

            a => a
            a,b => a,b

        Notification: groupkey is grouped into units Notification
    b. Alarm query
    c. Page
7. Notification
    a. email
    b. Tencent sms

Development function => meet business needs + technology (go, beego)
           business logic + technology
           1. Repeat manual work (design) => automation (development)
           2. Requirements => requirements analysis => design => development


Monitoring:
1. Availability
2. Delayed
    request consumption time
    Operation usage time
3. Error times
4. Capacity
    Current request number/total request
    number Current connection number/Total connection number

What does mysql_exporter monitor?
mysql => exporter =>
    monitoring object api => get indicator information (calculation)
    sql query =>
                show global status

mysql availability
    operation failed
        select 1;
        ping
slow query times show global status where variable_name='Slow_queries';

capacity 

qps:

counter

show global status where variable_name='Queries';

tqs:

insert, delete,update *

com_insert

com_delete

com_update

com_select

com_replace

connect:

show global status where variable_name='Thread_running';

show global status where variable_name='max_connections';

Alert => Prometheus Association (Node)

promagent => prometheus.yaml labels => uuid => xx

flow:

show global status where variable_name='Bytes_received'

show global status where variable_name='Bytes_send'


json
form  => controller => parseForm/unmarshal => object => server.insert/


                      unmarshal => ALertForm


groupby
alertname

a => a
    labels =>

b => a,b
    a => labels =>
    b =>

Paging
    pageSize => 5
    pageNum
    query condition


    Offset Limit
    pageNum => 1,2,3
    (pageNum) - 1 * pageSize
    limit PageSize

    1 = 0 limit 5
    2 => 5 limit 5

querytable.SetCond(cond).Offset().limit()
querytable.SetCond(conf).Count()

Page
    datas
    Paging related data page URL parameters

prometheus process

day19

1. Alarmmanager alarm notification
    a. SMS Tencent
    b. email smtp alarm notification in units of groups
    alarm processing in units of alarms

Mail and SMS sending
    mail:
        smtp server smtp protocol sending mail
        mail gateway http api mail parameter+url+authentication

        From:
        To:
        cc:
        Subject:
        Content:
        Attachments:

Tencent authorization code:
 

smtp.qq.com


SecretId: AKID44qsOsP1g5GB9qxu1ndW8CzuZIYYfr3y

SecretKey:5qYz4uSnrUrJUe5GDvLUheZEcsQYQKyZ


API url/request/response
authentication

Request structure
structure name => url
structure attribute => request parameter

http request => http response

response structure


Alarm sending:
    1. API receives alarm information
    2. JSON deserialization => alarm group alarm information
    3. Alarm group => send alarm
        send email
            alarm content generation => html/template
            theme => alarm name
            notifier => configuration

            Send:
                smtp server, port, username, password

        Send SMS
            SMS content => Tencent Cloud SMS template can only pass parameters
            Notifier => Configure
            sending:
                SDK call method
                address, structure/object, function call
        notifier
            must notify who => configure (operation and maintenance)
            business notifier => Relevant to the specific business owner =>
                        Notify the labels in the notification management alarm to filter the notifier
    4. Alarm information => Alarm processing

template function call = "

Guess you like

Origin blog.csdn.net/niwoxiangyu/article/details/130607161