Prometheus functions, architecture, components, configuration
1 Introduction
1.1 Introduction
Promethues is an open source system monitoring and alarm framework, inspired by Google's borgmon, created by SoundCoud employees in 2012, officially released in 2015, and officially joined the Cloud Native Computing Foundation in 2016
1.2 Monitoring purpose
Trend analysis: Continuous collection and statistics of monitoring indicators, and trend analysis.
Use the disk space growth rate to judge when to expand the disk capacity.
Comparative analysis: track and compare the operating resource usage or performance data of different versions of the system.
Alarm: Even if a fault is found, respond to the problem quickly to avoid a major impact on the business.
Fault analysis and location: analyze and troubleshoot problems, and find the root cause of the problem.
Data visualization: Visually display the system running status, resource usage, service status, etc. through the dashboard.
1.3 Features
Multidimensional data model: use metric name and key-value pairs to distinguish time series data.
Flexible and powerful query language support: PromQL. Mathematical and logical operations can be performed on metrics.
Single-node deployment does not depend on distributed storage.
Data collection is based on the HTTP protocol, using the Pull mode to obtain data.
Scan target and alarm management support static configuration and dynamic service discovery.
You can use PushGateway to push time series data to Prometheus Server.
The built-in Dashboard is used for basic function viewing and management.
1.4 Components
Prometheus Server: data collection and storage of time series data,
alarm analysis of data
client libaries: provide application data collection function
push gateway: support short-term jobs, push data from jobs to push gateway, and then expose the gateway to Prometheus Server
Exporters: expose third parties Service data collection, non-intrusive
Alertmanger: perform functions such as deduplication, grouping, and sending of alarms generated by the Prometheus server
1.5 Architecture
Process: Prometheus discovers the collection target through Service discovery,
uses HTTP request to obtain the collection target index data, and stores it persistently
Regularly run local rules for data aggregation to generate new time series data or alarms
Send the alarm to the alarmmanager, and the alarmmanager performs grouping, deduplication and other functions to send alarm notifications or recovery notifications
The HTTP API provided by Prometheus Server can be called to obtain the collected data for visualization
1.6 Usage scenarios:
Prometheus can record time series data of any pure data
Commonly used for machine-centric monitoring and highly dynamic service architecture monitoring
Not suitable for 100% accuracy requirements, such as request billing
Prometheus environment construction and use
2 Deployment and installation
Building components Prometheus Server, Alertmanger, node_exporter, mysqld_exporter
Install process monitoring systemd supervisord
Prometheus environment construction and use
3 Basic concepts
3.1 Data model:
Prometheus will also generate temporary time series for the query results.
Each time series is uniquely determined by the metric name and its label key-value pair set
format: <metric name>{label name> = <lable value>, ...} samples [millisecond]
Indicator name: format [a-zA-Z_:][a-zA-Z0-9_:] *
label name: [a-zA-Z_][a-zA-Z0-9_] *
label value: any Unicode value in the format of
sampled data: float64 format
3.2 Indicator Types
The Prometheus client library provides four types of core degrees. Prometheus Server does not use the type information
Counter: counting class
used in the case of monotonically increasing or monotonically decreasing cumulative indicators, which can only be automatically reset to zero after the target is restarted.
For example -- service request processing Quantity
The number of completed tasks
The number of errors Guage: Measurement
class
When using data that can be increased or decreased
For example--current memory usage The number of concurrent
requests For example -- delay time, response size Contains the statistics of each bucket, the total statistics and the total statistics: <basename>_bucket{le=""} <basename>_sum <basename>_count Summary: The summary class is similar to the histogram Figure, statistics on percentiles on the client side For example -- delay time, response size Contains percentile information of each bucket, total statistical information and total statistical quantity: <basename>{quantile=""} <basename> _sum <basename>_count
3.3 Assignments and examples
Example: instance
refers to the endpoint of data collection
Job: job
refers to a set of instances with the same purpose
Prometheus automatically adds job and instance tag names and values to time series data when collecting data: job
instance
Generate collection indicators for each instance Prometheus: up-- online or not
scrape_duration_seconds--collect continuous events
scrape_samples_post_metric_relabeling--reset the number of label samples
scrape_samples_scraped--the number of exposed samples
scrape_series_added--approximate number of new time series
4 configuration
Command line parameter -- cannot be modified after startup
Configuration file: specified by command line parameter --config.file, the default is prometheus.yml
can be modified and reloaded through signal or API: kill -s SIGHUP $PID
api /-/reload You need to enable the
global configuration rule file through the command line parameter --web.enable-lifecycle
: public configuration
illustrate
groups configuration rule group
name rule group name
interval rule running time interval
rules rule list
record: used to calculate time series data to generate new time series data for storage
Description record new time series database index name
expr
promql expression
used to query time series data
labels labels
alert is used to generate an alert
example
collection list
Rules file check
reload configuration
Alarm indicators
5 PromQL
Prometheus provides PromQL function for time series data query and statistical
expression data type:
Timely vector--contains only one set of time-series data and sample values for each query result set single item
Range vector--contains multiple sets of time-series data and sample values for each query result
set single item Scalar--floating point data
string --Data in characters
Time vector query
Range vector query
Offset Sub
query
Operation
function
6 HTTPAPI
Inquire:
Time vector query
Range vector query
Indicator query Query label Query tag value
Query collection target Query rule Query alarm Query deep set target metadata Query metadata Query alarm manager Query configuration information Query command line configuration Query runtime information Query compilation information Query TSDB state
Administrator: command line parameters through --web.enable-admin-api start
snapshot path /api/v1/admin/tsdb/snapshot
request method POST PUT
query parameter skip head skip header data
delete timing path /api/v1/admin/ tsdb/delete series
request method Post PUT
query parameter match[]
start
end
Description The data is not actually deleted from the hard disk, and will be cleaned up later during compression
clean disk data
Life cycle management: Command line parameters are started via --web.enable-lifecycle
Health status path /-/healthy
request method GET
preparation status path ///ready
request method GET
reload configuration path /-/reload
request method PUT Post
exit Path /-/quit
request method PUT POST
7 joint mode
8 Alarm management
Prometheus's main code interpretation and service discovery and control
go-demo/cmdb at main · yunixiangfeng/go-demo · GitHub
server development
prometheus model management
Prometheus node check and delete
Prometheus job addition, deletion, modification and query
Prometheus target addition, deletion, modification and query
prometheus terminal registration
Prometheus configuration acquisition
Promagent framework construction
promagent registration
Prometheus configuration file update
Prometheus client introduction
mysql_exporter development
Alarm notification
Alarm problem handling
CMDB management Prometheus Target
Prometheus target addition, deletion, modification and query
prometheus terminal registration
http://localhost:8080/v1/prometheus/register
Prometheus configuration acquisition
http://localhost:8080/v1/prometheus/config?uuid=xyz
Promagent framework construction
promagent registration
prometheus exporter development
mysql_exporter development
Features for service discovery
Alarm function
exporter collects data
day16
readme
exported_instance
push gateway
prometheus cascading
Rules
generate alerts
for generating new time series
operation => immediate (instantaneous) vector
on(url)
request {url=abc, code=200}
request {url=abc, code=400}
request {url=abc1, code=200}
request {url=abc1, code=400}
request {url=abc} 1
request {url=abc} 2
request {url=abc1} 3
request {url=abc1} 4
+
request_total {url=abc} 10
request_total {url=abc1} 11
on()
left + right group_left()
group_right()
left join
right join
inner join
Set operation
count
sum
min
max
avg
topk
bottomk
to develop
1. Develop application API, configuration, and PromQL
service discovery based on Prometheus => CMDB Prometheus Target configuration
Graphical management => Visual
alarm rule management
alarmmanager => CMDB
email
SMS
Multiple Prometheus
Configuration file operation yml, json
API => http client
file_sd_configs:
1. Service Discovery
a. prometheus CRUD
, name, ip
b. job management job CRUD
, prometheus, jobname, scheme, metrics, basic_auth, tls
c. target
target
DB
prometheus => config
Agent
systemctl reload exec
http client api
2. API to obtain data
display data, graphical, js (js + host resource monitoring)
3. Exporter
Agent => prometheus client => statistical
collection target for communication (socket)
4. Alarm => Db storage
notification SMS => Tencent Cloud
2. Prometheus
service discovery
Data collection method
Prometheus component diagram
server development
prometheus model management
Prometheus job addition, deletion, modification and query
Prometheus target addition, deletion, modification and query
prometheus terminal registration
Prometheus configuration acquisition
Promagent framework construction
promagent registration
Prometheus configuration file update
Prometheus client introduction
mysql_exporter development
Alarm notification
Alarm problem handling
go-demo/client at main · yunixiangfeng/go-demo · GitHub
prometheus client-go
https://github.com/prometheus/client_golang
prometheus package - github.com/prometheus/client_golang/prometheus - Go Packages
prometheus exporter development
https://github.com/yunixiangfeng/go-demo/tree/main/mysql_exporter
day17
network request
readme
1. prometheus
Node
=> query, delete
Agent => API register => does not exist to add
exists to update
=> 属性
uuid
hostname
addr https://host:port/
#username
#password
created_at
updated_at
deleted_at
Job
=> CRUD
=> attribute
task ID [a-zA-Z][0-9a-zA-Z_]
remarks
Node node_id
created_at
updated_at
deleted_at
Target
=> add, delete, modify, check
name
remark
Addr
Job
created_at
updated_at
deleted_at
Questions:
1. prometheus.yaml => modify => prometheus reload
=> api => enable lifecycle nginx => req
Authorization: Basic base64(name:password)
=> systemctl reload =>
systemctl is required
2. job: other Configuration, target and other configurations are added according to your own business needs
3. Rewrite the problem every interval and reload every time when the content changes, load
prometheus:
job: job1 job2 No change without reload
target:
No change is not allowed to write to the file
jobs targets sort sort json.Marshal => []byte => write
day 18
1. prometheus client_golang
client
a. prometheus client(java, go, php, python)
b. http exposure => processor
2. mysqld_exporter
3. web basic auth
a. The server responds to tell the browser that authentication is required => popup username and password Input box
401
www-authenticate: basic realm="my site"
b. Authorization: Basic xx => Read the message and verify it
Basic => xxx => Base64 decoding: split, verify Username: password (plain text)
Username: password => configuration file
hash => bcrypt/md5
md5
cmdb user login
form
user/password (plain text)
db: user/password (bcrypt=>hash)
4. The application monitors
the total number of http requests Counter
http url response time statistics histogram/summary
http The number of occurrences of each response status code Counter+labels
Filter:
5. alertmanager
6. Alert management
a. Alert reception & storage
authentication:
Authorization: Token xxxxxx
basic auth: Basic xxxx
bearer auth: Bearer xxxxx
webhook
API => alertmanager => json => db
id => labels generated (same labels generate the same id)
1 point => instance 1.1.1.1:9999 offline
restored
10 points => instance 1.1.1.1:9999 offline id
a => a
a,b => a,b
Notification: groupkey is grouped into units Notification
b. Alarm query
c. Page
7. Notification
a. email
b. Tencent sms
Development function => meet business needs + technology (go, beego)
business logic + technology
1. Repeat manual work (design) => automation (development)
2. Requirements => requirements analysis => design => development
Monitoring:
1. Availability
2. Delayed
request consumption time
Operation usage time
3. Error times
4. Capacity
Current request number/total request
number Current connection number/Total connection number
What does mysql_exporter monitor?
mysql => exporter =>
monitoring object api => get indicator information (calculation)
sql query =>
show global status
mysql availability
operation failed
select 1;
ping
slow query times show global status where variable_name='Slow_queries';
capacity
qps:
counter
show global status where variable_name='Queries';
tqs:
insert, delete,update *
com_insert
com_delete
com_update
com_select
com_replace
connect:
show global status where variable_name='Thread_running';
show global status where variable_name='max_connections';
Alert => Prometheus Association (Node)
promagent => prometheus.yaml labels => uuid => xx
flow:
show global status where variable_name='Bytes_received'
show global status where variable_name='Bytes_send'
json
form => controller => parseForm/unmarshal => object => server.insert/
unmarshal => ALertForm
groupby
alertname
a => a
labels =>
b => a,b
a => labels =>
b =>
Paging
pageSize => 5
pageNum
query condition
Offset Limit
pageNum => 1,2,3
(pageNum) - 1 * pageSize
limit PageSize
1 = 0 limit 5
2 => 5 limit 5
querytable.SetCond(cond).Offset().limit()
querytable.SetCond(conf).Count()
Page
datas
Paging related data page URL parameters
prometheus process
day19
1. Alarmmanager alarm notification
a. SMS Tencent
b. email smtp alarm notification in units of groups
alarm processing in units of alarms
Mail and SMS sending
mail:
smtp server smtp protocol sending mail
mail gateway http api mail parameter+url+authentication
From:
To:
cc:
Subject:
Content:
Attachments:
Tencent authorization code:
smtp.qq.com
SecretId: AKID44qsOsP1g5GB9qxu1ndW8CzuZIYYfr3y
SecretKey:5qYz4uSnrUrJUe5GDvLUheZEcsQYQKyZ
API url/request/response
authentication
Request structure
structure name => url
structure attribute => request parameter
http request => http response
response structure
Alarm sending:
1. API receives alarm information
2. JSON deserialization => alarm group alarm information
3. Alarm group => send alarm
send email
alarm content generation => html/template
theme => alarm name
notifier => configuration
Send:
smtp server, port, username, password
Send SMS
SMS content => Tencent Cloud SMS template can only pass parameters
Notifier => Configure
sending:
SDK call method
address, structure/object, function call
notifier
must notify who => configure (operation and maintenance)
business notifier => Relevant to the specific business owner =>
Notify the labels in the notification management alarm to filter the notifier
4. Alarm information => Alarm processing
template function call = "