Prometheus 监控系统安装

Prometheus 既是一个时序数据库,又是一个监控系统,更是一套完备的监控生态解决方案。

本文简要介绍 Prometheus的安装和使用。

下载

根据系统
下载Download版本,并解压

tar xvfz prometheus-*.tar.gz
cd prometheus-*

启动

./prometheus --config.file=prometheus.yml

output

ts=2023-04-30T12:53:24.032Z caller=main.go:520 level=info msg="No time or size retention was set so using the default time retention" duration=15d
ts=2023-04-30T12:53:24.032Z caller=main.go:564 level=info msg="Starting Prometheus Server" mode=server version="(version=2.43.0, branch=HEAD, revision=edfc3bcd025dd6fe296c167a14a216cab1e552ee)"
ts=2023-04-30T12:53:24.032Z caller=main.go:569 level=info build_context="(go=go1.19.7, platform=darwin/amd64, user=root@1fd07b70056a, date=20230321-12:56:36, tags=netgo,builtinassets)"
ts=2023-04-30T12:53:24.032Z caller=main.go:570 level=info host_details=(darwin)
ts=2023-04-30T12:53:24.032Z caller=main.go:571 level=info fd_limits="(soft=61440, hard=unlimited)"
ts=2023-04-30T12:53:24.032Z caller=main.go:572 level=info vm_limits="(soft=unlimited, hard=unlimited)"
ts=2023-04-30T12:53:24.035Z caller=web.go:561 level=info component=web msg="Start listening for connections" address=0.0.0.0:9090
ts=2023-04-30T12:53:24.036Z caller=main.go:1005 level=info msg="Starting TSDB ..."
ts=2023-04-30T12:53:24.037Z caller=tls_config.go:232 level=info component=web msg="Listening on" address=[::]:9090
ts=2023-04-30T12:53:24.037Z caller=tls_config.go:235 level=info component=web msg="TLS is disabled." http2=false address=[::]:9090
ts=2023-04-30T12:53:24.042Z caller=head.go:587 level=info component=tsdb msg="Replaying on-disk memory mappable chunks if any"
ts=2023-04-30T12:53:24.042Z caller=head.go:658 level=info component=tsdb msg="On-disk memory mappable chunks replay completed" duration=7.217µs
ts=2023-04-30T12:53:24.042Z caller=head.go:664 level=info component=tsdb msg="Replaying WAL, this may take a while"
ts=2023-04-30T12:53:24.042Z caller=head.go:735 level=info component=tsdb msg="WAL segment loaded" segment=0 maxSegment=0
ts=2023-04-30T12:53:24.042Z caller=head.go:772 level=info component=tsdb msg="WAL replay completed" checkpoint_replay_duration=141.939µs wal_replay_duration=394.666µs wbl_replay_duration=151ns total_replay_duration=574.341µs
ts=2023-04-30T12:53:24.044Z caller=main.go:1026 level=info fs_type=1a
ts=2023-04-30T12:53:24.044Z caller=main.go:1029 level=info msg="TSDB started"
ts=2023-04-30T12:53:24.044Z caller=main.go:1209 level=info msg="Loading configuration file" filename=prometheus.yml
ts=2023-04-30T12:53:24.100Z caller=main.go:1246 level=info msg="Completed loading of configuration file" filename=prometheus.yml totalDuration=56.598769ms db_storage=2.783µs remote_storage=4.154µs web_handler=302ns query_engine=525ns scrape=56.05599ms scrape_sd=151.651µs notify=69.339µs notify_sd=25.365µs rules=5.821µs tracing=35.529µs
ts=2023-04-30T12:53:24.100Z caller=main.go:990 level=info msg="Server is ready to receive web requests."
ts=2023-04-30T12:53:24.101Z caller=manager.go:974 level=info component="rule manager" msg="Starting rule manager..."

启动时,使用的配置文件是prometheus.yml,在该文件中当前采集的信息只有prometheus自身。

# global是全局模块,定义的内容会被scrape_configs模块中的每个Job单独覆盖
global:
  scrape_interval: 15s # 抓取target的时间间隔,15s,默认值为1分钟,经验值10s-60s.
  evaluation_interval: 15s # 计算一条规则配置的时间间隔,设置为15s,默认值为1分钟.
  # scrape_timeout # 抓取target的超时时间,默认值为10s.
  # external_labels # 与外部系统通信时添加到任意时间序列或告警所用的外部标签.

# 告警模块配置
alerting:
  alertmanagers:
    - static_configs: # 静态配置AlertManager的地址,也可以用服务发现动态识别
        - targets:
          # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # prometheus会将该名称作为Lable `job=<job_name>`追加到抓取的每条时序数据中
  - job_name: "prometheus"

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
      - targets: ["localhost:9090"]

使用

启动后,监听端口9090,通过localhost:9090/metrics可以查看Prometheus Server的监控信息,返回数据如下:

# HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 8.3116e-05
go_gc_duration_seconds{quantile="0.25"} 0.000103581
go_gc_duration_seconds{quantile="0.5"} 0.000208772
go_gc_duration_seconds{quantile="0.75"} 0.000275054
go_gc_duration_seconds{quantile="1"} 0.000334647
go_gc_duration_seconds_sum 0.001401928
go_gc_duration_seconds_count 7
# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 31
# HELP go_info Information about the Go environment.
# TYPE go_info gauge
go_info{version="go1.19.7"} 1
... ...

Prometheus还提供了查询界面
可以通过如下地址访问:

http://localhost:9090/graph

界面如下图所示:
在这里插入图片描述

在Graph页面输入PromSQL表达式,例如’up’,就可以查看监控的每个Job的健康状态,1表示健康,0表示不健康。

另外,在Status菜单里,还提供了Running&Build InformationTSDB StatusCommand-Line FlagsConfigurationRulesTargetService Discovery等功能模块。

如果Prometheus启动时加上参数
--web.enable-lifecycle,即

./prometheus --config.file=prometheus.yml --web.enable-lifecycle

则每当prometheus.yml 发生改变时,就不需要重启 Prometheus,而是执行如下命令就可以重新加载配置:

curl -X POST http://localhost:9090/-/reload

参考

Getting started

Download

猜你喜欢

转载自blog.csdn.net/lanyang123456/article/details/130451843