Kubernetes + Prometheus + Grafana 集群监控

地址: http://www.mknight.cn/post/631/

简介: Welcome to Prometheus! Prometheus is a monitoring platform that collects metrics from monitored targets by scraping metrics HTTP endpoints on these targets.

Prometheus

下载

 
  1. wget https://github.com/prometheus/prometheus/releases/download/v1.5.2/prometheus-1.5.2.linux-amd64.tar.gz
  2.  
  3. tar -zxvf prometheus-1.5.2.linux-amd64.tar.gz -C /opt/prometheus --strip-components=1
  4.  
  5. cd /opt/prometheus
  6. # 备份一下配置文件
  7. mv prometheus.yml prometheus.yml-bak

修改配置文件,替换为自己的IP。

 
  1. static_configs:
  2. - targets: ['10.10.30.102:9090']

启动

 
  1. /opt/prometheus/prometheus --config.file=prometheus.yml

检查9090端口是否监听,则正常。访问9090端口

Grafana

下载

 
  1. wget https://s3-us-west-2.amazonaws.com/grafana-releases/release/grafana-4.0.1-1480694114.x86_64.rpm
  2. yum localinstall grafana-4.0.1-1480694114.x86_64.rpm
  3.  
  4. service grafana-server start

配置数据源


保存后测试一下,如果不通过说明刚才的9090端口有问题。

增加dashboards模板

 
  1. 下载dashboards
  2. (https://github.com/percona/grafana-dashboards)
  3.  
  4. git clone https://github.com/percona/grafana-dashboards.git
  5. cp -r grafana-dashboards/dashboards /var/lib/grafana/
  6.  
  7. 编辑 Grafana config
  8.  
  9. vi /etc/grafana/grafana.ini
  10.  
  11. [dashboards.json]
  12. enabled = true
  13. path = /var/lib/grafana/dashboards
  14.  
  15.  
  16. systemctl restart grafana-server

Kubernetes监控

导入json模板

下载地址json模板
然后Import

配置prometheus

最终配置文件如下所示:

 
  1. # my global config
  2. global:
  3. scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  4. evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  5. # scrape_timeout is set to the global default (10s).
  6.  
  7. # Attach these labels to any time series or alerts when communicating with
  8. # external systems (federation, remote storage, Alertmanager).
  9. external_labels:
  10. monitor: 'codelab-monitor'
  11.  
  12. # Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
  13. rule_files:
  14. # - "first.rules"
  15. # - "second.rules"
  16.  
  17. # A scrape configuration containing exactly one endpoint to scrape:
  18. # Here it's Prometheus itself.
  19. scrape_configs:
  20. # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  21. - job_name: 'prometheus'
  22.  
  23. # metrics_path defaults to '/metrics'
  24. # scheme defaults to 'http'.
  25.  
  26. static_configs:
  27. - targets: ['10.10.30.102:9090']
  28.  
  29. #以下部分为配置kubernetes,注意修改地址
  30. - job_name: 'kubernetes-nodes-cadvisor'
  31. kubernetes_sd_configs:
  32. - api_server: 'http://10.10.30.102:8080'
  33. role: node
  34. relabel_configs:
  35. - action: labelmap
  36. regex: __meta_kubernetes_node_label_(.+)
  37. - source_labels: [__meta_kubernetes_role]
  38. action: replace
  39. target_label: kubernetes_role
  40. - source_labels: [__address__]
  41. regex: '(.*):10250'
  42. replacement: '${1}:10255'
  43. target_label: __address__
  44. - job_name: 'kubernetes_node'
  45. kubernetes_sd_configs:
  46. - role: node
  47. api_server: 'http://10.10.30.102:8080'
  48. relabel_configs:
  49. - source_labels: [__address__]
  50. regex: '(.*):10250'
  51. replacement: '${1}:9100'
  52. target_label: __address__

重启服务,选择kubernetes的Dashboard

猜你喜欢

转载自my.oschina.net/binges/blog/1617425