Introduction to kubernetes+docker monitoring

https://my.oschina.net/fufangchun/blog/714530
Abstract: Docker's monitoring principle: According to the official statement of Docker, it is not recommended to run multiple processes in a container, so it is not recommended to use an agent in the container for monitoring (zabbix, etc.), the agent should run on the host machine, and obtain it through cgroup or docker api Monitoring data.

kubernetes+docker monitoring

   Docker's monitoring principle : According to the official statement of docker, it is not recommended to run multiple processes in a container, so it is not recommended to use an agent in the container for monitoring (zabbix, etc.), the agent should run on the host machine, and obtain monitoring data through cgroup or docker api .

 

1. Introduction to monitoring classification:

 

①, self-developed:

  By calling docker's api interface, data is obtained, processed, and displayed, which is not introduced here.

  E.g:

   1), iQIYI refers to the dadvisor developed by cadvisor , the data is written to graphite , which is equivalent to cadvisor+influxdb , iQIYI 's dadvisor is not open source

②、Docker——cadvisor:

    Google's cAdvisor is another well-known open source container monitoring tool.

    Just deploy the cAdvisor container on the host, and users can access the performance data (CPU, memory, network, disk, file system, etc.) of the current node and container through the web interface or REST service, which is very detailed.

    By default, cAdvisor caches data in memory , with limited data display capabilities; it also provides support for different persistent storage backends, which can save and aggregate monitoring data to Google BigQuery, InfluxDB or Redis.

    In the new Kubernetes version, the cadvior function has been integrated into the kubelet component

    It should be noted that the web interface of cadvisor can only see the information of the container on the single physical machine. Other machines need to access the url of the corresponding ip. When the number is small, it is very effective . When the number is large, it is more troublesome , so you need to put cadvisor To summarize and display the data, see [ cadvisor+influxdb+grafana ]

 

Ock 、 Docker —— Cadvisor + InfluxDB + Grafana :

    Cadvisor    : write data to InfluxDB

    InfluxDB    : Time series database, providing data storage , stored in the specified directory

    Grafana    : Provides a WEB console, custom query indicators , query data from InfluxDB, and display .

This combination is only for monitoring Docker, no kubernetes

 

④ 、 Governors —— Heapster + InfluxDB + Grafana :

    Heapster : Obtain metrics and event data in the k8s cluster and write to InfluxDB . Heapster collects more data than cadvisor, but it is complete, and less is stored in influxdb.

    InfluxDB : Time series database, providing data storage , stored in the specified directory.

    Grafana : Provides a WEB console, custom query indicators , query data from InfluxDB, and display .

 

2. Cadvisor+Heapster+InfluxDB+Grafana Notes:

①, Cadvisor notes:

    Cadvisor, you only need to enable Cadvisor and configure related information in the kubelet command .

    Does not need to be started as a pod or command

--cadvisor-port=4194 --storage-driver-db="cadvisor" --storage-driver-host="localhost:8086"

 

②, InfluxDB precautions:

1), Influxdb must be version 0.8.8 , otherwise, the Cadvisor log will appear:

E0704 14:29:14.163238 05655 memory.go:94] failed to write stats to influxDb - Server returned (404): 404 page not found

http://blog.csdn.net/llqkk/article/details/50555442

    It is said that Cadvisor does not support the 0.9 version of Influxdb , so 0.8.8 is used here, [ok]

 

Comparison table of different versions of Cadvisor and Influxdb (test ok):

Cadvisor version

Influxdb version

0.7.1

0.8.8

0.23.2

0.9.6 (above)

[ The version of Cadvisor and Influxdb do not correspond, you can see the 404 error in Cadvisor ]

2) The data of influxdb needs to be cleaned regularly . For a single Cadvisor, the data in half a day is 600M

#Unit: [hour: h] , [day: d]

#Delete within an hour :

delete from /^stats.*/ where time > now() - 1h 

#delete one hour away :

delete from /^stats.*/ where time < now() - 1h

3) Regarding the availability of influxdb, you can write scripts to regularly check whether there are related databases and tables , and create them if they are not.

#Check if there is a library

curl -G 'http://192.168.16.100:8086/db?u=root&p=root&q=list+databases&pretty=true'

curl -G 'http://192.168.16.100:8086/db?u=root&p=root&q=show+databases&pretty=true'

#Check the table in a library [points part]

curl -G 'http://192.168.16.100:8086/db/cadvisor/series?u=root&p=root&q=list+series&pretty=true'

#Create library :

Library name: cadvisor

curl "http://www.perofu.com:8086/db?u=root&p=root" -d "{\"name\": \"cadvisor\"}"

③, Grafana notes:

    Grafana's data retrieval requires a lot of effort. You can check the relevant statements on the official website, or you can directly borrow other people's templates.

    Influxdb query statement:

https://docs.influxdata.com/influxdb/v0.8/api/query_language/

 

④, Heapster notes:

    For larger-scale k8s clusters, heapster's current cache method will eat up a lot of memory .

    Because the container information of the entire cluster needs to be obtained regularly, it will be a problem to store the information in memory , and the heapter needs to support the API to obtain temporary metrics.

    If heapster is run in pod mode, OOM is easy to occur. Therefore, it is currently recommended to turn off the cache and separate the k8s platform in a standalone way. It is recommended that each node run the container separately

    The biggest advantage of heapster is that the monitoring data it captures can be grouped by pod, container, namespace, etc.

In this way, the monitoring information can be kept private, that is, each k8s user can only see the resource usage of his own application.

    Heapster collects more data than cadvisor, but it is complete, and it stores less data in influxdb . Although it is Google's, it has different functions.

    When the Heapster container is started alone , it willconnect to influxdband create a k8s database

There are two types of data metrics collected by heapster. [When searching for grafana, pay attention to it]

    1), cumulative  : the aggregated value is the [ cumulative value ], including the cpu usage time, network inflow and outflow,

    2), gauge  : Aggregate is [ instantaneous value ], including memory usage

Reference: https://github.com/kubernetes/heapster/blob/master/docs/storage-schema.md

 

describe

Classification

cpu/limit

CPU preset value, yaml file can be set

Instantaneous value

cpu/node_reservation

The cpu preset value of the kube node, similar to cpu/limit

Instantaneous value

cpu/node_utilization

CPU utilization

Instantaneous value

cpu/request

CPU requests resources, yaml file can be set

Instantaneous value

cpu/usage

cpu usage

Cumulative value

cpu/usage_rate

CPU usage rate

Instantaneous value

filesystem/limit

file system restrictions

Instantaneous value

filesystem/usage

文件系统使用

瞬时值

memory/limit

内存限制,yaml文件可设置

瞬时值

memory/major_page_faults

内存主分页错误

累计值

memory/major_page_faults_rate

内存主分页错误速率

瞬时值

memory/node_reservation

节点内存预设值

瞬时值

memory/node_utilization

节点内存使用率

瞬时值

memory/page_faults

内存分页错误

瞬时值

memory/page_faults_rate

内存分页错误速率

瞬时值

memory/request

内存申请,yaml文件可设置

瞬时值

memory/usage

内存使用

瞬时值

memory/working_set

内存工作使用

瞬时值

network/rx

网络接收总流量

累计值

network/rx_errors

网络接收错误数

不确定

network/rx_errors_rate

网络接收错误数速率

瞬时值

network/rx_rate

网络接收速率

瞬时值

network/tx

网络发送总流量

累计值

network/tx_errors

网络发送错误数

不确定

network/tx_errors_rate

网络发送错误数速率

瞬时值

network/tx_rate

网络发送速率

瞬时值

uptime

容器启动时间,毫秒

 

瞬时值

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326640522&siteId=291194637