Unarmed teach you to create large-screen operation and maintenance monitoring

  The continuous development of the company's business, immediately came the types of business increased, the number of servers grows, the more complex the network environment and released more frequently, which inevitably resulted in increased line accident, so the need for server-to-application the all-round monitoring and early warning.

  Based on Zabbix server monitoring, application monitoring base (mysql, redis, ES, etc.), monitoring and early warning early warning function to meet the basic requirements underlying exceeds the set threshold will notify in advance the relevant personnel to resolve.

  With Zabbix why do you need Grafana?

  Zabbix chart aggregation very weak, it is not its strength, and their data sources is limited only collector, is a graph showing the class Grafana strengths.

  Log monitoring with ELG to see, Kibana exhibit performance problems occur after log reached a level, showcasing not Grafana strong, so instead of using Grafana Kibana.

  Micro container-related services monitored by Prometheus ecological tool to view CPU, memory, JVM and other related indicators container applications.

  APM also link monitoring services, the business operation of a distributed application cluster tracking, alerting and analysis systems, see the call link status between micro-services.

  Status of the entire existing monitoring system is a monitoring platform for their content scattered, not unified real-time view, distraction, and therefore needs to be monitored mainly come out of each platform, a unified platform to show.

  Under intense corporate developer resources, we want to quickly build up a large-screen operation and maintenance can be used Grafana.

  Grafana is an open source monitoring data analysis and visualization package. The most commonly used for time-series data infrastructure and application data analysis visual analysis can also be used in other areas of need data visualization analysis. Grafana can help you query, visualize, alarm, meaning in your analysis of indicators and data. Can be shared with the entire team, help develop the team's data-driven culture.

  Grafana have strong community support, there is a wealth of plug-in templates, enough to meet the needs of functional properties. Almost integrated ElasticSearch, Mysql, Zabbix, InfluxDB, Prometheus and OpenTSDB as a data source.

        Follows is a detailed presentation on the process of each platform operating Grafana practice docking.

Server memory available indicators show

  Server available memory is a very important indicator, therefore need to focus on real-time, prevents steep decline type appear to be ignored.

  Zabbix can be extracted from the memory information, the first added data source Zabbix

  In Grafana add a data source, select Zabbix, and then fill in the Zabbix API address, username and password.

  url:http://192.168.0.1:8080/zabbix/php/api_jsonrpc.php

Once saved, add a kanban, choose Graph

 

 

Enter the edit page

 

 

Selecting a data source Zabbix

 

 

Group selection and Host, a drop-down box corresponding to the content from the data source automatically Grafana the pulled.

 

Group Zabbix in the corresponding group, Host corresponding to the host, Application corresponding to the application set, item corresponding to the index is.

Here we select the server you want to monitor, select the corresponding item available memory indicators: Available memory.

 

Switch to Axes, select the units

 

Switch to Legend, show minimum and maximum selection

 

 

Display lines and switch to adjust the background color shades.

 

 

Thresholds switch to set the alarm, the security is more than 20G, 5G 20G to a warning, alarm 5G is the following (the following values ​​ignore the drawing) red.

 

 

This can be seen has been configured to see the complete memory available trend.

 

 

Dozens of servers need a a configuration?

If you want to see all the available memory index servers do require a one to add?

Grafana provide replication function, make a good copy in accordance with the rules, first add the server classification

 

Add to

 

specific contents:

 

 

When Host option because there are Windows server, the server name starts with B, so the first rule beginning with B servers, is positive to note here is based on regular expressions javascript subject of.

 

 

After saving return, you'll see two drop-down box, you can filter the graphic display.

 

FIG choice of Repeat, value selection lateral copy (on the configuration step) according to the index server host name, a minimum of 24 line / 4 = 6.

Change monitoring indicators shown below, item changed to include memory keys, it displays the total and free memory.

 

 

Save refresh the page will show up all the memory server.

 

 

Other property your own adjustments.

data monitoring

所有服务器的进出流量监控大屏制作步骤参考内存监控内容,不过监控项item改成如下图所示:

 

日志监控

日志监控包括了业务的访问日志accesslog和自定义info\error log日志。

可以从访问日志中提取某个业务的访问量、响应时长、客户端ip、响应码等等。

这里就其中一个做介绍。

先添加数据源,ElasticSearch,有认证的话需要填写认证信息。

 

 

查询访问量最多的前10个服务,用饼形图展示占比。

添加图形组件,选择数据源为上步添加的内容。

 

 

指标选择条数count,按servername(这里记录到ES服务的名称,若有自定义的自行更改)维度统计,选择Top 10。

切换Options,显示total指标到图形右侧。

 

 

这样就完成了对接ElasticSearch的图表制作。

与服务访问相关的内容其实Grafana官方有Nginx等相关的看板模板,直接下载模板后选择数据源就可以展现相关的指标,非常漂亮。

如何排除访问量中非业务相关的内容?

 

 ES的Query语法,非常粗暴直接的方法用NOT排除不关心的内容或干扰内容。

带查询的表格方式展示日志列表

查询日志时可按条件过滤,如只按关心的服务或关键字查询。

添加看板,选择Table。

 

先添加服务列表和日志等级,关键字输入框

 

详细内容如下:

 

 第二个参数

 

Info指标是自己定义的,就不从数据里面读取。

第三个参数选择输入框类型。

 

编辑图表,查询内容按以下条件过滤,$代表所选变量。

 

选择Json Data,然后添加需要展示的列。

 

  由于列名都是code,不太直观,因此可以映射成中文名,切换标签后填写需要映射的列名和中文名,选择类型,可以格式化,可以对值为空时作处理,最后可以对值落入的范围判断进行颜色标示。

 

 最后样式如下:

 

展示Docker中容器内服务的内存监控

容器内的监控采用的是Prometheus + Cadvisor方案,这里只讲收集后的展示。

添加数据源,指向部署好的Prometheus

 

  Prometheus的查询使用的是PromSQL,PromQL (Prometheus Query Language) 是 Prometheus 自己开发的数据查询 DSL 语言,语言表现力非常丰富,内置函数很多,在日常数据可视化以及rule 告警中都会使用到它。

  在页面 http://localhost:9099/graph 中,输入下面的查询语句,查看结果,例如:

  http_requests_total{code="200"}

与Mysql的查询对比,模糊查询: code 为 2xx 的数据

// PromQL
http_requests_total{code~="2xx"}

// MySQL
SELECT * from http_requests_total WHERE code LIKE "%2%" AND created_at BETWEEN 1495435700 AND 1495435710;

添加一个图表,选择数据源Prometheus

 

监控容器内服务内存用方法container_memory_rss,具体语法使用可进入Prometheus页面去查看每个指标,https://songjiayang.gitbooks.io/prometheus/content/promql/summary.html

其它的图表属性设置与前面的设置方法一致,这里不做展开讲,最后保存展示。

 

实际上不会自己去画每个图表,而是去Grafana模板市场去下载别人上传的模板或官方模板,https://grafana.com/plugins?utm_source=grafana_plugin_list

关于同环比的问题

  Gafana没有提供一个同环比展示的图表,这一块也是与每个数据源有关,数据源不支持,Gafana也无法展示,在众多数据源里面PromSQL是基于时间序列的,是可以实现同环比功能的,因此可以先用PromSQL来查询出同环比数据再进行展示。

综合大屏展示

  以上内容都是分模块的,现在想把服务器、业务访问流量、容器状态放在一个大屏内显示,每一块都来各自的数据源。

        关键在于一块大屏要展示哪些关键信息,摈弃掉无关紧要的内容,下面是其中一个大屏,具体制作方式与上面一样,其中图形大小与布局需要根据投影到大屏上的分辨率有关,需要现场调试。

关于大屏展示的技巧

Grafana提供一个大屏展示轮播功能,几个看板之间自动切换,具体就是Playlists。

 

给大屏一个名字,和切换间隔,然后将需要轮播的看板加入。

 

保存后,回到列表,选择播放模式。

 

  与普通模式区别在于,这两种模式下会全屏,隐藏不相关的内容,如地址栏、任务栏和图标,而且图表自适应屏幕大小。两种模式的介绍参考官网:https://grafana.com/docs/reference/playlist/

关于Grafana预警功能

  Grafana的预警功能比较薄弱,最大的问题是预警配置不支持模板变量,这就导致如内存低于2G时预警,图表用的是模板内容,含有$host变量就无法预警,只适合于不含变量的图表,没有Zabbix的预警功能方便,因此建议预警用Zabbix来实现。

        Grafana还可对接很多数据源,需要自行去探索,有能力的可以进行二次开发,打造自己的监控大屏。

Guess you like

Origin www.cnblogs.com/zhangs1986/p/11180694.html