Prometheus surveillance sharpest "Swiss Army Knife"

Original: https://mp.weixin.qq.com/s/Cujn6_4w8ZcXCOWpoAStvQ

A, Kubernetes vessel monitoring standard -Prometheus

  1 Introduction

  Prometheus is a SoundCloud development of open source monitoring and alarm system with timing database, based on the Go language, is Google BorgMon monitoring system open source version. 2016, Google sponsored by the Linux Foundation, the Foundation's native cloud (Cloud Native Computing Foundation, CNCF) to Prometheus into its second largest open source projects. Prometheus is also very active in the open source community, has more than 20,000 Star on GitHub, and the system every twelve weeks there will be a small version of the update.

  With Kubernetes determine the leader position on the container scheduling and management, Prometheus has become the standard Kubernetes container monitoring.

  2, the advantages

  Prometheus has many advantages, as described below.

  (1) providing a multi-dimensional data model and query flexible manner, to monitor the composition of any dimension data by associating a plurality of Tag monitoring index, and provides a simple query PromQL further provides HTTP query interface can easily GUI component and the like incorporated Grafana impression data.

  (2) in the case does not rely on external storage, support for local storage server node by Prometheus comes with timing database, you can complete ten million second-class data storage; Not only that, in saving a large amount of historical data in the scene, Prometheus can docked third-party databases such as OpenTSDB timing and so on.

  (3) defines an open index data standards, in order to collect time-series data over HTTP Pull the way, only to achieve the monitoring data Prometheus monitoring data format can be collected Prometheus, summary, and support for serial data pushed to the middle of the gateway to the Push mode We can be more flexible to respond to a variety of monitoring scenarios.

  (4) support the monitored objects through static and dynamic configuration file discovery mechanism, auto-complete data collection. Prometheus now supports Kubernetes, etcd, Consul and other service discovery mechanism can reduce manual configuration aspects of operation and maintenance personnel, particularly important in a container runtime environment.

  (5) easy to maintain, can be started directly from the binary file, and provides container deployment image.

  Zoning sampling and federal deployment (6) support of data to support large-scale cluster monitoring.

  3, architecture

  Following is a brief Prometheus architecture.

  Prometheus is a basic principle of a state monitored by HTTP periodic gripping assembly, any component as long as the corresponding HTTP interface and data format defined in line with Prometheus, Prometheus can access the monitor.

  Figure 1-9 shows the overall architecture of FIG Prometheus (Prometheus from official website), showing the relationship between the internal modules of Prometheus and related peripheral components.


Figure 1-9

  PrometheusServer responsible for the timing crawl metrics (indicators) data on the target, the target needs to be exposed each grab an HTTP service interface for Prometheus regular crawl. This call is monitored objects acquired monitoring data approach is called Pull (pull). Pull Prometheus unique way reflects the design philosophy and in most cases, a different monitoring system Push (push) mode.

  Pull mode advantage is the ability to automatically monitor and upstream level monitoring, configuration less, more scalable, more flexible, easier to achieve high availability. Expand it is Pull ways to reduce coupling. Since the push because the system is prone to push data to the monitoring system failure cause problems paralyzed monitoring system, so that by Pull mode, the end be collected without the presence sensing monitoring system, the monitoring system is completely independent of such data acquisition entirely controlled by the monitoring system to enhance the controllability of the whole system.

  Pull data collected by Prometheus way, how it gets to monitor objects? Prometheus support in two ways: the first type is a static configuration through configuration files, text files, etc.; No. 2 is supported ZooKeeper, Consul, Kubernetes etc. for dynamic discovery, for example, the dynamic discovery Kubernetes, Prometheus use the API to query Kubernetes and monitoring changes in the container information updated dynamically monitor objects, create and delete this container can then be perceived Prometheus.

  Storage by certain rules to clean up and organize data, and the results obtained are stored in a new time series, there are two storage methods.

  One is local storage. Prometheus comes through timing database to save data to a local disk, for performance reasons, it is recommended to use SSD. But local storage capacity is limited, it is recommended not to save more than a month of data.

  Another remote storage is adapted to store a large number of monitoring data. By conversion adapter intermediate layer, Prometheus currently supporting OpenTSDB, InfluxDB, Elasticsearch back-end storage for remote write remote read and stored by the adapter, Prometheus interfaces, you can access remote storage used as Prometheus. As shown in FIG show the remote database currently supported Prometheus 1-10.


Figure 1-10

  Prometheus impression data collected by PromQL API and other visually. Prometheus support a variety of ways graph visualization, e.g. Grafana, and comes PromDash engines provide their own template. Prometheus also provides HTTP API query, customize the output required.

  Pull pull manner by Prometheus data, but some prior systems is implemented by the Push mode, the access to these systems, Prometheus PushGateway provides support for these systems to PushGateway active push metrics, and just the timing to the Gateway Prometheus dedicate data.

  AlertManager is a component independent of Prometheus, after triggering a high-level rules set in advance in the Prometheus, Prometheus will push the alarm information to AlertManager. AlertManager provides a very flexible alarm mode, can push through the mail, slack nails or other means. And AlertManager support high availability deployment, in order to solve the problem of multiple AlertManager repeated alarms, the introduction of Gossip, Gossip among multiple AlertManager by synchronizing alarms.

  AlertManager overall workflow shown in Figure 1-11.

  Second, other open source monitoring tools

  In addition to providing the market a lot of SaaS business tools such as surveillance Po, easy to monitor, listen clouds, etc., there are some open source monitoring tools such as Zabbix, Open-Falcon and so on.

  1、Zabbix

  Zabbix is ​​a distributed monitoring system AlexeiVladishev open source, supports a variety of client acquisition and collection methods, while supporting SNMP, IPMI, JMX, Telnet, SSH and other protocols, the data collected will be stored in the database, and then analyze its order, if they meet the rules of the alarm, the corresponding alarm is triggered.

  Zabbix mainly in the following core concepts.

  Host (Host): is an object of abstract Zabbix monitoring, monitoring each object has an IP address, host here is not limited to physical servers, virtual machines may be a container or a network device.

  Host Group (Host Group): is a set of hosts, mainly for resource isolation between multiple users, user groups and the host association, such Zabbix different users will be visible only to the management of their own resources.

  Entry (Item): is an indicator of the associated monitoring data, to each entry has a key identifier, to distinguish between different indicators.

  Application (Application): is a collection of entries, one entry may belong to one or more applications.

  Template (Template): it is a tool to quickly configure Zabbix in for a preset entries to quickly define the set of monitored hosts, usually contain an entry, triggers, views, application and service discovery rules, by the host association to avoid duplication template configuration host.

  Trigger (Trigger): is Zabbix alarm rules for assessing whether the monitored object in the received data matches a particular entry threshold, if a match occurs to trigger a state transition from OK Problem, when the data back again when the reasonable limits, in turn triggers the state transition from Problem is OK.

  Action (Action): is Zabbix action after an event occurs executed. The main event refers to an alarm event, the discovery of new host joins a network event, automatic registration (Acting) events and internal events (monitored items is not supported); acts including sending e-mail notification, remote command execution or implementation of additional actions of Zabbix.

  FIG Zabbix components shown in Figure 1-12.


Figure 1-12

  As shown in FIG. 1-12, Zabbix consists of the following components.

  1)ZabbixServer

  Zabbix core components, written in C language. It is mainly responsible for receiving monitoring information sent by the Agent, and aggregated storage. Zabbix Server need to complete the following three main work.

  Device Registration. There are two ways to register the monitoring object to the device: one is to manually configure Agent address, but if you need a room in hundreds of machines added all at once, it will be a lot of work; the other is the use of auto-discovery mechanism, you can configure an entire network segment, Server will detect host the entire network segment, and can automatically configure templates, triggers a series of related configuration.

  data collection. Here both active collection, also comprises a passive receiver, the collected data will be placed in the first memory, and is then stored in a batch database.

  Regular data clean-up and alarm trigger. By matching the trigger data collection configuration, if the conditions are satisfied, an alarm is issued.

  2)Zabbix Database

  Is used to store all configuration information, and data collected by the monitoring Zabbix. Back-end database support for MySQL, PostgreSQL, Oracle, etc., generally used is MySQL, and provides data query ZabbixWeb page. As a result of time-series data stored in a relational database, it is often stretched in terms of data storage when Zabbix monitoring large clusters.

  3)Zabbix Web

  Zabbix GUI components, written in the PHP, usually associated with Server running on the same host, provide system configuration and monitoring data show, the template configuration includes monitoring, alarm and so on.

  4)Proxy

  Optional components commonly used in a distributed environment monitoring, Agent Server collects monitoring data monitoring terminal portion according to a certain frequency and sent to the Server side unity.

  Proxy has its own database, the introduction of Proxy mainly to solve the following two problems.

  The network is not communication between Server and Agent, which is often the case in cross-network, cross-room scene.

  Server to reduce the pressure in the large-scale deployment, after all, when so many simultaneous connections Agent, Server maintenance need more connections. When deployed in Proxy mode, is arranged upstream Proxy Agent address, and the Proxy Configuration Server.

  5)Agent

  Deployment on the monitored host, is responsible for collecting local data and sent to the Proxy Server side or end, Agent launches a daemon named Agentd of.

  Agent divided into two modes: active and passive. Initiative refers Agent actively collect data and send data to the Server / Proxy; passive means that every time the initiative to call Agent Server to obtain data. User Agent can also perform custom scripts to complete some of the native Agent features not available.

  Here add a tool to explain Zabbix, that Zabbix_sender. Zabbix_sender is a command line tool that can be used to actively send data to the Zabbix monitoring server, you can avoid the Agent has been waiting for the monitoring task is completed. Interestingly, Zabbix itself is maintained by a commercial team that offers free Zabbix software, but receive support and maintenance costs. This is exactly the same idea of ​​Red Hat, in future system software matures, the software business model will gradually shift from the sale of licenses to charge a service fee, which is the general trend.

  2、Nagios

  Nagios formerly known as NetSaint, developed by Ethan Galstad and maintenance. Nagios is a monitoring tool veteran, written in the C language, mainly for the host monitoring (CPU, memory, disk, etc.) and network monitoring (SMTP, POP3, HTTP and NNTP, etc.), of course, also supports user-defined monitoring scripts, shown in Figure 1-13.


Figure 1-13

  Nagios overall architecture is very clear that a variety of monitoring data collected by the Plugin, for example, SNMP monitoring, SNMP plugin obtain network information and communication snmpd running on the monitored objects through; it also supports a more versatile and safe collection methods NRPE (Nagios RemotePluginExecutor), it first starts a NRPE daemon at the distal end, the distal end of the host for the above detection command operation, the check nrep Nagios server with the plugin plugin to NRPE over SSL daemon, performing corresponding monitoring behavior. Compared SSH remotely execute commands, this way more secure. Of course, Nagios also supports SSH Plugin.

  Nagios data is stored in the RDD (Round Robin Database, annular database), RDD suitable for storing sequential data, monitoring data supports the following four kinds.

  gauge: dashboard data.

  counter: counter.

  absolute: the rate of change in different time periods, a positive number.

  driver: change rate refers to the ratio of the current value and the previous value of the monitoring, positive or negative.

  RDD storage principle is relatively simple, it is the entire data storage space to build a ring, a pointer to the latest data position and read and write data as will be moved, if the monitoring data acquired at this time is not, it will fill the default RDD Unknown, ensure data alignment. Each database file ending .rdd, size is fixed, shown in Figure 1-14.


Figure 1-14

  3、Open-Falcon

  Open-Falcon millet enterprise-class open source monitoring tools, developed by Go language from, including millet, drops, and other US corporations, including Internet companies are using it, it is a flexible, scalable and high-performance monitor embodiment, the overall structure shown in Figure 1-15.


Figure 1-15

  Next, the components shown in FIG. 1-15 will be briefly described.

  Falcon-agent: with Daemon Go language development program, running on every Linux server to collect data on various indicators host, including CPU, memory, disk, file systems, kernel parameters, Socket connection, at present It has supported more than 200 monitoring indicators. And, Agent supports user-defined monitoring script, the script must return an array of Agent specified format. Agent data collected will be reported to the Transfer by way of RPC. In order to avoid a single Transfer failure, Agent Transfer support to configure multiple addresses, it can also ignores the extra monitoring indicators. Agent itself may also be a proxy gateway Proxy-gateway, the third party receives the HTTP request and forwards it to the Transfer.

  Hearthbeat server: short HBS (heartbeat service), each Agent will periodically by RPC way to get in shape reported to the HBS, including host name, host IP, Agent version and plugin version, Agent will be available from the HBS own needs collection of tasks performed and custom plug-ins.

  Transfer: responsible for receiving Agent sends monitoring data, and data collation will be sent after filtration through consistent Hash algorithm data to Judge or Graph. In order to support large amounts of historical data storage, Transfer also supports OpenTSDB. Transfer itself does not state, free to expand.

  Judge: alarm module, Transfer forwarded to the Judge of the data set by the user triggers an alarm rule, if met, will trigger e-mail, letter or micro callback interface. To avoid repetition herein alarms, alarm temporary Redis introduced, thereby completing the merger and the alarm suppression.

  Graph: RRD data reporting, archiving, storage components. Graph After receiving the data, the data will be stored RRDtool archiving data, while providing a way to monitor RPC query interface.

  API: The main query interface provides not only monitoring data can be read from Graph years, also can dock MySQL, used to store alarms, and other user information.

  Dashboard: developed from the Python, Open-Falcon providing data and alarm presentation, the monitoring data from the Graph, Dashboard allows users to customize the monitoring panel.

  Aggregator: aggregation component, an indicator of the aggregate value of all machines under a cluster, providing monitoring experience a cluster perspective. Graph data acquired by the timing in accordance with the polymerization of the cluster transmits a new monitoring data and the monitoring data to the Transfer.

  Contrast Monitoring System

  Below horizontal comparison for Prometheus, Zabbix, Nagios and Open-Falcon these types of monitoring system, as shown in Table 1-1.

Table 1-1

  surveillance system

  Development language

  Maturity

  Scalability 1

  high performance

  Community activity

  Support for container

  Enterprise usage

  Zabbix

  C +PHP

  high

  high

  low

  in

  low

  high

  Nagios

  C

  high

  in

  in

  low

  low

  low

  Open-Falcon

  Go

  in

  high

  high

  in

  in

  in

  Prometheus

  Go

  in

  high

  high

  high

  high

  high

  From the development of language point of view, in order to respond to the needs of high concurrency and rapid iteration, the monitoring system development language has slowly moved from C to Go.

  Have to say, Go with simple syntax and elegant concurrency in Java occupy the field of business development, C occupation in the area of ​​low-level development, middleware development needs accurate positioning, it is widely used in the current open source middleware products.

  From the maturity aspect of the system point of view, Zabbix and Nagios monitoring systems are established: Zabbix appeared in 1998, Nagios appeared in 1999, the system functions more stable, high maturity. And Prometheus and Open-Falcon are recent years before the birth, although the iterative function is constantly updated, but they stand on the shoulders of giants, drawing a lot of experience on the old monitoring system architecture design.

  Scalability from the system point of view, Zabbix and Open-Falcon can customize various monitoring scripts. Zabbix can be done not only active push, pull can also be done passively; Prometheus monitoring data defines a set of specifications, and extend the system by various exporter acquisition capabilities.

  From the point of view of data storage, a relational database storing data using Zabbix, which greatly limits the data acquisition performance of Zabbix. Open-Falcon Nagios and have adopted the RDD data storage. Open-Falcon also added Hash algorithm consistency of data pieces and may receive OpenTSDB, Prometheus and a high-performance self-development sequence database, when V3 version can reach tens of millions per second level of data storage, by third-party extended stored docking sequence database of historical performance data.

  Activity from the community point of view, the current Zabbix and Nagios community activity is relatively low, especially Nagios, Open-Falcon community, although relatively active, but basically domestic companies involved. Prometheus community activity is high, and the support of CNCF, the late development worth the wait.

  Support from the container, due to Zabbix and Nagios appear early, when the container has not been born, so they naturally support of the container is relatively poor. Open-Falcon, while providing a vessel monitoring capabilities, but limited support. Prometheus dynamic discovery mechanism, not only supports Swarm native cluster, cluster monitor also supports Kubernetes container, container monitoring is the best solution. Zabbix in traditional monitoring systems, especially in terms of server-related monitoring, position of absolute dominance. Nagios, there are widely used in surveillance network. With the development of the container, Prometheus became standard surveillance vessel, and will be widely used. Overall, Prometheus can be said is the most sharp surveillance "Swiss army knife" the.

  Adapted from "layman Prometheus: principles, applications, source code and expand explain" a book. Chen Xiaoyu, Yang Hu Chuan, Chen Xiao, ed., Published by the Publishing House of Electronics Industry Bowen viewpoint.

Guess you like

Origin www.cnblogs.com/Irving/p/11828561.html
Recommended