Open source service internal monitoring system (1) Introduction to graphite

Open source monitoring systems, the famous ones are nagois and catis. The company has adopted nagios for server and service status monitoring for operation and maintenance, and provides email and SMS alarm functions in combination with plug-ins; catis monitors the server through the snmp protocol, and uses RRDTool to draw beautiful reports for you to do performance analysis.
These are powerful tools for operation and maintenance personnel, but service developers rarely use such tools because it is difficult for them to monitor the internal running status of the services we develop. If you want to monitor the response time of your own development service, draw reports every five minutes, or monitor the internal cache hit rate of your service at various times, these tools are basically not helpful.

Common scheme

In order to meet such needs, developers often develop a monitoring system by themselves, send the internal status of the service to the monitoring server regularly, store these statuses in the database, and then compile reports by themselves. If the service you want to monitor is a cluster, you also need to solve the problem of monitoring data aggregation.
Another common solution is to print various status data to log files, summarize these logs regularly, and then run job analysis on the summary results (some are aggregated into hadoop and run mapreduce jobs), so as to monitor the effect The real-time performance is poor.
In order to avoid the above workload, we have contacted and used two open source systems that can be used for internal data monitoring of services and provide excellent reporting effects, namely Graphite and Ganglia.

Graphite

The biggest experience that Graphite gave me was how easy it was to use. catis uses the snmp protocol, which means that you need to install the snmp agent on the monitored node; ganglia monitoring also requires you to install gmond on the monitored node to collect information. Graphite uses a simple text protocol and simply sends text data to the graphite server through a TCP socket

quentinxxz.server.count  1234 1440245016

Among them, quentinxxz.server.count is the name of a specific monitoring indicator, 1440245016 is the timestamp of data generation, and 1234 is the value of the indicator cut at this time. Then you can see the corresponding data curve on the graphite web.
Graphite is implemented in python and mainly consists of three parts:

1.whisper

Whisper is a fixed-size database, similar in design to RRD (round-robin-database)

whipser is a fixed file size database. This means that whipser's data files are created with a fixed size.
For example, we configure the quentinxxz.server.count entry
as follows in the /opt/graphite/conf/storage-schemas.conf file

[quentinxxz]
pattern = quentinxxz.server.count
retentions = 1min:50d,10min:50d

1min means to record a point with an accuracy of 1 minute, and the 50d table stores it for 50 days. Therefore, the number of points that need to be saved to create a file should be 1 * 60 * 24 * 50.
Another interesting aspect of whisper is its powerful aggregation function. In the above configuration, 10min represents another precision of our configuration, and whisper will specify it according to us. The aggregation method (for example, take the maximum value, minimum value, and average value among 10 points), and store the result in another storage area with the precision of one point of 10min. The configuration of the specific aggregation method is located in the /opt/graphite/conf/storage-aggregation.conf file.
In addition, RRD does not receive updates that are earlier than the current most recent time cutoff, while whisper can do it (but there seems to be not much demand for this). For more comparisons between RRD and whisper, you can refer to the document  http://graphite.wikidot.com/whisper

2. carbon (Twisted daemon that monitors data)

Carbon is implemented based on Twisted, which is the back-end implementation of Graphite.
The main role of Carbon is to receive connections from monitored nodes, collect data for each indicator, write the data into the cache, and finally persist it to the whisper storage file. Carbonr can ensure that Graphite web draws the updated indicators received in real time. The principle is also very simple. It is similar to lucence. The data received by carbon will be stored in the cache first, and then written to the hard disk storage of whisper. Graphite web will query the data in the cache and hard disk at the same time by initiating a request to carbon-cache.

3. graphite-web

Graphite web is a webapp based on Django, and its main function is to draw reports and display. I don't recommend using Graphite web directly, because while it's still powerful, the interface makes me think it's deadly ugly. Here I recommend using Tessera, a third-party Graphite front-end open source application.
The premise of using Tessera still requires the installation of Graphite web, because it will directly request Graphite web to get data. Tessera's interface is still quite cool, more in line with the aesthetics of technicians. This is a big plus that it attracted me to Graphite.
In addition, its flexible configuration allows us to freely combine our Dashboards. Not much to say, directly to the picture.



 

Graphite usage summary

Personal use experience, Graphite cooperates with Tessera, the main advantage is that the interface is simple and beautiful, and the transmission protocol is simple. The disadvantage is that when your application is a large cluster, Graphite currently does not have the ability to integrate and summarize data from different servers in the cluster for you, for example, to monitor the cache hits of 10 search nodes in the cluster. , you need to use 10 different indicator names (usually adding host name distinction), which means 10 different curves, but you cannot directly use Graphite to aggregate into one indicator or curve, so that you can see the overall cache of the search cluster hit situation.

 

20150822 first published on 3dobe.com    http://3dobe.com/archives/160/

iteye link:  http://quentinXXZ.iteye.com/blog/2237318

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326980489&siteId=291194637