An article thoroughly publicize the online application monitoring

"Online service stopped, to restart it"? Workplace well-done research programmer, line of sight will be gradually transferred to the state run online applications. Imagine, if you are asleep in dreams in the middle of the night two points, micro-channel group suddenly exploded pot: "service stopped, the first restart ..." For you to get up gas, the dream ended, whether may be able to?

 

Today is divided into three large pieces: the application status monitoring, based on the monitoring application logs, sublimation part (old driver, to take you to fly), a little talk about application monitoring relevant knowledge.

 

Tips:

1. Today's burning brain quite content, please drink enough six walnut in advance!

2. But I believe that insist on reading to the end, you definitely worthwhile trip!

 

I. Application Service Condition Monitoring

 

Run applications and services on the general requirements for the production of 7 x 24, stable operation rate of 99.99%. Which in addition to ensure the robustness of the application itself, but of course also need to rely on some of the daemon to do the monitoring. Otherwise, once the service suspended animation how to do?

 

First, we can think of, is the linux command through a few lines to form a lean, mean, shell file, occasionally matched crontab scheduled tasks to complete the process of application services guardian. Do not pull anything else, open common monitor.sh script to check it out (to tomcat for example).

 

640?wx_fmt=png

 

Small but perfectly formed, let us look at the anatomy sparrow.

 

How to determine the application is in suspended animation?

 

By configuring the health check url, designed to detect a heartbeat, when each time you visit a normal return a 200 status code, it considers the application can also provide normal service. If the returned status code other than 200, it is determined whether the process ID of the application exists, it indicates the presence in suspended animation.

 

How to achieve suspended animation application restart?

 

By ps -ef | grep "tomcat" | grep -w 'tomcat' | grep -v "grep" | awk '{print $ 2}' to obtain the corresponding process ID, process ID if there is, proceed kill, and then call the start command be restarted to complete the service.

 

The above approach is in shell scripts to achieve application once every 60 seconds to check the service status. In addition, I also often provided with the Linux system crontab, the timing configuration script to call monitoring, application monitoring is completed, or in the above monitor.sh script, for example, minor modifications commented loop.

 

640?wx_fmt=png

 

 

Completed the preparation of the script, the next is scheduled tasks with crontab ( the first time I heard the word crontab, your own look for Google, Baidu, study intensively knowledge )

 

*/1 * * * * /app/script/monitor.sh > /dev/null 2>&1

 

If you are ready to try the above scheme, there are two considerations:

A Note : Please pay attention to modify the corresponding directory, including tomcat directory, the script directory, heartbeat url;

Note two : Please note that for the shell script executable permissions assigned.

 

Small script to solve a big problem, so do not take improper BEAN BAG dry food, shall have skillfully deflected trend.

 

In fact, based on past experience, calm down and think about it, in the face of other non-tomcat service monitoring, then why not do such programs.

 

This most basic, simplest and most practical application service status monitoring program say finished. Did you get to?

 

II. Application-based monitoring logs

 

Contact financial items all know, is to solve the system log Bug last an Aladdin's lamp.

 

In the micro-services development in full swing today, the demolition service granularity finer, more and more clear division of labor module, the attendant is based on the log to troubleshoot the problem tends to be cumbersome.

 

It is not possible to log micro services to imputation with it? Molding industry has a lot of programs. Then talk about how that imputation log it? How imputation log store it? How to store the logs show it? How to achieve warning it?

 

How to log imputation?

 

Common logging industry imputation scheme, does it divided into two types: one is the direct mining way; the other is the agent way.

 

Adopt a so-called straight manner, that is, the log application, or directly uploaded to the server layer, for example, the Log4j appender.

 

A so-called agent mode, corresponding to application deployed on a machine service agent, specifically for collecting logs, and then pushed to the storage layer side or service, the application itself is only responsible for generating the log.

 

Direct mining method is applicable to: in the face of agent without additional resources can be deployed independently collect logs, such as load balancing equipment, it would have to consider the direct mining method.

 

agent suitable way: as long as the application will log output to disk as a file, you can log agent collected, and the application itself loose coupling. Compared with the direct supply mode: superior extensibility, maintainability, agent acquisition mode.

 

Common logging industry imputation tool, what does?

 

A lot of wheels ready to come out.

 

Elastic's Logstash, Elastic's Filebeat, Apache under the command of the Flume, Linux system provides Syslog / Rsyslog / Syslog-ng, Facebook and so on and so on under the name of Scribe.

 

Estimated adhere reading this you will look ignorant force ( laughing cry ), but that's okay, just when today the expansion of knowledge about it.

 

Today I mainly mention I used two: Elastic's Filebeat, Apache under the command of the Flume.

 

Filebeat is developed by the Go language, it is a binary file, not dependent on the deployment, minimal footprint, lightweight 3M and more, out of the box, pro-test is particularly convenient to use. And the industry's reputation is not small, is the product of ELK schema upgrade, may I ask whether you have heard ELK it ( laughing cry )?

 

Flume was developed using the Java language, I use Flume mainly the ability to integrate into the framework of the project to provide a log of imputation, mainly for Flume removes some redundancy, extended some functions, conducted a second extension development ( follow-up time to write special Flume that thing a secondary development, please look ).

 

How imputation log store it?

 

And a bunch of wheels ready to come out.

 

ElasticSearch, Mongodb, HDFS, timing database influxdb, opentsdb, rrd and so on.

 

Due to the needs of the scene, locate the query by keyword, but do elasticsearch inquiry is the most appropriate as this. Because each wheel, each wheel has usage scenarios, this will not do well under way.

 

What visual log analysis tools?

 

Yes, you certainly guessed, and a bunch of wheels ready to come out.

 

Node.js based presentation tool developed to provide logs show, kibana summary, search, dashboards and other features.

 

To provide time-series graph based on language development focused go-specific indicators based on CPU and IO utilization like Grafana.

 

How to achieve warning it?

 

Long March, only one step. Log imputation done, that if you want to see if there is a keyword, such as error, exception and so, keyword appears you can send a warning notice to implement it not so easy.

 

Eloquent talk so much about the log imputation, I often use the ELK, detailed follow-up to find time to write an article about imputation log it.

 

III. Sublimate it, the old driver to take you loaded B, take you fly

 

So far, you've learned how to monitor application service status, and know how to do the monitoring of log-based thinking. That you ever had tangled: Calling relationship sum requested it? Sum of the number of requests probably through the system? Probably request a sum that is spent consuming node?

 

Give everyone throwing a concept "APM Application Performance Monitoring" (first do not understand their own look to fill gaps in knowledge), if you have time and would appreciate your focus on the following three components APM.

 

The first: Zipkin, is distributed by Twitter Open Source tracking system, including: data collection, storage, search and show.

 

The second: Pinpoint: Korean open source distributed by the tracking component, APM is a tool for large-scale distributed systems written in Java.

 

Third: Skywalking: APM outstanding domestic components, is a tracking, alarm systems and analysis of the business operation of Java distributed applications.

 

Wheels ten million models, there is always a right for you.

 

IV. Written in the last

 

Under Internet winter, when the environment is not good, you can only self-improvement! Self-improvement! ! Self-improvement! ! !

 

Unconsciously code so many words, you do not get to know how much.

 

 

Guess you like

Origin www.cnblogs.com/socoool/p/12629806.html