How to configure tengine's log monitoring using ARMS

Abstract: The business real-time monitoring service ARMS provides such an entrance very well, and provides a solution for the company to query the running status of Tengine in real time. In the future, ARMS and ODPS will form a complementary solution for company monitoring, and tengine logs can be delivered to ARMS and MaxCompute at the same time in the future: ARMS responds to multi-dimensional alarms in the first place; MaxCompute does in-depth analysis, such as interface requests number, response time, etc., and make targeted tuning.

Wang Xinyan from Shenzhen Xiaoyi Network Co., Ltd.

Recently, the company successfully built a tengine-based log monitoring system through the business real-time monitoring service ARMS . Here is a brief share of the experience of using ARMS for monitoring tengine logs.

Since the company's development so far, Alibaba's tengine is used as a web container for all interfaces at this stage, similar to nginx, and information including host, url, ip, package body size, response time and other information is also recorded in the log. The current business requirement scenario is to have a system to monitor interface anomalies, to detect system anomalies in time, and to identify which projects, which servers, and even which URLs are abnormal, so as to improve the speed of problem analysis and problem solving.

The tengine logs are distributed to different servers, and the log service is first used to collect these logs separately. Then use logHub as the log source for monitoring. One of the most important reasons for using ARMS is that the log format of our system tengine is customized to a certain extent, and an end-to-end monitoring product provides a particularly strong customized data cleaning function, as well as aggregate calculation + alarm function.

Here is a detailed introduction to the log segmentation function of ARMS.

The first step is a start node;
the second step, the LogHub data source is in JSON format by default, and JSON parsing is performed, as shown below:

Pay attention to the time format of the date type, especially the "hour" part. The default format is hh (12 hours) and needs to be changed to HH (24 hours).
The third step is to clean and filter out the hosts that do not need to be monitored here, as shown below:

Through the function, filter out hosts that do not end with the specified domain name.
The fourth step is to parse the URL. Use the "single delimiter splitter" to split the request field according to spaces, and obtain the calling method (GET/POST), full URL, and protocol version respectively.

The fifth step is to distinguish error codes. This is mainly to distinguish whether the return code represents normal or error. Only error return codes are monitored here. The example in the figure below is that the return code is 4xx or 5xx, indicating an error, and others indicating normal. The new field is_error indicates whether it is an "error code", and assigning 1 or 0 indicates yes or no.

The sixth step is to configure the correspondence between the server IP and the server name. A mapping table is configured as follows:

Note that a kv relationship of 127.0.0.1 must be configured here, otherwise the "server name" field will not be visible in the log split preview.

Finally, simplify the host again, remove the suffix of the domain name, and keep only the prefix.

At this point, log segmentation is all over. Click "Log Split Preview" to view the split effect. Click Save, Next.

To configure the data set, for example, to monitor the number of accesses to the interface, use count(_line); to monitor the average packet size, use sum(packet size)÷count(_line); to monitor the average response time, use sum(response time) ÷count(_line). The dimension is configured as required. Here, the server is configured first, followed by the interface domain name.

At this point, the configuration is complete, click Save to complete the configuration. The task can be started. The above is a simple example.

Here is a list of some of the monitors I made.

Number of requests renderings:

Response time rendering:

Package size renderings:

The number of error codes renderings:

By observing and analyzing several charts, operation and maintenance personnel can quickly find the reasons for some abnormal situations and deal with them in time.

A young Internet company needs continuous growth. The same is true of the system, which needs continuous improvement and optimization. The business real-time monitoring service ARMS provides such an entrance very well, providing a solution for the company to query the running status of Tengine in real time. In the future, ARMS and ODPS will form a complementary solution for company monitoring, and tengine logs can be delivered to ARMS and MaxCompute at the same time in the future: ARMS responds to multi-dimensional alarms in the first place; MaxCompute does in-depth analysis, such as interface requests number, response time, etc., and make targeted tuning.

Real-time business monitoring ARMS is in public beta
to learn about all Alibaba Cloud Internet middleware products and usage scenarios

 

Original link: http://click.aliyun.com/m/28209/

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326458406&siteId=291194637