Cat Getting Started Learning Notes Organize


This article is a set of introductory study notes for Cat, an open source distributed real-time monitoring system developed by Meituan Dianping based on Java, mainly referring to the following materials:

Link to the code warehouse involved in this article: https://gitee.com/DaHuYuXiXi/cat-demo-project


Link Call Tracking Introduction

Under the microservice architecture system, the call relationship links between microservices in the microservice system are very complicated, and considering that different microservices may be developed and maintained by different teams, once a call link node If there is a problem on the Internet, it becomes very difficult to troubleshoot the problem. The main reason for the difficulty is that we need to restore the original site of the current business execution, so that we can analyze the business execution situation and locate the problem.

Therefore, call link monitoring is a very important part of the microservice architecture system. In addition to helping us locate problems, it can also help project members to clearly understand the project deployment structure. After all, a tens of hundreds of microservices, I believe After running for a long time, the structure of the project will have the above-mentioned very complicated call chain. In this case, the team developers and even the architects may not have a clear understanding of the network structure of the project. Don't talk about system optimization.

Under the microservice architecture, if there is no powerful call chain monitoring tool, the following problems will inevitably arise:

  • The problem is not handled in time, which affects the user experience
  • The person in charge of different applications does not admit that it is their own problems that lead to the failure, which is prone to "wrangling"
  • The calling relationship between services is difficult to sort out, and there may be many wrong calling relationships
  • Because there is no specific data, team members do not care about their application performance

It can be seen from the above that we need a call link monitoring tool , which at least needs to help us fulfill the following requirements:

  • Whether the online service is running normally. Is there some service that has gone down, but we didn't find it? How to quickly discover services that have gone down?

  • A call from the user failed, which service caused the error, we need to be able to quickly locate it in order to fix it.

  • Users report that our system is "slow". How do you know where the slowness is?


A brief analysis of the implementation process of link call monitoring

In 2010, Google published a paper called " Dapper, a Large-Scale Distributed Systems Tracing Infrastructure ", which introduced the design and use experience of Dapper, a tracking system under large-scale distributed systems in Google's production environment. Nowadays, many call chain systems such as zipkin/pinpoint and other systems are implemented based on this article.

Next, let's briefly introduce the principle of call chain monitoring in Dapper:
insert image description here
As shown in the figure above, this is a simple business for querying orders. It has the following steps:

  1. The front-end browser initiates a request to the order service, and the order service will query the corresponding order data from the database. The order data contains the ID of the product, so it is also necessary to query the product information.
  2. The order service initiates a call to remotely call the product information query interface of the product service through rpc.
  3. The order service assembles the data and returns it to the front end.

In these steps, there are several core concepts that need to be understood:

  • Trace:

    Trace refers to the link process of a request call, and trace id refers to the ID of this request call. In a request, a globally unique trace id for identifying this request will be generated at the very beginning of the network. This trace id will remain unchanged no matter how many nodes it passes through during this request call, and will be The calls of each layer are passed continuously. Finally, the trace id can be used to string together all the paths of the user request in the system this time.

  • Span:

    Span refers to the calling process of a module, generally identified by span id. Different nodes/modules/services will be called during a request, and a new span id will be generated for each call to record. In this way, the position of the current request in the entire system call chain and its upstream and downstream nodes can be located through the span id.

So back to the above case, the two processes of querying order data and querying product data are two spans, which we record as span A and span B. The parent of B, that is, the parent span, is A. Both spans have the same Trace Id: 1.

And in the process of information collection, the start time and end time of the call will be recorded, and the time-consuming call will be calculated from it.

In this way, you can clearly know that each call:

  • Which services have been passed and the order in which the services are called
  • Time-consuming of each service process

Common Link Tracking Frameworks

insert image description here
CAT is a call chain monitoring system open sourced by Dianping. The server is developed based on JAVA. There are many Internet companies using it, and the popularity is very high. It has a very powerful and rich visual report interface, which is actually very important for a call chain monitoring system. There are many functions in the report interface provided by CAT, and you can almost see the report data of any dimension you want.

  • Features: Rich aggregation reports, good Chinese support, many domestic cases

insert image description here

Pinpoint is implemented by a Korean team and is open source. It is designed for a large-scale distributed system written in Java. It implements byte code implantation through the mechanism of JavaAgent, realizes the purpose of adding traceid and obtaining performance data, and has zero intrusion into application code.

insert image description here


insert image description here
SkyWalking is an open source APM project under the Apache Foundation, designed for microservice architecture and cloud native architecture systems. It automatically collects the required metrics through probes and performs distributed tracing. Through these call links and indicators, Skywalking APM will perceive the relationship between applications and services, and perform corresponding indicator statistics. Skywalking supports link tracking and monitoring application components that basically cover mainstream frameworks and containers, such as domestic RPC Dubbo and motan, and internationalized spring boot and spring cloud.

  • Features: Support a variety of plug-ins, strong UI functions, no code intrusion at the access terminal

  • Official website: http://skywalking.apache.org/

insert image description here


insert image description here

Open sourced by Twitter, Zipkin is a distributed link call monitoring system that aggregates call delay data of various business systems to achieve link call monitoring and tracking. Zipkin is implemented based on Google's Dapper paper, which mainly completes data collection, storage, search and interface display.

insert image description here


CAT report introduction

The content of this part is intercepted to the introduction chapter of the official document report, link: Official document: report introduction
insert image description here

CAT supports the following reports:

report name report content
Transaction report The running time and times of a piece of code, such as the corresponding time of URL/cache/sql execution times
Event report The number of times a piece of code runs, such as an exception occurs
Problem report According to the Transaction/Event data, analyze the slow program that may occur once in the system
Heartbeat report JVM status information
Business report Business indicators, etc., users can customize

Transaction report

insert image description here
insert image description here
insert image description here


Event report

insert image description here
insert image description here


Problem report

insert image description here


Heartbeat report

insert image description here


Business report

insert image description here


Cat combat

docker installation

#由于仓库的git历史记录众多,对于不关注历史,只关注最新版本或者基于最新版本贡献的新用户,可以在第一次克隆代码时增加--depth=1参数以加快下载速度,如
git clone --depth=1 https://github.com/dianping/cat.git

Module introduction:

  • cat-client: client, reporting monitoring data
  • cat-consumer: server, collects monitoring data for statistical analysis, and builds rich statistical reports
  • cat-alarm: Real-time alarm, providing monitoring alarm of report indicators
  • cat-hadoop: data storage, logview storage to Hdfs
  • cat-home: management terminal, report display, configuration management, etc.

Server installation:

Official Documentation: Server Deployment Tutorial

The environmental requirements of the CAT server are as follows:

  • Linux 2.6 and above (2.6 kernel can support epoll), please use Linux environment for online server deployment, Mac and Windows environment can be used as development environment, Meituan Dianping internal CentOS 6.5
  • Java 6, 7, 8, the version of jdk7 is recommended on the server side, and jdk6, 7, 8 on the client side are all supported
  • Maven 3 and above
  • MySQL 5.6, 5.7, and higher versions of MySQL are not recommended, and the compatibility is not clear
  • It is recommended to use tomcat for the J2EE container, and it is recommended to use the recommended version 7.*. or 8.0.
  • The Hadoop environment is optional, and it is generally recommended that small-scale companies directly use the disk mode. You can apply for a CAT server, a 500GB disk or a larger disk, and this disk is mounted on the /data/ directory.

Database installation:

  • Database script file script/CatApplication.sql

    mysql -uroot -Dcat < CatApplication.sql
    
  • illustrate:数据库编码使用utf8mb4,否则可能造成中文乱码等问题

App packaging:

The version of cat installed in this article is 3.0.0. If the version is inconsistent with that installed in this article, there may be various compatibility problems

The cat.war corresponding to cat 3.0.0 version has been given in the code warehouse given at the beginning of this article, you can pull the project to get it


Linux source code installation steps:

  • Create a /data directory to store CAT configuration files and run-time data storage directories

Notice:

  • It is required that the /data/ directory can be read and written. If the /data/ directory cannot be written, it is recommended to use a linux soft link to link to a fixed writable directory. All client integration program machines and CAT server machines need to initialize this permission.
  • CAT supports the CAT_HOME environment variable, and the default path can be modified through JVM parameters.
mkdir /data
chmod -R 777 /data/
  • configuration/data/appdatas/cat/client.xml ($CAT_HOME/client.xml)
mkdir -p /data/appdatas/cat
cd /data/appdatas/cat
vi client.xml

Modify the client.xml file:

<?xml version="1.0" encoding="utf-8"?>
<config mode="client">
    <servers>
    	<!--下面的IP地址替换为主机的IP地址-->
        <server ip="192.168.1.101" port="2280" http-port="8080"/>
    </servers>
</config>
  • Configure /data/appdatas/cat/datasources.xml ($CAT_HOME/datasources.xml)
vi datasources.xml
<?xml version="1.0" encoding="utf-8"?>

<data-sources>
	<data-source id="cat">
		<maximum-pool-size>3</maximum-pool-size>
		<connection-timeout>1s</connection-timeout>
		<idle-timeout>10m</idle-timeout>
		<statement-cache-size>1000</statement-cache-size>
		<properties>
			<driver>com.mysql.jdbc.Driver</driver>
			<url><![CDATA[jdbc:mysql://127.0.0.1:3306/cat]]></url>  <!-- 请替换为真实数据库URL及Port  -->
			<user>root</user>  <!-- 请替换为真实数据库用户名  -->
			<password>root</password>  <!-- 请替换为真实数据库密码  -->
			<connectionProperties><![CDATA[useUnicode=true&characterEncoding=UTF-8&autoReconnect=true&socketTimeout=120000]]></connectionProperties>
		</properties>
	</data-source>
</data-sources>
  • Create a database named cat and import CatApplication.sql into it

insert image description here

  • Pull the tomcat image
#1.拉取镜像
docker pull tomcat:8.5.40
  • Find a directory, put the cat.war package, client.xml, server.xml, and datasource.xml in the directory, and create and edit the Dockerfile in the directory

Note: Rename cat-home.war to cat.war.

from tomcat:8.5.40 
RUN rm -rf /usr/local/tomcat/webapps/*
COPY cat.war   /usr/local/tomcat/webapps
RUN mkdir -p /data/appdatas/cat
RUN mkdir -p /data/applogs/cat
RUN chmod -R 777 /data/
COPY client.xml   /data/appdatas/cat
COPY datasources.xml   /data/appdatas/cat
COPY server.xml   /data/appdatas/cat
ENV TZ=Asia/Shanghai
RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone
EXPOSE 8080
EXPOSE 2280
  • Build cat image based on tomcat image
docker build -t cat:3.0 .
  • Start the cat container
docker run -di --name=tomcat -p 8080:8080 -p 2280:2280 -v /root/tomcat/webapps:/usr/local/tomcat/webapps  -v /root/tomcat/conf:/usr/local/tomcat/conf -v /data:/data tomcat:8

Remember to open both ports 8080 and 2280 of the firewall

  • In order to prevent Chinese garbled characters, you can use the docker exec command to enter the cat container, modify the server.xml configuration file, and then restart the container
#修改/root/tomcat/conf目录下 server.xml配置文件,防止中文乱码产生
<Connector port="8080" protocol="HTTP/1.1"
           URIEncoding="utf-8"    connectionTimeout="20000"
               redirectPort="8443" />  <!-- 增加  URIEncoding="utf-8"  -->  

Note: After modifying the configuration below, remove the comment and paste it directly into the page given above.

<?xml version="1.0" encoding="utf-8"?>
<server-config>
   <server id="default">
      <properties>
         <property name="local-mode" value="false"/>
         <property name="job-machine" value="false"/>
         <property name="send-machine" value="false"/>
         <property name="alarm-machine" value="false"/>
         <property name="hdfs-machine" value="false"/>
            #替换为当前cat所在服务器ip,如果cat采用集群部署,这里可以指定各个节点的地址 
         <property name="remote-servers" value="10.1.1.1:8080,10.1.1.2:8080,10.1.1.3:8080"/>
      </properties>
       # 没有使用分布式文件系统HDFS,下面这段配置可以不用管
      <storage  local-base-dir="/data/appdatas/cat/bucket/" max-hdfs-storage-time="15" local-report-storage-time="7" local-logivew-storage-time="7">
        <hdfs id="logview" max-size="128M" server-uri="hdfs://10.1.77.86/" base-dir="user/cat/logview"/>
        <hdfs id="dump" max-size="128M" server-uri="hdfs://10.1.77.86/" base-dir="user/cat/dump"/>
        <hdfs id="remote" max-size="128M" server-uri="hdfs://10.1.77.86/" base-dir="user/cat/remote"/>
      </storage>
      <consumer>
         <long-config default-url-threshold="1000" default-sql-threshold="100" default-service-threshold="50">
            <domain name="cat" url-threshold="500" sql-threshold="500"/>
            <domain name="OpenPlatformWeb" url-threshold="100" sql-threshold="500"/>
         </long-config>
      </consumer>
   </server>
   #替换为当前cat所在服务器ip
   <server id="10.1.1.1">
      <properties>
         <property name="job-machine" value="true"/>
         <property name="alarm-machine" value="true"/>
     <property name="send-machine" value="true"/>
      </properties>
   </server>
</server-config>

insert image description here

Note: After modifying the configuration below, remove the comment and paste it directly into the page given above.

<?xml version="1.0" encoding="utf-8"?>
<router-config backup-server="cat所在服务器IP" backup-server-port="2280">
   <default-server id="cat所在服务器IP" weight="1.0" port="2280" enable="true"/>
   <network-policy id="default" title="默认" block="false" server-group="default_group">
   </network-policy>
   <server-group id="default_group" title="default-group">
      <group-server id="cat所在服务器IP"/>
   </server-group>
   <domain id="cat">
      # 对服务器进行分组,如果配置了多个服务器,下面需要指明这多个服务器的地址
      <group id="default"> 
         <server id="cat所在服务器IP" port="2280" weight="1.0"/>
      </group>
   </domain>
</router-config>

client integration

  • Add maven dependency
        <dependency>
            <groupId>com.dianping.cat</groupId>
            <artifactId>cat-client</artifactId>
            <version>3.0.1</version>
        </dependency>
  • Create src/main/resources/META-INF/app.propertiesa file and add the following:
app.name={
    
    appkey}

appkey can only contain English letters (az, AZ), numbers (0-9), underscores (_) and dashes (-)

  • All the following files, if they are under Windows, need to be created under the drive letter of the startup project. (The project is deployed under the c drive, then create a data directory under the root directory of the c drive)

    1. Create /data/appdatas/cata directory -> Make sure you have read and write permissions for this directory.
    2. Create /data/applogs/cata directory (optional) —> this directory is used to store runtime logs, which will be of great help for debugging, and also requires read and write permissions.
    3. Create /data/appdatas/cat/client.xml, the content is as follows
<?xml version="1.0" encoding="utf-8"?>
<config xmlns:xsi="http://www.w3.org/2001/XMLSchema" xsi:noNamespaceSchemaLocation="config.xsd">
    <servers>
        <server ip="127.0.0.1" port="2280" http-port="8080" />
    </servers>
</config>
  • Write code
@RestController
public class TestController {
    
    
    @GetMapping("/test")
    public String test() {
    
    
        Transaction t = Cat.newTransaction("URL", "pageName");

        try {
    
    
            Cat.logEvent("URL.Server", "serverIp", Event.SUCCESS, "ip=${serverIp}");
            Cat.logMetricForCount("metric.key");
            Cat.logMetricForDuration("metric.key", 5);

            //让代码抛出异常
            int i = 1 / 0;
            t.setStatus(Transaction.SUCCESS);
        } catch (Exception e) {
    
    
            t.setStatus(e);
            Cat.logError(e);
        } finally {
    
    
            t.complete();
        }

        return "test";
    }
}
  • Start the springboot project, access the interface, and then go to cat to view the results

insert image description here
As shown in the figure above, there have been 4 interface call errors in the dhy-cat application. Let’s look at the following: View the
insert image description here
specific error message:
insert image description here
It is obvious that the above figure is actually a division by 0 exception, so far SpringBoot client Integrating Cat and you're done.


API introduction

Transaction

Transaction is suitable for recording program access behaviors across system boundaries, such as remote calls and database calls, and is also suitable for business logic monitoring with a long execution time. Transaction is used to record the execution time and times of a piece of code.

Now our framework has not been integrated with dubbo and mybatis, so we manually write a local method to test the usage of Transaction and create TransactionController for testing.

    @GetMapping("/test1")
    public String test1(){
    
    
        //开启第一个Transaction,类别为URL,名称为test
        Transaction t = Cat.newTransaction("URL", "test");

        try {
    
    
            dubbo();
            t.setStatus(Transaction.SUCCESS);
        } catch (Exception e) {
    
    
            t.setStatus(e);
            Cat.logError(e);
        } finally {
    
    
            t.complete();
        }

        return "test";
    }


    private String dubbo(){
    
    
        //开启第二个Transaction,类别为DUBBO,名称为dubbo
        Transaction t = Cat.newTransaction("DUBBO", "dubbo");

        try {
    
    
            t.setStatus(Transaction.SUCCESS);
        } catch (Exception e) {
    
    
            t.setStatus(e);
            Cat.logError(e);
        } finally {
    
    
            t.complete();
        }

        return "test";
    }

In the above code, two Transactions are opened, the first Transaction is the interface call received by the Controller, and the second is the local method dubbo we wrote to simulate the remote call. Inside the method, start the second Transaction.

Start the project and access the interface http://localhost:9200/test1

insert image description here
Click the Transaction report on the left menu, and select the Log View corresponding to the URL type to view the call chain relationship.
insert image description here

As shown in the figure, the call chain has been formed. It can be seen that the test of type URL calls the dubbo method of type DUBBO, which takes 0.47ms and 1.02ms respectively.


extension API

CAT provides a series of APIs to modify Transaction.

  • addData add additional data display
  • setStatus set status, success can set SUCCESS, failure can set exception
  • setDurationInMillis Set the execution time (milliseconds)
  • setTimestamp set the execution time
  • complete end Transaction

Write the following code to test:

    @GetMapping("/api")
    public String api() {
    
    
        Transaction t = Cat.newTransaction("URL", "pageName");

        try {
    
    
            //设置执行时间1秒
            t.setDurationInMillis(1000);
            t.setTimestamp(System.currentTimeMillis());
            //添加额外数据
            t.addData("content");
            t.setStatus(Transaction.SUCCESS);
        } catch (Exception e) {
    
    
            t.setStatus(e);
            Cat.logError(e);
        } finally {
    
    
            t.complete();
        }

        return "api";
    }

Start the project and access the interface: http://localhost:9200/api

Click the Transaction report on the left menu, and select the Log View corresponding to the URL type to view the call chain relationship.

insert image description here
Click the Transaction report on the left menu, and select the Log View corresponding to the URL type to view the call chain relationship.
insert image description here
As shown in the figure, the call time has been manually modified to 1000ms, and additional information content has been added.

When using the Transaction API, you may need to pay attention to the following points:

  1. You can call addDatamultiple times , and the added data will be &concatenated.
  2. Don't forget to complete transaction.complete(), or you'll get a broken message tree and a memory leak!

Event

Event is used to record the number of times an event occurs, such as recording system exceptions. Compared with transaction, it lacks time statistics, and its overhead is smaller than transaction.

  • Cat.logEvent: Log an event.
Cat.logEvent("URL.Server", "serverIp", Event.SUCCESS, "ip=${serverIp}");
  • Cat.logError: Log an Error with error stack information.

Error is a special event that typedepends on passed in Throwable e:

1. 如果 `e` 是一个 `Error`, `type` 会被设置为 `Error`。
2. 如果 `e` 是一个 `RuntimeException`, `type` 会被设置为 `RuntimeException`。
3. 其他情况下,`type` 会被设置为 `Exception`。

At the same time, the error stack information will be collected and written to datathe property .

try {
    
    
    int i = 1 / 0;
} catch (Throwable e) {
    
    
    Cat.logError(e);
}

You can add your own error messages to the top of the error stack, as shown in the following code:

Cat.logError("error(X) := exception(X)", e);

Write a case to test the above API:

@RestController
@RequestMapping("/event")
public class EventController {
    
    
    @RequestMapping("/logEvent")
    public String logEvent(){
    
    
        Cat.logEvent("URL.Server", "serverIp",
                Event.SUCCESS, "ip=127.0.0.1");
        return "test";
    }

    @RequestMapping("/logError")
    public String logError(){
    
    
        try {
    
    
            int i = 1 / 0;
        } catch (Throwable e) {
    
    
            Cat.logError("error(X) := exception(X)", e);
        }
        return "test";
    }
}

Start the project and access the interfaces http://localhost:9200/event/logEvent and http://localhost:9200/event/logError .

insert image description here
As can be seen from the figure above, two events have been added: URL.Server and RuntimeException. Click to view the LOG.

insert image description here
Here are the details of the two events:

1.URL.Server is a normal event, and the information of IP=127.0.0.1 is printed out.

2.RuntimeException is an error event, not only print out the error stack, but also print the

error(X) := exception(X)

Content is placed at the top of the stack for easy viewing.


Metric

Metric is used to record business indicators. Indicators may include the number of records for a metric, the average value of records, and the sum of records. The minimum statistical granularity of business indicators is 1 minute.

# Counter
Cat.logMetricForCount("metric.key");
Cat.logMetricForCount("metric.key", 3);

# Duration
Cat.logMetricForDuration("metric.key", 5);

We aggregate metrics every second.

For example, if you call count three times (with the same name) in the same second, accumulate their values ​​and report to the server at once.

durationIn the case of , the average value is used instead of the accumulated value.

Write a case to test the above API:

@RestController
@RequestMapping("/metric")
public class MetricController {
    
    

    @RequestMapping("/count")
    public String count(){
    
    
        Cat.logMetricForCount("count");
        return "test";
    }

    @RequestMapping("/duration")
    public String duration(){
    
    
        Cat.logMetricForDuration("duration", 1000);
        return "test";
    }
}

Start the project, access the interface http://localhost:9200/metric/count and click 5 times and http://localhost:9200/metric/duration .

insert image description here
As you can see from the above figure, the specific values ​​of count and duration.

count has been clicked 5 times in total, so the value in this minute is 5. And the duration is always 1000 no matter how many times you click, because it takes the average value.

Statistics granularity is minutes


CAT monitoring interface introduction

DashBoard

The DashBoard dashboard shows the systems with errors per minute and the number and time of errors.

insert image description here

  • Click the time button in the upper right corner to switch between different display times, -7d means 7 days ago, -1h means 1 hour ago, now locates to the current time
  • The time axis at the top is arranged according to the minutes. After clicking, you can see the abnormal situation from the time to the end
  • The error system and the time and times of error are identified below. Click the system name to jump to the Problem report

Transaction

The Transaction report is used to monitor the operation of a piece of code: 运行次数、QPS、错误次数、失败率、响应时间统计(平均影响时间、Tp分位值)等等.

  • The parts that will be dotted by default after the application starts:
RBI source component describe
System cat-client Report the RBI information of monitoring data
URL Need to access cat-filter RBI information for URL access
  • Hourly report: The Type statistics interface shows a view of the first-level classification of Transaction. You can know the number of times a classification is run during this period, the average response time, delay, and quantile line.

insert image description here
Analyze the report from the top down:

  1. The time span of the report CAT defaults to one hour as the statistical time span. Click [Switch to History Mode] to change the time span for viewing reports: the default is the hour mode; after switching to the history mode, the quick navigation on the right changes to month( Monthly report), week (weekly report), day (day report), you can click to view, note that the time span of the report will be different.

  2. Time selection Select the time through the time navigation bar in the upper right corner: click [+1h]/[-1h] to switch the time to the next hour/previous hour; click [+1d]/[-1d] to switch the time to the same hour of the next day /The same hour of the previous day; click [+7d]/[-7d] on the upper right corner to switch the time to the same hour of the next week/the same hour of the previous week; click [now] to return to the current hour.

  3. Project selection Enter the project name to view the project data; if you need to switch other project data, enter the project name and press Enter.

  4. Machine grouping CAT can count several machines as a group for data statistics. By default, there will be an All group, which represents the statistical data of all machines, that is, the cluster statistical data.

  5. All Type summary table First-level classification (Type), click to view the second-level classification (called name) data:
    insert image description here

  • The Type and Name of the buried point of Transaction are defined by the business itself. When Cat.newTransaction(type, name) is clicked, the first-level classification is type, and the second-level classification is name.
  • The second-level classification data is called to count all name data under the same type, and the data has the same display style as the first-level (type).
  1. Single Type indicator chart Click show to view minute-level statistics of all Type names, as shown in the figure below:

insert image description here

  1. Indicator description Displays data such as the number of first-level classifications (type) at the hourly granularity, the number of errors, and the failure rate.

  2. The sample logview L represents logview, which is a sample call link.

insert image description here

  1. Quantile line description Hourly granular time first-level classification (type) related statistics
  • 95line means that the response time of 95% of the requests is shorter than the reference value, 999line means that the response time of 99.9% is shorter than the reference value, 95line and 99line are also called tp95, tp99

  • historical report

Transaction history reports support daily, weekly, and monthly data statistics and trend graphs. Click the switch history mode in the navigation bar to query. Transaction history reports are displayed in three dimensions: response time, visits, and errors. Take the daily report as an example: select a type and click show to view the daily report.


Event

The Event report monitors the number of times a piece of code runs: 例如记录程序中一个事件记录了多少次,错误了多少次. The overall structure of the Event report is almost the same as that of the Transaction report, only the statistics of the response time are missing.

  • The first level classification (Type) statistical interface

The Type statistics interface shows a view of the first-level classification of Events. Compared with Transactions, Events have less running time statistics. You can know the number of times a classification runs during this period, the number of failures, the failure rate, sampling logView, and QPS.

insert image description here

  • The second level classification (Name) statistical interface

The second-level classification is entered by clicking on a specific Type in the Type statistics interface, and it displays all name data under the same type, which can be understood as a more detailed classification under a certain type.

insert image description here


Problem

Problem records the problems that occur during the operation of the entire project, including some exceptions, errors, and long-term access behaviors. The Problem report is integrated with the existing features of logview, which is convenient for users to locate problems. source:

  1. The business code shows that the Cat.logError(e) API is called to bury the point. For specific burying instructions, please refer to the burying document.
  2. Integrated with the LOG framework, exception logs with exception stacks in the log will be captured.
  3. long-url, indicates the slow request of the Transaction dot URL
  4. long-sql, indicating the slow request of Transaction SQL
  5. long-service, indicating the slow request of Transaction Service or PigeonService
  6. long-call, indicating the slow request of Transaction Call or PigeonCall
  7. long-cache, indicating that the Transaction manages the Cache. The slow request at the beginning

All error summary reports The first level of classification (Type) represents the type of error, such as error, long-url, etc.; the second level of classification (called Status) corresponds to a specific error, such as an exception class name, etc.

insert image description here

Error number distribution Click the show of type and status to display the minute-level error number distribution of type and status respectively:

insert image description here


HeartBeat

The Heartbeat report is a CAT client, which periodically reports to the server some status during the current run in a one-minute cycle

  • JVM related indicators

All the following indicator statistics are values ​​within 1 minute, and the minimum statistical granularity of cat is one minute.

JVM GC related indicators describe
NewGc Count / PS Scavenge Count Cenozoic GC times
NewGc Time / PS Scavenge Time New generation GC time-consuming
OldGc Count Old generation GC times
PS MarkSweepTime Old generation GC time-consuming
Heap Usage Java virtual machine heap usage
None Heap Usage Usage of Java Virtual Machine Perm

insert image description here

JVM Thread related indicators describe
Active Thread The current active thread of the system
Daemon Thread system background thread
Total Started Thread The system has a total of open threads
Started Thread Threads newly started by the system per minute
CAT Started Thread CAT client startup thread in the system

insert image description here
You can refer to the definition of java.lang.management.ThreadInfo


  • System indicators
System related indicators describe
System Load Average System Load details
Memory Free System memoryFree situation
FreePhysicalMemory free space in physical memory
/ Free /root usage
/data Free /data disk usage

insert image description here
insert image description here


Business

Business reports correspond to business indicators, such as order indicators. Different from Transaction, Event, and Problem, Business is more inclined to macro indicators, and the other three are more inclined to micro code execution.

Scenario example:

1. 我想监控订单数量。
2. 我想监控订单耗时。

insert image description here

  • Baseline: A baseline is a predicted value of a business metric.
  • Baseline Generation Algorithm: Calculated by the weighted sum of the data of the 4 days of the week in the last month. Based on the principle of trusting new data, cat will correct abnormal points based on historical data, and will make some abnormal points that are significantly higher than and Points below the mean are discarded.

Example: Today is 2018-10-25 (Thursday), the algorithm of the baseline data for the whole day today is the last four Thursdays (2018-10-18, 2018-10-11, 2018-10-04, 2018-09- 27) The weighted summation or average of each minute data, the weight values ​​are 1, 2, 3, 4 in turn. For example: the current time is 19:56 as value, and the data corresponding to 19:56 in the previous four weeks (from far to near) are A, B, C, D respectively, then value = (A+2B+3C+4D) / 10.

For a newly launched application, there is no baseline on the first day, and the baseline on the second day is the data from the previous day, and so on.

  • How to enable the baseline: Only the indicators configured with baseline alarms will automatically calculate the baseline. If you need the baseline function, please configure the baseline alarm.
  • Precautions:
  1. 打点尽量用纯英文,不要带一些特殊符号,例如 空格( )、分号(:)、竖线(|)、斜线(/)、逗号(,)、与号(&)、星号(*)、左右尖括号(<>)、以及一些奇奇怪怪的字符
  2. 如果有分隔需求,建议用下划线(_)、中划线(-)、英文点号(.)等
  3. 由于数据库不区分大小写,请尽量统一大小写,并且不要对大小写进行改动
  4. 有可能出现小数:趋势图每个点都代表一分钟的值。假设监控区间是10分钟,且10分钟内总共上报5次,趋势图中该点的值为5%10=0.5

State

The State report shows information related to CAT.

insert image description here


Cat integrates mainstream frameworks

For the complete code, please refer to the warehouse link given at the beginning of this article

insert image description here

Cat integrated with Dubbo

  • Install the cat-monitor plug-in in the plug-in directory to the local warehouse
    insert image description here

Note: Before installing, you can modify the pom file first, and change the cat-client dependency to 3.0.1.
insert image description here
If there is an error that CatLogger does not exist, replace it with Cat.logError

Producer configuration:

  • In the dubbo project, use the following dependencies to import the dubbo plug-in
		<dependency>
			<groupId>net.dubboclub</groupId>
			<artifactId>cat-monitor</artifactId>
			<version>0.0.6</version>
		</dependency>
   
        <dependency>
            <groupId>org.apache.dubbo</groupId>
            <artifactId>dubbo-spring-boot-starter</artifactId>
            <version>3.0.7</version>
        </dependency>

        <dependency>
            <groupId>net.dubboclub</groupId>
            <artifactId>cat-monitor</artifactId>
            <version>0.0.6</version>
        </dependency>        
  • dubbo server configuration
server.port=9300
spring.application.name=dubbo_provider_cat
# 默认为dubbo协议
dubbo.protocol.name=dubbo
# dubbo协议默认通信端口号为20880
dubbo.protocol.port=20880
# 为了简化环境搭建,采用了本地直接调用的方式,所以将注册中心写成N/A表示不注册到注册中心
dubbo.registry.address=N/A
  • META-INF/app.properties勿忘

insert image description here

  • Design the interface and implementation class
public interface HelloService {
    
    
    String hello();
}

@DubboService(interfaceClass = HelloService.class)
public class HelloServiceImpl implements HelloService {
    
    

    public String hello() {
    
    
        return "hello cat";
    }
}
  • Producer startup class
@EnableDubbo(scanBasePackages = "org.example.dubbo")
@SpringBootApplication
public class CatProviderDemo {
    
    
    public static void main(String[] args) {
    
    
        SpringApplication.run(CatProviderDemo.class,args);
    }
}

Consumer configuration:

  • Introduce dependencies
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-test</artifactId>
            <scope>test</scope>
        </dependency>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-web</artifactId>
        </dependency>
        <dependency>
            <groupId>org.apache.dubbo</groupId>
            <artifactId>dubbo-spring-boot-starter</artifactId>
            <version>3.0.7</version>
        </dependency>
        <dependency>
            <groupId>net.dubboclub</groupId>
            <artifactId>cat-monitor</artifactId>
            <version>0.0.6</version>
        </dependency>
  • configuration
server.port=9500
spring.application.name=dubbo_consumer_cat
  • META-INF/app.properties勿忘

insert image description here

  • For convenience, copy the called UserService interface ( note that it is consistent with the full class name of HelloService in the producer )
public interface HelloService {
    
    
    public String hello();
}
  • startup class
@EnableDubbo(scanBasePackages = "org.example.dubbo")
@SpringBootApplication
public class CatConsumerDemo {
    
    
    public static void main(String[] args) {
    
    
        SpringApplication.run(CatConsumerDemo.class,args);
    }
}
  • Write a test case, start the producer first, and then run the test case
@SpringBootTest(classes = CatConsumerDemo.class)
public class ConsumerTest {
    
    
    //采用直连而非从注册中心获取服务地址的方式,在@Reference注解中声明
    @DubboReference(url = "dubbo://127.0.0.1:20880")
    private HelloService helloService;

    @Test
    public void test(){
    
    
        for (int i = 0; i < 1000; i++) {
    
    
            System.out.println(helloService.hello());   
        }
    }
}

insert image description here

  • View the cat page

insert image description here
As shown in the figure, the call of dubbo has been correctly displayed in the transaction report. Click log view to view detailed calls.
insert image description here
As shown in the figure, the call log has been successfully printed.

The log print content of the dubbo plug-in is not very good. If it is applied in an enterprise, secondary development can be carried out based on the dubbo plug-in.


Cat Integration Mybaits

  • Looking at the plug-in directory provided by cat, we can see that the way cat integrates mybaits is to use the interceptor of mybaits. We create a project integrating springboot and mybaits, and copy the provided interceptor class into the project.

insert image description here

  • test table structure
CREATE TABLE `t_user` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `username` varchar(32) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `password` varchar(32) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
  • Introduce dependencies
    <dependencies>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-web</artifactId>
        </dependency>
        <dependency>
            <groupId>org.mybatis.spring.boot</groupId>
            <artifactId>mybatis-spring-boot-starter</artifactId>
            <version>2.1.2</version>
        </dependency>
        <dependency>
            <groupId>mysql</groupId>
            <artifactId>mysql-connector-java</artifactId>
            <scope>runtime</scope>
            <version>5.1.27</version>
        </dependency>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-test</artifactId>
            <scope>test</scope>
        </dependency>
        <dependency>
            <groupId>com.dianping.cat</groupId>
            <artifactId>cat-client</artifactId>
            <version>3.0.1</version>
        </dependency>
        <dependency>
            <groupId>com.alibaba</groupId>
            <artifactId>druid</artifactId>
            <version>1.1.10</version>
        </dependency>
    </dependencies>
  • configuration file
# datasource
spring:
  datasource:
   url: jdbc:mysql://119.91.143.140:3306/test?useUnicode=true&characterEncoding=utf8&zeroDateTimeBehavior=convertToNull&useSSL=false&serverTimezone=Asia/Shanghai
   username: root
   password: xfx123xfx
   driver-class-name: com.mysql.jdbc.Driver
   type: com.alibaba.druid.pool.DruidDataSource

# mybatis
mybatis:
  mapper-locations: classpath:mapper/*Mapper.xml # mapper映射文件路径
  type-aliases-package: org.example.dao
server:
  port: 9500
  • mapper interface
@Mapper
public interface UserXmlMapper {
    
    
    List<User> findAll();
}
  • userMapper.xml mapping file
<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE mapper PUBLIC "-//mybatis.org//DTD Mapper 3.0//EN" "http://mybatis.org/dtd/mybatis-3-mapper.dtd">
<mapper namespace="org.example.mapper.UserXmlMapper">
    <select id="findAll" resultType="user">
        select * from t_user
    </select>
</mapper>
  • startup class
@SpringBootApplication
public class CatMybaitsDemo {
    
    
    public static void main(String[] args) {
    
    
        SpringApplication.run(CatMybaitsDemo.class,args);
    }
}
  • Test case, run the test case to make sure there are no problems so far
@SpringBootTest(classes = CatMybaitsDemo.class)
public class CatMybaitsDemoTest {
    
    
    @Resource
    private UserXmlMapper userXmlMapper;

    @Test
    public void testSearchUser() throws InterruptedException {
    
    
        try {
    
    
            userXmlMapper.findAll().forEach(System.out::println);
        } catch (Exception e) {
    
    
            e.printStackTrace();
        }
        Thread.sleep(30000);
    }
}

  • Introduce the mybaits plugin provided by cat
    insert image description here
import com.alibaba.druid.pool.DruidDataSource;
import com.dianping.cat.Cat;
import com.dianping.cat.message.Message;
import com.dianping.cat.message.Transaction;
import org.apache.ibatis.datasource.pooled.PooledDataSource;
import org.apache.ibatis.datasource.unpooled.UnpooledDataSource;
import org.apache.ibatis.executor.Executor;
import org.apache.ibatis.mapping.*;
import org.apache.ibatis.plugin.*;
import org.apache.ibatis.reflection.MetaObject;
import org.apache.ibatis.session.Configuration;
import org.apache.ibatis.session.ResultHandler;
import org.apache.ibatis.session.RowBounds;
import org.apache.ibatis.type.TypeHandlerRegistry;

import javax.sql.DataSource;
import java.lang.reflect.Field;
import java.lang.reflect.InvocationTargetException;
import java.text.DateFormat;
import java.util.Date;
import java.util.List;
import java.util.Locale;
import java.util.Properties;
import java.util.regex.Matcher;
import java.util.regex.Pattern;


/**
 *  1.Cat-Mybatis plugin:  Rewrite on the version of Steven;
 *  2.Support DruidDataSource,PooledDataSource(mybatis Self-contained data source);
 * @author zhanzehui([email protected])
 */

@Intercepts({
    
    
        @Signature(method = "query", type = Executor.class, args = {
    
    
                MappedStatement.class, Object.class, RowBounds.class,
                ResultHandler.class }),
        @Signature(method = "update", type = Executor.class, args = {
    
     MappedStatement.class, Object.class })
})
public class CatMybatisPlugin implements Interceptor {
    
    

    private static final Pattern PARAMETER_PATTERN = Pattern.compile("\\?");
    private static final String MYSQL_DEFAULT_URL = "jdbc:mysql://UUUUUKnown:3306/%s?useUnicode=true";
    private Executor target;

    @Override
    public Object intercept(Invocation invocation) throws Throwable {
    
    
        MappedStatement mappedStatement = this.getStatement(invocation);
        String          methodName      = this.getMethodName(mappedStatement);
        Transaction t = Cat.newTransaction("SQL", methodName);

        String sql = this.getSql(invocation,mappedStatement);
        SqlCommandType sqlCommandType = mappedStatement.getSqlCommandType();
        Cat.logEvent("SQL.Method", sqlCommandType.name().toLowerCase(), Message.SUCCESS, sql);

        String url = this.getSQLDatabaseUrlByStatement(mappedStatement);
        Cat.logEvent("SQL.Database", url);

        return doFinish(invocation,t);
    }

    private MappedStatement getStatement(Invocation invocation) {
    
    
        return (MappedStatement)invocation.getArgs()[0];
    }

    private String getMethodName(MappedStatement mappedStatement) {
    
    
        String[] strArr = mappedStatement.getId().split("\\.");
        String methodName = strArr[strArr.length - 2] + "." + strArr[strArr.length - 1];

        return methodName;
    }

    private String getSql(Invocation invocation, MappedStatement mappedStatement) {
    
    
        Object parameter = null;
        if(invocation.getArgs().length > 1){
    
    
            parameter = invocation.getArgs()[1];
        }

        BoundSql boundSql = mappedStatement.getBoundSql(parameter);
        Configuration configuration = mappedStatement.getConfiguration();
        String sql = sqlResolve(configuration, boundSql);

        return sql;
    }

    private Object doFinish(Invocation invocation,Transaction t) throws InvocationTargetException, IllegalAccessException {
    
    
        Object returnObj = null;
        try {
    
    
            returnObj = invocation.proceed();
            t.setStatus(Transaction.SUCCESS);
        } catch (Exception e) {
    
    
            Cat.logError(e);
            throw e;
        } finally {
    
    
            t.complete();
        }

        return returnObj;
    }


    private String getSQLDatabaseUrlByStatement(MappedStatement mappedStatement) {
    
    
        String url = null;
        DataSource dataSource = null;
        try {
    
    
            Configuration configuration = mappedStatement.getConfiguration();
            Environment environment = configuration.getEnvironment();
            dataSource = environment.getDataSource();

            url = switchDataSource(dataSource);

            return url;
        } catch (NoSuchFieldException|IllegalAccessException|NullPointerException e) {
    
    
            Cat.logError(e);
        }

        Cat.logError(new Exception("UnSupport type of DataSource : "+dataSource.getClass().toString()));
        return MYSQL_DEFAULT_URL;
    }

    private String switchDataSource(DataSource dataSource) throws NoSuchFieldException, IllegalAccessException {
    
    
        String url = null;

        if(dataSource instanceof DruidDataSource) {
    
    
            url = ((DruidDataSource) dataSource).getUrl();
        }else if(dataSource instanceof PooledDataSource) {
    
    
            Field dataSource1 = dataSource.getClass().getDeclaredField("dataSource");
            dataSource1.setAccessible(true);
            UnpooledDataSource dataSource2 = (UnpooledDataSource)dataSource1.get(dataSource);
            url =dataSource2.getUrl();
        }else {
    
    
            //other dataSource expand
        }

        return url;
    }

    public String sqlResolve(Configuration configuration, BoundSql boundSql) {
    
    
        Object parameterObject = boundSql.getParameterObject();
        List<ParameterMapping> parameterMappings = boundSql.getParameterMappings();
        String sql = boundSql.getSql().replaceAll("[\\s]+", " ");
        if (parameterMappings.size() > 0 && parameterObject != null) {
    
    
            TypeHandlerRegistry typeHandlerRegistry = configuration.getTypeHandlerRegistry();
            if (typeHandlerRegistry.hasTypeHandler(parameterObject.getClass())) {
    
    
                sql = sql.replaceFirst("\\?", Matcher.quoteReplacement(resolveParameterValue(parameterObject)));

            } else {
    
    
                MetaObject metaObject = configuration.newMetaObject(parameterObject);
                Matcher matcher = PARAMETER_PATTERN.matcher(sql);
                StringBuffer sqlBuffer = new StringBuffer();
                for (ParameterMapping parameterMapping : parameterMappings) {
    
    
                    String propertyName = parameterMapping.getProperty();
                    Object obj = null;
                    if (metaObject.hasGetter(propertyName)) {
    
    
                        obj = metaObject.getValue(propertyName);
                    } else if (boundSql.hasAdditionalParameter(propertyName)) {
    
    
                        obj = boundSql.getAdditionalParameter(propertyName);
                    }
                    if (matcher.find()) {
    
    
                        matcher.appendReplacement(sqlBuffer, Matcher.quoteReplacement(resolveParameterValue(obj)));
                    }
                }
                matcher.appendTail(sqlBuffer);
                sql = sqlBuffer.toString();
            }
        }
        return sql;
    }

    private String resolveParameterValue(Object obj) {
    
    
        String value = null;
        if (obj instanceof String) {
    
    
            value = "'" + obj.toString() + "'";
        } else if (obj instanceof Date) {
    
    
            DateFormat formatter = DateFormat.getDateTimeInstance(DateFormat.DEFAULT, DateFormat.DEFAULT, Locale.CHINA);
            value = "'" + formatter.format((Date) obj) + "'";
        } else {
    
    
            if (obj != null) {
    
    
                value = obj.toString();
            } else {
    
    
                value = "";
            }

        }
        return value;
    }

    @Override
    public Object plugin(Object target) {
    
    
        if (target instanceof Executor) {
    
    
            this.target = (Executor) target;
            return Plugin.wrap(target, this);
        }
        return target;
    }

    @Override
    public void setProperties(Properties properties) {
    
    
    }

}

It is a better option to package this file and all other cat plugins on a private repository.

  • Write the mybatis-config.xml configuration file and place the file in the resources/mybatis folder:
<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE configuration PUBLIC "-//mybatis.org//DTD Config 3.0//EN"
        "http://mybatis.org/dtd/mybatis-3-config.dtd">
<configuration>
    <plugins>
        <plugin interceptor="org.example.cat.CatMybatisPlugin"/>
    </plugins>
</configuration>
  • Modify the application.yml file:
# mybatis
mybatis:
  mapper-locations: classpath:mapper/*Mapper.xml # mapper映射文件路径
  type-aliases-package: org.example.dao
  # config-location:  # 指定mybatis的核心配置文件
  config-location: classpath:mybatis/mybatis-config.xml
  • Place META-INF/app.properties
    insert image description here

  • Test: If we modify the sql statement to a wrong statement, after starting, test again

insert image description here
It has been able to display some statement execution errors. If you want to view specific errors, click Log View to view:

insert image description here
In the figure, you can not only see the specific sql statement, but also the stack information of the error report.


Cat Integrated Logging Framework

The idea of ​​CAT integrated log framework is generally similar, so the default logback log framework of Spring Boot is used to explain in the course. If log4j and log4j2 are used, the processing methods are similar.

  • Introduce dependencies
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-web</artifactId>
        </dependency>

        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-test</artifactId>
            <scope>test</scope>
        </dependency>

        <dependency>
            <groupId>com.dianping.cat</groupId>
            <artifactId>cat-client</artifactId>
            <version>3.0.1</version>
        </dependency>
  • application.yml configuration
logging:
  level:
    root: info
  path: ./logs
  config: classpath:logback-spring.xml
server:
  port: 9600
  • Write the configuration file logback-spring.xml and place it in the resources directory:
<?xml version="1.0" encoding="UTF-8" ?>
<configuration>
    <!-- 属性文件:在properties文件中找到对应的配置项 -->
    <springProperty scope="context" name="logging.path" source="logging.path"/>
    <contextName>cat</contextName>
    <appender name="consoleLog" class="ch.qos.logback.core.ConsoleAppender">
        <encoder class="ch.qos.logback.classic.encoder.PatternLayoutEncoder">
            <!--格式化输出(配色):%d表示日期,%thread表示线程名,%-5level:级别从左显示5个字符宽度%msg:日志消息,%n是换行符-->
            <pattern>%yellow(%d{yyyy-MM-dd HH:mm:ss}) %red([%thread]) %highlight(%-5level) %cyan(%logger{50}) -
                %magenta(%msg) %n
            </pattern>
            <charset>UTF-8</charset>
        </encoder>
    </appender>

    <!--根据日志级别分离日志,分别输出到不同的文件-->
    <appender name="fileInfoLog" class="ch.qos.logback.core.rolling.RollingFileAppender">
        <filter class="ch.qos.logback.classic.filter.LevelFilter">
            <level>ERROR</level>
            <onMatch>DENY</onMatch>
            <onMismatch>ACCEPT</onMismatch>
        </filter>
        <encoder class="ch.qos.logback.classic.encoder.PatternLayoutEncoder">
            <pattern>
                %d{yyyy-MM-dd HH:mm:ss} [%thread] %-5level %logger{50} - %msg%n
            </pattern>
            <charset>UTF-8</charset>
        </encoder>
        <!--滚动策略-->
        <rollingPolicy class="ch.qos.logback.core.rolling.TimeBasedRollingPolicy">
            <!--按时间保存日志 修改格式可以按小时、按天、月来保存-->
            <fileNamePattern>${logging.path}/cat.info.%d{yyyy-MM-dd}.log</fileNamePattern>
            <!--保存时长-->
            <MaxHistory>90</MaxHistory>
            <!--文件大小-->
            <totalSizeCap>1GB</totalSizeCap>
        </rollingPolicy>
    </appender>

    <appender name="fileErrorLog" class="ch.qos.logback.core.rolling.RollingFileAppender">
        <filter class="ch.qos.logback.classic.filter.ThresholdFilter">
            <level>ERROR</level>
        </filter>
        <encoder>
            <pattern>
                %d{yyyy-MM-dd HH:mm:ss} [%thread] %-5level %logger{50} - %msg%n
            </pattern>
        </encoder>
        <!--滚动策略-->
        <rollingPolicy class="ch.qos.logback.core.rolling.TimeBasedRollingPolicy">
            <!--路径-->
            <fileNamePattern>${logging.path}/cat.error.%d{yyyy-MM-dd}.log</fileNamePattern>
            <MaxHistory>90</MaxHistory>
        </rollingPolicy>
    </appender>
    <root level="info">
        <appender-ref ref="consoleLog"/>
        <appender-ref ref="fileInfoLog"/>
        <appender-ref ref="fileErrorLog"/>
    </root>
</configuration>
  • View the related log framework integration plug-ins officially provided by cat

insert image description here

  • Copy CatLogbackAppender to our own project
package org.example.appender;

import ch.qos.logback.classic.Level;
import ch.qos.logback.classic.spi.ILoggingEvent;
import ch.qos.logback.classic.spi.ThrowableProxy;
import ch.qos.logback.core.AppenderBase;
import ch.qos.logback.core.LogbackException;
import com.dianping.cat.Cat;

import java.io.PrintWriter;
import java.io.StringWriter;

public class CatLogbackAppender extends AppenderBase<ILoggingEvent> {
    
    

	@Override
	protected void append(ILoggingEvent event) {
    
    
		try {
    
    
			boolean isTraceMode = Cat.getManager().isTraceMode();
			Level level = event.getLevel();
			if (level.isGreaterOrEqual(Level.ERROR)) {
    
    
				logError(event);
			} else if (isTraceMode) {
    
    
				logTrace(event);
			}
		} catch (Exception ex) {
    
    
			throw new LogbackException(event.getFormattedMessage(), ex);
		}
	}

	private void logError(ILoggingEvent event) {
    
    
		ThrowableProxy info = (ThrowableProxy) event.getThrowableProxy();
		if (info != null) {
    
    
			Throwable exception = info.getThrowable();

			Object message = event.getFormattedMessage();
			if (message != null) {
    
    
				Cat.logError(String.valueOf(message), exception);
			} else {
    
    
				Cat.logError(exception);
			}
		}
	}

	private void logTrace(ILoggingEvent event) {
    
    
		String type = "Logback";
		String name = event.getLevel().toString();
		Object message = event.getFormattedMessage();
		String data;
		if (message instanceof Throwable) {
    
    
			data = buildExceptionStack((Throwable) message);
		} else {
    
    
			data = event.getFormattedMessage().toString();
		}

		ThrowableProxy info = (ThrowableProxy) event.getThrowableProxy();
		if (info != null) {
    
    
			data = data + '\n' + buildExceptionStack(info.getThrowable());
		}

		Cat.logTrace(type, name, "0", data);
	}

	private String buildExceptionStack(Throwable exception) {
    
    
		if (exception != null) {
    
    
			StringWriter writer = new StringWriter(2048);
			exception.printStackTrace(new PrintWriter(writer));
			return writer.toString();
		} else {
    
    
			return "";
		}
	}

}
  • Modify the logback-spring.xml configuration file:
    <appender name="CatAppender" class="org.example.appender.CatLogbackAppender"/>
    <root level="info">
        <appender-ref ref="consoleLog"/>
        <appender-ref ref="fileInfoLog"/>
        <appender-ref ref="fileErrorLog"/>
        <appender-ref ref="CatAppender" />
    </root>
</configuration>
  • META-INF/app.properties勿忘

insert image description here

  • Write test cases
@SpringBootTest(classes = CatDemoMain.class)
@Slf4j
public class CatLogBackDemoTest {
    
    

    @Test
    public void testLog() throws InterruptedException {
    
    
        //需要开启跟踪模式,才会生效--可以参考CatLogbackAppender的append方法,一看便知        
        Cat.getManager().setTraceMode(true);
        log.info("cat info");
        try {
    
    
            int i = 1/0;
        }catch (Exception e){
    
    
            log.error("cat error",e);
        }
        //睡眠一会,让cat客户端有时间上报错误
        Thread.sleep(100000);
    }
}

  • Run to view the cat interface

insert image description here

The information listed by Cat is relatively detailed. There are INFO level logs and ERROR level logs. The ERROR level logs show all the stack information to facilitate problem analysis.

insert image description here


Cat Integration with SprinBoot

The integration method of Spring Boot is relatively simple. We use the Mybatis framework that has been built for testing.
insert image description here

  • Add the following configuration class to the config package:
@Configuration
public class CatFilterConfigure {
    
    

    @Bean
    public FilterRegistrationBean catFilter() {
    
    
        FilterRegistrationBean registration = new FilterRegistrationBean();
        CatFilter filter = new CatFilter();
        registration.setFilter(filter);
        registration.addUrlPatterns("/*");
        registration.setName("cat-filter");
        registration.setOrder(1);
        return registration;
    }
}
  • access test

insert image description here
The call in the figure first passes through the Controller, so the relevant information is printed out:

  • /mybatis interface address
  • URL.Server server, browser and other related information
  • URL.Method call method (GET, POST, etc.) and URL

Cat integrates Spring AOP

Using Spring AOP technology can simplify our burying operation, and by adding unified annotations, the specified method can be monitored by CAT.
insert image description here

  • It is relatively simple to use, here copy the two classes into our project, and mark the annotation on the method we want to monitor
@Retention(RUNTIME)
@Target(ElementType.METHOD)
public @interface CatAnnotation {
    
    
}
@Component
@Aspect
public class CatAopService {
    
    

	@Around(value = "@annotation(CatAnnotation)")
	public Object aroundMethod(ProceedingJoinPoint pjp) throws Throwable {
    
    
		MethodSignature joinPointObject = (MethodSignature) pjp.getSignature();
		Method method = joinPointObject.getMethod();

		Transaction t = Cat.newTransaction("method", method.getName());

		try {
    
    
			Object res = pjp.proceed();
			t.setSuccessStatus();
			return res;
		} catch (Throwable e) {
    
    
			t.setStatus(e);
			Cat.logError(e);
			throw e;
		} finally {
    
    
			t.complete();
		}

	}

}
  • Annotation demo
@RestController
public class AopController {
    
    

    @RequestMapping("aop")
    @CatAnnotation
    public String aop1(){
    
    

        return  "aop";
    }
}

Cat integrates Spring MVC

The official integration method of Spring MVC is to use AOP for integration. The source code is as follows:

  • AOP interface:
@Retention(RetentionPolicy.RUNTIME)
@Target(ElementType.METHOD)
public @interface CatTransaction {
    
    
    String type() default "Handler";//"URL MVC Service SQL" is reserved for Cat Transaction Type
    String name() default "";
}
  • AOP processing code:
 @Around("@annotation(catTransaction)")
    public Object catTransactionProcess(ProceedingJoinPoint pjp, CatTransaction catTransaction) throws Throwable {
    
    
        String transName = pjp.getSignature().getDeclaringType().getSimpleName() + "." + pjp.getSignature().getName();
        if(StringUtils.isNotBlank(catTransaction.name())){
    
    
            transName = catTransaction.name();
        }
        Transaction t = Cat.newTransaction(catTransaction.type(), transName);
        try {
    
    
            Object result = pjp.proceed();
            t.setStatus(Transaction.SUCCESS);
            return result;
        } catch (Throwable e) {
    
    
            t.setStatus(e);
            throw e;
        }finally{
    
    
            t.complete();
        }
    }

Cat warning

CAT provides us with a complete alarm function. Reasonable and flexible monitoring rules can help find business line faults faster and more accurately.

Alarm general configuration

  • Alarm server configuration

Only the machine configured as the alarm server will execute the alarm logic; only the machine configured as the sending server will send the alarm.

Enter the function global system configuration - server configuration, modify the server type, add <property name="alarm-machine" value="true"/>configuration and <property name="send-machine" value="true"/>configuration to the alarm server. As shown below:

insert image description here


Alert Policy

Alarm strategy: configure a certain alarm type, a certain item, a certain error level, the corresponding alarm sending channel, and the pause time.

insert image description here
Example: The following configuration example shows that for Transaction alarms, when the alarm project name is demo_project:

  • When the alarm level is error, the sending channel is email, SMS, WeChat, and the interval between consecutive alarms is 5 minutes
  • When the alarm level is warning, the sending channels are email and WeChat, and the interval between consecutive alarms is 10 minutes
<alert-policy>
	<type id="Transaction">
          <group id="default">
             <level id="error" send="mail,weixin" suspendMinute="5"/>
             <level id="warning" send="mail,weixin" suspendMinute="5"/>
          </group>
          <group id="demo-project">
             <level id="error" send="mail,weixin,sms" suspendMinute="5"/>
             <level id="warning" send="mail,weixin" suspendMinute="10"/>
          </group>
    </type>
</alert-policy>
  • Configuration instructions:

    • type: type of alarm, optional: Transaction, Event, Business, Heartbeat
    • Group id attribute: group can be default, representing the default, that is, all projects; it can also be a project name, representing the strategy of a certain project, and the default strategy will not take effect at this time
    • level id attribute: error level, divided into warning for warning and error for error
    • level send attribute: alarm channel, divided into mail-mailbox, weixin-WeChat, sms-text message
    • level suspendMinute attribute: the suspension time of continuous alarms

Alert recipient

  • The recipient of the alarm is the contact person of the project to which the alarm belongs:

    • Project team email: the email of the project leader, or the email of the product line of the project team, multiple mailboxes are separated by English commas, do not leave spaces; as the basis for sending alarm emails and WeChat
    • Project team number: the mobile phone number of the person in charge of the project; multiple numbers are separated by English commas, do not leave spaces; as the basis for sending alarm messages
      insert image description here

Alarm server

Configuration of the alarm sending center. (What is the alarm sending center: it provides the function of sending SMS, email, WeChat, and provides the service of Http API)

After the CAT generates an alarm, it calls the Http interface of the alarm sending center to send the alarm. CAT itself does not integrate an alarm sending center, please build an alarm sending center yourself.

insert image description here

  • Configuration example
<sender-config>
   <sender id="mail" url="http://test/" type="post" successCode="200" batchSend="true">
      <par id="type=1500"/>
      <par id="key=title,body"/>
      <par id="[email protected]"/>
      <par id="to=${receiver}"/>
      <par id="value=${title},${content}"/>
   </sender>
   <sender id="weixin" url="http://test/" type="post" successCode="success" batchSend="true">
      <par id="domain=${domain}"/>
      <par id="email=${receiver}"/>
      <par id="title=${title}"/>
      <par id="content=${content}"/>
      <par id="type=${type}"/>
   </sender>
   <sender id="sms" url="http://test/" type="post" successCode="200" batchSend="false">
      <par id="jsonm={type:808,mobile:'${receiver}',pair:{body='${content}'}}"/>
   </sender>
</sender-config>
  • Configuration instructions:

    • sender id attribute: type of alarm, optional: mail, sms, weixin
    • sender url attribute: the URL of the alarm center
    • sender batchSend attribute: whether to support sending alarm information in batches
    • par: Http parameters required by the alarm center. ${argument} represents the dynamic parameters attached when building the alert object; here, the dynamic parameters need to be added to the m_paras in the code AlertEntity according to the requirements of the alert sending center

Alert rules

The current monitoring rules of CAT have five elements

  • Alarm time period. The same business indicator may have different trends at different times of the day. Setting this item allows CAT to execute different monitoring rules at different times of the day. Note: The alarm time period is not the time period for monitoring data, but the alarm starts to check the data from this moment

  • Combination of rules. In a period of time, it is possible that an indicator triggers one of the multiple monitoring rules and an alarm is issued, and it is also possible that an indicator triggers multiple monitoring rules at the same time before an alarm needs to be issued.

  • Monitoring rule type. Indicators are monitored by the following six types: maximum value, minimum value, percentage increase in volatility, percentage decrease in volatility, total maximum value, and total minimum value

  • Monitor the last minutes. After the time is set (the unit is minutes), the alarm will only be issued when the indicator continuously triggers the monitoring rule within the latest set time period. For example, if the number of the most recent minute is 3, it means that the arrays for three consecutive minutes meet the conditions before an alarm is issued. If the number of minutes is 1, it means that the most recent minute meets the conditions and the alarm will be triggered

  • The matching of rules and monitored indicators. Monitoring rules can match the monitored objects (indicators) according to their names and regular expressions

Subcondition type:

There are six types. The content of the sub-condition is the corresponding threshold. Please note that the threshold can only be composed of numbers. When the threshold expresses a percentage, a percent sign cannot be added at the end. The eight types are as follows:

type illustrate
MaxVal Maximum value (current value) The maximum value of the current actual value, such as checking the data of the last 3 minutes, there will be 3 values ​​in the 3-minute data, which means (>=N) all values ​​must be >= the set value at the same time
MinVal minimum value (current value) The minimum value of the current actual value, such as checking the data of the last 3 minutes, there will be 3 values ​​in the 3-minute data, which means that (>=N) values ​​must be greater than <= the set value at the same time
FluAscPer Fluctuation Rise Percentage (Current Value) Maximum fluctuation percentage. That is, when the increase percentage of the current last (N) minute value compared to other minute values ​​(MN) in the monitoring period is greater than or equal to the set percentage, the alarm is triggered. For example, when checking the data of the last 10 minutes, the number of triggers is 3; the data within 10 minutes 7 percentage data will be calculated, which means that the value of the last 3 minutes is compared with the value of the previous 7 minutes, and the rising fluctuation percentages of the 3 groups of 7 comparisons are all >= the configuration threshold. For example, if it drops by 50%, fill in 50 as the threshold.
FluDescPer Fluctuation Decline Percentage (Current Value) Fluctuation percentage minimum. The alarm will be triggered when the decrease percentage of the current last (N) minute value compared with the other (MN) minute values ​​in the monitoring period is greater than the set percentage. For example, if you check the data of the last 10 minutes, the number of triggers is 3; 10 minutes of data will calculate 7 The percentage data means that the values ​​in the last 3 minutes are compared with the values ​​in the previous 7 minutes, and the fluctuation percentages of the 3 groups of 7 comparisons are all >= the configuration threshold. For example, if it drops by 50%, fill in 50 as the threshold.
SumMaxVal Sum maximum value (current value) The maximum value of the sum of the current value, such as checking the data of the last 3 minutes, if the sum within 3 minutes is >= the set value, an alarm will be issued.
SumMinVal Sum minimum value (current value) The minimum value of the sum of the current value, such as checking the data of the last 3 minutes, means that the sum within 3 minutes <= the set value and an alarm will be issued.

Transaction warning

For transaction alarms, the supported indicators include times, delays, and failure rates; monitoring period: one minute

  • As shown in the figure below, the Transaction monitoring rules of the springboot-cat project are configured.

insert image description here
Configuration instructions:

  • Project name: the name of the project to be monitored

  • type: the type of the monitored transaction

  • name: the name of the monitored transaction; if it is All, it means all names

  • Monitoring indicators: times, delay, failure rate

  • Alert rules: see the Alert Rules section for details


Event alarm

Alert the number of Events; monitoring period: one minute

insert image description here
Configuration instructions:

  • Project name: the name of the project to be monitored
  • type: the type of the monitored event
  • name: the name of the monitored event; if it is All, it means all names
  • Alert rules: see the Alert Rules section for details

Heartbeat alarm

Heartbeat alarm is to monitor the current status of the server, such as monitoring system load, GC number and other information; monitoring cycle: one minute

insert image description here
Configuration instructions:

  • Project name: the name of the project to be monitored
  • Indicator: the name of the monitored heartbeat indicator; the heartbeat alarm is matched by two levels: first match the item, and then match according to the indicator
  • Alert rules: see the Alert Rules section for details

Abnormal alarm

Alert the number of abnormalities; monitoring period: one minute

insert image description here
Configuration instructions:

  • Project name: the name of the project to be monitored
  • Exception name: the name of the monitored exception; when it is set to "Total", it is set for the total threshold of all exceptions in the current project group; when it is set as a specific exception name, it is set for all the exception thresholds with the same name in the current project group
  • warning threshold: When this threshold is reached, a warning level alarm is sent; when the number of abnormalities is less than this threshold, no alarm is issued
  • error threshold: When this threshold is reached, an error level alarm is sent
  • If the total number is greater than the Warning threshold but less than the Error threshold, a Warning level alarm will be issued; if it is greater than the Error threshold, an Error level alarm will be issued.

Alarm interface writing

  • Write the controller interface:
@RestController
public class AlertController {
    
    

    @RequestMapping(value = "/alert/msg")
    public String sendAlert(@RequestParam String to) {
    
    
        System.out.println("告警了" +to);
        return "200";
    }
}
  • Modify the configuration of the alarm server, fill in the interface address, take email as an example:
 <sender id="mail" url="http://localhost:8085/alert/msg" type="post" successCode="200" batchSend="true">
      <par id="type=1500"/>
      <par id="key=title,body"/>
      <par id="[email protected]"/>
      <par id="to=${receiver}"/>
      <par id="value=${title},${content}"/>
   </sender>
  • The test results, the output is as follows:
告警了testUser1@test.com,testUser2@test.com

Guess you like

Origin blog.csdn.net/m0_53157173/article/details/130041284