SpringCloud Alibaba actual combat and source code (7) Skywalking

What is SkyWalking

  Skywalking is a product open sourced by domestic open source enthusiast Wu Sheng and submitted to the Apache incubator. It also absorbs the design ideas of Zipkin/Pinpoint/CAT. Features are: supports a variety of plug-ins, strong UI functions, and supports non-intrusive buried points. Currently, most manufacturers are used, and the versions are updated quickly.

  Data storage support: Elasticsearch, MySQL, H2, TiDB. The default is H2, and it is saved to memory. In fact, we usually save it in ES.

Home page: http://skywalking.apache.org/
Downloads: https://skywalking.apache.org/downloads/
github: https://github.com/apache/skywalking
documentation: https://github.com/apache /skywalking/tree/master/docs
configuration: https://github.com/apache/skywalking/tree/master/docs/en/setup/backend

APM

emsp; APM, the full name of Application Performance Management, aims to collect data through various probes, collect key indicators, and present the data together to achieve a systematic solution to application performance management and fault management.

  Monitoring systems such as Zabbix, Premetheus, and open-falcon mainly focus on server hardware indicators and system service running status, while APM systems pay more attention to the monitoring of program internal execution process indicators and link calls between services . APM is more conducive to in-depth code Find the fundamental problem of "slow" request response, and it is complementary to monitoring systems such as Zabbix. Currently, the open source APM systems on the market mainly include CAT, Zipkin, Pinpoint, and SkyWalking, most of which refer to Google's implementation Dapper.

Comparison of link tracing tools

Link tracking tools generally have the following functions:

  • Heartbeat detection (to determine if the application is still running)
  • Record the execution process and execution time of the request
  • Resource monitoring (CPU, memory, bandwidth, disk)
  • Alarm function (monitor execution time, success rate, etc. and notify via email, DingTalk, SMS, WeChat, etc.)
  • Visualization page

Commonly used tools include:

Zipkin
  Twitter's open source call chain analysis tool is currently widely used based on springcloud sleuth. It is characterized by light weight and easy deployment.
Pinpoint
  Korea's open-source bytecode injection-based call chain analysis and application monitoring and analysis tools. The feature is that it supports multiple plug-ins, the UI is powerful, and there is no code intrusion at the access terminal.
SkyWalking
  's native open source bytecode injection-based call chain analysis and application monitoring and analysis tools. It is characterized by supporting multiple plug-ins, strong UI functions, and no code intrusion at the access terminal. Currently has joined the Apache incubator.
CAT
  Dianping's open source monitoring platform tools include call chain analysis, application monitoring analysis, log collection, monitoring and alarming based on coding and configuration.

Comparison of various dimensions

Comparative item Zipkin Pinpoint SkyWalking Cat
Method to realize Intercept requests and send (Http, MQ) data to the Zipkin service Java probe, bytecode enhancement Java probe, bytecode enhancement Code burying points (interceptors, annotations, filters, etc.)
Access method Based on linkerd or sleuth method javaagent bytecode javaagent bytecode code intrusion
agent to collector protocol http,MQ thrift gRPC http/tcp
OpenTracing support not support support not support
Graininess interface level method level method level code level
Global call statistics not support support support support
traceid query support not support support not support
Call the police not support support support support
JVM monitoring not support not support support support
UI function support support support support
data storage ES, MySQL, etc. HBase ES/H2/MySQL MySQL/HDFS

Performance comparison chart
Insert image description here

Features of SkyWalking

  1. A variety of monitoring methods, obtaining monitoring data through language probes and service mesh
  2. Supports automatic probes in multiple languages, including Java, .NET Core and Node.js
  3. Lightweight and efficient, no need for a big data platform and a large amount of server resources
  4. There are multiple mechanisms available for modularization, UI, storage, and cluster management.
  5. Support alarm, alarm
  6. Excellent visualization solution

Skywalking structure

Insert image description here
Insert image description here

illustrate:

  • Skywalking agent is bound to the business system and is responsible for collecting various monitoring data.
  • Skywalking oapservice is responsible for processing monitoring data, such as receiving monitoring data from Skywalking agent and storing it in the database, accepting requests from the Skywalking webapp front-end, querying data from the database, and returning it to the front-end. Skywalking oapservice is usually built in a cluster.
  • Skywalking webapp, UI service, used to visually display data
  • The database for users to persist monitoring data can choose ElasticSearch, MySQL, etc.

Installation and deployment

Official website

http://skywalking.apache.org/

download

http://skywalking.apache.org/downloads/
Insert image description here

start up

Insert image description here

Service access probe

Script

# 生产环境
#!/bin/sh
# SkyWalking Agent配置
export SW_AGENT_NAME=boot-micrometer #Agent名字,一般使用`spring.application.name`
export SW_AGENT_COLLECTOR_BACKEND_SERVICES=127.0.0.1:11800 #配置 Collector 地址。
export SW_AGENT_SPAN_LIMIT=2000 #配置链路的最大Span数量,默认为 300。
export JAVA_AGENT=-javaagent:/root/apache-skywalking-apm-bin/agent/skywalking-agent.jar
java $JAVA_AGENT -jar springcloudalibaba-0.0.1-SNAPSHOT.jar #jar启动
Integrated IDE
# java应用启动时
-Xmx512m
-javaagent:E:/environment/SpringCloudAlibaba/skywalking/skywalking-agent/skywalking-agent.jar 
-Dskywalking.agent.service_name=provider 
-Dskywalking.collector.backend_service=127.0.0.1:11800

Skywalking tracks gateways across multiple microservices (bug)

Insert image description here

Three concepts in SkyWalking

  • Service: Represents a series or a group of workloads that provide the same behavior for requests. When using Agent, you can define the name of the service;
  • Service Instance: Each workload in the above set of workloads is called an instance. A service instance is actually a real process on the operating system;
  • Endpoint: The request path received by a specific service, such as the URI path of HTTP and the class name + method signature of gRPC service;

Monitor dashboard Dashboard

dashboard:http://127.0.0.1:8080/

Data collection port:

  • HTTP default port 12800

  • gRPC default port 11800

Insert image description here
Insert image description here

Custom SkyWalking link

  By default, Skywalking does not record our business methods. If we need to add link monitoring for business methods, we need to add the following dependencies.

<dependency>
    <groupId>org.apache.skywalking</groupId>
    <artifactId>apm-toolkit-trace</artifactId>
    <version>8.8.0</version>
</dependency>

Then add @Trace annotation on the business method. Then the method will be monitored
Insert image description here

Insert image description here
Check the details of this method. There is no return information and parameters.
Insert image description here
Insert image description here
You can solve this problem through @Tags and @Tag.

@Trace  //表示当前方法会被skywalking追踪
    @Tags({
    
    //显示指定的返回结果和参数
            @Tag(key = "process",value = "returnedObj"),//key:方法名  value = returnedObj:是指定返回值
            @Tag(key = "param",value = "arg[0]")//返回第一个参数
    })

key: method name value = returnedObj: yes (specified) return value
arg[0]: parameter

Integrated logging framework

  Integrate the log framework of microservices with SkyWalking, hoping that the ID of the current call link can be recorded in the log of the microservice, and then we can search the front-end interface of SkyWalking based on this ID to find the corresponding call link record.

  Because the default logging framework implemented by springboot is logback, here we will take logback as an example.

Import maven coordinates in microservices

<!-- skywalking 日志记录  -->
<dependency>
    <groupId>org.apache.skywalking</groupId>
    <artifactId>apm-toolkit-logback-1.x</artifactId>
    <version>8.5.0</version>
</dependency>

resourcesCreate logback-spring.xmla file in the project directory
Insert image description here

<?xml version="1.0" encoding="UTF-8"?>
<configuration>

    <appender name="console" class="ch.qos.logback.core.ConsoleAppender">
    	<!-- 日志的格式化 -->
        <encoder class="ch.qos.logback.core.encoder.LayoutWrappingEncoder">
            <layout class="org.apache.skywalking.apm.toolkit.log.logback.v1.x.TraceIdPatternLogbackLayout">
                <pattern>%d{HH:mm:ss.SSS} [%thread] %-5level logger_name:%logger{36} - [%tid] - message:%msg%n</pattern>
            </layout>
        </encoder>
    </appender>

	<!-- 设置 Appender -->
    <root level="INFO">
        <appender-ref ref="console" />
    </root>

</configuration>

Display log information in the log menu of Skywalking UI (commonly used)

<?xml version="1.0" encoding="UTF-8"?>
<configuration>

    <!--  控制台日志输出的格式中添加tid  -->
    <appender name="console" class="ch.qos.logback.core.ConsoleAppender">
        <!-- 日志的格式化 -->
        <encoder class="ch.qos.logback.core.encoder.LayoutWrappingEncoder">
            <layout class="org.apache.skywalking.apm.toolkit.log.logback.v1.x.TraceIdPatternLogbackLayout">
                <pattern>%d{HH:mm:ss.SSS} [%thread] %-5level logger_name:%logger{36} - [%tid] - message:%msg%n</pattern>
            </layout>
        </encoder>
    </appender>

    <!-- skywalking grpc 日志收集 8.4.0版本开始支持 -->
    <!-- https://skywalking.apache.org/docs/skywalking-java/latest/en/setup/service-agent/java-agent/application-toolkit-logback-1.x/  -->
    <!-- 通过grpc上报日志到skywalking oap-->
    <appender name="grpc-log" class="org.apache.skywalking.apm.toolkit.log.logback.v1.x.log.GRPCLogClientAppender">
        <encoder class="ch.qos.logback.core.encoder.LayoutWrappingEncoder">
            <layout class="org.apache.skywalking.apm.toolkit.log.logback.v1.x.mdc.TraceIdMDCPatternLogbackLayout">
                <Pattern>%d{yyyy-MM-dd HH:mm:ss.SSS} [%tid] [%thread] %-5level %logger{36} -%msg%n</Pattern>
            </layout>
        </encoder>
    </appender>
    
    <!-- 设置 Appender -->
    <root level="INFO">
        <appender-ref ref="console" />
        <appender-ref ref="grpc-log" />
    </root>

</configuration>

Alarm service

Insert image description here
Insert image description here
Alarm log
Insert image description here

Global global dimension

Services load : Number of service requests per minute

Slow Services : slow response service, unit ms

Un-Health services (Apdex) : Apdex performance index, 1 is a perfect score.

  • Apdex is an alliance composed of many network analysis technology companies and measurement industries. They have jointly developed the "Application Performance Index" (Application Performance Index). In one sentence, Apdex is user satisfaction with application performance. quantified value
  • http://www.apdex.org/

Slow Endpoints : Slow response endpoint, unit ms

Global Response Latency : Percent response delay, delay time of different percentages, unit ms

Global Heatmap : Service response time heat distribution map, showing color depth according to the number of different response times within a time period

Service service dimension

Service Apdex (number ): Rating of the current service

Service Avg Response Times : average response delay, unit ms

Successful Rate (number) : request success rate

Servce Load (number) : Number of requests per minute

Service Apdex (line chart) : Apdex scores at different times

Service Response Time Percentile : Percent response delay

Successful Rate (line chart) : Request success rate at different times

Servce Load (line chart) : Number of requests per minute at different times

Servce Instances Load : Number of requests per minute per service instance

Slow Service Instance : The maximum delay of each service instance

Service Instance Successful Rate : The request success rate of each service instance

Instance

Service Instance Load : Number of requests per minute for the current instance

Service Instance Successful Rate : The request success rate of the current instance

Service Instance Latency : response latency of the current instance

JVM CPU : The percentage of CPU occupied by jvm

JVM Memory : JVM memory size, unit m

JVM GC Time : JVM garbage collection time, including YGC and OGC

JVM GC Count : JVM garbage collection times, including YGC and OGC

Endpoint

Endpoint Load in Current Service : Number of requests per minute per endpoint

Slow Endpoints in Current Service : The slowest request time of each endpoint, in ms

Successful Rate in Current Service : The request success rate of each endpoint

Endpoint Load : Request data for each time period of the current endpoint

Endpoint Avg Response Time : The request line response time of each time period of the current endpoint

Endpoint Response Time Percentile : The response time proportion of each time period of the current endpoint

Endpoint Successful Rate : The request success rate of the current endpoint in each time period

Guess you like

Origin blog.csdn.net/Forbidden_City/article/details/132405831