What is SkyWalking
Skywalking is a product open sourced by domestic open source enthusiast Wu Sheng and submitted to the Apache incubator. It also absorbs the design ideas of Zipkin/Pinpoint/CAT. Features are: supports a variety of plug-ins, strong UI functions, and supports non-intrusive buried points. Currently, most manufacturers are used, and the versions are updated quickly.
Data storage support: Elasticsearch, MySQL, H2, TiDB. The default is H2, and it is saved to memory. In fact, we usually save it in ES.
Home page: http://skywalking.apache.org/
Downloads: https://skywalking.apache.org/downloads/
github: https://github.com/apache/skywalking
documentation: https://github.com/apache /skywalking/tree/master/docs
configuration: https://github.com/apache/skywalking/tree/master/docs/en/setup/backend
APM
emsp; APM, the full name of Application Performance Management, aims to collect data through various probes, collect key indicators, and present the data together to achieve a systematic solution to application performance management and fault management.
Monitoring systems such as Zabbix, Premetheus, and open-falcon mainly focus on server hardware indicators and system service running status, while APM systems pay more attention to the monitoring of program internal execution process indicators and link calls between services . APM is more conducive to in-depth code Find the fundamental problem of "slow" request response, and it is complementary to monitoring systems such as Zabbix. Currently, the open source APM systems on the market mainly include CAT, Zipkin, Pinpoint, and SkyWalking, most of which refer to Google's implementation Dapper
.
Comparison of link tracing tools
Link tracking tools generally have the following functions:
- Heartbeat detection (to determine if the application is still running)
- Record the execution process and execution time of the request
- Resource monitoring (CPU, memory, bandwidth, disk)
- Alarm function (monitor execution time, success rate, etc. and notify via email, DingTalk, SMS, WeChat, etc.)
- Visualization page
Commonly used tools include:
Zipkin
Twitter's open source call chain analysis tool is currently widely used based on springcloud sleuth. It is characterized by light weight and easy deployment.
Pinpoint
Korea's open-source bytecode injection-based call chain analysis and application monitoring and analysis tools. The feature is that it supports multiple plug-ins, the UI is powerful, and there is no code intrusion at the access terminal.
SkyWalking
's native open source bytecode injection-based call chain analysis and application monitoring and analysis tools. It is characterized by supporting multiple plug-ins, strong UI functions, and no code intrusion at the access terminal. Currently has joined the Apache incubator.
CAT
Dianping's open source monitoring platform tools include call chain analysis, application monitoring analysis, log collection, monitoring and alarming based on coding and configuration.
Comparison of various dimensions
Comparative item | Zipkin | Pinpoint | SkyWalking | Cat |
---|---|---|---|---|
Method to realize | Intercept requests and send (Http, MQ) data to the Zipkin service | Java probe, bytecode enhancement | Java probe, bytecode enhancement | Code burying points (interceptors, annotations, filters, etc.) |
Access method | Based on linkerd or sleuth method | javaagent bytecode | javaagent bytecode | code intrusion |
agent to collector protocol | http,MQ | thrift | gRPC | http/tcp |
OpenTracing | support | not support | support | not support |
Graininess | interface level | method level | method level | code level |
Global call statistics | not support | support | support | support |
traceid query | support | not support | support | not support |
Call the police | not support | support | support | support |
JVM monitoring | not support | not support | support | support |
UI function | support | support | support | support |
data storage | ES, MySQL, etc. | HBase | ES/H2/MySQL | MySQL/HDFS |
Performance comparison chart
Features of SkyWalking
- A variety of monitoring methods, obtaining monitoring data through language probes and service mesh
- Supports automatic probes in multiple languages, including Java, .NET Core and Node.js
- Lightweight and efficient, no need for a big data platform and a large amount of server resources
- There are multiple mechanisms available for modularization, UI, storage, and cluster management.
- Support alarm, alarm
- Excellent visualization solution
Skywalking structure
illustrate:
- Skywalking agent is bound to the business system and is responsible for collecting various monitoring data.
- Skywalking oapservice is responsible for processing monitoring data, such as receiving monitoring data from Skywalking agent and storing it in the database, accepting requests from the Skywalking webapp front-end, querying data from the database, and returning it to the front-end. Skywalking oapservice is usually built in a cluster.
- Skywalking webapp, UI service, used to visually display data
- The database for users to persist monitoring data can choose ElasticSearch, MySQL, etc.
Installation and deployment
Official website
http://skywalking.apache.org/
download
http://skywalking.apache.org/downloads/
start up
Service access probe
Script
# 生产环境
#!/bin/sh
# SkyWalking Agent配置
export SW_AGENT_NAME=boot-micrometer #Agent名字,一般使用`spring.application.name`
export SW_AGENT_COLLECTOR_BACKEND_SERVICES=127.0.0.1:11800 #配置 Collector 地址。
export SW_AGENT_SPAN_LIMIT=2000 #配置链路的最大Span数量,默认为 300。
export JAVA_AGENT=-javaagent:/root/apache-skywalking-apm-bin/agent/skywalking-agent.jar
java $JAVA_AGENT -jar springcloudalibaba-0.0.1-SNAPSHOT.jar #jar启动
Integrated IDE
# java应用启动时
-Xmx512m
-javaagent:E:/environment/SpringCloudAlibaba/skywalking/skywalking-agent/skywalking-agent.jar
-Dskywalking.agent.service_name=provider
-Dskywalking.collector.backend_service=127.0.0.1:11800
Skywalking tracks gateways across multiple microservices (bug)
Three concepts in SkyWalking
- Service: Represents a series or a group of workloads that provide the same behavior for requests. When using Agent, you can define the name of the service;
- Service Instance: Each workload in the above set of workloads is called an instance. A service instance is actually a real process on the operating system;
- Endpoint: The request path received by a specific service, such as the URI path of HTTP and the class name + method signature of gRPC service;
Monitor dashboard Dashboard
dashboard:http://127.0.0.1:8080/
Data collection port:
-
HTTP default port 12800
-
gRPC default port 11800
Custom SkyWalking link
By default, Skywalking does not record our business methods. If we need to add link monitoring for business methods, we need to add the following dependencies.
<dependency>
<groupId>org.apache.skywalking</groupId>
<artifactId>apm-toolkit-trace</artifactId>
<version>8.8.0</version>
</dependency>
Then add @Trace annotation on the business method. Then the method will be monitored
Check the details of this method. There is no return information and parameters.
You can solve this problem through @Tags and @Tag.
@Trace //表示当前方法会被skywalking追踪
@Tags({
//显示指定的返回结果和参数
@Tag(key = "process",value = "returnedObj"),//key:方法名 value = returnedObj:是指定返回值
@Tag(key = "param",value = "arg[0]")//返回第一个参数
})
key: method name value = returnedObj: yes (specified) return value
arg[0]: parameter
Integrated logging framework
Integrate the log framework of microservices with SkyWalking, hoping that the ID of the current call link can be recorded in the log of the microservice, and then we can search the front-end interface of SkyWalking based on this ID to find the corresponding call link record.
Because the default logging framework implemented by springboot is logback, here we will take logback as an example.
Import maven coordinates in microservices
<!-- skywalking 日志记录 -->
<dependency>
<groupId>org.apache.skywalking</groupId>
<artifactId>apm-toolkit-logback-1.x</artifactId>
<version>8.5.0</version>
</dependency>
resources
Create logback-spring.xml
a file in the project directory
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
<appender name="console" class="ch.qos.logback.core.ConsoleAppender">
<!-- 日志的格式化 -->
<encoder class="ch.qos.logback.core.encoder.LayoutWrappingEncoder">
<layout class="org.apache.skywalking.apm.toolkit.log.logback.v1.x.TraceIdPatternLogbackLayout">
<pattern>%d{HH:mm:ss.SSS} [%thread] %-5level logger_name:%logger{36} - [%tid] - message:%msg%n</pattern>
</layout>
</encoder>
</appender>
<!-- 设置 Appender -->
<root level="INFO">
<appender-ref ref="console" />
</root>
</configuration>
Display log information in the log menu of Skywalking UI (commonly used)
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
<!-- 控制台日志输出的格式中添加tid -->
<appender name="console" class="ch.qos.logback.core.ConsoleAppender">
<!-- 日志的格式化 -->
<encoder class="ch.qos.logback.core.encoder.LayoutWrappingEncoder">
<layout class="org.apache.skywalking.apm.toolkit.log.logback.v1.x.TraceIdPatternLogbackLayout">
<pattern>%d{HH:mm:ss.SSS} [%thread] %-5level logger_name:%logger{36} - [%tid] - message:%msg%n</pattern>
</layout>
</encoder>
</appender>
<!-- skywalking grpc 日志收集 8.4.0版本开始支持 -->
<!-- https://skywalking.apache.org/docs/skywalking-java/latest/en/setup/service-agent/java-agent/application-toolkit-logback-1.x/ -->
<!-- 通过grpc上报日志到skywalking oap-->
<appender name="grpc-log" class="org.apache.skywalking.apm.toolkit.log.logback.v1.x.log.GRPCLogClientAppender">
<encoder class="ch.qos.logback.core.encoder.LayoutWrappingEncoder">
<layout class="org.apache.skywalking.apm.toolkit.log.logback.v1.x.mdc.TraceIdMDCPatternLogbackLayout">
<Pattern>%d{yyyy-MM-dd HH:mm:ss.SSS} [%tid] [%thread] %-5level %logger{36} -%msg%n</Pattern>
</layout>
</encoder>
</appender>
<!-- 设置 Appender -->
<root level="INFO">
<appender-ref ref="console" />
<appender-ref ref="grpc-log" />
</root>
</configuration>
Alarm service
Alarm log
Global global dimension
Services load : Number of service requests per minute
Slow Services : slow response service, unit ms
Un-Health services (Apdex) : Apdex performance index, 1 is a perfect score.
- Apdex is an alliance composed of many network analysis technology companies and measurement industries. They have jointly developed the "Application Performance Index" (Application Performance Index). In one sentence, Apdex is user satisfaction with application performance. quantified value
- http://www.apdex.org/
Slow Endpoints : Slow response endpoint, unit ms
Global Response Latency : Percent response delay, delay time of different percentages, unit ms
Global Heatmap : Service response time heat distribution map, showing color depth according to the number of different response times within a time period
Service service dimension
Service Apdex (number ): Rating of the current service
Service Avg Response Times : average response delay, unit ms
Successful Rate (number) : request success rate
Servce Load (number) : Number of requests per minute
Service Apdex (line chart) : Apdex scores at different times
Service Response Time Percentile : Percent response delay
Successful Rate (line chart) : Request success rate at different times
Servce Load (line chart) : Number of requests per minute at different times
Servce Instances Load : Number of requests per minute per service instance
Slow Service Instance : The maximum delay of each service instance
Service Instance Successful Rate : The request success rate of each service instance
Instance
Service Instance Load : Number of requests per minute for the current instance
Service Instance Successful Rate : The request success rate of the current instance
Service Instance Latency : response latency of the current instance
JVM CPU : The percentage of CPU occupied by jvm
JVM Memory : JVM memory size, unit m
JVM GC Time : JVM garbage collection time, including YGC and OGC
JVM GC Count : JVM garbage collection times, including YGC and OGC
Endpoint
Endpoint Load in Current Service : Number of requests per minute per endpoint
Slow Endpoints in Current Service : The slowest request time of each endpoint, in ms
Successful Rate in Current Service : The request success rate of each endpoint
Endpoint Load : Request data for each time period of the current endpoint
Endpoint Avg Response Time : The request line response time of each time period of the current endpoint
Endpoint Response Time Percentile : The response time proportion of each time period of the current endpoint
Endpoint Successful Rate : The request success rate of the current endpoint in each time period