Skywalking deployment and use is enough to see this document

Version: JDK1.8
ES: 6.6+
The latest package used here: apache-skywalking-apm-8.1.0

A stand-alone deployment

1.1 Download the installation package:

wget https://mirrors.tuna.tsinghua.edu.cn/apache/skywalking/8.1.0/apache-skywalking-apm-8.1.0.tar.gz

1.2 Unzip and enter the config directory:

tar -zxvf apache-skywalking-apm-8.1.0.tar
cd config

1.3 Modify the application.yml configuration file

[External link image transfer failed. The source site may have an anti-leech link mechanism. It is recommended to save the image and upload it directly (img-GVrCmsi3-1608185731042)(http://192.168.15.111:4999/server/index.php?s=/ api/attachment/visitFile/sign/b694edcaccd072bc40af3fc8241fb4b3&showdoc=.jpg)]

storage:
  selector: ${SW_STORAGE:elasticsearch}
  elasticsearch:
    nameSpace: ${SW_NAMESPACE:"namespace-"}
    clusterNodes: ${SW_STORAGE_ES_CLUSTER_NODES:127.0.0.1:9002}
    protocol: ${SW_STORAGE_ES_HTTP_PROTOCOL:"http"}
    trustStorePath: ${SW_STORAGE_ES_SSL_JKS_PATH:""}
    trustStorePass: ${SW_STORAGE_ES_SSL_JKS_PASS:""}
    user: ${SW_ES_USER:"username"}
    password: ${SW_ES_PASSWORD:"password"}

1.4 Start skw

Enter the bin directory

oapService.sh: start the back-end data service
webappService.sh: start the front-end page service
startup.sh: start the two services So
far, the stand-alone version can be used, just ask the default IP: port.

2. Cluster deployment

2.1 Modify application.yml configuration file

Choose whether the registry is zookeeper or nacosInsert picture description here

2.2 Modify the front-end configuration

/apache-skywalking-apm-bin/webapp/webapp.yml
listOfServers: modify to the list of ip:port, ip:port of the current cluster

2.3 Start each node

Introduction to skywalking

The skywalking architecture consists of:

  • Skywalking Agent: Collect tracing (call chain data) and metric (index) information and report it, and send the data to Skywalking Collector via HTTP or gRPC
  • Skywalking Collector: The link data collector, which integrates and analyzes the tracing and metric data passed by the agent, is processed through the Analysis Core module and stored in the relevant data storage, and at the same time, it will perform secondary statistics and monitoring alarms through the Query Core module
  • Storage: Skywalking storage, supports ElasticSearch, Mysql, TiDB, H2, etc. as storage media for data storage
  • UI: Web visualization platform, used to display the landing data, currently officially adopted RocketBot as the main UI of SkyWalking

3. The main function of skywalking:

  1. Link monitoring
  2. Metrics: runtime data; health data; self-check

3.1 Bytecode enhancement

Java completes the class loading process by loading binary bytecode into the JVM memory. If we want to dynamically create a class during the JVM running process, we must follow the format and structure of the Java compilation system to organize the .class file, generate the corresponding binary data, and then load and convert the binary data into the corresponding class. Can complete the dynamic creation of the class. There are many ways to accomplish these functions:

  1. Use Java proxy to implement Java bytecode injection
  2. Use Javaassist to modify the bytecode
  3. Use ASM to also modify the bytecode

3.2 Introduction to skywalkingUI

The SkyWalking official UI provides a default and powerful visualization function cluster for distributed SkyWalking observations. The
SkyWalking dashboard includes the following parts:

  1. Function tab selection area. The main features are listed here. More details will be introduced below.
  2. Reload area. Control the reloading mechanism, including periodic reloading or manual reloading.
  3. Time selector. Control time zone and time range. There is a Chinese/English switch button. By default, the UI uses the browser language to set the
    Insert picture description here
    database Dashboard to display the response time, response time distribution, throughput, SLA, slow SQL and other detailed information of the database, which is convenient for visually displaying the database status
    Database Avg Response time: database average response time
    database access successful rate: database access success rate
    database Traffic: database access per minute
    database access latency Percentile: database access delay percentage
    slow Statements: slow SQL
    All database Loads,:
    Un-Health databases (successful rate):

3.3 Dashboard

The dashboard provides metrics for services, service instances, and endpoints. Some measurement terms need to be understood here

  • Throughput CPM, which means calls per minute
  • Apdex score, refer to Apdex in WIKI
  • Percentage of response time, including p99, p95, p90, p75, p50. Refer to percentile in WIKI
  • SLA means success rate. For HTTP, it means a request with a response of 200

Services, instances, and dashboard selectors can be manually reloaded instead of reloading the entire page. Note that the overloaded area ** will not overload these selectors
Insert picture description here

3.4 Dashboard APM

  • The first column: monitoring panels of different content themes, applications/databases/containers, etc.
  • The second column: operations, including editing/exporting current data/importing display data/screening display of different service endpoints
  • The third column: display in different latitudes, services/instances/endpointsInsert picture description here

3.4.2 Global global dimension

  • The first column: Different display panels of Global, Server, Instance, Endpoint, you can adjust the internal content
  • Services load: service requests per minute
  • Slow Services: Slow response service, unit ms
  • Un-Health services (Apdex): Apdex performance index, 1 is full mark.
  • Global Response Latency: percentage response delay, different percentage delay time, unit ms
  • Global Heatmap: Service response time heat distribution map, showing the color depth according to the number of different response times in the time period
  • Bottom bar: The time interval for displaying data, click to adjust.

3.4.3 Service Dimension Insert picture description here-Service Apdex (number): current service rating

  • Service Apdex (line chart): Apdex scores at different times
  • Successful Rate (number): request success rate
  • Successful Rate (line chart): request success rate at different times
  • Servce Load (number): Number of requests per minute
  • Servce Load (line chart): the number of requests per minute at different times
  • Service Avg Response Times: Average response delay, in ms
  • Global Response Time Percentile: percentage response delay
  • Servce Instances Load: The number of requests per minute for each service instance
  • Show Service Instance: the maximum delay of each service instance
  • Service Instance Successful Rate: The request success rate of each service instance

3.4.4 Instance dimension

Insert picture description here-Service Instance Load: The number of requests per minute of the current instance

  • Service Instance Successful Rate: The request success rate of the current instance
  • Service Instance Latency: the response delay of the current instance
  • JVM CPU: the percentage of CPU occupied by jvm
  • JVM Memory: JVM memory occupancy size, unit m
  • JVM GC Time: JVM garbage collection time, including YGC and OGC
  • JVM GC Count: JVM garbage collection times, including YGC and OGC
  • CLR XX: similar to JVM virtual machine

3.4.5 Endpoint (API) dimension

Insert picture description here

  • Endpoint Load in Current Service: The number of requests per minute for each endpoint
  • Slow Endpoints in Current Service: The slowest request time for each endpoint, in ms
  • Successful Rate in Current Service: Request success rate for each endpoint
  • Endpoint Load: request data for each time period of the current endpoint
  • Endpoint Avg Response Time: The response time of the request line of each time period of the current endpoint
  • Endpoint Response Time Percentile: the percentage of the response time of each time period of the current endpoint
  • Endpoint Successful Rate: The request success rate for each time period of the current endpoint

3.5 Topology

Topology diagram uses indicators to show the relationship between services and instances

  • Topology shows the default global topology that includes all services.
  • The service selector supports the display of direct relationships, including upstream and downstream.
  • The custom group provides any sub-topology function of the service group.
  • Services drill down opens when you click on any service. The graph can measure, track and query the selected service.
  • The relationship of service indicators provides a measurement of service RPC interaction and instances of these two services

3.6 Tracking query

  1. The tracked partial list is not a tracked list. Each trace has several segments belonging to different services. If you query through all services or through tracking id, you can list different segments with the same tracking id.
  2. Whether the span is clickable, the details of each span will pop up on the left.
  3. The tracking view provides 3 typical and different usage views to visualize the tracking
    Insert picture description here

4. Alarm

Insert picture description here

# Sample alarm rules.
# 1:过去3分钟内服务平均响应时间超过1秒
# 2:服务成功率在过去2分钟内低于80%
# 3:服务90%响应时间在过去3分钟内低于1000毫秒
# 4:服务实例在过去2分钟内的平均响应时间超过1秒
# 5:端点平均响应时间过去2分钟超过1秒
# 6:数据库的响应时间在最后10分钟的2分钟内超过1秒

Insert picture description here

Guess you like

Origin blog.csdn.net/qq_38130094/article/details/111314266