Golang's pprof performance analysis

1. Overview of pprof

      pprofis a tool for visualizing and analyzing profiling data. But everyone Gophershould have heard pprofof the name, look for it for memory leaks, look for it for performance bottlenecks, look for it by looking at the flame graph, etc...

      This article will introduce pprofthe use of the basics, as well as performance analysis and optimization examples. Use and understand from practical examples pprof.

If you traceare interested in analysis, you can check Golang's trace performance analysis

Second, the service opens pprof

      pprofThere are two ways to use, one is to call a specific method in the code APIto collect and write to the file, the package used is:

runtime/pprof

Reference: pprof of Go you don't know
https://darjun.github.io/2021/06/09/youdontknowgo/pprof/

One is httpservice-based collection, the packages used are:

net/http/pprof

httpWe are using it based on services this time pprof.

1. Reference pprof in the code

1、代码中引用pprof
_ "net/http/pprof"

2、服务开启一个端口,用来监听pprof
// Start a HTTP server to expose pprof data
	pprofAddr := "0.0.0.0:6060"
	go func(addr string) {
    
    
		if err := http.ListenAndServe(addr, nil); err != http.ErrServerClosed {
    
    
			logger.Fatalf("Pprof server ListenAndServe: %v", err)
		}
	}(pprofAddr)
	logger.Infof("HTTP Pprof start at %s", pprofAddr)

3、浏览器查看
http://服务ip:6060/debug/pprof

2. The service opens a port to monitor pprof

// Start a HTTP server to expose pprof data
	pprofAddr := "0.0.0.0:6060"
	go func(addr string) {
    
    
		if err := http.ListenAndServe(addr, nil); err != http.ErrServerClosed {
    
    
			logger.Fatalf("Pprof server ListenAndServe: %v", err)
		}
	}(pprofAddr)
	logger.Infof("HTTP Pprof start at %s", pprofAddr)

3、浏览器查看
http://服务ip:6060/debug/pprof

The web interface is as follows
insert image description here

3. Use pprof to collect CPU time consumption

(1) 引入Pprof
(2) 监听6060端口
(3) 采样,例如采样cpu,采样30s
	http://10.91.2.111:6060/debug/pprof/profile?seconds=30&sample_index=0&top=20
参数结束:
	seconds: 采集30s
	top: 采集耗时前20的程序
在30s内访问接口即可。30s后会下载文件到本地

(4) 分析生成的profile文件:本地开启8080端口,通过浏览器查看分析
go tool pprof -http=localhost:8080 profile(文件名)
(5) Could not execute dot; may need to install graphviz.
安装图像处理工具:
brew cleanup
brew update
brew install graphviz

1. Call flow chart

insert image description here

Main indicators, check the time-consuming ratio of each call, and the overall time-consuming of the call link.

2. View the flame graph

Click on view in the upper left corner and select flame graph
insert image description here

main indicators

1、火焰图的格子越宽,代表耗时越久
2、火焰图的高度越高代表调用链路越长。
追求:宽度适中,高度适中。优化格子过宽的,或者预期之外耗时较久的。

Fourth, use pprof to analyze memory leaks

View the memory usage of the current program

Reference: Use pprof to troubleshoot Golang memory leaks

1、执行命令
 go tool pprof -inuse_space http://127.0.0.1:6060/debug/pprof/heap

2、进入命令行
	top: 查看当前占用内存的函数排行
	list 函数名: 查看内存占用的函数内部代码

top示例如下:
Type: inuse_space
Time: Jun 16, 2023 at 3:29pm (CST)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 2706.84kB, 100% of 2706.84kB total
Showing top 10 nodes out of 32
      flat  flat%   sum%        cum   cum%
  650.62kB 24.04% 24.04%   650.62kB 24.04%  github.com/valyala/fasthttp/stackless.NewFunc
     515kB 19.03% 43.06%      515kB 19.03%  vendor/golang.org/x/net/http2/hpack.init
  514.63kB 19.01% 62.07%   514.63kB 19.01%  math/rand.newSource (inline)
  514.38kB 19.00% 81.08%   514.38kB 19.00%  google.golang.org/protobuf/reflect/protoregistry.(*Types).register
  512.20kB 18.92%   100%   512.20kB 18.92%  runtime.malg
         0     0%   100%   514.63kB 19.01%  github.com/nacos-group/nacos-sdk-go/v2/clients.NewConfigClient
         0     0%   100%   514.63kB 19.01%  github.com/nacos-group/nacos-sdk-go/v2/clients/config_client.NewConfigClient
         0     0%   100%   514.63kB 19.01%  github.com/nacos-group/nacos-sdk-go/v2/clients/config_client.NewConfigProxy
         0     0%   100%   514.63kB 19.01%  github.com/nacos-group/nacos-sdk-go/v2/common/nacos_server.NewNacosServer
         0     0%   100%   514.63kB 19.01%  github.com/spf13/cobra.(*Command).Execute

3、通过查看当前服务的内存占用,可以分析内存泄漏等

Check the running time of goroutine

insert image description here

      You need to be careful when you see a long-running one goroutine. If it is non-systematic goroutine, it may be that the business goroutinehas not been released, which is also one of the sources of memory leaks.

Five, performance optimization case

background

      GoWhen the service performs batch query, because the returned data contains a large amount of OSSsignature data, the actual performance QPSis very low and the delay is relatively high, so it is decided to perform performance analysis and optimization for this interface.

The general flow of the query is:

1、参数校验
2、从数据库批量查询
3、OSS签名,处理数据
4、json序列化返回

      The process is very simple, it is a common query interface, but when the number of queries is relatively large, such as querying more than 30 records, the QPSperformance is really not satisfactory. The pressure test results are as follows:
insert image description here

This performance analysis and optimization does not involve the database cache for the time being. Due to time reasons, only other optimizations except the database will be done.

1. Introduction to stress testing tools

go-wrk

      go-wrkis a command-line tool for testing the performance of Golanguage web servers. It is based Goon the standard library net/httppackage of , using to goroutineachieve high concurrent requests.

go-wrk [flags] url:启动测试,并指定要测试的 URL。
-c value:指定并发数。默认为 50。
-d value: 压测时间,单位是秒
-t value:指定请求超时时间。默认为 20 秒。
-H "header:value":设置 HTTP 请求头。
-v:显示详细输出。
-m: 设置 HTTP 请求方法,例如 GET, POST, PUT, DELETE 等常用的请求方法。


例如:
go-wrk -c 100 -d 30 -T 2000 -H "Content-Type: application/json, Authorization: Bearer token" \n
http://localhost:8080/test -m POST -body '{"name":"bob","age":30}'

返回:
15841 requests in 38.848392606s, 117.13MB read
Requests/sec:           407.76
Transfer/sec:           3.01MB
Avg Req Time:           2.452395215s
Fastest Request:        66.236651ms
Slowest Request:        19.945042378s
Number of Errors:       232

Suitable for local use and easy to install. The parameter of primary concern is the number of requests.

jmeter

      jmeteris an open source Javaapplication for functional testing and performance testing. When installing, an interface is required to configure the automation scene. Currently, it is meterSpherebuilt-in in use jmeter, and the main focus is on TPSindicators.

LinuxOriginally      , I wanted to install a command line to test the pressure. The actual test result is that the installation is complicated, but the functions are very comprehensive. The company’s test is using jmeter, so our subsequent pressure tests will be based jmeteron the results of

2. pprof analyzes the flame graph

insert image description here

The main questions are as follows

oss签名占比将近40%
GC耗时比较高,占比>10%
go-restful写入response占比>20%,本质上是json序列化

3. The first wave of optimization

OSS signature

      Since the expiration time of the signature setting is 1 hour, and the written OSSdata is not updated frequently, you can consider adding a cache.
Lightweight is used here go-cache. OSSSignature caching does not need to consider distributed caching, go-cacheit is enough.

go get github.com/patrickmn/go-cache

var c = cache.New(50*time.Minute, 10*time.Minute) //设置过期队列
c.Set("key", value, cache.DefaultExpiration) // 写入缓存
cachedValue, found := c.Get("key") // 读出缓存

JSON serialization optimization

      Native jsonlibraries use reflection, which may be poor in performance. We use high-performance json-iterator/golibraries for jsonserialization.

Go common package (thirty-three): high-performance json parser
insert image description here

// 使用很简单,安装之后开箱即用
var fastJson = jsoniter.ConfigFastest
jsonData, _ := fastJson.Marshal(data)

The effect after optimization

insert image description here

1、oss签名占比<1%
2、批量查询sql耗时较多,主要集中在scanAll部分,这部分的耗时跟数据库查询字段的多少有关
3、json序列化耗时较多,但比之前的原声json要好一些

Second Wave Optimization

optimization point

1、减少json解析字段,按需返回。
2、sql优化,返回字段缩减。
3、由于不分页,因此可以减少count查询的sql

Optimization effect

insert image description here

existing problems

1、数据库占用37%
2、json耗时从27%下降到17%
3、gc占用过多,4G内存,只能用到100M就GC了
4、日志打印耗时10%
5、gvalid校验参数耗时占比6.6%

The third wave of optimization

Optimize sql and increase the connection pool

unsafeDb.SetMaxOpenConns(200) // 设置最大连接数为 200
unsafeDb.SetMaxIdleConns(10)  // 设置最大空闲连接数为 10

Optimize json encoder

// 原来先解析成[]byte,后赋值给resp
// 新增了变量且多了赋值操作
jsonData, _ := fastJson.Marshal(common.SuccessRESTResp(data))
resp.Header().Set(restful.HEADER_ContentType, "application/json")
resp.WriteHeader(200)
resp.Write(jsonData)



// 直接把go-restful的resp *restful.Response 作为参数
// 编码器会直接把编码结果写入到resp,省去了原来的拷贝赋值
enc := fastJson.NewEncoder(resp)
enc.Encode(common.SuccessRESTResp(data))

The measured json serialization effect has not improved.

json.Marshal

insert image description here

Use json.NewEncoder

insert image description here

      In theory, responseit is right to merge the writes together to reduce data copying, but the actual measurement found that it takes jsonfrom 10 to 10 hours . Viewing the details, it is found that the main part is the dynamic expansion of slices, as shown below:27%33%
insert image description here

Reduce unnecessary log printing

logrusinfopart of the debug. Unnecessary infofprinting parameters are removed or converted todebugf

Manually set the GOGC threshold

      At present, the memory used is too little, and 4Gonly tens of M are allocated. The threshold Goof the service can be configured through this environment variable, the default is , that is to say: the part can refer to: trace performance analysis of GolangGCGOGC100
GC

当前内存10M,那么下次GC是在10M + 10M * 100% = 20M的时候进行GC。

Set the threshold to increase 10by a factor, and it is estimated that 100-200Mit will only be performed when the memory ratio reaches the intervalGC
insert image description here

Optimized effect

insert image description here

Pressure test results

insert image description here

in conclusion
1、数据库占比31%
2、日志占比下降,目前占比4%
3、json序列化占比20%左右
4、主要耗时还是在sql,需要上缓存
5、参数校验耗时低于1%
6、GC占比略有下降,查看trace发现GC频率明显下降。

Subsequent optimization direction

1. Access cache

      High QPSmust be inseparable from the cache, and the database can only support it QPS. redisIt is appropriate to use for caching.

2. Limit the number of inquiries

      jsonWith the increase of query parameters, serialization accounts for an increasing proportion. Appropriately reducing the return of data jsonwill greatly improve serialization.

3. Replace the high-performance log library

The importance of the log does not need to be emphasized, and it is not advisable to abandon the log because of performance. As the key to statistics and troubleshooting, we can choose to use a higher-performance log library.
Uber's open source high-performance log libraryinsert image description here
To be honest, logrusthe performance of uber is really poor.

4. GC tuning

      Go 1.9Before, you can only set the threshold GOGCto adjust the method , and then you can directly set one , if it is reached , it will not be reached .       The question is, is it really good if the threshold is high? The memory of one scan becomes larger, and the marking time will be prolonged. The actual effect may not be satisfactory. It needs careful and multiple verifications to find the optimal threshold.GC1.9MemoryLimitlimitGCGC
GCGCGC

end

Guess you like

Origin blog.csdn.net/LJFPHP/article/details/131261957