Article directory
-
- 1. Overview of pprof
- Second, the service opens pprof
- 3. Use pprof to collect CPU time consumption
- Fourth, use pprof to analyze memory leaks
- Five, performance optimization case
- Subsequent optimization direction
1. Overview of pprof
pprof
is a tool for visualizing and analyzing profiling data. But everyone Gopher
should have heard pprof
of the name, look for it for memory leaks, look for it for performance bottlenecks, look for it by looking at the flame graph, etc...
This article will introduce pprof
the use of the basics, as well as performance analysis and optimization examples. Use and understand from practical examples pprof
.
If you trace
are interested in analysis, you can check Golang's trace performance analysis
Second, the service opens pprof
pprof
There are two ways to use, one is to call a specific method in the code API
to collect and write to the file, the package used is:
runtime/pprof
Reference: pprof of Go you don't know
https://darjun.github.io/2021/06/09/youdontknowgo/pprof/
One is http
service-based collection, the packages used are:
net/http/pprof
http
We are using it based on services this time pprof
.
1. Reference pprof in the code
1、代码中引用pprof
_ "net/http/pprof"
2、服务开启一个端口,用来监听pprof
// Start a HTTP server to expose pprof data
pprofAddr := "0.0.0.0:6060"
go func(addr string) {
if err := http.ListenAndServe(addr, nil); err != http.ErrServerClosed {
logger.Fatalf("Pprof server ListenAndServe: %v", err)
}
}(pprofAddr)
logger.Infof("HTTP Pprof start at %s", pprofAddr)
3、浏览器查看
http://服务ip:6060/debug/pprof
2. The service opens a port to monitor pprof
// Start a HTTP server to expose pprof data
pprofAddr := "0.0.0.0:6060"
go func(addr string) {
if err := http.ListenAndServe(addr, nil); err != http.ErrServerClosed {
logger.Fatalf("Pprof server ListenAndServe: %v", err)
}
}(pprofAddr)
logger.Infof("HTTP Pprof start at %s", pprofAddr)
3、浏览器查看
http://服务ip:6060/debug/pprof
The web interface is as follows
3. Use pprof to collect CPU time consumption
(1) 引入Pprof
(2) 监听6060端口
(3) 采样,例如采样cpu,采样30s
http://10.91.2.111:6060/debug/pprof/profile?seconds=30&sample_index=0&top=20
参数结束:
seconds: 采集30s
top: 采集耗时前20的程序
在30s内访问接口即可。30s后会下载文件到本地
(4) 分析生成的profile文件:本地开启8080端口,通过浏览器查看分析
go tool pprof -http=localhost:8080 profile(文件名)
(5) Could not execute dot; may need to install graphviz.
安装图像处理工具:
brew cleanup
brew update
brew install graphviz
1. Call flow chart
Main indicators, check the time-consuming ratio of each call, and the overall time-consuming of the call link.
2. View the flame graph
Click on view in the upper left corner and select flame graph
main indicators
1、火焰图的格子越宽,代表耗时越久
2、火焰图的高度越高代表调用链路越长。
追求:宽度适中,高度适中。优化格子过宽的,或者预期之外耗时较久的。
Fourth, use pprof to analyze memory leaks
View the memory usage of the current program
Reference: Use pprof to troubleshoot Golang memory leaks
1、执行命令
go tool pprof -inuse_space http://127.0.0.1:6060/debug/pprof/heap
2、进入命令行
top: 查看当前占用内存的函数排行
list 函数名: 查看内存占用的函数内部代码
top示例如下:
Type: inuse_space
Time: Jun 16, 2023 at 3:29pm (CST)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 2706.84kB, 100% of 2706.84kB total
Showing top 10 nodes out of 32
flat flat% sum% cum cum%
650.62kB 24.04% 24.04% 650.62kB 24.04% github.com/valyala/fasthttp/stackless.NewFunc
515kB 19.03% 43.06% 515kB 19.03% vendor/golang.org/x/net/http2/hpack.init
514.63kB 19.01% 62.07% 514.63kB 19.01% math/rand.newSource (inline)
514.38kB 19.00% 81.08% 514.38kB 19.00% google.golang.org/protobuf/reflect/protoregistry.(*Types).register
512.20kB 18.92% 100% 512.20kB 18.92% runtime.malg
0 0% 100% 514.63kB 19.01% github.com/nacos-group/nacos-sdk-go/v2/clients.NewConfigClient
0 0% 100% 514.63kB 19.01% github.com/nacos-group/nacos-sdk-go/v2/clients/config_client.NewConfigClient
0 0% 100% 514.63kB 19.01% github.com/nacos-group/nacos-sdk-go/v2/clients/config_client.NewConfigProxy
0 0% 100% 514.63kB 19.01% github.com/nacos-group/nacos-sdk-go/v2/common/nacos_server.NewNacosServer
0 0% 100% 514.63kB 19.01% github.com/spf13/cobra.(*Command).Execute
3、通过查看当前服务的内存占用,可以分析内存泄漏等
Check the running time of goroutine
You need to be careful when you see a long-running one goroutine
. If it is non-systematic goroutine
, it may be that the business goroutine
has not been released, which is also one of the sources of memory leaks.
Five, performance optimization case
background
Go
When the service performs batch query, because the returned data contains a large amount of OSS
signature data, the actual performance QPS
is very low and the delay is relatively high, so it is decided to perform performance analysis and optimization for this interface.
The general flow of the query is:
1、参数校验
2、从数据库批量查询
3、OSS签名,处理数据
4、json序列化返回
The process is very simple, it is a common query interface, but when the number of queries is relatively large, such as querying more than 30 records, the QPS
performance is really not satisfactory. The pressure test results are as follows:
This performance analysis and optimization does not involve the database cache for the time being. Due to time reasons, only other optimizations except the database will be done.
1. Introduction to stress testing tools
go-wrk
go-wrk
is a command-line tool for testing the performance of Go
language web servers. It is based Go
on the standard library net/http
package of , using to goroutine
achieve high concurrent requests.
go-wrk [flags] url:启动测试,并指定要测试的 URL。
-c value:指定并发数。默认为 50。
-d value: 压测时间,单位是秒
-t value:指定请求超时时间。默认为 20 秒。
-H "header:value":设置 HTTP 请求头。
-v:显示详细输出。
-m: 设置 HTTP 请求方法,例如 GET, POST, PUT, DELETE 等常用的请求方法。
例如:
go-wrk -c 100 -d 30 -T 2000 -H "Content-Type: application/json, Authorization: Bearer token" \n
http://localhost:8080/test -m POST -body '{"name":"bob","age":30}'
返回:
15841 requests in 38.848392606s, 117.13MB read
Requests/sec: 407.76
Transfer/sec: 3.01MB
Avg Req Time: 2.452395215s
Fastest Request: 66.236651ms
Slowest Request: 19.945042378s
Number of Errors: 232
Suitable for local use and easy to install. The parameter of primary concern is the number of requests.
jmeter
jmeter
is an open source Java
application for functional testing and performance testing. When installing, an interface is required to configure the automation scene. Currently, it is meterSphere
built-in in use jmeter
, and the main focus is on TPS
indicators.
Linux
Originally , I wanted to install a command line to test the pressure. The actual test result is that the installation is complicated, but the functions are very comprehensive. The company’s test is using jmeter
, so our subsequent pressure tests will be based jmeter
on the results of
2. pprof analyzes the flame graph
The main questions are as follows
oss签名占比将近40%
GC耗时比较高,占比>10%
go-restful写入response占比>20%,本质上是json序列化
3. The first wave of optimization
OSS signature
Since the expiration time of the signature setting is 1 hour, and the written OSS
data is not updated frequently, you can consider adding a cache.
Lightweight is used here go-cache
. OSS
Signature caching does not need to consider distributed caching, go-cache
it is enough.
go get github.com/patrickmn/go-cache
var c = cache.New(50*time.Minute, 10*time.Minute) //设置过期队列
c.Set("key", value, cache.DefaultExpiration) // 写入缓存
cachedValue, found := c.Get("key") // 读出缓存
JSON serialization optimization
Native json
libraries use reflection, which may be poor in performance. We use high-performance json-iterator/go
libraries for json
serialization.
Go common package (thirty-three): high-performance json parser
// 使用很简单,安装之后开箱即用
var fastJson = jsoniter.ConfigFastest
jsonData, _ := fastJson.Marshal(data)
The effect after optimization
1、oss签名占比<1%
2、批量查询sql耗时较多,主要集中在scanAll部分,这部分的耗时跟数据库查询字段的多少有关
3、json序列化耗时较多,但比之前的原声json要好一些
Second Wave Optimization
optimization point
1、减少json解析字段,按需返回。
2、sql优化,返回字段缩减。
3、由于不分页,因此可以减少count查询的sql
Optimization effect
existing problems
1、数据库占用37%
2、json耗时从27%下降到17%
3、gc占用过多,4G内存,只能用到100M就GC了
4、日志打印耗时10%
5、gvalid校验参数耗时占比6.6%
The third wave of optimization
Optimize sql and increase the connection pool
unsafeDb.SetMaxOpenConns(200) // 设置最大连接数为 200
unsafeDb.SetMaxIdleConns(10) // 设置最大空闲连接数为 10
Optimize json encoder
// 原来先解析成[]byte,后赋值给resp
// 新增了变量且多了赋值操作
jsonData, _ := fastJson.Marshal(common.SuccessRESTResp(data))
resp.Header().Set(restful.HEADER_ContentType, "application/json")
resp.WriteHeader(200)
resp.Write(jsonData)
// 直接把go-restful的resp *restful.Response 作为参数
// 编码器会直接把编码结果写入到resp,省去了原来的拷贝赋值
enc := fastJson.NewEncoder(resp)
enc.Encode(common.SuccessRESTResp(data))
The measured json serialization effect has not improved.
json.Marshal
Use json.NewEncoder
In theory, response
it is right to merge the writes together to reduce data copying, but the actual measurement found that it takes json
from 10 to 10 hours . Viewing the details, it is found that the main part is the dynamic expansion of slices, as shown below:27%
33%
Reduce unnecessary log printing
logrus
info
part of the debug
. Unnecessary infof
printing parameters are removed or converted todebugf
Manually set the GOGC threshold
At present, the memory used is too little, and 4G
only tens of M are allocated. The threshold Go
of the service can be configured through this environment variable, the default is , that is to say: the part can refer to: trace performance analysis of GolangGC
GOGC
100
GC
当前内存10M,那么下次GC是在10M + 10M * 100% = 20M的时候进行GC。
Set the threshold to increase 10
by a factor, and it is estimated that 100-200M
it will only be performed when the memory ratio reaches the intervalGC
Optimized effect
Pressure test results
in conclusion
1、数据库占比31%
2、日志占比下降,目前占比4%
3、json序列化占比20%左右
4、主要耗时还是在sql,需要上缓存
5、参数校验耗时低于1%
6、GC占比略有下降,查看trace发现GC频率明显下降。
Subsequent optimization direction
1. Access cache
High QPS
must be inseparable from the cache, and the database can only support it QPS
. redis
It is appropriate to use for caching.
2. Limit the number of inquiries
json
With the increase of query parameters, serialization accounts for an increasing proportion. Appropriately reducing the return of data json
will greatly improve serialization.
3. Replace the high-performance log library
The importance of the log does not need to be emphasized, and it is not advisable to abandon the log because of performance. As the key to statistics and troubleshooting, we can choose to use a higher-performance log library.
Uber's open source high-performance log library
To be honest, logrus
the performance of uber is really poor.
4. GC tuning
Go 1.9
Before, you can only set the threshold GOGC
to adjust the method , and then you can directly set one , if it is reached , it will not be reached . The question is, is it really good if the threshold is high? The memory of one scan becomes larger, and the marking time will be prolonged. The actual effect may not be satisfactory. It needs careful and multiple verifications to find the optimal threshold.GC
1.9
MemoryLimit
limit
GC
GC
GC
GC
GC
end