Use the go performance analysis tool pprof to analyze memory leaks

This article has participated in the "Newcomer Creation Ceremony" event to start the road of gold creation together.

Foreword: Recently, a project in the company occupies a very high memory. After startup, it occupies 8 to 10 G, and as the program runs, the memory will become larger and larger. It can reach 15 G in a day or two, and the server will be set online and hang up. , but I can continue to use it after restarting. The boss asked me to optimize the positioning. Haha, I said that the memory is not enough, and the machine will come together. Manual dog head!

As the saying goes: golang has 10 memory leaks, 8 goroutine leaks, 1 real memory leak, and one is cgo

Who is familiar? do not know

Closer to home: The tasks laid out still need to be implemented. The initial analysis is a memory leak problem. It may be that the cache is used in some places or the object has not been released. Then the next step is to locate it, and the following only provides solutions.

Test for local debug and test environment

Related instructions:

# 浏览器查看
http://localhost:7777/debug/pprof

# 查看内存占用
go tool pprof --text http://localhost:7777/debug/pprof/heap
# 打开web分析
go tool pprof -http=:7778 http://localhost:7777/debug/pprof/heap
复制代码

1. Open the pprof tool of go. Just introduce in the project: ****_ "net/http/pprof"

_ "net/http/pprof"
复制代码

2. Launch the program, browser access

http://localhost:7777/debug/pprof/heap

It seems that there is a lot of heap memory information, click on it to see

I can't see clearly, the evil mosaic, haha, I typed it myself, because of confidentiality, I didn't mean to not show it to my family, probably means that the object created in this position of the program takes up a lot of memory, so big, Open the project to see what it is.

Sure enough, the code reads a model file, which occupies more than one G. So simple? Problem solved?

Scroll to the bottom of the page to view:

It doesn't feel right. The alloc shown above: the number of bytes in the allocated space is 1.2G, which is exactly the same as this model, and the total system memory is only 2.1G. But my running environment is the same as the top command to view the naming and occupy 5 G, and who has eaten more than 3 G?

No matter, comment out the model loading code and try it out. Note, restart, the memory really dropped by more than 1 G. But still was eaten more than 3 G.

3. Roll back the project and use the pprof web UI to visually see the size and occupancy of the object

instruction:

go tool pprof -http=:7778 http://localhost:7777/debug/pprof/heap
复制代码

View the intuitive object memory allocation diagram, which also only occupies one G

4. Go back to the origin and think about the go memory leak

10 memory leaks in golang, 8 goroutine leaks, 1 real memory leak, and one in cgo

因为项目很少使用goroutine,所以一开始就根据代码排除掉了这个原因,那内存有没有被记录到,难道是因为cgo?

5.一条路走到黑

查看一下项目里面是否有cgo相关的插件引入

找啊找啊找啊,好像真找到了

项目引入了一个结巴分词,结巴分词工具使用完以后需要释放掉

好,既然这样,先注释到这个结巴分词加载试试,注释重启,哎,哎,哎!好像真是这个问题。内存终于降下来了,苍天不负有心人啊。但是这个玩意什么时候free呢,按道理是使用完成以后free就行,但是为了项目的访问速度,每个活跃商户都会将它加载到内存中,加上商户自定义的关键词分词。也就是说商户如果一直活跃的话就一直在内存中,而且会越来越大,这也是为什么项目内存会递增的原因呢。好吧,找到问题了,但是又好像没有解决问题。

优化:想在项目分为4个节点,也就是每个内存节点都会加载一次活跃的用户,并且加载这个结巴分词对象。那我把它改成分布式的加载问题内存不久节约了四分之三吗?原来是每个10个G,如果分布式加载的话每个节点就只需要加载2.5个G。

6.解决方法

使用python根据用户id做个hash路由,每个请求先请求python项目接口,python根据节点通过dns自动发现存活节点,然后使用用户id做hash转发。完美解决!

python使用的是flask框架,线上直接将接口原接口切换成python的项目的接口,完美融合,内存终于降下来了啊!

7总结:

golang10次内存泄漏,8次goroutine泄漏,1次真正内存泄漏,还有一次是cgo

pprof只对纯go分析有用,cgo的问题pprof是无法定位的,只能通过对代码的熟悉或调试去定位,或者是用BCC工具去跟踪操作系统内核去分析解决。我自己的解决方式偏向于删除我怀疑的部分代码然后重启来比较内存变化,这样更直观,只是有时候方向不对会花费比较多的时间。

Guess you like

Origin juejin.im/post/7083412727498014734