1: Background
1. Tell stories
A few days ago, a friend came to me and said that the memory of his production program was exploding. He asked me to help find out what was going on. The simplest and most crude way is to ask my friend to grab a dump when the memory is exploding and take a look. You probably know what's going on.
2: Windbg analysis
1. Who ate the memory?
This issue cannot be said enough, be sure to see clearly how this program can be personalized and developed using !address -summary
commands.
0:000> !address -summary
--- Usage Summary ---------------- RgnCount ----------- Total Size -------- %ofBusy %ofTotal
Free 255 7dfb`064e1000 ( 125.981 TB) 98.42%
<unknown> 529 204`d53ac000 ( 2.019 TB) 99.97% 1.58%
Heap 889 0`170f0000 ( 368.938 MB) 0.02% 0.00%
Image 1214 0`07a9a000 ( 122.602 MB) 0.01% 0.00%
Stack 192 0`05980000 ( 89.500 MB) 0.00% 0.00%
Other 10 0`001d8000 ( 1.844 MB) 0.00% 0.00%
TEB 64 0`00080000 ( 512.000 kB) 0.00% 0.00%
PEB 1 0`00001000 ( 4.000 kB) 0.00% 0.00%
--- State Summary ---------------- RgnCount ----------- Total Size -------- %ofBusy %ofTotal
MEM_FREE 255 7dfb`064e1000 ( 125.981 TB) 98.42%
MEM_RESERVE 709 204`43eab000 ( 2.017 TB) 99.86% 1.58%
MEM_COMMIT 2190 0`b5c64000 ( 2.840 GB) 0.14% 0.00%
Judging from the hexagram, the process memory is only 2.84G
small. Strictly speaking, it is not too much. Maybe my friend is a little impatient. From the above unknown
indicators, it is likely that the managed heap is skyrocketing. Continue to use and observe !eeheap -gc
the managed heap.
0:000> !eeheap -gc
========================================
Number of GC Heaps: 4
----------------------------------------
Heap 0 (000001d0adf50a20)
generation 0 starts at 1d0b3fad350
generation 1 starts at 1d0b3f9be88
generation 2 starts at 1d0ae5d1000
ephemeral segment allocation context: none
Small object heap
segment begin allocated committed allocated size committed size
01d0ae5d0000 01d0ae5d1000 01d0b4046258 01d0b48ac000 0x5a75258 (94851672) 0x62dc000 (103661568)
Large object heap starts at 1d4ae5d1000
segment begin allocated committed allocated size committed size
01d4ae5d0000 01d4ae5d1000 01d4b6d0c4e8 01d4b6d2d000 0x873b4e8 (141800680) 0x875d000 (141938688)
Pinned object heap starts at 1d4ee5d1000
segment begin allocated committed allocated size committed size
01d4ee5d0000 01d4ee5d1000 01d4ee5e4f08 01d4ee5f2000 0x13f08 (81672) 0x22000 (139264)
------------------------------
...
Heap 3 (000001d0ae4fd000)
generation 0 starts at 1d3b26929e0
generation 1 starts at 1d3b2687ad8
generation 2 starts at 1d3ae5d1000
ephemeral segment allocation context: none
Small object heap
segment begin allocated committed allocated size committed size
01d3ae5d0000 01d3ae5d1000 01d4179a5980 01d418021000 0x693d4980 (1765624192) 0x69a51000 (1772425216)
Large object heap starts at 1d4de5d1000
segment begin allocated committed allocated size committed size
01d4de5d0000 01d4de5d1000 01d4df8836d8 01d4df884000 0x12b26d8 (19605208) 0x12b4000 (19611648)
Pinned object heap starts at 1d51e5d1000
segment begin allocated committed allocated size committed size
01d51e5d0000 01d51e5d1000 01d51e5dd7e0 01d51e5e2000 0xc7e0 (51168) 0x12000 (73728)
------------------------------
GC Allocated Heap Size: Size: 0x8a6b9060 (2322305120) bytes.
GC Committed Heap Size: Size: 0x8c6b1000 (2355826688) bytes.
From the GC heap, it turns out that it is a problem with the managed layer. Continue to use it and !dumpheap -stat
observe the current situation of the managed heap to see which one is the culprit.
0:000> !dumpheap -stat
Statistics:
MT Count TotalSize Class Name
...
7fff32e81db8 43 68,801,032 SmartMeter.Mem.TerminalInfo[]
7fff329f7470 200,000 110,400,000 SmartMeter.Model.MeterInfo_Model
7fff3227d708 2,285,392 116,193,998 System.String
01d0ae46b350 543 1,857,281,320 Free
Total 3,947,969 objects, 2,314,533,332 bytes
Fragmented blocks larger than 0.5 MB:
Address Size Followed By
01d0ae935870 723,384 01d0ae9e6228 System.SByte[]
01d1b41d3cd0 23,081,616 01d1b57d6f60 System.Byte[]
01d3b274eb40 1,696,943,656 01d4179a3968 System.Byte[]
You don't know this hexagram, but you will be shocked when you see it. 2.3G
The memory here has been 1.69G
embezzled by a Free. If you don't believe it, you can !do
verify it.
0:000> !do 01d3b274eb40
Free Object
Size: 1696943656(0x65254e28) bytes
2. Why is there such a big Free?
This is a question worth thinking about, and it also determines the direction of our next analysis. The next step is to look at the foothold of this free and the distribution of surrounding objects. You can use observation !gcwhere
.
0:000> !gcwhere 01d3b274eb40
Address Heap Segment Generation Allocated Committed Reserved
01d3b274eb40 3 01d3ae5d0000 0 1d3ae5d1000-1d4179a5980 1d3ae5d0000-1d418021000 1d418021000-1d4ae5d0000
0:000> !dumpheap -segment 1d3ae5d0000
...
01d3b274e948 7fff32468658 96
01d3b274e9a8 7fff3227d708 28
01d3b274e9c8 7fff3227d708 28
01d3b274e9e8 7fff32d0c8d8 80
01d3b274ea38 7fff3227d708 96
01d3b274ea98 7fff32d0aa38 40
01d3b274eac0 01d0ae46b350 128 Free
01d3b274eb40 01d0ae46b350 1,696,943,656 Free
01d4179a3968 7fff323e1638 8,216
It is a pity from the hexagram. If Free falls in the last position of the segment, then the segment will be uncommitted and the memory will be lost. However, the last position is occupied by an 8216byte object, which prevents memory recycling. Experienced experts Friends may know that this person is either rich or expensive. There is a high probability that he is pinned. You can !gcroot
observe it.
0:000> !gcroot 01d4179a3968
HandleTable:
000001d0ae3927f8 (async pinned handle)
-> 01d3b26706f0 System.Threading.OverlappedData
-> 01d4179a3968 System.Byte[]
Found 1 unique roots.
0:000> !dumpobj /d 1d4179a3968
Name: System.Byte[]
MethodTable: 00007fff323e1638
EEClass: 00007fff323e15b8
Tracked Type: false
Size: 8216(0x2018) bytes
Array: Rank 1, Number of elements 8192, Type Byte (Print Array)
Content: ............L.o.g.\.2.0.2.3.0...
Fields:
None
From the above async pinned handle
point of view, it is a callback function for file monitoring. Here we can explain from the surface: it is this 8216 object that causes the memory to be unrecoverable.
3. Is 8216 really responsible?
If you really want 8216 to take responsibility, then you really only see the appearance. The sudden surge in memory that cannot be recovered is just happened to be blocked by 8216, but it is not the essential reason. What really needs to be considered is why the GC will The creation of such a large single Free actually metaphorically refers to the short-term occurrence of the current program 大对象分配
. Yes, that is the word.
The next question is how to find this 大对象分配
? The best way is to use perfview
.NET SampAlloc to gain insight. If you must use WinDbg, you can only look at what Free was during his lifetime. You may be able to find the answer, and you can use .writemem
commands to observe.
0:000> !do 01d3b274eb40
Free Object
Size: 1696943656(0x65254e28) bytes
0:000> .writemem D:\testdump\1.txt 01d3b274eb40 L?0x65254e28
Writing 65254e28 bytes................
From the data in the hexagram, we can see that there is a lot of billing information. It seems that it is caused by grabbing a large amount of data from the database in a short period of time and tossing it on the hosting heap . Once you know the essential reason, the solution is relatively simple. There are usually two a practice.
-
Modify the GC mode to Workstation.
-
Change large batches of data to small steps and run quickly
Three: Summary
This memory explosion accident appears to be caused by the blockage of 8216, which prevents the memory from being uncommitted. In essence, it is still a managed heap 内存黑洞
phenomenon.