Recording and analysis of the reasons for the explosion of memory in a power system using .NET

1: Background

1. Tell stories

A few days ago, a friend came to me and said that the memory of his production program was exploding. He asked me to help find out what was going on. The simplest and most crude way is to ask my friend to grab a dump when the memory is exploding and take a look. You probably know what's going on.

2: Windbg analysis

1. Who ate the memory?

This issue cannot be said enough, be sure to see clearly how this program can be personalized and developed using  !address -summary commands.


0:000> !address -summary

--- Usage Summary ---------------- RgnCount ----------- Total Size -------- %ofBusy %ofTotal
Free                                    255     7dfb`064e1000 ( 125.981 TB)           98.42%
<unknown>                               529      204`d53ac000 (   2.019 TB)  99.97%    1.58%
Heap                                    889        0`170f0000 ( 368.938 MB)   0.02%    0.00%
Image                                  1214        0`07a9a000 ( 122.602 MB)   0.01%    0.00%
Stack                                   192        0`05980000 (  89.500 MB)   0.00%    0.00%
Other                                    10        0`001d8000 (   1.844 MB)   0.00%    0.00%
TEB                                      64        0`00080000 ( 512.000 kB)   0.00%    0.00%
PEB                                       1        0`00001000 (   4.000 kB)   0.00%    0.00%

--- State Summary ---------------- RgnCount ----------- Total Size -------- %ofBusy %ofTotal
MEM_FREE                                255     7dfb`064e1000 ( 125.981 TB)           98.42%
MEM_RESERVE                             709      204`43eab000 (   2.017 TB)  99.86%    1.58%
MEM_COMMIT                             2190        0`b5c64000 (   2.840 GB)   0.14%    0.00%

Judging from the hexagram, the process memory is only  2.84Gsmall. Strictly speaking, it is not too much. Maybe my friend is a little impatient. From the above  unknown indicators, it is likely that the managed heap is skyrocketing. Continue to use and observe  !eeheap -gc the managed heap.


0:000> !eeheap -gc

========================================
Number of GC Heaps: 4
----------------------------------------
Heap 0 (000001d0adf50a20)
generation 0 starts at 1d0b3fad350
generation 1 starts at 1d0b3f9be88
generation 2 starts at 1d0ae5d1000
ephemeral segment allocation context: none
Small object heap
         segment            begin        allocated        committed allocated size          committed size         
    01d0ae5d0000     01d0ae5d1000     01d0b4046258     01d0b48ac000 0x5a75258 (94851672)    0x62dc000 (103661568)  
Large object heap starts at 1d4ae5d1000
         segment            begin        allocated        committed allocated size          committed size         
    01d4ae5d0000     01d4ae5d1000     01d4b6d0c4e8     01d4b6d2d000 0x873b4e8 (141800680)   0x875d000 (141938688)  
Pinned object heap starts at 1d4ee5d1000
         segment            begin        allocated        committed allocated size          committed size         
    01d4ee5d0000     01d4ee5d1000     01d4ee5e4f08     01d4ee5f2000 0x13f08 (81672)         0x22000 (139264)       
------------------------------
...
Heap 3 (000001d0ae4fd000)
generation 0 starts at 1d3b26929e0
generation 1 starts at 1d3b2687ad8
generation 2 starts at 1d3ae5d1000
ephemeral segment allocation context: none
Small object heap
         segment            begin        allocated        committed allocated size          committed size         
    01d3ae5d0000     01d3ae5d1000     01d4179a5980     01d418021000 0x693d4980 (1765624192) 0x69a51000 (1772425216)
Large object heap starts at 1d4de5d1000
         segment            begin        allocated        committed allocated size          committed size         
    01d4de5d0000     01d4de5d1000     01d4df8836d8     01d4df884000 0x12b26d8 (19605208)    0x12b4000 (19611648)   
Pinned object heap starts at 1d51e5d1000
         segment            begin        allocated        committed allocated size          committed size         
    01d51e5d0000     01d51e5d1000     01d51e5dd7e0     01d51e5e2000 0xc7e0 (51168)          0x12000 (73728)        
------------------------------
GC Allocated Heap Size:    Size: 0x8a6b9060 (2322305120) bytes.
GC Committed Heap Size:    Size: 0x8c6b1000 (2355826688) bytes.

From the GC heap, it turns out that it is a problem with the managed layer. Continue to use it and  !dumpheap -stat observe the current situation of the managed heap to see which one is the culprit.


0:000> !dumpheap -stat
Statistics:
          MT     Count     TotalSize Class Name
...
7fff32e81db8        43    68,801,032 SmartMeter.Mem.TerminalInfo[]
7fff329f7470   200,000   110,400,000 SmartMeter.Model.MeterInfo_Model
7fff3227d708 2,285,392   116,193,998 System.String
01d0ae46b350       543 1,857,281,320 Free
Total 3,947,969 objects, 2,314,533,332 bytes

Fragmented blocks larger than 0.5 MB:
         Address           Size      Followed By
    01d0ae935870        723,384     01d0ae9e6228 System.SByte[]
    01d1b41d3cd0     23,081,616     01d1b57d6f60 System.Byte[]
    01d3b274eb40  1,696,943,656     01d4179a3968 System.Byte[]

You don't know this hexagram, but you will be shocked when you see it. 2.3GThe memory here has been  1.69G embezzled by a Free. If you don't believe it, you can  !do verify it.


0:000> !do 01d3b274eb40
Free Object
Size:        1696943656(0x65254e28) bytes

2. Why is there such a big Free?

This is a question worth thinking about, and it also determines the direction of our next analysis. The next step is to look at the foothold of this free and the distribution of surrounding objects. You can use observation  !gcwhere .


0:000> !gcwhere 01d3b274eb40
Address          Heap   Segment          Generation Allocated               Committed               Reserved               
01d3b274eb40     3      01d3ae5d0000     0          1d3ae5d1000-1d4179a5980 1d3ae5d0000-1d418021000 1d418021000-1d4ae5d0000

0:000> !dumpheap -segment 1d3ae5d0000
    ...
    01d3b274e948     7fff32468658             96 
    01d3b274e9a8     7fff3227d708             28 
    01d3b274e9c8     7fff3227d708             28 
    01d3b274e9e8     7fff32d0c8d8             80 
    01d3b274ea38     7fff3227d708             96 
    01d3b274ea98     7fff32d0aa38             40 
    01d3b274eac0     01d0ae46b350            128 Free
    01d3b274eb40     01d0ae46b350  1,696,943,656 Free
    01d4179a3968     7fff323e1638          8,216 

It is a pity from the hexagram. If Free falls in the last position of the segment, then the segment will be uncommitted and the memory will be lost. However, the last position is occupied by an 8216byte object, which prevents memory recycling. Experienced experts Friends may know that this person is either rich or expensive. There is a high probability that he is pinned. You can  !gcroot observe it.


0:000> !gcroot 01d4179a3968
HandleTable:
    000001d0ae3927f8 (async pinned handle)
          -> 01d3b26706f0     System.Threading.OverlappedData 
          -> 01d4179a3968     System.Byte[] 

Found 1 unique roots.

0:000> !dumpobj /d 1d4179a3968
Name:        System.Byte[]
MethodTable: 00007fff323e1638
EEClass:     00007fff323e15b8
Tracked Type: false
Size:        8216(0x2018) bytes
Array:       Rank 1, Number of elements 8192, Type Byte (Print Array)
Content:     ............L.o.g.\.2.0.2.3.0...
Fields:
None

From the above  async pinned handle point of view, it is a callback function for file monitoring. Here we can explain from the surface: it is this 8216 object that causes the memory to be unrecoverable.

3. Is 8216 really responsible?

If you really want 8216 to take responsibility, then you really only see the appearance. The sudden surge in memory that cannot be recovered is just happened to be blocked by 8216, but it is not the essential reason. What really needs to be considered is why the GC will The creation of such a large single Free actually metaphorically refers to the short-term occurrence of the current program  大对象分配. Yes, that is the word.

The next question is how to find this  大对象分配 ? The best way is to use  perfview .NET SampAlloc to gain insight. If you must use WinDbg, you can only look at what Free was during his lifetime. You may be able to find the answer, and you can use  .writemem commands to observe.


0:000> !do 01d3b274eb40
Free Object
Size:        1696943656(0x65254e28) bytes

0:000> .writemem D:\testdump\1.txt 01d3b274eb40 L?0x65254e28
Writing 65254e28 bytes................

From the data in the hexagram, we can see that there is a lot of billing information. It seems that it is caused by grabbing a large amount of data from the database in a short period of time and tossing it on the hosting heap . Once you know the essential reason, the solution is relatively simple. There are usually two a practice.

  • Modify the GC mode to Workstation.

  • Change large batches of data to small steps and run quickly

Three: Summary

This memory explosion accident appears to be caused by the blockage of 8216, which prevents the memory from being uncommitted. In essence, it is still a managed heap  内存黑洞 phenomenon.

Guess you like

Origin blog.csdn.net/qq_41221596/article/details/133000430