Use dotTrace to analyze the performance of .NET applications

A few days ago, a friend asked me how you usually troubleshoot a program’s performance problems. Don't get me wrong, this friend is not me, because I really have such a friend named Toby. When it comes to performance issues, you may immediately think of indicators such as the number of concurrency , throughput , response time , QPS , TPS, etc. These indicators can indeed reflect the performance of a system. But as our system structure becomes more and more complex, it will also become more and more difficult to find such a performance "loss point". In the eyes of different people, the criteria for judging the performance are different. For example, in the eyes of the front end, the speed of page opening represents the quality of performance; in the eyes of the back end, the number of concurrency, throughput and response time Represents the quality of performance; and in the eyes of the DBA, the execution efficiency of a SQL statement represents the quality of performance. Not to mention, the programs in the real world have to shuttle back and forth in the world of hardware and network, so it is very easy to stack from 80% of functions to 100%; and from 80% of performance optimization to 85% , It is not a very easy thing. It is very simple to figure this out because our system is never simple 1 + 1 = 2. At this point, we need a performance analysis tools, and to share today is JetBrainsproduced dotTrace .

Quick Start

The process of installing the software is not shown here. It is recommended that you install dotTrace and dotMemery at the same time . Because this is a JetBrainsfamily bucket of software, when installed options click on it, it can be described as little effort. After installation, the interface looks like this. You can notice that it can detect in-process .NET applications, local .NET applications and remote .NET applications, because here is a .NET Core application as a demonstration. Therefore, we choose Profile Local App:

dotTrace main interface

Here, we have prepared a simple console program:

public class Program
{
    static void Main(string[] args)
    {
        CPUHack();
        MemeryHack();
    }
        
    public static void MemeryHack() {
        Console.ReadLine();
        var bytes = GC.GetTotalAllocatedBytes();
        Console.WriteLine($"AllocatedBytes: { bytes } bytes");
        var list = new List<byte[]>();
        try
        {
            while (true) {
              list.Add(new byte[85000]);
            }
        } catch (OutOfMemoryException) {
            Console.WriteLine(nameof(OutOfMemoryException));
            Console.WriteLine(list.Count);
            bytes = GC.GetTotalAllocatedBytes();
            Console.WriteLine($"AllocatedBytes: { bytes } bytes");
        }

        Console.ReadLine();
    }

    public static void CPUHack() {
        Parallel.For(0, Environment.ProcessorCount,
            new ParallelOptions() {
                MaxDegreeOfParallelism = Environment.ProcessorCount
            },
            i => {
              
        });
    }
}

Among them, the CPUHack()method comes from: blow up your CPU ; the MemeryHack()method comes from: realize OutOfMemory through code . As the name suggests, we will use these two methods to test dotTrace and dotMemery respectively .

dotTrace currently supports the following platforms: .NET, .NET Core, WPF, UWP ( Universal Windows Platform ), ASP.NET, Windows Services, WCF, Mono and Unity. It can be noticed that it has four monitoring methods, namely Sampling, Tracing, Line by Line and Timeline. According to the description on the interface, Sampling is suitable for accurate measurement of call time in most scenarios, Tracing is suitable for accurate measurement of call times in algorithm complexity analysis scenarios, Line by Line is suitable for higher-level use scenarios, and Timeline is suitable for Accurate measurement of data processing including multithreading. So, we choose an executable file here, then select Sampling, and then click "Run":

Sample the program and generate snapshots

At this point, we will see the toolbar of the corresponding program. We can click "Get Snapshot and Wait" to sample. Each sample will generate a snapshot. By default, the generated snapshot will be automatically opened. We can also click "Start" to re-sampling until a satisfactory sample is collected, and after sampling is completed, we can click "Kill" to end sampling. Let's take a look at the generated snapshot:

dotTrace performance snapshot

Through these two pictures, we can clearly see that the most time-consuming CPUHack()method is our method here , and there are a total of four threads here, because the blogger’s computer uses a 4-core i3 processor , And you can directly see the relevant code snippets in dotTrace . Of course, the premise of all this is that you have not obfuscated the application. In this way, we have completed a simple performance analysis. Similarly, we start dotMemery . At this point, the following results can be obtained:

dotMemery memory analysis

Here, we <YourApp>.runtimeconfig.jsonset the maximum value of the GC heap to 1M through the file, and every time a byte array exceeding 85K is added to the list, the current object will be allocated to the large object heap. From this picture, we can clearly see that the LOH in the blue area in the entire curve accounts for an absolute proportion. In other words, almost all memory is allocated to the large object heap ( LOH ). In addition, some small objects have been promoted from generation 0 to generation 1. In this example, due to insufficient allocatable memory, they were eventually triggered OutOfMemoryException. And this is consistent with the results we saw:

{
  "runtimeOptions": {
    "tfm": "netcoreapp3.1",
    "framework": {
      "name": "Microsoft.NETCore.App",
      "version": "3.1.0"
    },
    "configProperties": {

      "System.GC.HeapHardLimit": 1048576

    }
  }
}

Analyze from Dump file

So far, the use of dotTrace and dotMemery is basically finished! Maybe some friends will have questions at this time, what if the performance problem occurs in the production environment. Yes, here we are debugging local programs, and the production environment has no chance for you to do this. At this point, we can use the memory dump file ( Dump ) file, which is the memory image of the process. The execution state of the program can be saved in the Dump file through the debugger. Just imagine, if the program crashed in the previous second, and At this moment, you get the status information of the program at that time, which is equivalent to getting the "criminal evidence" left on the scene by the "fault". It is very simple to create a Dump file in a Windows system. It can be completed by 任务管理器-> 创建转储文件. Let's continue to use the example mentioned above:

Create dump file

In fact, after getting the Dump file, there are many tools to analyze it, such as common WinDBG, DebugDiag, etc., here we can directly use dotMemery , because it supports the import of Dump files, compared to the previous two in use Be more friendly. At this point, by importing this Dump file, we can get the following results:

Large object heap distribution

First and second generation GC distribution

This is consistent with our previous analysis conclusion, that is, almost all memory is allocated to the large object heap ( LOH ). In addition, for .NET Core, the official
dotnet-dumpand dotnet-gcdumptwo command-line tool that can be installed via the following command:

dotnet tool install -g dotnet-dump
dotnet tool install -g dotnet-gcdump

These two commands can also analyze the memory. For more .NET Core diagnostic tutorials, please refer to: https://docs.microsoft.com/zh-cn/dotnet/core/diagnostics/event-counter-perf , These details are specific to .NET Core, and may not be universal. Interested friends can find out by themselves. Like most JetBrainsapplications, these programs have Visual Studio extensions that can be directly integrated into Visual Studio. This also depends on personal preference and will not be explained in detail.

Summary of this article

Combined with a simple example program, this article briefly describes from JetBrainstwo software dotTrace and dotMemery basic use, and how to diagnose production environment memory through a memory dump file (Dump). In the past experience of program performance optimization, I personally have used the ANTS-Performance-Profiler software, but the experience feels that dotTrace and dotMemery are slightly easier to use, and for more general performance analysis from a code perspective, I recommend A lightweight project MiniProfiler , performance optimization cannot be guessed, but the "control variable method" learned from junior high school may not be a good idea. During the period of brushing LeetCode , one of the biggest insights is that the performance of the program is really optimized bit by bit. Take the simplest sorting example, you really have to submit it many times to gradually Understand why some sorting algorithms are "unstable". Perhaps, now that the hardware level is getting better and better, we don’t have to be like the predecessors, but all of this is fair. How wasteful you are when writing code, and how distressed you are when you play games, here Special recognition should be given to Ubisoft's optimization of Betrayal. Okay, this is the content of this blog, thank you, good night!

Reference link

Guess you like

Origin blog.csdn.net/qinyuanpei/article/details/109450653