The primary problem solving complex application issues to consider

1, the reaction of events: "The engineer brought to the scene, because it makes it easier to isolate the problem."

This is what I heard most common misconception. Let me explain: Most complex problems require in-depth debugging session.
Collect the necessary information is easy, it can be done by remote or customer. However, debug dump files may take several hours or days. Because we may not have access to our private symbols, can not access collaborate with colleagues with specific technical knowledge, and therefore work in the field will actually slow down the process.
In many cases, an important value of the work at the site is to act as the eyes and ears of the remote engineer, or a better understanding of the complex issues we can not well understood by e-mail or telephone.

2. "We need to check the code, because our application performance problems."

Sometimes I get a code review request, but in fact, the customer needs is a matter of isolation. So what is the difference, how do I know what I need?
Object code inspection is to check the source code and the code does not follow the pointed part of best practices, or a partial security breach, or can also be optimized for speed part.
The goal is to isolate the problem led to a particular application problem isolation symptoms. For example, crashes, hangs, memory leaks and performance bottlenecks.
Let me explain: Imagine a poor ASP.NET application performance scene. If my application code review, I would probably find a way to optimize speed. However, if the application performance is very slow, because there is a bottleneck on the database side or network, obtained by the code to check the performance gain is not the answer. In the worst case, it might even obvious.
If you want to make sure that the application is not a potential problem can be avoided by implementing best practices, or if you think we can further optimize applications to achieve higher speeds, the code checks very well. However, there is only a baseline at the time of the application does not have problems, in order to measure the speed gain. It is important to improve the performance of generally inferior to eliminate bottlenecks so important.

3. "Therefore, after the repair of this issue, the performance / memory issues will be normalized, right?"

The fact is, there may be different problems cause the same symptoms, such as poor performance, suspend or memory problems.
What does it mean? This means that after solving the most important and most obvious problem, we need to monitor the application because other minor problems can cause the same symptoms. In addition, after the repair of the main bottlenecks, these other minor problems should become visible and easy to isolate. Identify and resolve application problems is an iterative process.

4, "we are using .NET, so I do not need to worry about memory management."

If you have a pure .NET application, I tend to agree. However, most commercial applications have some kind of interaction with the native world, such as C Dlls, COM object or API calls.

Pure .NET CLR is ideal for managing applications in memory. If the application is interacting with native code, the developer has the responsibility to ensure that the resources freed on / off.

5, I need to gather what information? How much information do I need to collect?

Between insufficient information and too much information there is a thin line. For us the most important thing is to get the right information. When confronted with this symptom, collect a dump file from the application in question is very valuable. Five dump file from the application in question is running normally collected probably not much help.
If an application crashes, you want to collect dump files when the application crashes. If you collect a dump file at any other time, it is not unusual information from. If you force a huge collection of core dump file, you end up with a huge dump files from all computers in the process, but again, the dump file will not contain information about the abnormality cause the application to crash.

6, "We need to check the architecture because of performance issues our application."

This is similar to item 2 above. The best way to solve the current problem is not the architecture review.
In addition, the architecture review may not even be the right way to solve the problem for most applications, because these applications are often too fine-grained problems. This means that the client application from an architectural perspective correct design, but nothing to do with the way the architecture design problem.
Let me give some examples. Suppose you have not installed the update important impact on the application for the .NET framework. Or if your application does not release internal SharePoint SharePoint object that it is being used. In these examples, the architecture review will not reveal these problems.

7, sometimes, to find the right starting point is the most difficult.

Imagine this scenario:
. "We need a IIS engineer because my W3WP.EXE taking up too much memory IIS this may be a mistake," users, administrators and developers will experience how this problem?

  • End User: I think the issue with your browser, the application is very slow.
  • IIS Admin: I think the problem lies in the ASP.NET application.
  • Developer: ASP.NET applications that run well; the problem may be in the databases.
  • DBA: SQL Server running well; I think the bottleneck of network-related.
  • Network administrators: no network problems.

We, as developers PFEs goal is to help our customers across different technologies isolate the problem, and provide cross-group collaboration between different teams on-site or remotely.

8. What skills do I need to help me debug applications?

If you need to debug the application, you do not need to know how to manage the product or installation engineers. What you need is an understanding of internal application and how to debug them engineers. The good news is that this knowledge does not depend on the application.
Even Microsoft engineers had never seen before in your application, he / she can also debug your application. The same applies to our own products.
If at some point we will isolate the problem to one of our products, then we need to engineer a product team, because he / she has a deep understanding of support for the product in question and defects.

9, "I performed! Clrstack run for a long time and most of the threads are trying to retrieve data from the database. Bottleneck may in databases."

Let me tell you one thing: I once our new engineers or those who want to learn more about .NET debugger people say, if you want to excel in the .NET debugger, you must learn the native code debugging, which also It implies some knowledge of C / C ++ programming.
do not trust me? If you're like most bloggers know .NET debugger, then ask their situation .NET debugging.
So to say! clrstack learning .NET debugger people's favorite command. Cool; you can see the managed end the call stack, which is usually higher than the level the machine side. Sometimes, however, you still need to see the native side can truly understand what threads are doing, if only concerned with hosting side, you might draw the wrong conclusions.
The bottom line is: if you want to improve .NET debugging skills, learn more about native debugging.

10, "my two servers are the same, but the problem only occurs on the server XYZ."

When this scene troubleshooting, never assume that the server is the same. In contrast, the collection of data to prove it.
A good start is to run MPSReport / SPSReport tool. This tool collects all the information from each server and compared. At least in the case of a server exactly the same fundamental problem is that the application is accessing one of the servers, so it is overloaded.

11, "from the event log, I can see cause the application to crash and call stack point to the Windows exception. I think this is a Windows error."

This is related to a common misconception and item 7 above. Sometimes a second chance from the call stack
abnormalities (application unhandled exception, causing the application to crash) will dll from Windows as the top of the frame. This is normal and does not mean that Windows will crash.

example:

 ChildEBP RetAddr
0013bcd0 7c90de7a ntdll!KiFastSystemCall+0x2
0013bdd0 7c81cdfe kernel32!_ExitProcess+0x62
0013bde4 79f944b0 kernel32!ExitProcess+0x14
0013c00c 79f2c09a mscorwks!SafeExitProcess+0x11b
0013c018 79eff585 mscorwks!DisableRuntime+0xd1
0013c0a8 79011628 mscorwks!CorExitProcess+0x242
0013c0b8 77c39d3c mscoree!CorExitProcess+0x46
0013c0c4 77c39e78 msvcrt!__crtExitProcess+0x29
0013c0d4 77c39e90 msvcrt!_cinit+0xee
0013c0e8 0e68d21e msvcrt!exit+0x12
0013c580 0e256834 testappl!FuTestInterface::init+0x34 <<< This is where you should start the investigation.
0013c5a4 0e1d8c01 testapp!WBNARiskReportInterface::getResults+0x442a

Therefore, do not think that is ntdll or kernel32 cause this problem. The operating system dll api is due to abnormalities in the application and may result in the call. Try the latest application method calls identified as the initial investigation points. In the example above, this is testappl! :: FuTestInterface the init . Analyze it, if necessary, a pre-analysis and the like.

12, "we have collected from a C ++ application crash dump file. We think this is a heap corruption, so the call stack should indicate the culprit, right?"

Heap corruption not as often before, because .NET applications are becoming more common. However, in the era of COM objects and C DLL, the heap corruption is a typical problem.
Methods actual damage from the stack in order to get the call stack to enable page heap, restart the application so that it can use the new heap manager to set and collect the dump file. Using this method, you can easily isolate heap corruption issues.
You can use different tools (such as Page Heap.exe, GFlags.exe, Application) to enable page heap verifier and others. Some page heap settings (such as full page heap) after each memory allocation to create a read-only page, so whenever an application tries to cover the buffer, it will hit the read-only page, resulting in an access violation.

Guess you like

Origin www.cnblogs.com/yilang/p/12170689.html