The Complete Guide for Debugging a Stop 0x9F

转自:www.sysnative.com

Purpose: A collective summary of on the workings of a Stop 0x9F, and how to debug the two most common Stop 0x9F bugcheck types. I originally wrote this article in a Word document which spans 10 pages, so there may be most grammar mistakes or spelling mistakes.

Note: Since we're dealing with Stop 0x9F's, I switch interchangeably Watchdog Timer and Timer when speaking about the Timer types. Thanks to Jared for providing me with the Kernel Mode Dump files.

How it Works?

The Stop 0x9F common refers to the system being in a inconsistent or unsupported power state. This bugcheck is almost always exclusively related to device drivers not handling Power IRPs correctly. It's important to remember that filter and bus drivers are commonly blamed with these bugchecks, because these drivers tend to be the only drivers which will handle the actual Power IRP.

Stop 0x9Fs are largely caused by a device driver which isn't passing the Power IRP through it's driver stack to appropriate drivers which are able to process the Power IRP. Windows monitors this activity with the use of timer dispatcher objects. If we examine the stack of a Stop 0x9F, we can notice a few interesting functions. 

The most two most important functions are PopCheckIrpWatchdog and PopCheckForIdleness. The Pop prefix indicates we're dealing with the Power Manager, an important subsystem of the Windows kernel. The PopCheckForIdleness function periodically checks for inactivity, and is built upon a DPC object. If examine the lower stack frames then we can see a timer object has expired, the timer object is associated with the previously stated function, so when the timer object expires, the PopCheckForIdleness is called and calls it's own code. The function will run in intervals of 100 seconds, each time it will call another function called PopCheckIrpWatchdog. The Watchdog will check a linked list where the IRPs with the pending status are stored (PopIrpList), and then increment a counter. Once the counter threshold has been breached, the KeBugCheckEx function will be called and a crash with the Stop 0x9F stop code will be produced. The counter threshold is stored within a variable called PopWatchdogMaxTicks. 

0: kd> ? poi(nt!PopWatchdogMaxTicks)
Evaluate expression: 41 = 00000029

We can view all pending timers with either the !timer extension or !ms_timers (requires SwishDbgExt.dll). All timers are created using a structure called _KTIMER or IO_TIMER.

Before Windows 7, the timers were stored inside a doubly linked list which has a different expiration time associated with each list head. There were 512 list heads, and each list head was stored within an array called KiTimerListHead. However, from Windows 7 onwards, the timer lists are now stored within a structure called KTIMER_TABLE.

0: kd> dt nt!_KTIMER_TABLE
   +0x000 TimerExpiry      : [64] Ptr64 _KTIMER
   +0x200 TimerEntries     : [256] _KTIMER_TABLE_ENTRY

The TimerExpiry field corresponds to the list of timers which are set to be wakened at a given time interval. The TimerEntries field indicates the list heads for the given time interval. The KTIMER_TABLE_ENTRY has the following structure:

0: kd> dt nt!_KTIMER_TABLE_ENTRY
   +0x000 Lock             : Uint8B
   +0x008 Entry            : _LIST_ENTRY
   +0x018 Time             : _ULARGE_INTEGER

The Entry field is the position of a timer object within the linked list as discussed above, and the Time entry is the time where the timer object is set to expire.

The _KTIMER data structure has the following entries:

0: kd> dt nt!_KTIMER
   +0x000 Header           : _DISPATCHER_HEADER
   +0x018 DueTime          : _ULARGE_INTEGER
   +0x020 TimerListEntry   : _LIST_ENTRY
   +0x030 Dpc              : Ptr64 _KDPC
   +0x038 Processor        : Uint4B
   +0x03c Period           : Uint4B

The DueTime field is self-explanatory, it's the time which the timer has been set to expire. The Processor field is which processor number the timer object has been associated with, and the Dpc field is which DPC object is being used to schedule when the appropriate driver should be notified of the timer expiration, this for synchronization purposes. However, threads attached to timer objects can be awoken immediately after the timer expiration. The Period field is the same as the DueTime field. It's also important to remember that timer objects can be episodic (repeatedly) or periodic (only once).

There is another timer structure called _IO_TIMER, and is used with a associated device object. These timer objects can be used by driver developers to check specific operations within their device. 

0: kd> dt nt!_DEVICE_OBJECT
   +0x000 Type             : Int2B
   +0x002 Size             : Uint2B
   +0x004 ReferenceCount   : Int4B
   +0x008 DriverObject     : Ptr64 _DRIVER_OBJECT
   +0x010 NextDevice       : Ptr64 _DEVICE_OBJECT
   +0x018 AttachedDevice   : Ptr64 _DEVICE_OBJECT
   +0x020 CurrentIrp       : Ptr64 _IRP
   +0x028 Timer            : Ptr64 _IO_TIMER
   +0x030 Flags            : Uint4B
   +0x034 Characteristics  : Uint4B
   +0x038 Vpb              : Ptr64 _VPB
   +0x040 DeviceExtension  : Ptr64 Void
   +0x048 DeviceType       : Uint4B
   +0x04c StackSize        : Char
   +0x050 Queue            : <unnamed-tag>
   +0x098 AlignmentRequirement : Uint4B
   +0x0a0 DeviceQueue      : _KDEVICE_QUEUE
   +0x0c8 Dpc              : _KDPC
   +0x108 ActiveThreadCount : Uint4B
   +0x110 SecurityDescriptor : Ptr64 Void
   +0x118 DeviceLock       : _KEVENT
   +0x130 SectorSize       : Uint2B
   +0x132 Spare1           : Uint2B
   +0x138 DeviceObjectExtension : Ptr64 _DEVOBJ_EXTENSION
   +0x140 Reserved         : Ptr64 Void

 The Timer field within the DEVICE_OBJECT structure contains a 64-bit pointer to the _IO_TIMER structure, which is defined as follows:

0: kd> dt nt!_IO_TIMER
   +0x000 Type             : Int2B
   +0x002 TimerFlag        : Int2B
   +0x008 TimerList        : _LIST_ENTRY
   +0x018 TimerRoutine     : Ptr64     void 
   +0x020 Context          : Ptr64 Void
   +0x028 DeviceObject     : Ptr64 _DEVICE_OBJECT

The timer object described is used to monitor any important operations, and ensure that these operations are successful, otherwise the operating system is informed to crash and produce a bugcheck. The timer is created with the IoIntializeTimer function, taking parameters which include the address of the specified device object and callback routine which is to be called every second (see TimerRoutine field), once the timer object has been started with the IoStartTimer. If the operation is successful or unsuccessful, then the timer can be stopped with IoStopTimer, and the appropriate code will be executed to handle the exception. The same timer can be restarted (called 'kicking the dog') by calling IoStartTimer again.

To summarise quickly, the timer object is used when the driver notices an operation hasn't been completed for a given timer interval, and this operation is vital for the stability of the device, thereby the Watchdog Timer is called once every second until that Timer has timed out - or the operation completes successfully - leading to an exception and an IRP with status code STATUS_IO_TIMEOUT being used to implement any exception handling for the device object. The IoCompleteRequest function will check the stated status, by looking into the _IO_STATUS_BLOCK structure. 

0: kd> dt nt!_IO_STATUS_BLOCK
   +0x000 Status           : Int4B
   +0x000 Pointer          : Ptr64 Void
   +0x008 Information      : Uint8B

We have now discussed how the Stop 0x9F is created and have briefly examined how timer objects work. We'll now discuss a little about Power IRPs and particularly how they handled within a Stop 0x9F. This last section will wrap up about the internals of a Stop 0x9F. In preceding sections, we will investigate the common causes of a Stop 0x9F and how to debug these crashes accordingly.

A Stop 0x9F is generally caused by a Power IRP being blocked somewhere within the device stack. Power IRPs are sent by the Power Manager or the Power Policy Device Manager. This is achieved by PoRequestPowerIrp. Power IRPs are used to achieve several things, but to say within the topic of discussion, we'll let Power IRPs main purpose being handling power transitions, wherever from sleep or to sleep. We can check power transition states with the !poaction.

Power IRPs are filtered through the device stack until they reach a bus driver, which is typically the only type of driver which can handle such IRPs. We can view a device stack for a specified physical device object by using the !devstack.

0: kd> !devstack fffffa80`05823060
  !DevObj   !DrvObj            !DevExt   ObjectName
  fffffa80058882c0  \Driver\partmgr    fffffa8005888410  
  fffffa8005888790  \Driver\Disk       fffffa80058888e0  DR0
  fffffa80057399b0  \Driver\ACPI       fffffa80039c5d50  
> fffffa8005823060  \Driver\atapi      fffffa80058231b0  IdeDeviceP2T0L0-2
!DevNode fffffa800573d900 :
  DeviceInst is "IDE\DiskST3250824A______________________________3.AAH___\5&3731d328&0&0.0.0"
  ServiceName is "disk"

The Power IRP is passed down the device stack until it reaches the bus driver, which as stated before, tends to be only driver that is able to process the Power IRP, and therefore tends to be the driver which is blamed for the crash. With some IRPs, several different device stacks may be used to process an IRP, and therefore it can become more difficult in tracking where the IRP has become blocked, or if a preceding driver has caused the IRP to become blocked in some other location.

The above device stack belongs to the device node stated below it. Each device node (represents a physical device) will have a device stack associated with it. 

0: kd> !devnode fffffa800573d900
DevNode 0xfffffa800573d900 for PDO 0xfffffa8005823060
  Parent 0xfffffa8005726900   Sibling 0xfffffa800573e900   Child 0000000000
  InstancePath is "IDE\DiskST3250824A______________________________3.AAH___\5&3731d328&0&0.0.0"
  ServiceName is "disk"
  State = DeviceNodeStarted (0x308)
  Previous State = DeviceNodeEnumerateCompletion (0x30d)
  StateHistory[08] = DeviceNodeEnumerateCompletion (0x30d)
  StateHistory[07] = DeviceNodeEnumeratePending (0x30c)
  StateHistory[06] = DeviceNodeStarted (0x308)
  StateHistory[05] = DeviceNodeStartPostWork (0x307)
  StateHistory[04] = DeviceNodeStartCompletion (0x306)
  StateHistory[03] = DeviceNodeResourcesAssigned (0x304)
  StateHistory[02] = DeviceNodeDriversAdded (0x303)
  StateHistory[01] = DeviceNodeInitialized (0x302)
  StateHistory[00] = DeviceNodeUninitialized (0x301)
  StateHistory[19] = Unknown State (0x0)
  StateHistory[18] = Unknown State (0x0)
  StateHistory[17] = Unknown State (0x0)
  StateHistory[16] = Unknown State (0x0)
  StateHistory[15] = Unknown State (0x0)
  StateHistory[14] = Unknown State (0x0)
  StateHistory[13] = Unknown State (0x0)
  StateHistory[12] = Unknown State (0x0)
  StateHistory[11] = Unknown State (0x0)
  StateHistory[10] = Unknown State (0x0)
  StateHistory[09] = Unknown State (0x0)
  Flags (0x2c000130)  DNF_ENUMERATED, DNF_IDS_QUERIED, 
                      DNF_NO_RESOURCE_REQUIRED, DNF_NO_LOWER_DEVICE_FILTERS, 
                      DNF_NO_LOWER_CLASS_FILTERS, DNF_NO_UPPER_DEVICE_FILTERS
  UserFlags (0x00000008)  DNUF_NOT_DISABLEABLE
  DisableableDepends = 1 (including self)

The drivers used by the device objects to process the IRP form another called the driver stack. Every IRP has a Major Function Code and a Minor Function Code, in the case of a Stop 0x9F, the very common IRP which is seen is the [16,2], where 0x16 is the Major Function Code (IRP_MJ_POWER) and 0x2 is the Minor Function Code (IRP_MN_SET_POWER). The 0x16 Major Function Code shows us that the IRP is in fact a Power IRP, with the 0x2 indicating that the specific purpose of the Power IRP was to notify a driver of a power state change in the system.

0x4 – How it Works?

Let's have a quick look at the internal works of a Stop 0x9F in relation to the PnP Manager. I've explained most of this Stop 0x9F in the debugging example, but this will provide a small generic description of how PnPs work.

There are two types of PnP IRPs: State Changing and Non-State Changing. With this particular type of bugcheck, we're most interested in the State Changing PnP IRPs. 

The main difference between State Changing and Non-State Changing IRPs, is how these IRPs are handled by the I/O Manager. State Changing IRPs must be synchronized against each other, this prevents two state changing IRPs from existing in the same stack. The most obvious problem this helps to prevent, this when an removal request and a start request are being processed at the same time. The State Changing IRPs have to be synchronized with System Power IRPs, a power transition must be processed before a PnP IRP. The State Changing PnP IRPs are only reserved for the use of system drivers.

On the other hand, there is a few Power IRPs which the PnPs aren't synchronized with, and these IRPs are the following:
 

  • IRP_MN_SET_POWER
  • IRP_MN_WAIT_WAKE
  • IRP_MN_QUERY_POWER
  • IRP_MN_POWER_SEQUENCE


Causes and Debugging

There is a number of causes for a Stop 0x9F, but the main two causes are a device driver blocking a IRP within the device stack, or a timer object has timed out and failed to notify a thread of a power transition state change and synchronize with the PnP Manager.

A complete list of parameters can be found on the MSDN page and in the BSOD Index.

Stop 0x9F – 0x3:

DRIVER_POWER_STATE_FAILURE (9f)
A driver has failed to complete a power IRP within a specific time (usually 10 minutes).
Arguments:
Arg1: 0000000000000003, A device object has been blocking an Irp for too long a time
Arg2: fffffa8005823060, Physical Device Object of the stack
Arg3: fffff80000b9c518, nt!TRIAGE_9F_POWER on Win7, otherwise the Functional Device Object of the stack
Arg4: fffff9801c458dc0, The blocked IRP

The two most important arguments are the second and the fourth argument. The fourth argument contains the address of the blocked IRP, and the second argument contains the address of the PDO (Physical Device Object) of the stack location where the IRP has become blocked.

Using the !irp extension on the fourth parameter, we can view the current stack where the IRP has become stuck. 

0: kd> !irp fffff9801c458dc0
Irp is active with 5 stacks 1 is current (= 0xfffff9801c458e90)
 No Mdl: No System Buffer: Thread 00000000:  Irp stack trace.  
     cmd  flg cl Device   File     Completion-Context
>[ 16, 2]   0 e1 fffffa8005823060 00000000 fffff88000e0957c-fffff88000e093c4 Success Error Cancel pending
           \Driver\atapi    ACPI!ACPIDeviceIrpDeviceFilterRequest
            Args: 00000000 00000001 00000004 00000000
 [ 16, 2]   0 e1 fffffa80057399b0 00000000 fffff880018f27d0-fffffa8005888cd0 Success Error Cancel pending
           \Driver\ACPI    CLASSPNP!ClasspStartNextPowerIrpCompletion
            Args: 00000000 00000001 00000004 00000000
 [ 16, 2]   0 e1 fffffa8005888790 00000000 fffff88000eaff50-00000000 Success Error Cancel pending
           \Driver\Disk    partmgr!PmPowerCompletion
            Args: 00000000 00000001 00000004 00000000
 [ 16, 2]   0 e1 fffffa80058882c0 00000000 00000000-00000000    pending
           \Driver\partmgr
            Args: 00000000 00000001 00000004 00000000
 [  0, 0]   0  0 00000000 00000000 00000000-fffffa8006c71c50    

            Args: 00000000 00000000 00000000 00000000

The output doesn't give us much indication, in most cases, a potential third party driver can be found, however, we'll need to investigate further using the !stacks extension to display all the kernel stacks in the system. I don't regularly use this extension simply because of the sheer volume of output it produces, there is hundreds of threads to check through.

As a side note, the fffff9801c458e90 address seen with the IRP is the current I/O Stack Location, which is stored with the IO_STACK_LOCATION structure:

0: kd> dt nt!_IO_STACK_LOCATION
   +0x000 MajorFunction    : UChar
   +0x001 MinorFunction    : UChar
   +0x002 Flags            : UChar
   +0x003 Control          : UChar
   +0x008 Parameters       : <unnamed-tag>
   +0x028 DeviceObject     : Ptr64 _DEVICE_OBJECT
   +0x030 FileObject       : Ptr64 _FILE_OBJECT
   +0x038 CompletionRoutine : Ptr64     long 
   +0x040 Context          : Ptr64 Void

There is one IO_STACK_LOCATION for each driver within the driver stack for the associated IRP.

!stacks
Proc.Thread  .Thread  Ticks   ThreadState Blocker
                            [fffff80002c0d180 Idle]
   0.000000  fffff80002c0ccc0 ff1674ec RUNNING    nt!KeBugCheckEx
   0.000000  fffff880009f3fc0 ff187841 RUNNING    amdppm!C1Halt+0x2
   0.000000  fffff88002f6efc0 ff186575 RUNNING    amdppm!C1Halt+0x2
   0.000000  fffff88002fdffc0 ff184b88 RUNNING    nt!KiIdleLoop+0x10d
                            [fffffa8003989b30 System]
*** ERROR: Module load completed but symbols could not be loaded for avgidsfiltera.sys
   4.000018  fffffa80039fbb50 ff1849a6 Blocked    Ntfs!NtfsWaitOnIo+0x28
   4.00001c  fffffa80039fb660 ff1848b8 Blocked    Ntfs!NtfsWaitOnIo+0x28
   4.000020  fffffa80039fb170 ff1845b7 Blocked    Ntfs!NtfsWaitOnIo+0x28
   4.000024  fffffa80039fd040 ff184537 Blocked    Ntfs!NtfsWaitOnIo+0x28
   4.000028  fffffa80039fdb50 ff167729 Blocked    tdtcp!TdiSubmitRequest+0xca
   4.00002c  fffffa80039fd660 ff1681d2 Blocked    +0xfffffa80075dfcda
   4.000038  fffffa80039fe660 ff184b88 Blocked    nt!IopCloseFile+0x260
   4.00003c  fffffa80039ff040 ff184a83 Blocked    nt!CmpDoFileWrite+0x281
   4.000040  fffffa80039ffb50 ff1849b5 STANDBY    nt!KeSetSystemGroupAffinityThread+0x18a
   4.000044  fffffa80039ff660 ff1797f0 Blocked    disk!DiskPerformSmartCommand+0x1f0
   4.000060  fffffa800460eb50 ff167536 Blocked    nt!ViKeTrimWorkerThreadRoutine+0x1e
   4.000064  fffffa800460e660 ff184b89 Blocked    nt!ViPoolDelayFreeTrimThreadRoutine+0x29
   4.000068  fffffa800460f040 ff167ff5 Blocked    nt!ViPoolDelayFreeTrimThreadRoutine+0x29
   4.00006c  fffffa800460fb50 ff168ad9 Blocked    nt!ViPendingWorkerThread+0x20
   4.000070  fffffa800460f660 ff168ad9 Blocked    nt!ViPendingWorkerThread+0x20
   4.000074  fffffa8004610040 ff167ff5 Blocked    nt!ViPendingWorkerThread+0x20
   4.000078  fffffa8004610b50 ff167ff5 Blocked    nt!ViPendingWorkerThread+0x20
   4.00007c  fffffa8004610660 ff167ff5 Blocked    nt!ViPendingWorkerThread+0x20
   4.000080  fffffa8004611040 ff167ff5 Blocked    nt!ViPendingWorkerThread+0x20
   4.000084  fffffa8004611b50 ff167ff5 Blocked    nt!ViPendingWorkerThread+0x20
   4.000088  fffffa8004611660 ff167ff5 Blocked    nt!ViPendingWorkerThread+0x20
   4.000090  fffffa8004612660 ff18486c Blocked    nt!MiModifiedPageWriter+0xcf
   4.000094  fffffa8004613b50 ff183446 Blocked    Ntfs!NtfsWaitOnIo+0x28
   4.0000b0  fffffa8004617610 ff1843b6 Blocked    nt!CcQueueLazyWriteScanThread+0x85
   4.0000bc  fffffa800461e940 ff167ff5 Blocked    nt!AlpcpReceiveMessagePort+0x189
   4.0000c0  fffffa800466f040 ff184bfa Blocked    nt!EtwpLogger+0xf2
   4.0000c4  fffffa800466f790 ff16753a Blocked    nt!EtwpLogger+0xf2
   4.0000c8  fffffa800467c5d0 ff16753a Blocked    nt!EtwpLogger+0xf2
   4.0000cc  fffffa80046bd040 ff16753a Blocked    nt!EtwpLogger+0xf2
   4.0000d4  fffffa80047c9b50 ff1674fa Blocked    nt!EtwpLogger+0xf2
   4.0000d8  fffffa80047ec040 ffffe2ed Blocked    nt!EtwpLogger+0xf2
   4.0000dc  fffffa80047f1b50 ff167d9a Blocked    nt!WdipSemCheckTimeout+0x12c
   4.0000e0  fffffa80039a65c0 fffffb53 Blocked    ACPI!ACPIWorkerThread+0x74
   4.0000e8  fffffa8005636b50 fffff583 Blocked    ndis!ndisThreadPoolTimerHandler+0x10c
   4.0000f0  fffffa8005639740 ff167644 Blocked    ndis!ndisCmWaitThread+0x6e
   4.0000f4  fffffa80057fc110 ffffff8d Blocked    ACPI!PciRootBusBiosMethodDispatcherOnResume+0x51
   4.000104  fffffa8005766b50 ffcd71dd Blocked    volsnap!VspWorkerThread+0x80
   4.000108  fffffa8005769b50 ff197630 Blocked    volsnap!VspWorkerThread+0x80
   4.00010c  fffffa800576ab50 ff1880a0 Blocked    volsnap!VspWorkerThread+0x80
   4.000110  fffffa800576bb50 fffffdf4 Blocked    volsnap!VspWorkerThread+0x80
   4.000114  fffffa800576cb50 fffffea1 Blocked    volsnap!VspWorkerThread+0x80
   4.000118  fffffa800576db50 fffffea1 Blocked    volsnap!VspWorkerThread+0x80
   4.00011c  fffffa800576eb50 fffffea1 Blocked    volsnap!VspWorkerThread+0x80
   4.000120  fffffa800576fb50 fffffea1 Blocked    volsnap!VspWorkerThread+0x80
   4.000124  fffffa8005770b50 fffffea1 Blocked    volsnap!VspWorkerThread+0x80
   4.00012c  fffffa8005c0a930 fffffde0 Blocked    watchdog!SMgrGdiCalloutThread+0x4f
*** ERROR: Module load completed but symbols could not be loaded for avgtdia.sys
   4.000130  fffffa8005c3d040 fffffddc Blocked    nt!VerifierKeWaitForMultipleObjects+0x1dc
   4.000134  fffffa8005c2cb50 ff16759e Blocked    avgtdia+0x31676

[...]

Threads Processed: 939

This is a small subset of the stacks extension, the first column contains the Process ID and the Thread ID, in this case it is a System process. The second column contains the address of the _ETHREAD block which we can pass to the !thread extension. The threads which are of a interest to us are the ones which have the Blocked status.

I haven't been able to find a direct cause, but based on the device stack for the blocked IRP, I'm assuming that AVG may a culprit driver.

0: kd> lmvm avgrkx64

start             end                 module name
fffff880`01911000 fffff880`0191d000   avgrkx64   (deferred)             
    Image path: \SystemRoot\system32\DRIVERS\avgrkx64.sys
    Image name: avgrkx64.sys
    Timestamp:        Tue Jan 31 03:11:41 2012 (4F275BED)
    CheckSum:         0000D438
    ImageSize:        0000C000
    Translations:     0000.04b0 0000.04e4 0409.04b0 0409.04e4

Stop 0x9F – 0x4:

This bugcheck indicates that we have failed to synchronize our power transition with the PnP Manager. This is common for the removal PnP devices, since after the removal of the device we need to update the device tree to show that the device object has been removed from the system.

DRIVER_POWER_STATE_FAILURE (9f)
A driver has failed to complete a power IRP within a specific time (usually 10 minutes).
Arguments:
Arg1: 0000000000000004, The power transition timed out waiting to synchronize with the Pnp
    subsystem.
Arg2: 0000000000000258, Timeout in seconds.
Arg3: fffffa8007005660, The thread currently holding on to the Pnp lock.
Arg4: fffff800053e83d0, nt!TRIAGE_9F_PNP on Win7

The timeout is specified in the second argument, and the thread which is holding onto the lock required to access the PnP is shown in the third argument. We'll begin by using the !thread extension with the thread address, and then examining the stack.

0: kd> !thread fffffa8007005660
THREAD fffffa8007005660  Cid 0004.0048  Teb: 0000000000000000 Win32Thread: 0000000000000000 WAIT: (Executive) KernelMode Non-Alertable
    fffffa800d035ee8  NotificationEvent
IRP List:
    fffffa8008f5cc10: (0006,03e8) Flags: 00000000  Mdl: 00000000
Not impersonating
DeviceMap                 fffff8a000008c10
Owning Process            fffffa8006f8d890       Image:         System
Attached Process          N/A            Image:         N/A
Wait Start TickCount      396427         Ticks: 38463 (0:00:10:00.026)
Context Switch Count      44059          IdealProcessor: 0  NoStackSwap
UserTime                  00:00:00.000
KernelTime                00:00:00.343
Win32 Start Address nt!ExpWorkerThread (0xfffff80003298150)
Stack Init fffff88003bd2c70 Current fffff88003bd2280
Base fffff88003bd3000 Limit fffff88003bcd000 Call 0
Priority 15 BasePriority 12 UnusualBoost 0 ForegroundBoost 0 IoPriority 2 PagePriority 5
Child-SP          RetAddr           : Args to Child                                                           : Call Site
fffff880`03bd22c0 fffff800`032845f2 : fffffa80`07005660 fffffa80`07005660 00000000`00000000 00000000`00000000 : nt!KiSwapContext+0x7a
fffff880`03bd2400 fffff800`0329599f : fffffa80`0d0df208 fffff880`0ae9e10b fffffa80`00000000 00000000`00000000 : nt!KiCommitThreadWait+0x1d2
fffff880`03bd2490 fffff880`0ae915dd : fffffa80`0d035000 00000000`00000000 fffffa80`0dd8ca00 00000000`00000000 : nt!KeWaitForSingleObject+0x19f
fffff880`03bd2530 fffff880`0ae92627 : fffffa80`0d035000 00000000`00000000 fffffa80`0c0891a0 fffff880`03bd2670 : ZTEusbnet+0x35dd
fffff880`03bd2580 fffff880`0215d809 : fffffa80`0c0891a0 fffff880`020f0ecd fffff880`03bd2670 fffffa80`091c5550 : ZTEusbnet+0x4627
fffff880`03bd25b0 fffff880`0215d7d0 : fffffa80`091c54a0 fffffa80`0c0891a0 fffff880`03bd2670 fffffa80`08fc2ac0 : ndis!NdisFDevicePnPEventNotify+0x89
fffff880`03bd25e0 fffff880`0215d7d0 : fffffa80`08fc2a10 fffffa80`0c0891a0 fffffa80`091f9010 fffffa80`091f90c0 : ndis!NdisFDevicePnPEventNotify+0x50
fffff880`03bd2610 fffff880`0219070c : fffffa80`0c0891a0 00000000`00000000 00000000`00000000 fffffa80`0c0891a0 : ndis!NdisFDevicePnPEventNotify+0x50
fffff880`03bd2640 fffff880`021a1da2 : 00000000`00000000 fffffa80`08f5cc10 00000000`00000000 fffffa80`0c0891a0 : ndis! ?? ::LNCPHCLB::`string'+0xddf
fffff880`03bd26f0 fffff800`034fb121 : fffffa80`091c7060 fffffa80`0c089050 fffff880`03bd2848 fffffa80`070bfa00 : ndis!ndisPnPDispatch+0x843
fffff880`03bd2790 fffff800`0367b3a1 : fffffa80`070bfa00 00000000`00000000 fffffa80`0dc19990 fffff880`03bd2828 : nt!IopSynchronousCall+0xe1
fffff880`03bd2800 fffff800`03675d78 : fffffa80`09196e00 fffffa80`070bfa00 00000000`0000030a 00000000`00000308 : nt!IopRemoveDevice+0x101
fffff880`03bd28c0 fffff800`0367aee7 : fffffa80`0dc19990 00000000`00000000 00000000`00000003 00000000`00000136 : nt!PnpSurpriseRemoveLockedDeviceNode+0x128
fffff880`03bd2900 fffff800`0367b000 : 00000000`00000000 fffff8a0`11d1c000 fffff8a0`049330d0 fffff880`03bd2a58 : nt!PnpDeleteLockedDeviceNode+0x37
fffff880`03bd2930 fffff800`0370b97f : 00000000`00000002 00000000`00000000 fffffa80`09122010 00000000`00000000 : nt!PnpDeleteLockedDeviceNodes+0xa0
fffff880`03bd29a0 fffff800`0370c53c : fffff880`03bd2b78 fffffa80`114ab700 fffffa80`07005600 fffffa80`00000000 : nt!PnpProcessQueryRemoveAndEject+0x6cf
fffff880`03bd2ae0 fffff800`035f573e : 00000000`00000000 fffffa80`114ab7d0 fffff8a0`123a25b0 00000000`00000000 : nt!PnpProcessTargetDeviceEvent+0x4c
fffff880`03bd2b10 fffff800`03298261 : fffff800`034f9f88 fffff8a0`11d1c010 fffff800`034342d8 fffff800`034342d8 : nt! ?? ::NNGAKEGL::`string'+0x54d9b
fffff880`03bd2b70 fffff800`0352b2ea : 00000000`00000000 fffffa80`07005660 00000000`00000080 fffffa80`06f8d890 : nt!ExpWorkerThread+0x111
fffff880`03bd2c00 fffff800`0327f8e6 : fffff880`03965180 fffffa80`07005660 fffff880`0396ffc0 00000000`00000000 : nt!PspSystemThreadStartup+0x5a
fffff880`03bd2c40 00000000`00000000 : fffff880`03bd3000 fffff880`03bcd000 fffff880`03bd2410 00000000`00000000 : nt!KxStartSystemThread+0x16

As we can see in the call stack for the thread, there is two device removal functions, which in turn supports our argument from earlier on. More importantly, if we check the pending IRP for the current thread, we'll find some more evidence relating to PnP and the third party driver shown in our call stack.

0: kd> !irp fffffa8008f5cc10
Irp is active with 10 stacks 10 is current (= 0xfffffa8008f5cf68)
 No Mdl: No System Buffer: Thread fffffa8007005660:  Irp stack trace.  
     cmd  flg cl Device   File     Completion-Context
 [  0, 0]   0  0 00000000 00000000 00000000-00000000    

            Args: 00000000 00000000 00000000 00000000
 [  0, 0]   0  0 00000000 00000000 00000000-00000000    

            Args: 00000000 00000000 00000000 00000000
 [  0, 0]   0  0 00000000 00000000 00000000-00000000    

            Args: 00000000 00000000 00000000 00000000
 [  0, 0]   0  0 00000000 00000000 00000000-00000000    

            Args: 00000000 00000000 00000000 00000000
 [  0, 0]   0  0 00000000 00000000 00000000-00000000    

            Args: 00000000 00000000 00000000 00000000
 [  0, 0]   0  0 00000000 00000000 00000000-00000000    

            Args: 00000000 00000000 00000000 00000000
 [  0, 0]   0  0 00000000 00000000 00000000-00000000    

            Args: 00000000 00000000 00000000 00000000
 [  0, 0]   0  0 00000000 00000000 00000000-00000000    

            Args: 00000000 00000000 00000000 00000000
 [  0, 0]   0  0 00000000 00000000 00000000-00000000    

            Args: 00000000 00000000 00000000 00000000
>[ 1b,17]   0  0 fffffa800c089050 00000000 00000000-00000000    
           \Driver\ZTEusbnet
            Args: 00000000 00000000 00000000 00000000

The Major Function code corresponds to IRP_MJ_PNP, and the Minor Function Code corresponds to IRP_MN_SURPRISE_REMOVAL. These IRP function codes can be found within the WinDbg documentation, and an explanation can be found on the appropriate MSDN pages. I will briefly explain these function codes, and how they link in with the nature of the bugcheck. PnP IRPs are handled by DispatchPnP driver dispatch rountine.

The Minor Function Code is used to notify important Kernel components and User Mode applications that the device is being removed from the system, and no other IRPs will be able to be sent to that device. The IRP is exclusively reserved for the system, and thus no other drivers should be sending the IRP. The IRP is processed from the top layer of the device stack down to the bottom layer of the device stack.

By checking the IRP, the call stack makes much more sense to us, at the moment the thread is currently waiting on a lock. We need to investigate what this lock is, and why these locks are required. Using the !locks extension, we can list all the currently acquired locks on executive resources (_ERESOURCE).

0: kd> !locks
**** DUMP OF ALL RESOURCE OBJECTS ****
KD: Scanning for held locks..

Resource @ nt!IopDeviceTreeLock (0xfffff80003492ce0)    Shared 1 owning threads
    Contention Count = 1
     Threads: fffffa8007005660-01<*> 
KD: Scanning for held locks.

Resource @ nt!PiEngineLock (0xfffff80003492be0)    Exclusively owned
    Contention Count = 21
    NumberOfExclusiveWaiters = 1
     Threads: fffffa8007005660-01<*> 
     Threads Waiting On Exclusive Access:
              fffffa800f308b50       

KD: Scanning for held locks.....

The current thread holding both locks is our PnP thread. The most interesting lock is the IopDeviceTreeLock. The other thread waiting on the PiEngineLock is a kernel related thread which is handling other device removal operations and power transitions.

Instead of going in depth about the various locks and synchronization mechanisms used in Windows, it would be better to explain why the thread is using a lock to begin with. The answer with our PnP device.

The main reason of acquiring the lock is to ensure that the driver will not use DispatchPnP to handle a IRP_MN_REMOVE_DEVICE IRP while another driver is performing an I/O operation at the same time. Furthermore, by using these lock routines, the driver can determine when all I/O operations have been completed for that device, and therefore when it is safe to remove that device.

The IO_REMOVE_LOCK illustrates a counter which is used to check when all the pending I/O operations have been completed for that device. Once the I/O operations have been completed, then the device can be safely removed from the system. A safe removal of the device is indicated with the IoRemoveLockAndWait. The IO_REMOVE_LOCK is destroyed when the device is removed from the system.

In conclusion, the ZTEusbnet.sys driver seems like the most likely culprit, it is a USB NDIS Miniport driver developed by a company called ZTE.

0: kd> lmvm ZTEusbnet

start             end                 module name
fffff880`0ae8e000 fffff880`0aebc000   ZTEusbnet   (no symbols)           
    Loaded symbol image file: ZTEusbnet.sys
    Image path: \SystemRoot\system32\DRIVERS\ZTEusbnet.sys
    Image name: ZTEusbnet.sys
    Timestamp:        Mon Oct 13 06:50:10 2008 (48F2E192)
    CheckSum:         000329ED
    ImageSize:        0002E000
    Translations:     0000.04b0 0000.04e4 0409.04b0 0409.04e4

Commands and Extensions

This a quick overview of the extensions and commands which can be used for debugging Stop 0x9F. Some of the extensions are added for informational purposes i.e. you wish to learn a little about the nature of the device etc.

!irp, !irpfind, !devext, !devobj, !drvobj, !popolicy, !timer, !ms_timer, !thread, !stacks, !devnode,
!devstack, !podev, !poaction, !popolicy, !poreqlist, !locks and !pnptriage.

Data Structures

*Discussed in article
_IO_TIMER*, _DEVICE_OBJECT*, _DRIVER_OBJECT, _KTIMER*, _KTIMER_TABLE_ENTRY*, _KTIMER_TABLE*, _SYSTEM_POWER_STATE, _DEVICE_POWER_STATE, _SYSTEM_POWER_POLICY, _IO_STATUS_BLOCK*, _TRIAGE_9F_POWER

References:
Managing PnP State Transistions
Using Remove Locks
Debugging Stop 0x9F - CodeMachine
Which PnP IRPs are State Changing? - Hole In My Head
A Performance Issue in Windows Timer Management?
Ain't Nuthin But A K(Timer) Thing, Baby
Power IRPs for Individual Devices
I/O Stack Locations

==================================================

The !pnptriage is an useful extension along with its data structure which is nt!_TRIAGE_9F_POWER.

0: kd> !pnptriage fffff80309c66960

********************************************************************************
Dumping PnP DeviceAction Queue @ 0xfffff8030793a3a0
********************************************************************************


Dumped 0 entries
PnP DeviceActionWorker active!


********************************************************************************
Dumping PnP DeviceEvent Queue @ 0xffffd7889b1ffd70
********************************************************************************

List = 0xffffd7889b1ffde8, 0xffffd7889b1ffde8
Event list is empty

********************************************************************************
Dumping PnP DeviceCompletion Queue @ 0xfffff8030793b740
********************************************************************************

0 Pnp operation(s) dispatched (IRP pending) currently.

Dumping pending asynchronous request list...

Dumping completed asynchronous request list...

********************************************************************************
Dumping devnodes with problems...
********************************************************************************

Dumping IopRootDeviceNode (= 0xffffd7889b1fdd30)

********************************************************************************
Dumping PnP locks...
********************************************************************************


Resource @ nt!PiEngineLock (0xfffff8030793b8e0)    Exclusively owned
    Contention Count = 109
     Threads: ffffd788a850e040-01<*> 
1 total locks, 1 locks currently held

Resource @ nt!IopDeviceTreeLock (0xfffff8030793b7e0)    Shared 1 owning threads
    Contention Count = 2
     Threads: ffffd788a850e040-01<*> 
1 total locks, 1 locks currently held

Resource @ nt!PnpRegistryDeviceResource (0xfffff8030793ba00)    Available
    Contention Count = 390
1 total locks

********************************************************************************
If one or more of above are NOT available, do !thread on the owner thread to find the thread hung in PnP
********************************************************************************


********************************************************************************
Dumping currently active PnP thread (if any)...
********************************************************************************

Please note that I have used a different dump file, hence the different address being provided with the extension.

0: kd> dt nt!_TRIAGE_9F_POWER fffff80309c66960
   +0x000 Signature        : 0x8000
   +0x002 Revision         : 1
   +0x008 IrpList          : 0xfffff803`0791fa30 _LIST_ENTRY [ 0xffffd788`9b080a70 - 0xffffd788`a7598920 ]
   +0x010 ThreadList       : 0xfffff803`0791e670 _LIST_ENTRY [ 0xffffaa81`7caa3bb8 - 0xffffaa81`866d0bb8 ]
   +0x018 DelayedWorkQueue : 0xfffff803`079db280 _TRIAGE_EX_WORK_QUEUE

猜你喜欢

转载自blog.csdn.net/lixiangminghate/article/details/86238266