Debugging Bug Check 0x9F: DRIVER_POWER_STATE_FAILURE

In Windows Vista and later version of Windows, when the system changes power states i.e. transitions from a working state (S0) to a lower power state like standby (S3), hibernate (S4) or shutdown (S5) or vice versa, every driver in the system is notified about the transition via a system power IRP (S-IRP). In response to these system power state change requests, drivers power down their devices by requesting device power IRPs (D-IRP) and then sending them down to the underlying bus driver. For every power IRP that is sent to a driver, the power manager starts a watchdog timer that fires if the IRP is not completed within 10 minutes. The resultant timer DPC routine that runs at the end of this period crashes the system with the following bug-check code and parameters as displayed by the kernel debugger command "!analyze -v".

DRIVER_POWER_STATE_FAILURE (9f)
A driver is causing an inconsistent power state.
Arguments:
Arg1: 00000003, A device object has been blocking an Irp for too long a time
Arg2: 86b3e6e8, Physical Device Object of the stack
Arg3: 82d3fae0, Functional Device Object of the stack
Arg4: 852bc6f0, The blocked IRP

In the above output the description displayed for Arg1 alludes to the situation that led to the crash. The kernel debugger command "!poaction" displays information about the power state transition in progress. The output shown below indicates that system is going to sleep i.e. S3 power state.

0: kd> !poaction
PopAction: 82d50a60
  State..........: 3 - Set System State
  Updates........: 0 
  Action.........: Sleep
  Lightest State.: Sleeping3
  Flags..........: 80000004 OverrideApps|Critical
  Irp minor......: SetPower
  System State...: Sleeping3
  Hiber Context..: 00000000

Power Watchdog Timer

Although the crashing thread stack below shows the sequence of calls that led to the call to KeBugCheckEx(), it fails to identify the component that is responsible for the crash. Finding that culprit will require some more work beyond just running "!analyze -v"

nt!KeBugCheckEx
nt!PopCheckIrpWatchdog
nt!PopCheckForIdleness
nt!KiProcessTimerDpcTable
nt!KiProcessExpiredTimerList
nt!KiTimerExpiration
nt!KiRetireDpcList
nt!KiIdleLoop

The above sequence of calls is trigged by timer expiration as identified by the presence of KiProcessExpiredTimerList(). The kernel debugger command "!timer" lists all the pending timers in the system, one of which is the timer that has the DPC routine PopCheckForIdleness() associated with it, as shown below :

kd> !timer
.
.
.
83154a40 P 4f9aa283 00000014 [ 2/18/2011 13:47:08.239]  nt!PopCheckForIdleness (DPC @ 83154a20)

The DPC routine PopCheckForIdleness() fires periodically at an interval of 100 sec. It calls PopCheckIrpWatchdog() to scan the list of IRPs at nt!PopIrpList and increments a watchdog counter. Every time the DPC runs, it displays a debug message in the kernel debugger similar to the one below to warn about the impending bug-check:

Power Irp Watchdog: warning for PDO=86b3e6e8 Current=82d3fae0 IRP=852bc6f0 (2) status c00000bb
Power system watchdog warning (2 ticks)

When the count of timer expirations exceed the threshold stored in PopWatchdogMaxTicks (shown below), PopCheckIrpWatchdog() calls KeBugCheckEx(). So effectively, if a power IRP is not completed within 10 minutes from the time the IRP was sent to a driver, the system will bug-check with DRIVER_POWER_STATE_FAILURE.

0: kd> ? poi(nt!PopWatchdogMaxTicks)
Evaluate expression: 41 = 00000029

The kernel debugger output below displays details about the stuck power IRP that is responsible for the crash. Note that the pointer to this IRP is in Arg4 of the bug-check parameters.

0: kd> !irp 852bc6f0
Irp is active with 4 stacks 3 is current (= 0x852bc7a8)
 No Mdl: No System Buffer: Thread 00000000:  Irp stack trace.  
     cmd  flg cl Device   File     Completion-Context
 [  0, 0]   0  0 00000000 00000000 00000000-00000000    

			Args: 00000000 00000000 00000000 00000000
 [  0, 0]   0  0 00000000 00000000 00000000-00000000    

			Args: 00000000 00000000 00000000 00000000
>[ 16, 2]   0 e1 86359028 00000000 82f39f58-866040b0 Success Error Cancel pending
	       \Driver\vwifimp	nt!PopSystemIrpCompletion
			Args: 00014400 00000000 00000004 00000002
 [  0, 0]   0  0 00000000 00000000 00000000-866040b0    

			Args: 00000000 00000000 00000000 00000000

Component Identification

The device objects identified by the bug-check parameters Arg2 (Physical Device Object of the stack) and Arg3 (Functional Device Object of the stack), typically point to the lowest level device objects in the device stack. These device objects are owned by the bus and function drivers, respectively, in the device stack. This can be seen in the output of the kernel debugger command "!devstack" executed on the device object in Arg2 of the bug-check parameters.

0: kd> !devstack 86b3e6e8
  !DevObj   !DrvObj            !DevExt   ObjectName
  86359028  \Driver\vwifimp    863590e0  NDMP19
> 86b3e6e8  \Driver\vwifibus   86b3eff0  00000088
!DevNode 863996a0 :
  DeviceInst is "{5d624f94-8850-40c3-a3fa-a4fd2080baf3}\vwifimp\5&e3fd124&0&01"
  ServiceName is "vwifimp"

Although the bug-check parameters seem to imply that the bus and the function drivers are the culprits, more often than not their involvement in the crash is limited to the fact that they are the only drivers in the stack that actually process power IRPs and hence get blamed for any power IRP related crashes.

As an example, consider the networking sub-system in Windows - the components that drive the Network Interface Card (NIC) is the NDIS Miniport driver and the NDIS library (NDIS.sys). Since these two drivers are the power policy owners of the network stack, they are responsible for handing all of the S-IRPs and D-IRPs dispatched by the Power Manager. However, the drivers that are actually in the network data path i.e. the ones that process the networks data packets, comprise of NDIS Lightweight Filter Drivers, NDIS Intermediate Drivers, NDIS Protocol Drivers, TDI Filter Drivers, TDI Clients, WFP Filters, WSK Clients as well as the kernel mode socket driver (AFD.sys).

If any of these drivers block a network data packet for any reason, the adapter unbinding process that occurs when the system transitions from S0 to SX power state will not be able to complete. This in turn would result in power IRPs getting blocked at the miniport/NDIS level.

A key step in tracking down the root cause of power watchdog related bug-checks is to identify threads that are blocked on synchronization objects (i.e. events, semaphores, mutexes, push locks etc.) and would be signaled either when an outstanding operation completes or when the reference count on some data structure drops down to zero. These threads could be blocked in the hardware driver (function driver) or in higher level drivers that send requests to drivers below.

In order to identify threads that meet the above criteria, it is necessary to manually examine the kernel mode call stacks of all threads in the system, especially the threads in the system process that dispatch power requests. The kernel debugger commands "!stacks 2" and "!process 0 7" display the information required to identify such stacks.

The following example shows a thread stuck in the network stack. In this specific case, the NDIS function ndisUnbindProtocol() is requesting the NDIS intermediate driver (imdriver.sys) to tear down its binding to the underlying adapter. The NDIS intermediate driver's protocol unbind adapter entry point (i.e. ProtocolUnbindAdapter()) seems to be blocked on an event.

0: kd> !stacks 2
Proc.Thread  .Thread  Ticks   ThreadState Blocker
                            [84c3aab8 System]
.
.
.
   4.000014  84ca7850 0009811 Blocked   nt!KiSwapContext+0x26
                                        nt!KiSwapThread+0x266
                                        nt!KiCommitThreadWait+0x1df
                                        nt!KeWaitForSingleObject+0x393
                                        ndis!NdisWaitEvent+0x55
                                        imdriver+0x98ddc
                                        ndis!ndisUnbindPrototol+0x251
                                        ndis!ndisPnPNotifyAllTransports+0x1f6
                                        ndis!ndisSetPower+0x9aa
                                        ndis!ndisPowerDispatch+0x1ce
                                        nt!PopIrpWorker+0x351
                                        nt!PspSystemThreadStartup+0x9e
                                        nt!KiThreadStartup+0x19

To ascertain that the right thread has been identified, the thread's wait time can be observed, keeping in mind that the power IRP watchdog would allow the IRP about 10 minutes to complete from the time of dispatch, so such threads should be blocked for at least 10 minutes.

0: kd> !thread 84ca7850
THREAD 84ca7850  Cid 0004.0014  Teb: 00000000 Win32Thread: 00000000 WAIT: (Executive) KernelMode Non-Alertable
    863ad89c  NotificationEvent
Not impersonating
DeviceMap                 89a08ae0
Owning Process            84c3aab8       Image:         System
Attached Process          N/A            Image:         N/A
Wait Start TickCount      71276          Ticks: 38929 (0:00:10:07.296)
Context Switch Count      148            IdealProcessor: 0             
UserTime                  00:00:00.000
KernelTime                00:00:00.000
Win32 Start Address nt!PopIrpWorker (0x82c244f2)
Stack Init 807c1fd0 Current 807c19c8 Base 807c2000 Limit 807bf000 Call 0
Priority 13 BasePriority 13 UnusualBoost 0 ForegroundBoost 0 IoPriority 2 PagePriority 5

The above thread prevents NDIS from tearing down binding for one of the NICs in the system causing it to block the power IRP completion. This, in turn, causes the power watch dog timer to bug-check the system. In this particular case, the NIC in question is a Virtual WiFi adapter driven by vwifimp.sys.

Further investigation is required to understand the set of conditions under which imdriver.sys is blocking the unbind operation. This typically requires access to the private symbols and source code of the driver in question as well as a good understanding of the driver's architecture.

Sample Stack Traces

The stack depicted above is, by no means, the only stack signature that could lead to blocked power IRPs. There could be a myriad of such conditions involving pretty much any driver in the system. This section lists similar thread stacks from systems that have bug-checked due to power IRP watchdog timeouts. Although most of the examples shown below are from the networking subsystem on Windows, similar issues can occur in any subsystem in the kernel.

NDIS miniport driver (driver.sys) blocking on a synchronization object in the MiniportHaltAdapter() callback.

nt!KiSwapContext+0x26 (FPO: [Uses EBP] [0,0,4])
nt!KiSwapThread+0x266
nt!KiCommitThreadWait+0x1df
nt!KeWaitForSingleObject+0x393
driver+0xf0a9
driver+0x120dc
driver+0xb096
ndis!ndisMCommonHaltMiniport+0x54f (FPO: [Non-Fpo])
ndis!ndisPmHaltMiniport+0xfc (FPO: [Non-Fpo])
ndis!ndisSetPower+0x63a (FPO: [Non-Fpo])
ndis!ndisPowerDispatch+0x1ce (FPO: [Non-Fpo])
nt!PopIrpWorker+0x351
nt!PspSystemThreadStartup+0x9e
nt!KiThreadStartup+0x19

NDIS protocol driver (tcpip.sys) waiting for the reference count in a Rundown Protection Lock to drop to zero. Kernel mode components use Rundown Protection Locks to ensure that data structures protected by the lock are kept around for as long as there are references on the structure implying that the structure is in use by one or more threads in system.

nt!KiSwapContext
nt!KiSwapThread
nt!KiCommitThreadWait
nt!KeWaitForSingleObject
nt!ExWaitForRundownProtectionReleaseCacheAware
tcpip!FlUnbindAdapter
ndis!ndisUnbindProtocol
ndis!ndisCloseMiniportBindings
ndis!ndisPnPRemoveDevice
ndis!NdisIMDeInitializeDeviceInstance
driver!XxUnbindAdapter
ndis!ndisUnbindProtocol
ndis!ndisPnPNotifyAllTransports
ndis!ndisSetPower
ndis!ndisPowerDispatch
nt!PopIrpWorker
nt!PspSystemThreadStartup
nt!KiThreadStartup

NDIS waiting for a light weight filter driver to unbind from an underlying adapter.

nt!KiSwapContext
nt!KiSwapThread
nt!KeWaitForSingleObject
ndisPnPNotifyBinding
ndis!ndisPnPNotifyAllTransports
ndis!NdisFNetPnPEvent
ndis!ndisDevicePnPEventNotifyFiltersAndAllTransports
ndis!ndisQueryPower
ndis!ndisPowerDispatch
nt!PopIrpWorker
nt!PspSystemThreadStartup
nt!KiStartSystemThread

TDX.sys (TDI Compatibility Driver) waiting for a reference count on the "Transport Address" to drop to zero. The call to tdx!TdxDeactivateTransportAddress() is preventing power IRPs from being completed in the network stack.

nt!KiSwapContext
nt!KiSwapThread
nt!KeWaitForSingleObject
tdx!TdxDeactivateTransportAddress
tdx!TdxDeleteTransportAddress
tdx!TdxTdiDispatchCleanup
nt!IopCloseFile
nt!ObpDecrementHandleCount
nt!ObpCloseHandleTableEntry
nt!ObpCloseHandle
nt!KiSystemServiceCopyEnd
nt!KiServiceLinkage
netbt!DelayedNbtCloseFileHandles
netbt!CloseAddressesWithTransport
netbt!NbtSetNewAddress
netbt!TdiAddressNotification
TDI!TdiNotifyPnpClientList
TDI!TdiExecuteRequest
nt!ExpWorkerThread
nt!PspSystemThreadStartup
nt!KiStartSystemThread

AFD.sys (Windows Kernel Socket Driver) waiting for a file handle to be closed. The call to IopCloseFile() blocks, causing the power IRPs to be blocked in the network stack.

nt!KiSwapContext+0x26
nt!KiSwapThread+0x44f
nt!KeWaitForSingleObject+0x492
nt!IopCloseFile+0x39f
nt!ObpDecrementHandleCount+0x146
nt!ObpCloseHandleTableEntry+0x234
nt!ObpCloseHandle+0x73
nt!NtClose+0x20
nt!KiFastCallEntry+0x12a
nt!ZwClose+0x11
afd!AfdFreeConnectionResources+0x33
afd!AfdFreeConnectionEx+0xbf
afd!AfdFreeConnection+0x1d
afd!AfdDoWork+0x51
nt!IopProcessWorkItem+0x23
nt!ExpWorkerThread+0xfd
nt!PspSystemThreadStartup+0x9d
nt!KiThreadStartup+0x16

Upon further analysis, most of the issues depicted in the above call stacks were tracked down to the fact that one more Net Buffer Lists (NBLs) were stuck in some queue causing the driver that owns the queue to wait indefinitely for the NBLs to complete. The kernel debugger command "!ndiskd.pendingnbls" can prove to be useful in locating such pending/leaked NBLs. This command only works on Win7/Win2008R2 and later versions of Windows.

Lastly, the variables and function names referred to above have changed in Windows 8 and Windows Server 2012.

发布了233 篇原创文章 · 获赞 106 · 访问量 56万+

猜你喜欢

转载自blog.csdn.net/lixiangminghate/article/details/90732885