Saturday, 21 June 2014

0x101 bugcheck analysis

Today we'll take a look at the 0x101 bugcheck error and what it means.

 BugCheck 101, {31, 0, fffff88003165180, 2}

So first of all what does all this mean?
Well this bugcheck indicates a clock interrupt was not received on a processor within the allocated time interval and therefore crashed the system.

Well a clock interrupt is a synchronization mechanism that lets the processors stay in sync to improve performance, when it's sent out all the processors have to respond within the allocated time interval which in this case is 31 clock ticks.

The third parameter is the address that contains processor information of the hung processor.
The fourth parameter is mostly the processor that was responsible for not responding.

0: kd> kv
Child-SP          RetAddr           : Args to Child                                                           : Call Site
fffff880`06e53328 fffff800`03b2da4a : 00000000`00000101 00000000`00000031 00000000`00000000 fffff880`03165180 : nt!KeBugCheckEx
fffff880`06e53330 fffff800`03ae06f7 : 00000000`00000000 fffff800`00000002 00000000`00002710 fffff880`06e53450 : nt! ?? ::FNODOBFM::`string'+0x4e3e
fffff880`06e533c0 fffff800`03a22895 : fffff800`03a48460 fffff880`06e53570 fffff800`03a48460 00000000`00000000 : nt!KeUpdateSystemTime+0x377
fffff880`06e534c0 fffff800`03ad3113 : fffff800`03c51e80 00000000`00000001 00000000`00000001 fffff800`03a61000 : hal!HalpHpetClockInterrupt+0x8d
fffff880`06e534f0 fffff800`03aab937 : fffff800`03ae0aa5 00000000`000406f8 fffff880`03165180 fffff880`06e53710 : nt!KiInterruptDispatchNoLock+0x163 (TrapFrame @ fffff880`06e534f0)
fffff880`06e53680 fffff800`03de1d4f : 00000000`00000000 fffff880`06e53b60 00000000`00000000 00000000`00000000 : nt!KeFlushProcessWriteBuffers+0x63
fffff880`06e536f0 fffff800`03de23ad : 00000000`0307b2d0 fffff800`03dcccce 00000000`00000000 00000000`00000286 : nt!ExpQuerySystemInformation+0x13af
fffff880`06e53aa0 fffff800`03ad5e53 : 00000000`00000000 fffff880`06e53b60 ffffffff`fffe7960 000007fe`f3a80b90 : nt!NtQuerySystemInformation+0x4d
fffff880`06e53ae0 00000000`7799161a : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceCopyEnd+0x13 (TrapFrame @ fffff880`06e53ae0)
00000000`02e1f5b8 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x7799161a

So the primary processor called the clock interrupt then bugchecked because the 2nd processor didn't respond.
Lets look at the registers at the time of the interrupt dispatch.

0: kd> .trap fffff880`06e534f0
NOTE: The trap frame does not contain all registers.
Some register values may be zeroed or incorrect.
rax=0000000000000001 rbx=0000000000000000 rcx=0000000000000202
rdx=00000000000c00e1 rsi=0000000000000000 rdi=0000000000000000
rip=fffff80003aab937 rsp=fffff88006e53680 rbp=0000000000000000
 r8=00000000000000e1  r9=0000000000000001 r10=0000000000000000
r11=fffff880031d7180 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000
iopl=0         nv up ei pl nz na pe nc
fffff800`03aab937 f390            pause
Here we see the the processor trying to flush the write buffers which happens when the CPU isn't synced properly to try and help it become available to execute handler instructions and reorder the buffer(s) to hopefully allow the execution of the necessary instructions. When it attempts to flush the Trnaslation Lookaside Buffers the processor can try and get through its work and sync with the rest of the cores before the system calls a bugcheck, which is what has happened here.
So this is where I'll do some unassembling.

0: kd> u @rip
fffff800`03aab937 f390            pause 
fffff800`03aab939 8b8380200000    mov     eax,dword ptr [rbx+2080h]
fffff800`03aab93f 3bc5            cmp     eax,ebp
fffff800`03aab941 75f4            jne     nt!KeFlushProcessWriteBuffers+0x63 (fffff800`03aab937
fffff800`03aab943 400fb6c6        movzx   eax,sil
fffff800`03aab947 440f22c0        mov     cr8,rax
fffff800`03aab94b 4c8d5c2460      lea     r11,[rsp+60h]
fffff800`03aab950 498b5b10        mov     rbx,qword ptr [r11+10h]
Lets unassemble the exact function.

0: kd> u fffff800`03aab937 fffff800`03aab943
fffff800`03aab937 f390            pause <-- Pause (CPU delay) probably waiting for the spinlock to be released
fffff800`03aab939 8b8380200000    mov     eax,dword ptr [rbx+2080h]
fffff800`03aab93f 3bc5            cmp     eax,ebp
fffff800`03aab941 75f4            jne     nt!KeFlushProcessWriteBuffers+0x63 (fffff800`03aab937) <-- Jump if not zero trying to stay in the loop
fffff800`03aab943 400fb6c6        movzx   eax,sil 
Now lets look at the other processors.
This processor is #1 which is actually the second one given that they actually start with #0.
1: kd> kv
Child-SP          RetAddr           : Args to Child                                                           : Call Site
fffff880`0a5b3bb0 fffff800`03b0b3cd : 00000000`00000000 00000000`00000001 00000000`476ca000 00000000`476ca000 : nt!KxFlushEntireTb+0x93 <-- flushing the translation lookaside buffers.
fffff880`0a5b3bf0 fffff800`03b2f9b0 : 00000000`00000002 fffff680`0023b670 00000000`00000001 00000000`00015e7e : nt!KeFlushTb+0x119
fffff880`0a5b3c70 fffff800`03ae678f : fffff680`0023b670 fffff880`0a5b3d00 fffff700`01080000 fffffa80`0af873f8 : nt! ?? ::FNODOBFM::`string'+0xada2
fffff880`0a5b3cb0 fffff800`03af3bfe : 00000000`00000000 00000000`476ce000 fffff880`0a5b4010 fffff680`0023b670 : nt!MiResolveDemandZeroFault+0x1ff
fffff880`0a5b3da0 fffff800`03ae3179 : fffffa80`0baa6ab0 ffffffff`ffffffff fffff8a0`0c1fe740 00000000`00007000 : nt!MiDispatchFault+0x8ce
fffff880`0a5b3eb0 fffff800`03ad4cee : 00000000`00000000 00000000`476ce000 00000000`00000000 ffffffff`ffffffff : nt!MmAccessFault+0x359
fffff880`0a5b4010 fffff800`03af2798 : fffff8a0`0bebf960 fffffa80`0918f0f0 00000000`00000000 fffff800`03c0ae80 : nt!KiPageFault+0x16e (TrapFrame @ fffff880`0a5b4010)
fffff880`0a5b41a0 fffff880`0ff88ba1 : fffffa80`0ba99520 00000000`00000000 fffff880`00000002 00000000`476cc000 : nt!MmProbeAndLockPages+0x118
fffff880`0a5b42b0 fffff880`0ff87d0d : fffffa80`0918f1e0 fffff8a0`10969ad0 fffff880`4b677844 00000000`00007000 : dxgmms1!VIDMM_SEGMENT::SafeProbeAndLockPages+0x229
fffff880`0a5b4340 fffff880`0ff827d8 : fffff8a0`10969ad0 fffff8a0`10969ad0 fffff8a0`00000000 00000000`00000001 : dxgmms1!VIDMM_SEGMENT::LockAllocationBackingStore+0x8d
fffff880`0a5b43b0 fffff880`0ff76c2b : fffffa80`06ac0c80 fffffa80`095ec000 fffffa80`095ec000 fffff880`0ff76820 : dxgmms1!VIDMM_APERTURE_SEGMENT::CommitResource+0x1c4
fffff880`0a5b4400 fffff880`0ff7a59f : fffffa80`095ec000 fffff880`0a5b44d8 fffffa80`06ac0c00 00000000`00000000 : dxgmms1!VIDMM_GLOBAL::PageInAllocations+0xbb
fffff880`0a5b4460 fffff880`0ff74dd6 : fffff8a0`10969ad0 fffffa80`06ac0c80 fffff8a0`10969ad0 fffffa80`095ec000 : dxgmms1!VIDMM_GLOBAL::PageInOneAllocation+0x107
fffff880`0a5b44d0 fffff880`0ff744b3 : fffff880`0a5b4828 fffffa80`0b6a7010 fffff880`0a5b47a0 fffff880`0a5b4838 : dxgmms1!VIDMM_GLOBAL::ProcessDeferredCommand+0x3d2
fffff880`0a5b45f0 fffff880`0ff8e3ad : 00000000`00000001 fffff880`0ff600e0 fffffa80`095e0d50 fffffa80`095e0d50 : dxgmms1!VidMmiProcessSystemCommand+0x23
fffff880`0a5b4620 fffff880`0ff8d538 : fffff880`0a5b4780 00000000`00000000 fffffa80`095e0c00 00000000`00000001 : dxgmms1!VidSchiSubmitSystemCommand+0x39
fffff880`0a5b4650 fffff880`0ff5f786 : 00000000`00000000 fffffa80`095e0d50 fffffa80`095e0c00 fffffa80`095e0c00 : dxgmms1!VidSchiSubmitQueueCommand+0x74
fffff880`0a5b4680 fffff880`0ff8faa3 : fffffa80`095d4af0 fffffa80`095dd000 fffffa80`095e0c00 fffff8a0`10e11000 : dxgmms1!VidSchiSubmitQueueCommandDirect+0x1e6
fffff880`0a5b4710 fffff880`0ff745dd : fffffa80`00000001 fffffa80`09187410 fffffa80`095ec000 fffffa80`095ec000 : dxgmms1!VidSchiSubmitCommandPacketToQueue+0x15f
fffff880`0a5b4780 fffff880`0ff749f0 : fffff880`0a5b48b8 00000000`00000001 fffff880`0a5b48b8 00000000`00000001 : dxgmms1!VIDMM_GLOBAL::QueueSystemCommandAndWait+0xf9
fffff880`0a5b47f0 fffff880`0ff71cb2 : fffffa80`095ec000 fffffa80`06ac0c80 00000000`00000000 00000000`00040000 : dxgmms1!VIDMM_GLOBAL::QueueDeferredCommandAndWait+0x4c
fffff880`0a5b4860 fffff880`0ff57260 : fffffa80`095ec000 fffff8a0`10e12301 00000000`00000000 fffff880`02ee207f : dxgmms1!VIDMM_GLOBAL::BeginCPUAccess+0xcfa
fffff880`0a5b4930 fffff880`02f3f0e7 : 00000000`000ae260 fffffa80`095d5000 00000000`000ae260 fffff880`02ee107b : dxgmms1!VidMmBeginCPUAccess+0x28
fffff880`0a5b4980 fffff880`02f3f7ae : fffff8a0`0f3e34e0 fffff8a0`0f3e34e0 fffff880`0a5b4b60 00000000`00000000 : dxgkrnl!DXGDEVICE::Lock+0x287
fffff880`0a5b49e0 fffff960`002513a2 : 00000000`000ae260 00000000`40000101 00000000`000007db fffff8a0`10e11000 : dxgkrnl!DxgkLock+0x22a
fffff880`0a5b4ab0 fffff800`03ad5e53 : fffffa80`0a2d7750 fffff880`0a5b4b60 00000000`fffdb000 00000000`28d3c000 : win32k!NtGdiDdDDILock+0x12
fffff880`0a5b4ae0 00000000`7518156a : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceCopyEnd+0x13 (TrapFrame @ fffff880`0a5b4ae0)
00000000`000ae238 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x7518156a

Here we see some display routines by the DirectX wrappers, it looks like they're allocating memory then locking it into memory using the MmProbeAndLockPages routine.

There's not much unassembling to be done in this processor so I'll not put the code in.

Now lets look at the problematic processor which is #2.

2: kd> kv
Child-SP          RetAddr           : Args to Child                                                           : Call Site
00000000`00000000 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x0
2: kd> r
rax=0000000000000000 rbx=0000000000000000 rcx=0000000000000000
rdx=0000000000000000 rsi=0000000000000000 rdi=0000000000000000
rip=0000000000000000 rsp=0000000000000000 rbp=0000000000000000
 r8=0000000000000000  r9=0000000000000000 r10=0000000000000000
r11=0000000000000000 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000
iopl=0         nv up di pl nz na pe nc
cs=0000  ss=0000  ds=0000  es=0000  fs=0000  gs=0000             efl=00000000
00000000`00000000 ??              ???
The callstack and registers are zeroed out which is never good, so for this we'll have to look at the raw stack. This is either due to the IRQL being too high or the processor was too hung to record the information.

2: kd> !irql
Debugger saved IRQL for processor 0x2 -- 0 (LOW_LEVEL)
The IRQL is at passive level so it's not the IRQL causing the problem but most likely being too hung to record the information.
fffff880`0318a978  fffff880`04285e6f*** ERROR: Module load completed but symbols could not be loaded for cfosspeed6.sys
fffff880`0318a980  fffff880`0318aa38
fffff880`0318a988  fffff880`042ca86d cfosspeed6+0x8986d
fffff880`0318a990  fffff880`0318aa30
fffff880`0318a998  fffff880`0318aa38
fffff880`0318a9a0  fffffa80`0b0c5580
fffff880`0318a9a8  fffff880`04284cb7 cfosspeed6+0x43cb7

The cfosspeed6.sys is an internet accelerator which I find to be pointless anyway.

fffff880`0318bae8  fffff880`02fd6061*** ERROR: Symbol file could not be found.  Defaulted to export symbols for VBoxNetFlt.sys - 
fffff880`0318baf0  fffffa80`06c6f830
fffff880`0318baf8  fffffa80`071efe70
fffff880`0318bb00  00000000`00000000
fffff880`0318bb08  fffffa80`071f1010
fffff880`0318bb10  fffffa80`070681a0
fffff880`0318bb18  fffff880`017b3a22 ndis!ndisMSendPacketCompleteToOpen+0x102
fffff880`0318bb20  00000000`00000000
fffff880`0318bb28  fffffa80`071efe70
fffff880`0318bb30  fffffa80`071f1010
fffff880`0318bb38  fffffa80`071efe70
fffff880`0318bb40  fffffa80`071a4010
fffff880`0318bb48  fffff880`05488848*** ERROR: Module load completed but symbols could not be loaded for tap0901.sys

The Virtual Box network driver and the TAP-Win32 Adapter V9 are being flagged as well.
It looks like the Internet accelerator is conflicting with NETBIOS (Network Basic Input/Output System)

Finally lets look at the last processor.

3: kd> kv
Child-SP          RetAddr           : Args to Child                                                           : Call Site
fffff880`09ff6020 fffff800`03b0b3cd : 00000000`00000000 00000000`00000001 00000000`00000002 fffffa80`08da7c58 : nt!KxFlushEntireTb+0xcd
fffff880`09ff6060 fffff800`03b2f9b0 : 00000000`00000001 fffff680`0001e088 00000000`00000001 ffffffff`ffffffff : nt!KeFlushTb+0x119
fffff880`09ff60e0 fffff800`03ae389d : fffffa80`01e33c30 fffff880`09ff6180 fffffa80`0a3ec450 00000000`00000000 : nt! ?? ::FNODOBFM::`string'+0xada2
fffff880`09ff6120 fffff800`03ad4cee : 00000000`00000001 00000000`03c11000 00000000`00000000 00000000`00000001 : nt!MmAccessFault+0xa7d
fffff880`09ff6280 fffff960`0011ad9f : fffff960`0011ac60 fffff900`c264e670 00000000`00000000 00000000`03c10000 : nt!KiPageFault+0x16e (TrapFrame @ fffff880`09ff6280)
fffff880`09ff6410 fffff960`0011ab34 : fffff880`09ff6860 00000000`0000021d 00000000`03c10000 00000000`00000000 : win32k!vSolidFillRect1+0x13f
fffff880`09ff6450 fffff960`000f5d61 : 00000000`00000005 fffff880`09ff6860 fffff900`c2976870 fffff880`00000000 : win32k!vDIBSolidBlt+0x204
fffff880`09ff6630 fffff960`000df7d9 : 00000000`00000000 00000000`00000000 00000000`00000000 fffff900`c264e670 : win32k!EngBitBlt+0x79d
fffff880`09ff6740 fffff960`000df3c0 : 00000000`00000000 fffff880`09ff6b60 fffff880`09ff69d0 fffff900`0000f0f0 : win32k!GrePatBltLockedDC+0x2f9
fffff880`09ff67f0 fffff960`002b9538 : fffff880`09ff69d0 00000000`fffdb000 00000000`0008ebc0 fffff960`00000000 : win32k!GrePolyPatBltInternal+0x2ec
fffff880`09ff6940 fffff800`03ad5e53 : 00000000`0008e308 00000000`00f00021 00000000`0008ebc0 00000000`00000001 : win32k!NtGdiPolyPatBlt+0x308 
fffff880`09ff6a70 00000000`751804ca : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceCopyEnd+0x13 (TrapFrame @ fffff880`09ff6ae0)
00000000`0008e2e8 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x751804ca

Some Win32k.sys routines, lets look at the trap frame for the page fault.

3: kd> .trap fffff880`09ff6280
NOTE: The trap frame does not contain all registers.
Some register values may be zeroed or incorrect.
rax=0000000003c10ff8 rbx=0000000000000000 rcx=0000000000000000
rdx=0000000000000006 rsi=0000000000000000 rdi=0000000000000000
rip=fffff9600011ad9f rsp=fffff88009ff6410 rbp=00000000ffffffff
 r8=0000000000000016  r9=0000000003c10874 r10=0000000000000003
r11=000000000000021d r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000
iopl=0         nv up ei pl nz ac po cy
fffff960`0011ad9f ??              ???
3: kd> u @rip
fffff960`0011ad9f ??              ???
            ^ Memory access error in 'u @rip'
The information can't be recorded.

So with all this said, the cause looks pretty clear that some network related drivers are causing the issues.
This dump file was in my downloads so I can't recall the exact solution.