Sunday, 29 June 2014

Debugging 0x124

Sorry for not posting in a while I've been distracted by other means.
Anyway, lets get into this.
So what's a 0x124 bugcheck?

Well it means the CPU has raised the flag saying a fatal hardware error has occurred, this is noticed to windows via a standard messaging interface normally through a Machine Check Exception to notify of the error, Windows then bugchecks and stops the system in its tracks.

To sufficiently debug 0x124s we need to have multiple dump files to acquire enough evidence of the cause which is normally the CPU.

BugCheck 124, {0, fffffa80031f8028, b6472000, 1a000135}
So the first parameter which is null indicates the machine check exception which means the CPU has found the hardware error and has bugchecked, this is normally the case with 0x124s.
The second parameter is the address that contains the WHEA error record which should give us insight into the cause of the error.

3: kd> !errrec fffffa80031f8028
===============================================================================
Common Platform Error Record @ fffffa80031f8028
-------------------------------------------------------------------------------
Record Id     : 01cf93c95694ec87
Severity      : Fatal (1)
Length        : 928
Creator       : Microsoft
Notify Type   : Machine Check Exception
Timestamp     : 6/29/2014 20:39:44 (UTC)
Flags         : 0x00000000

===============================================================================
Section 0     : Processor Generic
-------------------------------------------------------------------------------
Descriptor    @ fffffa80031f80a8
Section       @ fffffa80031f8180
Offset        : 344
Length        : 192
Flags         : 0x00000001 Primary
Severity      : Fatal

Proc. Type    : x86/x64
Instr. Set    : x64
Error Type    : Cache error
Operation     : Data Read
Flags         : 0x00
Level         : 1
CPU Version   : 0x0000000000100f53
Processor ID  : 0x0000000000000003

===============================================================================
Section 1     : x86/x64 Processor Specific
-------------------------------------------------------------------------------
Descriptor    @ fffffa80031f80f0
Section       @ fffffa80031f8240
Offset        : 536
Length        : 128
Flags         : 0x00000000
Severity      : Fatal

Local APIC Id : 0x0000000000000003
CPU Id        : 53 0f 10 00 00 08 04 03 - 09 20 80 00 ff fb 8b 17
                00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00
                00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00

Proc. Info 0  @ fffffa80031f8240

===============================================================================
Section 2     : x86/x64 MCA
-------------------------------------------------------------------------------
Descriptor    @ fffffa80031f8138
Section       @ fffffa80031f82c0
Offset        : 664
Length        : 264
Flags         : 0x00000000
Severity      : Fatal

Error         : DCACHEL1_DRD_ERR (Proc 3 Bank 0)
  Status      : 0xb64720001a000135
  Address     : 0x0000000013a87500
  Misc.       : 0x0000000000000000

So what does this mean?

Well the error you're looking at is a Level 1 data read cache error which means the CPU failed to retrieve data stored in the Level 1 cache.
This is normally the first sign of a bad CPU but one single dump file isn't enough to go on to fully determine the cause.
The other dump files (3 more) all indicate the same error on the same memory bank (0) on the same processor (3), this is enough to determine a bad CPU.

This can be caused by overclocking however which can cause a lot of problems, it can however be resolved if no permanent damage is caused.

3: kd> !sysinfo cpuinfo
[CPU Information]
~MHz = REG_DWORD 3206
Component Information = REG_BINARY 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
Configuration Data = REG_FULL_RESOURCE_DESCRIPTOR ff,ff,ff,ff,ff,ff,ff,ff,0,0,0,0,0,0,0,0
Identifier = REG_SZ AMD64 Family 16 Model 5 Stepping 3
ProcessorNameString = REG_SZ AMD Phenom(tm) II X4 840 Processor
VendorIdentifier = REG_SZ AuthenticAMD
 The CPU isn't overclocked as this processor should be running at 3.2GHz which is 3200 MHz.

I would say its safe to say the CPU is bad but we can always remove the CMOs battery to remove any improper timings, and check for overheating.
If none of those help the CPU should be replaced.

I hope this has cleared some questions about these types of bugchecks.