Sunday, 6 July 2014

0xF4 debugging



This will be a short post seen as I haven't managed to get hold of a Kernel memory dump that involves memory leakage although I have seen it happen but I think the memory dump was a while ago so I deleted it but I can't remember exactly.
BugCheck F4, {3, fffffa800bb94b30, fffffa800bb94e10, fffff800035e3270}
This bugcheck indicates a critical process has terminated for some reason which causes the system to crash as this process is critical for the system's operations.

The 3 indicates a process has terminated so we can use the !process command on the second parameter.

2: kd> !process fffffa800bb94b30

GetPointerFromAddress: unable to read from fffff80003515000PROCESS fffffa800bb94b30 SessionId: none Cid: 0174 Peb: 7fffffda000 ParentCid: 0154 DirBase: 321389000 ObjectTable: fffff8a00b4f9840 HandleCount: <Data Not Accessible> Image: csrss.exe
The process that crashed is csrss.exe (Client/Server Runtime Subsystem) which is the Windows Subsystem, although Windows was designed to support multiple subsystems, calling each subsystem to perform functions such as display I/O would result in duplicate functions which would inevitably reduce performance, therefore designers put a lot of basic functions within this primary subsystem to improve performance. This results in the Windows Subsystem (implemented within csrss.exe) being marked as a critical process even on servers where display I/O isn't needed so if its exited for any reason the system must bugcheck.


Now is mainly caused by disk I/O errors, so what is a disk I/O error?

Well when drive cannot perform basic operations such as read and write Windows cannot perform basic routines so the system fails resulting in a crash, this is usually the cause of a failing disk.
EXCEPTION_CODE: (NTSTATUS) 0xc0000006 - The instruction at 0x%p referenced memory at 0x%p. The required data was not placed into memory because of an I/O error status of 0x%x.

X64_0xF4_IOERR_IMAGE_csrss.exe
Secondly, severe memory leakage can cause this problem as it can drain all the systems resources, normally non paged memory pools so the system cannot function and crashes.

It's caused by programs not freeing there pages of memory after they've finished using them so the pages are no longer in use by the application but they can't be used by anything else as they haven't been freed.


To determine whether or not you have a memory leakage you can use different programs, the Pool Monitor is one of them.

It sorts all memory used on the system into different categories of your choice such as Paged and Nonpaged pools.

For more information on the Pool Monitor look here:


How to use Memory Pool Monitor (Poolmon.exe) to troubleshoot kernel mode memory leaks


Another way is a Kernel Debugger which is my personal favourite way, you will need Kernel memory dumps to find pool leaks.

You can start with using the !poolused 2 command


I'll show an example as I found a 0xF4 Kernel dump file but it isn't the cause of a memory leak though.


(This was due to a disk I/O failure.)

EXCEPTION_CODE: (NTSTATUS) 0xc0000006 - The instruction at 0x%p referenced memory at 0x%p. The required data was not placed into memory because of an I/O error status of 0x%x.
Using the !vm command we can look at all the Virtual Memory being used at the time of the crash.

1: kd> !vm

*** Virtual Memory Usage *** Physical Memory: 4175860 ( 16703440 Kb) Page File: \??\C:\pagefile.sys Current: 16703440 Kb Free Space: 16703436 Kb Minimum: 16703440 Kb Maximum: 50110320 Kb Available Pages: 3833632 ( 15334528 Kb) ResAvail Pages: 4052929 ( 16211716 Kb) Locked IO Pages: 0 ( 0 Kb) Free System PTEs: 33497223 ( 133988892 Kb) Modified Pages: 137120 ( 548480 Kb) Modified PF Pages: 37803 ( 151212 Kb) NonPagedPool Usage: 13122 ( 52488 Kb) NonPagedPool Max: 3116636 ( 12466544 Kb) PagedPool 0 Usage: 27172 ( 108688 Kb) PagedPool 1 Usage: 5597 ( 22388 Kb) PagedPool 2 Usage: 0 ( 0 Kb) PagedPool 3 Usage: 0 ( 0 Kb) PagedPool 4 Usage: 56 ( 224 Kb) PagedPool Usage: 32825 ( 131300 Kb) PagedPool Maximum: 33554432 ( 134217728 Kb)


Although we can see the PagedPool usage that isn't normally the cause of crashes due to memory leakage as it can be paged out to disk, it's non paged pool leakage caused by device drivers that cause these issues.
Lets look at all the processes that are using the Nonpaged memory pools.
Do note the list is very long and it is ordered in size of memory usage so only the top few lines are of use.

The 2 extension is used to display the amount of nonpaged pool usage, 4 would show page pool.

1: kd> !poolused 2

....

Sorting by NonPaged Pool Consumed



NonPaged Paged

Tag Allocs Used Allocs Used



VfPT
1 8388608 0 0 Verifier Allocate/Free Pool stack traces , Binary: nt!Vf

XENO 30 2955056 0 0 UNKNOWN pooltag 'XENO', please update pooltag.txt

Obtd 1 2625536 0 0 UNKNOWN pooltag 'Obtd', please update pooltag.txt

NVRM 3064 2461228 1 528384 UNKNOWN pooltag 'NVRM', please update pooltag.txt

4KBS 564 2319168 0 0 UNKNOWN pooltag '4KBS', please update pooltag.txt


The only pool usage that sticks out here is Driver Verifier running which separates certain pool allocations to monitor those specific drivers.

We can confirm this by running the !verifier command.
1: kd> !verifier

Verify Flags Level 0x00000dbb

STANDARD FLAGS: [X] (0x00000000) Automatic Checks [X] (0x00000001) Special pool [X] (0x00000002) Force IRQL checking [X] (0x00000008) Pool tracking [X] (0x00000010) I/O verification [X] (0x00000020) Deadlock detection [X] (0x00000080) DMA checking [X] (0x00000100) Security checks [X] (0x00000800) Miscellaneous checks

ADDITIONAL FLAGS: [ ] (0x00000004) Randomized low resources simulation [ ] (0x00000200) Force pending I/O requests [X] (0x00000400) IRP logging

[X] Indicates flag is enabled
We can see every option apart from Force pending I/O requests and Low resource simulation are enabled as these options create unrealistic environments for drivers that can cause them to crash when in reality the drivers might not crash at all so this creates false positive reports.


For more information on Driver Verifier options look here:


Driver Verifier Options (Windows Drivers)


Lastly, if pool usage is too high and causing system crashes we can take a look at IRPs being used as sometimes they can keep calling the functions using up memory.


We can do this by using the !irpfind command, unfortunately it doesn't look like they're saved in this dump file for some reason which I've never seen before.


A great example of this bugcheck can be found here.

This was originally recommend by my friend Vir Gnarus.


But I will see what happens regarding this situation and see if changing the disk drive solves the issue.


I forgot to mention the last way of tracking pool usage, Driver Verifier can use Pool Tracking to monitor all drivers selected and see if they have freed their allocations after the driver unloads, if it doesn't the system crashes with a 0xC4 bugcheck hopefully catching the culprit.

This is not to be confused with Special Pool which Driver Verifier uses to allocate driver memory from a special pool which can be monitored for incorrect usage, for example, if a driver tries to access memory that has already been freed then the system will crash as it hasn't been scrubbed and ready for allocation.


If a driver allocates 100 bytes but writes 110 bytes the driver will write into another driver's header which can blame a different driver long after the culprit has left the scene so when the police come and investigate the crime scene the wrong person is locked up.

Special Pool changes how things are setup, so when a driver has allocated memory a guard page is set as well as slop byte at each end of the buffer, if the driver tries to write more than its allocated into the guard page the system immediately bugchecks, after the driver has unloaded and the memory has free the slop bytes will check to see if the memory has been freed, if it hasn't the system will bugcheck and blame that driver.


For more information on Special Pool visit the previous link.