Monday, 25 August 2014

Debugging and my story

I thought is write a blot post about myself and how I started debugging as I haven't posted in a long time, the reason is I'm actually on holiday in Greece so I can't write much about debugging as I don't have access to my computer.

So here's my story on how I got to where I am today...

Before time began... Wait, that's not right.
Okay, I first got my own computer in March 2013 from a local computer shop, I saved up from Xmas and birthday money, my parents didn't agree getting my own computer was a good idea but I insisted so I went to this shop and poured out a sum of £590 to pay for a gaming computer.
At the time I knew nothing, literally... I mean nothing about computers, I just wanted to play better games than what was on my Xbox at the time.
So here I carried it upstairs (my Uncle fetched the computer in his car for me at the time) into my bedroom, here comes the phone call of my mum ringing up and to her disbelief I told her I bought a computer.
She came home and to no surprise wasn't happy at all, we had no where to put it at all, eventually she decided to move everything around to put a desk in.
Here's where the irony comes in, she said it was a bad idea to buy the computer and especially from this local shop, so here I was setting it up, buying a few games on Steam, starting playing the games and... (You guessed it) blue screen!

Oh no, what am I going to do?
Panic, panic...
I eventually ring the shop and say this has happened, so I took it down to the shop, he looked at it and told me to come back in a few days, I came back and it ran fine until it happened again.
I remember him saying it was a driver issue and I'm installing something that's causing problems, funny how he never told me what it was.

This is the point where I decided to go online and see if I can find any solutions, all gibberish as I knew nothing of computing.
That's when I stumbled upon www.sevenforums.com

I asked for help and I got all these solutions that wouldn't work, from somebody called Arc, I then contacted somebody else called X Blue Robot to help me which happens to be a good friend now over at www.sysnative.com

Anyway, the problems still persisted so joined another forum which was www.windowsforum.com and made a thread, Vir Gnarus and two others helped me and it was appearing to be a hardware failure, especially given it was a 0x124, (discussed in another post).

I still couldn't fix the problem, I got numerous errors to the point where I contacted the supplier where the local shop got the computers from. They told me to send it to them free of charge, they contacted me days after and said they couldn't replicate the issue but replaced the PSU and GPU as the GPU had a loose bearing as wasn't actually supplied with the computer, the local shop bought that seperate so the 450w PSU couldn't really handle it so that was replaced to an el cheapo 750w Ace switching PSU which can be bought from around £10, that's good...

I gave up and changed what software I could as the supplier still couldn't find any problems.

I started then posting BSOD instructions over at www.sevenforums.com to help some BSOD analysts.
I got a few thanks, I then installed the Windows Debugger to take a look at some files, still gibberish, no luck in finding useful commands on my own, thanks to some of x BlueRobot's posts (Harry Miller) I managed to use some commands to find simply BSOD cases.
I then started learning basics and reading blog posts by Harry on learning debugging. Afterwards I bought Windows Internals and read a bit of that.
Without causing more wars I got into a large disagreement over at sevenforums.com and got banned, after already having an account on Sysnative about freezing I decided to take my knowledge over there.
Shorty after I made friends with two fellow BSOD analysts Patrick Barker and John Griffith.
I've been there ever since at my new online home, I then joined www.techsupportforum.com and helped people out there (which I still do).
And now here I am (in Greece) helping people out with BSODs and hopefully starting a Computer Science degree at Sheffield Hallam University this time next year.
Oh and over at www.windowsforum.com Vir Gnarus who is now a good friend but he recently switched to IT Infrastructure as apposed to debugging but I'm hoping he'll return soon.

I'm undecided in what to do as a career at the moment but an Escalation Engineer at Microsoft looks like a very interesting job.

So that's my story so far, debugging is very interesting and so hard to believe that if I hadn't had bought that specific computer I wouldn't be here today.


Tuesday, 5 August 2014

Memory Management - Stacks

In this blog I'll talk about stacks, what they are and how they are used in Windows.
We've come across the term before but we don't know that much about them unless you really look into them.

So a stack is an abstract data type that is implemented as a LIFO structure which means Last In First Out.

So from good old wikipedia here's a very good simple picture of a LIFO mechanism, we can see it uses Push to add data onto the stack and Pop to remove it, so now you know how the simple stack works lets go a bit more advanced.

A stack has a fixed origin within memory called (you know it) stack origin, it then uses a the push instruction to initialize the stack. It then contains a stack pointer which points to the address of the last item added to the stack. The pointer moves further away from the origin as more data is added, although this doesn't necessarily mean it's moving up, it can move down.
Now the pointer cannot cross the origin margine at any time, if this happens a stack underrun occurs, this is normally caused by using pop more times than it should be.
A stack overflow occurs when Push is used more times than allowed so the pointer moves into the boundary of another stack, in other words it spills data outside of the allocated region and goes into another stack.
This is a very big problem on Kernel stacks as there are no process address spaces to protect the memory, in Kernel mode everything is ran from a single system memory space that has access to the entire Kernel system of the OS. When this overflow happens it can and will corrupt data on another stack elsewhere that can be executing a thread completely different to the stack overflowing, the culprit on the current stack essentially flees and the stack being corrupt blames somebody else and a bugcheck is called once the corruption is detected, if this is the case Driver Verifier should be enabled.

A good picture to show how this works is as follows.



I rambled a bit here but I just tried to briefly explain how stack overflows cause corruption that can bring the system to a halt so device drivers and other kernel objects should be written carefully and correctly to prevent these situations from happening.

Stacks can also be implemented within arrays which involves the first element at offset zero being the stack origin and it builds from there.
Implementing stacks in linked lists differs in that AFAIK it doesn't involve using the LIFO mechanism but rather removing nodes and replacing them with different ones in order to change the bottom element of the stack, I need to look into that a bit more though.

There are generally three major types of stacks: User Stacks, Kernel Stacks and DPC Stacks.

In User Stacks when a thread is created by the memory manager 1MB of memory is reserved which can be altered when calling the CreateThread function. Once the thread is created only the first page and a guard page is created, more data can then be added to the page until the guard page is hitwhen an exception occurs, this then allows it to grow with demand but it will never shrink back.

Kernel stacks are a lot smaller than user stacks, they typically range from 12KB (x86) to 16KB (x64), this excludes a guard page table entry which consumes an additional 4KB.
Kernel running code tends to have less recursion than user mode code and therefore contains more efficient code which keeps stack buffer sizes smaller. As stated before, Kernel code has a much larger impact on the system as it runs in a single system address space.

However, interactions between the graphics system and win32k.sys subsequent calls back into user mode are recursive the Kernel implements a way for stacks to be added when nearing the guard page, these stacks contain an additional 16KB, when calls are returned the memory manager frees the stacks afterwards.

 The DPC stack contains a processor stack (One for each processor) which is available for use everytime DPCs are executed, they stay in their own stack as it's generally unrelated to the current kernel stack's operations as it runs in an arbitary thread context.

I believe I've covered pretty much everything on stacks, I hope that's helped your understanding.


References:
http://en.wikipedia.org/wiki/Stack_%28abstract_data_type%29
Windows Internals

Saturday, 2 August 2014

Interrupt dispatching and handling

In this post I'll talk about interrupt dispatching and the type of interrupts. Interrupts have always been interesting yet slightly confusing at the same time so I'll try and explain what they are and the different types they come in.

So what is an interrupt?

It's kind of in the name, it's an asynchronous event that diverts the processors flow of control.
They generally come in two forms, hardware interrupts and software interrupts.
 Interrupts can occur from I/O devices, timers or processor clocks.

Hardware Interrupts

These interrupts are external I/Os that come from lines in the interrupt controllers, so when an IRQ (Interrupt Request) is received it enters through a line on the interrupt controller which converts the IRQ into a number which is matched with the IDT index (Interrupt Dispatch Table), then the ISR (Interrupt Service Routine) trap handler is invoked to save the context of the currently executing thread, once the interrupt is completed the context is restored so the thread continues execution like nothing has ever happened.

Interrupt controllers

Hardware interrupts use interrupt controllers which generally speaking come in two forms, PIC (Programmable Interrupt Controller) and APIC (Advanced Programmable Interrupt Controller). The PIC is a uniprocessor controller that is generally used on x86 systems and uses 8 lines. However another PIC can be added called a slave which can add an additional 7 lines to the controller adding to a total of 15 lines.
The APIC is multiprossor interrupt controller which is generally used on x64 systems that contains 256 lines, with this in play the PIC is quickly being phased out.





Here is an example of the IDT which contains lots of different entries for specific interrupts, trap handlers for exceptions also use the IDT for events such as page faults.
I will discuss later on how page faults come into play with bugchecks and IRQs but for a more indepth explanation on how page faults are handled take a look at my friend Patrick's post over at Sysnative.com

http://www.sysnative.com/forums/bsod-kernel-dump-analysis-debugging-information/10551-page-faults-explained.html


Software Interrupts

Although interrupt controllers implement their own prioritisation mechanisms Windows uses it's own technique for doing so, IRQLs.


These are IRQLs for x64, IA64 and x86 systems.
IRQLs are a way for interrupts to be prioritised appropriately, IRQLs aren't implemented in a first in first serve technique but rather the higher the IRQL the higher the priority so an IRQL at 15 would get serviced before one at IRQL 2.

To put this into perspective an IRQ that is at IRQL 2 would have to wait for any IRQs at 3 or above to get serviced before the IRQL is lowered for it to be serviced as the level cannot be lowered when a new interrupt has occurred.
For example, if an interrupt is being serviced and another interrupt needs servicing two things can happen.

One is the current IRQ is but on a waiting list and the new one is serviced.
Two is the current IRQ is finished being serviced then the next IRQL further down the list is next.

It depends on the IRQL of the interrupts.

Back to page faults,
A page fault occurs when a request to memory that is not present happens, when a page fault occurs the page fault handler requests the memory being referenced is brought into memory but in order to do that the IRQL must be at 1 or below as this is when pageable memory can be accessed.
Now when the IRQL is higher than this servicing and interrupt and a page fault occurs this is when we bugcheck with either 0xA or 0xD1 (DRIVER_)IRQL_NOT_LESS_OR_EQUAL

So why can't we just lower the IRQL to service the page fault or wait for the current interrupt to finish?

Well IRQLs cannot be lowered when an interrupt at that level is being serviced as that has priority, a page fault cannot wait as it must be serviced immediately.
You see the problem?
It's an endless cycle so the system crashes as it can't compute anything else.

I hope I've covered pretty much everything and I hope you've learned something.

I forgot to add, hardware interrupts (IRQs) can only be serviced above DPC/dispatch level, so anything at that level or below will not allow hardware interrupts to be serviced.