Wednesday, 30 July 2014

Instruction pointer misalignments

This time I'll talk about instruction pointer misalignment.
So what is an instruction pointer misalignment?

Well, when an object references memory it uses a pointer to (you guessed it) point to a certain memory address, once it references the data inside that address it grabs the data from inside the address which is known as dereferencing.

When a pointer is misaligned it grabs data from the wrong address which causes a lot of problems by causing severe memory corruption depending on the contents of the address being referenced, if allowed to write it can completely corrupt the address, the culprit can escape and some innocent pointer comes along, tries to use the address and gets blamed by the computer police.
This is why bugchecks are called to prevent such memory corruption, now the way data structures are arranged and accessed it will read/write in chunks of 4 bytes (sometimes larger) so the memory offset size will be a multiple of the word size, the reason this is done is to maximise the performance by utilising way the CPU handles memory.


When the memory being referenced isn't a multiple of 4 then that's when things go wrong, it generally results in an alignment fault which is also known a bus error, a good example is this.

This instruction taken from a crash dump can explain this a little bit.
    nt!MmCleanProcessAddressSpace+0xe6

 So the nt!Mm is the module, in this case it's a Memory Management Windows function.
The CleanProcessAddressSpace is the actual function, in this case it's scrubbing a memory address ready for allocation.
 The +0xe6 is the offset which is like the address on a street, it's the location which the function takes place.

 I was actually looking at the differences between a segmentation fault and a bus error as they both involve the CPU not physically being capable of addressing the memory being referenced.

  • The segmentation error (or access violation exception) occurs when memory outside of the allowed location is referenced (not to be confused with buffer overruns which involve writing outside allocated memory into another buffer).
  • The bus error occurs when an address which is not alligned is referenced, by this as you know is when they aren't multiples of 4.
Another thing to note, in dump files you can see where it says misaligned pointer it mentions it's probably caused by hardware. As I've mentioned, it's probably due to the fact that the CPU cannot address memory that isn't alligned with multiples of 4 so it looks like it's due to the CPU not being able to read it at all.
Misaligned IPs don't always result in a bus error, they can be caused by drivers writing more data in a buffer on a stack which results in a stack overflow, this also results in a bugcheck to prevent critical memory corruption.
 I hope this has helped you understand the differences and more about instruction pointers.

Hexadecimal and Binary

This blog will be a little different to my usual debugging blogs.
I will be talking about hexadecimals and binary, it can be difficult to fully understand but we should be able to get through it.

Now, at school, I was never really good at Maths, I struggled with a lot of things but I've picked up a few things with debugging as Windows Internals uses these figures to perform operations that would not be possible in decimal.

 Binary can be difficult to get your head round but computers use them to make things a lot more simpler.
Remember at school when you had to use a T chart and count in tens.
So "10" would be 1 ten and 0 ones, in binary "10" means 1 twos and 0 ones.
"100" in binary would be 4 (2x2), "1000" would be eight (4x2) etc.
Generally, binary is used for power states within computers because they're multiples of 2 it would on and off.
There would also be far less rules compared to decimal which actually simplifies things (for the computer), but for us we would need a compiler to convert the code for us to make sense of them.

Hexadecimal is used because it's easier to make smaller numbers, it's mainly used to convert code into binary easier as it divides easier. Instead of multiples of 10 hexadecimal uses multiples of 16, so "10" in hexadecimal would be 16 as it's 1 sixteen and 0 ones.
"25" in hexadecimal would be 2 sixteens and 5 ones so it would be 37.

But how does that work as there aren't symbols for 10 to 16 in hexadecimal?
This is why we have letters for 10 to 16, here's a good conversion chart to help you understand.

Here we can see how they all convert into each other, obviously the higher the figures the more difficult they become to understand.

Hopefully this has helped a lot of you understand if you didn't already.