How the stack works

Programming HOW TOs and in-depth guides for programmers of all levels. Programming is an essential skill for hackers, so start learning today!
Post Reply
User avatar
IceDane
Because I Can
Posts: 2652
Joined: Wed May 12, 2004 9:25 am

How the stack works

Post by IceDane » Mon Dec 07, 2009 1:13 am

I woke up yesterday, and a guy in IRC was having trouble understanding how the stack works. I started explaining it to him, but soon found out that IRC is not a good platform for long explanations with ASCII diagrams.
So I told him I'd whip up a guide, and a couple of hours later I had 2000 words.
It is by no means advanced, but if you don't understand the stack, there's not much you can do when it comes to 'analyzing vulnerabilities' or exploiting them.

Code: Select all

How the stack works - An informative guide that might be your first step in learning how to hack the gibson, by IceDane

I'll just go ahead and dive right in.
We're going to use this small C program as an example to work with:

int function(int a, int b, int c, int d)
{
    int integer;
    char buffer[16];
}

int main(void)
{
    function(1, 2, 3, 4);
}

When this function is called, the way it is called follows is dictated by the calling convention used by the compiler. There are ways to change calling conventions for single functions, but that is not within the scope of this small tutorial.
stdcall is the default calling convention for Win32 API, while cdecl is the default for C functions and also global C++ functions. These two are almost exactly identical for all our intents and purposes, except some minor details that are more or less irrelevant, at least in this guide.

Both cdecl and stdcall calling conventions have this in common:

    * Arguments to the functions are pushed on the stack right-to-left
    * Return values are put in the eax register(Mostly)
    * edx, ecx and eax are designated for use within the function.

When a function is called, there are a few things that happen, that are called the 'function prolog'. That's just a fancy term for 'stuff that happens before the function's code is executed'. 
The stack works in frames. Every function that is called has a stack frame, to the base of which the ebp register points. 
If we take a look at the source-code above, and use that as an example, this is what the compiled code will resemble:

    push 4         = 300
    push 3         = 304
    push 2         = 308
    push 1         = 312
    call function  = 316
    ret            = 320

As I said before, the stack grows down the the address space, so the address of 1, on the stack, is actually lower than the address of 2, 3 and 4. 
The ESP register always points to the top of the stack. If you think about it, this makes perfect sense. If you push a value on to the stack, and then pop it back, you expect to get the same value you just pushed on. 
What follows is a chart that shows what the stack looks like, before, during, and after we push those variables to the stack. We assume the stack starts at the address of 100, and that every variable takes 4 bytes(This is actually somewhat true, as the stack is always aligned on certain boundaries defined by the compiler. Often, 4 byte boundaries are used.)

    | ... | = ...
    | --- | = 092
    | --- | = 096 
    | XXX | = 100 <- esp
    | XXX | = 104
    | ... | = ...

This is what the stack looks before we start pushing arguments to it. Like I said, esp always points to the top of the stack. This means that anything below it(Address wise) is unused, and anything above it is stack space that is in use.
On this chart, XXX = used, but irrelevant, and --- = unused.
When we push something to the stack, what basically happens is that the size of the variable is basically subtracted from esp, and the variable is then moved to where esp points.
So, this means that

    push 123

is basically equivalent to

    sub esp, 4
    mov [esp], 123

Except that push does all of that for us in a single instruction. 
Here's what the stack looks like when we push our first variable to the stack(which is 4):

    | ... | = ...
    | --- | = 092
    |  4  | = 096 <- esp
    | XXX | = 100
    | ... | = ...

And the, after all variables have been pushed:
    
    | ... | = ...
    | --- | = 080
    |  1  | = 084 <- esp
    |  2  | = 088
    |  3  | = 092
    |  4  | = 096
    | XXX | = 100
    | ... | = ...

All variables have been pushed to the stack, and esp contains 084.
Now, the call instruction is executed. What the call instruction does, is basically pushing the return address to the stack, so that when the function is finished executing, it can resume execution where it was before.
The return address is actually the address of the instruction after the call instruction. As you can imagine, if it returned directly to the call instruction again, it would be stuck in an infinite loop, which is pretty useless.

If you take a look at the code from before, where I showed you what the disassembly would resemble, you'll see that it has addresses to the right of it. The address of the instruction right after the call instruction is actually 320.
Here's what the stack would look like after the call instruction:

    | ... | = ...
    | --- | = 076
    | 320 | = 080 <- esp (old eip = 320)
    |  1  | = 084
    |  2  | = 088
    |  3  | = 092
    |  4  | = 096
    | XXX | = 100
    | ... | = ...

Like I said before, the stack works in frames. Every function has its own stack frame, and ebp points to the base of that frame. This might be vague and ambiguous, but you will understand if you read just a tiny bit further.
The calling conventions this guide applies to also dictate what the first few instructions are in the function that you call. Those instructions serve to preserve the old stack frame(The caller's stackframe, or in our example, main's stackframe), and set up a new one. So this is what the start of function looks like:

    push ebp
    mov ebp, esp

By now, you should be able to deduct that after ebp has been pushed, using our example stack, it would be stored at address 076, which means that esp points to 076. 
So, after the "mob ebp, esp" instruction, both esp and ebp point the same place.
I promise 'stackframe' will make sense to you soon, but here's another ASCII masterpiece to remove all doubt:

    | ... | = ...
    | --- | = 072
    | EBP | = 076 <- esp, ebp (old ebp)
    | 320 | = 080
    |  1  | = 084
    | ... | = ...

Let's take a look at the rest of "function":

    push ebp
    mov ebp, esp
    sub esp, 0x14
    leave
    ret

The leave instruction is actually part of the 'epilog', which again is just a fancty term for 'doing shit before the function ends'. Leave just restores the old stack frame, and is in fact exactly equivalent to:

    mov esp, ebp
    pop ebp

Which is the opposite of the prolog, as you can see.
From going through the code, step-by-step, we know that the only thing left to do is actually allocate memory for the local variables, but the only thing that is done is subtracting 0x14(20) bytes esp?
That's perfectly valid.
As you now know, esp points to the top of the stack. When that instruction is called, it will be pointing at the old ebp that had been pushed to the stack(076 in our chart above). 
If we subtract 20 bytes from it, we will have reserved room for 20 bytes for our variables. Our buffer is 16 bytes, and our integer is 4 bytes. That makes 20 bytes.
Doing it all at once, or in two subtracts is really irrelevant. The compiler knows that the integer is at a certain offset, and the buffer is at another offset.

This is actually where ebp comes in, and the term 'stackframe' is explained. To facilitate the explanation, I will write out the whole stack.

    | ... | = ...
    | --- | = 052
    |     | = 056 <- ebp-20, esp, buffer
    |     | = 060
    |     | = 064
    |     | = 068
    |     | = 072 <- ebp-4, integer
    | EBP | = 076 <- ebp - base of stack frame
    | 320 | = 080 (old eip = 320)
    |  1  | = 084 <- ebp+8
    |  2  | = 088 <- ebp+12
    |  3  | = 092 <- ebp+16
    |  4  | = 096 <- ebp+20
    | XXX | = 100
    | ... | = ...

Can you understand why ebp is called the basepointer, now? And why the whole shebang is called a stackframe?
Througout the whole function, ebp with various offsets is used to reference both the local variables and function parameters. 

Out of curiosity, we can try compiling our code to check out our theory. It will explain a couple of things that you might encounter doing disassembly on your own, that you may or may not understand immediately.

Let's start by compiling our code:

    $ gcc foo.c -o foo -fno-stack-protector

The flag, '-fno-stack-protector' is there to tell the compiler to not use any stack protection mechanisms. If it is enabled, you will see instructions whose purpose might not be obvious, along with calls to functions that check the stack, and make sure that a buffer overflow hasn't occurred. 

Now, let's open it up in gdb:

    $ gdb foo
    -- Version, gpl propaganda from gdb --
    (gdb) set disassembly-flavor intel

.. because real men use intel syntax ..

    (gdb) disas main
    0x080483bc <main+0>:	push   ebp
    0x080483bd <main+1>:	mov    ebp,esp
    0x080483bf <main+3>:	sub    esp,0x10
    0x080483c2 <main+6>:	mov    DWORD PTR [esp+0xc],0x4
    0x080483ca <main+14>:	mov    DWORD PTR [esp+0x8],0x3
    0x080483d2 <main+22>:	mov    DWORD PTR [esp+0x4],0x2
    0x080483da <main+30>:	mov    DWORD PTR [esp],0x1
    0x080483e1 <main+37>:	call   0x80483b4 <function>
    0x080483e6 <main+42>:	leave  
    0x080483e7 <main+43>:	ret

The addresses to the left probably won't match yours, but the code should look almost exactly like this.
What you might notice is that instead of using push to put data on the stack, the data is actually being mov'd onto it.
Notice that first, 0x10(16) is subtracted from esp. 4 ints take 16 bytes. Everything good so far. As arguments should be pushed from right to left, 4 should be in the highest address. In fact, if we didn't subtract 16 from esp, 4 would be put at "esp-4", if you look at esp before it is pushed on the stack. 
What is 0xc-0x10? 12-16 = -4. In effect, 4 gets put at esp-4, like it should.

The point of all this is that it doesn't matter whether we use the push instruction to push it to the stack, or do it manually with sub and mov.

Let's take a look at function:

    (gdb) disas function
    0x080483b4 <function+0>:	push   ebp
    0x080483b5 <function+1>:	mov    ebp,esp
    0x080483b7 <function+3>:	sub    esp,0x20
    0x080483ba <function+6>:	leave  
    0x080483bb <function+7>:	ret

That is completely and absolutely identical to what I displayed before(And that was merely a prediction), except for one minor detail. Instead of exactly 16+4 bytes being allocated, we are infact allocating 32 bytes. 
This is, again, due to byte alignment boundaries. In reality, buffer probably has about 28 bytes allocated for it, but you should still operate as if it just has 16, because it is implementation-dependent how it works out the boundaries.

Using static disassembly, we have been able to confirm that stuff works like I said it would, but there is one thing that happens that we can't see happen without running the program - eip being put on the stack. 

We know that it should return to the instruction after "call function" in main, which is at the address 0x080483e6 (For me). 
We also know that esp should point at it right after call is executed. Let's put a breakpoint at the very start of "function"

    (gdb) break *function
    (gdb) run
    Starting program: /home/icedane/leethaxoringsecrets/foo
    Breakpoint 1, 0x080483b4 in function ()
    (gdb) x $esp
    0xbffff5b4:	0x080483e6

Lo and behold. =)

So kids, that's the way the cookie crumbles. If you are able to fully understand this guide, you should easily be able to apply that knowledge to exploiting buffer overflow vulnerabilities. Those, and many other types of exploits, are all about understanding the stack.
Any further questions regarding the subject are welcomed on our boards, at www.hackerthreads.org. 

Yours truly, 

IceDane, Elitist Hacker

horze
Hacker in Training
Posts: 53
Joined: Wed Aug 26, 2009 8:33 am

Re: How the stack works

Post by horze » Wed Dec 09, 2009 1:03 am

Good tutorial :)

User avatar
infinite_
Bat Country
Posts: 1353
Joined: Fri Jun 04, 2004 7:19 pm
Location: Australia

Re: How the stack works

Post by infinite_ » Tue Dec 15, 2009 1:43 pm

No matter how many times I read about pointers, buffer overflows or memory handling, it's always good to have a dedicated explanation of how the stack works. Understanding the stack is crucial to being able to understand the three items in the previous sentence, yet it's amazing how many people don't fully understand how the stack works.
My effort to help you will never exceed your effort to explain the problem.

cid_kagenou
n00b
Posts: 1
Joined: Fri Jan 19, 2024 8:27 am

Re: How the stack works

Post by cid_kagenou » Fri Jan 19, 2024 10:19 am

If I'm understanding it correctly, the stack memory is a dedicated system on your cpu,
therefore there are cpu instructions on your compiled code specifically to work with the stack, for speed and efficiency I guess,
but there is a limit of how much stack memory it can take, you can see it with `ulimit -s`, a common one is 8MiB,
I think this limit is because the stack memory is automatically handled and the stack can overflow without you being aware (stackoverflow.com moment), that is, when the stack memory reaches the limit unexpectedly due to a bug of excessive memory usage,
the counterparty of the stack is the heap, unlike the stack, it enforces you to manage it manually,
requesting the memory, deleting it, etc. the heap doesn't have a limit but it is less likely to have some unexpected overload on your memory,
still the memory can leak if you don't delete it, each thread has its own private stack,
but the heap is shared. the stack is faster due to dedicated cpu instructions that manage it automatically,
doesn't has memory leaks, and is simpler to use, so you should keep using it whenever is possible.
memory is divided on "pages", the size can vary but a common one is 4KiB,
you can get it with `getconf PAGESIZE`, whenever your program wants memory,
it must take more pages, but 4KiB (4096 bytes) is a lot considering that an int is only around a nibble (4 bits, half of a byte, exact int size is compiler-dependent),
that means that when you're allocating you're using a preallocated page until
it is filled and then your program request another one

Post Reply