How does a "stack overflow" occur and how do you prevent it?
A stack overflow occurs when Jeff and Joel want to give the world a better place to get answers to technical questions. It's too late to prevent this stack overflow. That "other site" could have prevented it by not being scuzzy. ;)
A stack overflow in real code occurs very rarely. Most situations in which it occurs are recursions where the termination has been forgotten. It might however rarely occur in highly nested structures, e.g. particularly large XML documents. The only real help here is to refactor the code to use an explicit stack object instead of the call stack.
Most people will tell you that a stack overflow occurs with recursion without an exit path - while mostly true, if you work with big enough data structures, even a proper recursion exit path won't help you.
Some options in this case:
- Breadth-first search
- Tail recursion, .Net-specific great blog post (sorry, 32-bit .Net)
Stack
A stack, in this context, is the last in, first out buffer you place data while your program runs. Last in, first out (LIFO) means that the last thing you put in is always the first thing you get back out - if you push 2 items on the stack, 'A' and then 'B', then the first thing you pop off the stack will be 'B', and the next thing is 'A'.
When you call a function in your code, the next instruction after the function call is stored on the stack, and any storage space that might be overwritten by the function call. The function you call might use up more stack for its own local variables. When it's done, it frees up the local variable stack space it used, then returns to the previous function.
Stack overflow
A stack overflow is when you've used up more memory for the stack than your program was supposed to use. In embedded systems you might only have 256 bytes for the stack, and if each function takes up 32 bytes then you can only have function calls 8 deep - function 1 calls function 2 who calls function 3 who calls function 4 .... who calls function 8 who calls function 9, but function 9 overwrites memory outside the stack. This might overwrite memory, code, etc.
Many programmers make this mistake by calling function A that then calls function B, that then calls function C, that then calls function A. It might work most of the time, but just once the wrong input will cause it to go in that circle forever until the computer recognizes that the stack is overblown.
Recursive functions are also a cause for this, but if you're writing recursively (ie, your function calls itself) then you need to be aware of this and use static/global variables to prevent infinite recursion.
Generally, the OS and the programming language you're using manage the stack, and it's out of your hands. You should look at your call graph (a tree structure that shows from your main what each function calls) to see how deep your function calls go, and to detect cycles and recursion that are not intended. Intentional cycles and recursion need to be artificially checked to error out if they call each other too many times.
Beyond good programming practices, static and dynamic testing, there's not much you can do on these high level systems.
Embedded systems
In the embedded world, especially in high reliability code (automotive, aircraft, space) you do extensive code reviews and checking, but you also do the following:
- Disallow recursion and cycles - enforced by policy and testing
- Keep code and stack far apart (code in flash, stack in RAM, and never the twain shall meet)
- Place guard bands around the stack - empty area of memory that you fill with a magic number (usually a software interrupt instruction, but there are many options here), and hundreds or thousands of times a second you look at the guard bands to make sure they haven't been overwritten.
- Use memory protection (ie, no execute on the stack, no read or write just outside the stack)
- Interrupts don't call secondary functions - they set flags, copy data, and let the application take care of processing it (otherwise you might get 8 deep in your function call tree, have an interrupt, and then go out another few functions inside the interrupt, causing the blowout). You have several call trees - one for the main processes, and one for each interrupt. If your interrupts can interrupt each other... well, there be dragons...
High-level languages and systems
But in high level languages run on operating systems:
- Reduce your local variable storage (local variables are stored on the stack - although compilers are pretty smart about this and will sometimes put big locals on the heap if your call tree is shallow)
- Avoid or strictly limit recursion
- Don't break your programs up too far into smaller and smaller functions - even without counting local variables each function call consumes as much as 64 bytes on the stack (32 bit processor, saving half the CPU registers, flags, etc)
- Keep your call tree shallow (similar to the above statement)
Web servers
It depends on the 'sandbox' you have whether you can control or even see the stack. Chances are good you can treat web servers as you would any other high level language and operating system - it's largely out of your hands, but check the language and server stack you're using. It is possible to blow the stack on your SQL server, for instance.
-Adam