safe C programming
First of all, is it safe for me to run programs like this?
Your example: No. Absolutely not. Why would you even try? What are you expecting it to do? More generic examples using negative indexes - as long as they dereference into legal memory then it's ok.
Also, I had expected at least some kind of segfault error like I have encountered in the past, but quietly ignoring an error like this really scares me. How come this program doesn't segfault on me?
Blind luck. (actually not excatly - as nicely explained by Ira Baxter)
And finally, out of curiosity (this might be the silliest question), is there a method to the madness?
If you setup pointers to stuff inside arrays, then negative indexes may work, but they would be a nightmare for others to understand and maintain! - I've seen it done in embedded systems.
Can I expect all ANSI C compilers to work this way?
Yep.
How about gcc on different platforms?
Yep
Is the layout of memory well defined that it is exploitable (perhaps if you were out to write cross-platform obfuscated code)?
Yep - but I'm not sure if you would really want to rely on it.
You may be interested in INRIA's CompCert C, a formally, mathematically verifiable and verified implementation of the C language. It's the same authors as the famous Coq proof assistant. There is also another variante Verifiable C.
I don't know much about it, but I know that plane engineers in France use it to program the upcoming embedded computers in planes, so at least in France it's an officially accepted language for critical systems programming.
Lastly, note that a formally verifiable language is different from a safe language.
For example, MISRA C is said to be a safe C language (although this is debated), and there are also Safe-C, Microsoft's Checked-C and Cyclone, along with safe libraries without changing the compiler such as Safe C Library and libsrt, or just using the standard compiler and libraries but with a sourcecode analyzer such as frama-c.
But although safe languages provide fixes to some issues like buffer overflows but no guarantee of consistent logic flow as is needed for critical systems. For example, CompCert C should always produce the same set of Assembler instructions for the same C instructions. Formally verifiable language such as CompCert C and Ada provide such formal guarantees.
You may also be interested in these articles:
- What's the Difference Between Sound and Unsound Static Analysis?
- A Guide to Undefined Behavior in C and C++, Part 1
- https://github.com/stanislaw/awesome-safety-critical
The C language defines the behavior of certain programs as "undefined". They can do anything. We'll call such programs erroneous.
One of them is a program that accesses outside the declared/allocated bounds of an array, which your program very carefully does.
You program is erroneous; the thing your erroneous program happens to do is what you see :-} It could "overwrite the OS"; as a practical matter, most modern OSes prevent you from doing that, but you can overwrite critical values in your process space, and your process could crash, die or hang.
The simple response is, "don't write erroneous programs". Then the behavior you see will make "C" sense.
In this particular case, with your particular compiler, the array indexing "sort of" works: you index outside the array and it picks up some value. The space allocated to m is in the stack frame; m[0] is at some location in the stack frame and so is "m[-1]" based on machine arithmetic combining the array address and the index, so a segfault does not occur and a memory location is accessed. This lets the compiled program read and write that memory location ... as an erroneous program. Basically, compiled C programs don't check to see if your array access is out of bounds.
Our CheckPointer tool when applied to this program will tell you the array index is illegal at execution time. So, you can either eyeball the program yourself to see if you've made a mistake, or let CheckPointer tell you when you make a mistake. I strongly suggest you do the eyeballing in any case.