Of Memory Management, Heap Corruption, and C++

These are relatively cheap mechanisms for possibly solving the problem:

  1. Keep an eye on my heap corruption question - I'm updating with the answers as they shake out. The first was balancing new[] and delete[], but you're already doing that.
  2. Give valgrind more of a go; it's an excellent tool, and I only wish it was available under Windows. I only slows your program down by about half, which is pretty good compared to the Windows equivalents.
  3. Think about using the Google Performance Tools as a replacement malloc/new.
  4. Have you cleaned out all your object files and started over? Perhaps your make file is... "suboptimal"
  5. You're not assert()ing enough in your code. How do I know that without having seen it? Like flossing, no-one assert()s enough in their code. Add in a validation function for your objects and call that on method start and method end.
  6. Are you compiling -wall? If not, do so.
  7. Find yourself a lint tool like PC-Lint. A small app like yours might fit in the PC-lint demo page, meaning no purchase for you!
  8. Check you're NULLing out pointers after deleteing them. Nobody likes a dangling pointer. Same gig with declared but unallocated pointers.
  9. Stop using arrays. Use a vector instead.
  10. Don't use raw pointers. Use a smart pointer. Don't use auto_ptr! That thing is... surprising; its semantics are very odd. Instead, choose one of the Boost smart pointers, or something out of the Loki library.

Oh, if you want to know how to debug the problem, that's simple. First, get a dead chicken. Then, start shaking it.

Seriously, I haven't found a consistent way to track these kinds of bugs down. Because there's so many potential problems, there's not a simple checklist to go through. However, I would recommend the following:

  1. Get comfortable in a debugger.
  2. Start tromping around in the debugger to see if you can find anything that looks fishy. Check especially to see what's happening during the exampleString = hello; line.
  3. Check to make sure it's actually crashing on the exampleString = hello; line, and not when exiting some enclosing block (which could cause destructors to fire).
  4. Check any pointer magic you might be doing. Pointer arithmetic, casting, etc.
  5. Check all of your allocations and deallocations to make sure they are matched (no double-deallocations).
  6. Make sure you aren't returning any references or pointers to objects on the stack.

There are lots of other things to try, too. I'm sure some other people will chime in with ideas as well.


We once had a bug which eluded all of the regular techniques, valgrind, purify etc. The crash only ever happened on machines with lots of memory and only on large input data sets.

Eventually we tracked it down using debugger watch points. I'll try to describe the procedure here:

1) Find the cause of the failure. It looks from your example code, that the memory for "exampleString" is being corrupted, and so cannot be written to. Let's continue with this assumption.

2) Set a breakpoint at the last known location that "exampleString" is used or modified without any problem.

3) Add a watch point to the data member of 'exampleString'. With my version of g++, the string is stored in _M_dataplus._M_p. We want to know when this data member changes. The GDB technique for this is:

(gdb) p &exampleString._M_dataplus._M_p
$3 = (char **) 0xbfccc2d8
(gdb)  watch *$3
Hardware watchpoint 1: *$3

I'm obviously using linux with g++ and gdb here, but I believe that memory watch points are available with most debuggers.

4) Continue until the watch point is triggered:

Continuing.
Hardware watchpoint 2: *$3

Old value = 0xb7ec2604 ""
New value = 0x804a014 ""
0xb7e70a1c in std::string::_M_mutate () from /usr/lib/libstdc++.so.6
(gdb) where

The gdb where command will give a back trace showing what resulted in the modification. This is either a perfectly legal modification, in which case just continue - or if you're lucky it will be the modification due to the memory corruption. In the latter case, you should now be able to review the code that is really causing the problem and hopefully fix it.

The cause of our bug was an array access with a negative index. The index was the result of a cast of a pointer to an 'int' modulos the size of the array. The bug was missed by valgrind et al. as the memory addresses allocated when running under those tools was never "> MAX_INT" and so never resulted in a negative index.