How do I debug an MPI program?
As someone else said, TotalView is the standard for this. But it will cost you an arm and a leg.
The OpenMPI site has a great FAQ on MPI debugging. Item #6 in the FAQ describes how to attach GDB to MPI processes. Read the whole thing, there are some great tips.
If you find that you have far too many processes to keep track of, though, check out Stack Trace Analysis Tool (STAT). We use this at Livermore to collect stack traces from potentially hundreds of thousands of running processes and to represent them intelligently to users. It's not a full-featured debugger (a full-featured debugger would never scale to 208k cores), but it will tell you which groups of processes are doing the same thing. You can then step through a representative from each group in a standard debugger.
Many of the posts here are about GDB, but don't mention how to attach to a process from startup. Obviously, you can attach to all processes:
mpiexec -n X gdb ./a.out
But that is wildly ineffective since you'll have to bounce around to start up all of your processes. If you just want to debug one (or a small number of) MPI process, you can add that as a separate executable on the command line using the :
operator:
mpiexec -n 1 gdb ./a.out : -n X-1 ./a.out
Now only one of your processes will get GDB.
I have found gdb quite useful. I use it as
mpirun -np <NP> xterm -e gdb ./program
This the launches xterm windows in which I can do
run <arg1> <arg2> ... <argN>
usually works fine
You can also package these commands together using:
mpirun -n <NP> xterm -hold -e gdb -ex run --args ./program [arg1] [arg2] [...]