Munin vs Nagios
Solution 1:
Munin and Nagios are really different tools.
From the official Munin website:
Munin is a networked resource monitoring tool that can help analyze resource trends and "what just happened to kill our performance?" problems. It is designed to be very plug and play. A default installation provides a lot of graphs with almost no work.
Nagios is a monitoring (alerting) tool. Munin could be considered a replacement for Cacti.
We use both of them: Nagios and Munin.
- Nagios tell us in real time if something is wrong: like web server down, database load average, etc.
- Using Munin you can see the trends and the history about why that happenend.
Solution 2:
Munin definitely works best in parallel with Nagios. It can also tie into it, sending notifications of thresholds being exceeding into the Nagios notification system. The reason we use it is because it is virtually trivial to set up new monitors. Nagios requires a little bit more effort.
Note also though that PNP4Nagios gives graphing capabilities to Nagios - most plugins will report performance data, and then PNP4Nagios will store that info in RRD databases and display it as graphs in the Nagios interface. We use it in addition to Munin, as it gives graphs of network services (munin's main strength is monitoring the local box).
One final note - we also use Cacti, as it is the most useful tool for graphing switch and router ports via SNMP. We have <10 devices monitored by it. It's too much of a pain to set it up to manage actual servers - munin and Nagios/NRPE are much easier to manage than SNMP agents.