How to monitor RAM ECC errors on Ivy Bridge Xeon E3 processor in Linux?
Since version 3.17 of the Linux kernel, ECC errors on E3 Xeons can be monitored using the ie31200_edac
driver, introduced by this commit. This uses the standard EDAC interface so errors can be listed using edac-util
.
In a little more detail:
sudo modprobe ie31200-edac
loads the driver, which will result in lines like
[ 14.635299] EDAC MC: Ver: 3.0.0
[ 14.637898] EDAC MC0: Giving out device to module ie31200_edac controller IE31200: DEV 0000:00:00.0 (POLLED)
appearing in the kernel log (that’s on a C216 Haswell system); then
edac-util
will report any errors.
Xeon D, E5 and E7 memory controllers are supported using the sb_edac
or skx_edac
modules.