Got "input output error" when execute any commands
I found my server can't run any command, and it shouws "input output error"
The error code EIO
("Input/output error") on command launch would happen when your filesystem is damaged; or worse, when you are running on a faulty storage.
Cross your fingers; either way, be aware that at this point you should NOT try to power on the server unless really necessary.1
The Test
There is one sure-fire way to distinguish between two root causes: conduct block-level read scan on the system, and watch out for kernel messages.
- Boot your system with GNU/Linux recovery boot disk.
- Change the system to the plain old text console (press Ctrl+Alt+F1); don't use graphical terminal for this.
- Login as root.
- Run
dmesg -E
to enable live kernel message display on the console. - Run
dmesg -n debug
to let low-level kernel message though. - Run
blkid
to see which disk contains system partition. (Note thatblkid
will list partitions; strip number off the end of partition path and you will get the disk) - Run
time -p dd if=/dev/sda of=/dev/null bs=4M
to conduct an entire-disk read test (please type this carefully). If your system disk is not/dev/sda
, substitute accordingly. - Watch the screen (it will take a long while)...
Results
In the best case where
dd
completed successfully and uneventfully, then it is likely a filesystem problem.- If you are comfortable doing filesystem check from boot disk, you can do it now (recommended).
- If you would rather let the system sort it by itself, reboot (also remove the boot disk), and boot your usual system but with
fsck.mode=force
appended to the end of kernel command line. (See this question for details) - Discussing the result of filesystem check will warrant a different question though.
However, in the worst case, you would see kernel messages like this spewing on the screen:
ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 ata2.00: irq_stat 0x40000001 ata2.00: failed command: READ DMA EXT ata2.00: cmd 25/00:08:78:15:c5/00:00:6c:00:00/e0 tag 0 dma 4096 in res 51/40:00:78:15:c5/00:00:6c:00:00/e0 Emask 0x9 (media error) ata2.00: status: { DRDY ERR } ata2.00: error: { UNC } ata2.00: configured for UDMA/100 sd 1:0:0:0: [sda] Unhandled sense code sd 1:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE sd 1:0:0:0: [sda] Sense Key : Medium Error [current] [descriptor] Descriptor sense data with sense descriptors (in hex): 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 6c c5 15 78 sd 1:0:0:0: [sda] Add. Sense: Unrecovered read error - auto reallocate failed sd 1:0:0:0: [sda] CDB: Read(10): 28 00 6c c5 15 78 00 00 08 00 end_request: I/O error, dev sda, sector 1824855416 Buffer I/O error on device sda, logical block 228106927 ata2: EH complete
Look for the key parts:
DRDY
,ERR
andUNC
in bracesMedium Error
statusUnrecovered read error
sense message
If you glanced and find these in the messages (even once), they show that you are facing physical disk error.
When this is the case, don't let
dd
finish, press Ctrl+C to stop, NOW; shut down your system, and bring your disk to a data recovery shop you trust.If you did not find the above worst-case telltales, and rather found this kind of kernel messages repeated:
ata2: exception Emask 0x10 SAct 0x0 SErr 0x4040000 action 0xe frozen ata2: irq_stat 0x00000040, connection status changed ata2: SError: { CommWake DevExch } ata2: hard resetting link ata2: link is slow to respond, please be patient (ready=0)
Key parts:
hard resetting link
link is slow to respond
Then you are rather facing SATA link problem (e.g. bad cabling): press Ctrl+C to stop, shut down your system, fix your disk cable and connection, and try again.
Side Notes
And I made a smartctl test to confirm if there is any promblem with hard disk. And it passed without error.
Beware that some hard disks tell straight lies in their S.M.A.R.T status (I'm looking at you, Toshiba); my previous laptop hard disk just ground to halt when reading, spewing read errors, and it still said "nothing's wrong" in its status registers.
If your server is mission-critical, then you should consider RAID-based setup.
1 Cautionary tale: My housemate once ignored this warning, and keep filesystem checker grinding on his desktop system anyway. He didn't wait for me to check it up until it eventually failed to boot. Once I got a chance to check it, the disk damage had been already beyond recover (the 500 GB disk could only barely read at snail-pace KB/s, and there was no significant continuous readable area found even after several days).
On the other hand, in another case with the same symptom, the machine owner heeded my warning and left the thing off until I could check it. Of course, it was a hard disk failure. After half a day of GNU DDRescue session and one new hard disk, I brought a good news to him that his system and data was 100% recovered at block level- i.e. all files intact, and ready to boot again without any modification.