Got "input output error" when execute any commands

I found my server can't run any command, and it shouws "input output error"

The error code EIO ("Input/output error") on command launch would happen when your filesystem is damaged; or worse, when you are running on a faulty storage.

Cross your fingers; either way, be aware that at this point you should NOT try to power on the server unless really necessary.¹

The Test

There is one sure-fire way to distinguish between two root causes: conduct block-level read scan on the system, and watch out for kernel messages.

Boot your system with GNU/Linux recovery boot disk.
Change the system to the plain old text console (press Ctrl+Alt+F1); don't use graphical terminal for this.
Login as root.
Run dmesg -E to enable live kernel message display on the console.
Run dmesg -n debug to let low-level kernel message though.
Run blkid to see which disk contains system partition. (Note that blkid will list partitions; strip number off the end of partition path and you will get the disk)
Run time -p dd if=/dev/sda of=/dev/null bs=4M to conduct an entire-disk read test (please type this carefully). If your system disk is not /dev/sda, substitute accordingly.
Watch the screen (it will take a long while)...

Results

In the best case where dd completed successfully and uneventfully, then it is likely a filesystem problem.
- If you are comfortable doing filesystem check from boot disk, you can do it now (recommended).
- If you would rather let the system sort it by itself, reboot (also remove the boot disk), and boot your usual system but with fsck.mode=force appended to the end of kernel command line. (See this question for details)
- Discussing the result of filesystem check will warrant a different question though.

However, in the worst case, you would see kernel messages like this spewing on the screen:

ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata2.00: irq_stat 0x40000001
ata2.00: failed command: READ DMA EXT
ata2.00: cmd 25/00:08:78:15:c5/00:00:6c:00:00/e0 tag 0 dma 4096 in
         res 51/40:00:78:15:c5/00:00:6c:00:00/e0 Emask 0x9 (media error)
ata2.00: status: { DRDY ERR }
ata2.00: error: { UNC }
ata2.00: configured for UDMA/100
sd 1:0:0:0: [sda] Unhandled sense code
sd 1:0:0:0: [sda]  
Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
sd 1:0:0:0: [sda]  
Sense Key : Medium Error [current] [descriptor]
Descriptor sense data with sense descriptors (in hex):
        72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 
        6c c5 15 78 
sd 1:0:0:0: [sda]  
Add. Sense: Unrecovered read error - auto reallocate failed
sd 1:0:0:0: [sda] CDB: 
Read(10): 28 00 6c c5 15 78 00 00 08 00
end_request: I/O error, dev sda, sector 1824855416
Buffer I/O error on device sda, logical block 228106927
ata2: EH complete

Look for the key parts:

DRDY, ERR and UNC in braces
Medium Error status
Unrecovered read error sense message

If you glanced and find these in the messages (even once), they show that you are facing physical disk error.

When this is the case, don't let dd finish, press Ctrl+C to stop, NOW; shut down your system, and bring your disk to a data recovery shop you trust.

If you did not find the above worst-case telltales, and rather found this kind of kernel messages repeated:
```
ata2: exception Emask 0x10 SAct 0x0 SErr 0x4040000 action 0xe frozen
ata2: irq_stat 0x00000040, connection status changed
ata2: SError: { CommWake DevExch }
ata2: hard resetting link
ata2: link is slow to respond, please be patient (ready=0)
```
Key parts:
- hard resetting link
- link is slow to respond
Then you are rather facing SATA link problem (e.g. bad cabling): press Ctrl+C to stop, shut down your system, fix your disk cable and connection, and try again.

Side Notes

And I made a smartctl test to confirm if there is any promblem with hard disk. And it passed without error.

Beware that some hard disks tell straight lies in their S.M.A.R.T status (I'm looking at you, Toshiba); my previous laptop hard disk just ground to halt when reading, spewing read errors, and it still said "nothing's wrong" in its status registers.

If your server is mission-critical, then you should consider RAID-based setup.

¹ Cautionary tale: My housemate once ignored this warning, and keep filesystem checker grinding on his desktop system anyway. He didn't wait for me to check it up until it eventually failed to boot. Once I got a chance to check it, the disk damage had been already beyond recover (the 500 GB disk could only barely read at snail-pace KB/s, and there was no significant continuous readable area found even after several days).

On the other hand, in another case with the same symptom, the machine owner heeded my warning and left the thing off until I could check it. Of course, it was a hard disk failure. After half a day of GNU DDRescue session and one new hard disk, I brought a good news to him that his system and data was 100% recovered at block level- i.e. all files intact, and ready to boot again without any modification.

Got "input output error" when execute any commands

The Test

Results

Side Notes

Tags:

Filesystems

Hardware

Debian

Hard Disk

Related

Recent Posts