Got "input output error" when execute any commands

I found my server can't run any command, and it shouws "input output error"

The error code EIO ("Input/output error") on command launch would happen when your filesystem is damaged; or worse, when you are running on a faulty storage.

Cross your fingers; either way, be aware that at this point you should NOT try to power on the server unless really necessary.1

The Test

There is one sure-fire way to distinguish between two root causes: conduct block-level read scan on the system, and watch out for kernel messages.

  1. Boot your system with GNU/Linux recovery boot disk.
  2. Change the system to the plain old text console (press Ctrl+Alt+F1); don't use graphical terminal for this.
  3. Login as root.
  4. Run dmesg -E to enable live kernel message display on the console.
  5. Run dmesg -n debug to let low-level kernel message though.
  6. Run blkid to see which disk contains system partition. (Note that blkid will list partitions; strip number off the end of partition path and you will get the disk)
  7. Run time -p dd if=/dev/sda of=/dev/null bs=4M to conduct an entire-disk read test (please type this carefully). If your system disk is not /dev/sda, substitute accordingly.
  8. Watch the screen (it will take a long while)...

Results

  • In the best case where dd completed successfully and uneventfully, then it is likely a filesystem problem.

    • If you are comfortable doing filesystem check from boot disk, you can do it now (recommended).
    • If you would rather let the system sort it by itself, reboot (also remove the boot disk), and boot your usual system but with fsck.mode=force appended to the end of kernel command line. (See this question for details)
    • Discussing the result of filesystem check will warrant a different question though.
  • However, in the worst case, you would see kernel messages like this spewing on the screen:

    ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
    ata2.00: irq_stat 0x40000001
    ata2.00: failed command: READ DMA EXT
    ata2.00: cmd 25/00:08:78:15:c5/00:00:6c:00:00/e0 tag 0 dma 4096 in
             res 51/40:00:78:15:c5/00:00:6c:00:00/e0 Emask 0x9 (media error)
    ata2.00: status: { DRDY ERR }
    ata2.00: error: { UNC }
    ata2.00: configured for UDMA/100
    sd 1:0:0:0: [sda] Unhandled sense code
    sd 1:0:0:0: [sda]  
    Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
    sd 1:0:0:0: [sda]  
    Sense Key : Medium Error [current] [descriptor]
    Descriptor sense data with sense descriptors (in hex):
            72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 
            6c c5 15 78 
    sd 1:0:0:0: [sda]  
    Add. Sense: Unrecovered read error - auto reallocate failed
    sd 1:0:0:0: [sda] CDB: 
    Read(10): 28 00 6c c5 15 78 00 00 08 00
    end_request: I/O error, dev sda, sector 1824855416
    Buffer I/O error on device sda, logical block 228106927
    ata2: EH complete
    

    Look for the key parts:

    • DRDY, ERR and UNC in braces
    • Medium Error status
    • Unrecovered read error sense message

    If you glanced and find these in the messages (even once), they show that you are facing physical disk error.

    When this is the case, don't let dd finish, press Ctrl+C to stop, NOW; shut down your system, and bring your disk to a data recovery shop you trust.

  • If you did not find the above worst-case telltales, and rather found this kind of kernel messages repeated:

    ata2: exception Emask 0x10 SAct 0x0 SErr 0x4040000 action 0xe frozen
    ata2: irq_stat 0x00000040, connection status changed
    ata2: SError: { CommWake DevExch }
    ata2: hard resetting link
    ata2: link is slow to respond, please be patient (ready=0)
    

    Key parts:

    • hard resetting link
    • link is slow to respond

    Then you are rather facing SATA link problem (e.g. bad cabling): press Ctrl+C to stop, shut down your system, fix your disk cable and connection, and try again.

Side Notes

And I made a smartctl test to confirm if there is any promblem with hard disk. And it passed without error.

Beware that some hard disks tell straight lies in their S.M.A.R.T status (I'm looking at you, Toshiba); my previous laptop hard disk just ground to halt when reading, spewing read errors, and it still said "nothing's wrong" in its status registers.

If your server is mission-critical, then you should consider RAID-based setup.


  • 1 Cautionary tale: My housemate once ignored this warning, and keep filesystem checker grinding on his desktop system anyway. He didn't wait for me to check it up until it eventually failed to boot. Once I got a chance to check it, the disk damage had been already beyond recover (the 500 GB disk could only barely read at snail-pace KB/s, and there was no significant continuous readable area found even after several days).

    On the other hand, in another case with the same symptom, the machine owner heeded my warning and left the thing off until I could check it. Of course, it was a hard disk failure. After half a day of GNU DDRescue session and one new hard disk, I brought a good news to him that his system and data was 100% recovered at block level- i.e. all files intact, and ready to boot again without any modification.