What's wrong with XOR encryption?
The problem with XOR encryption is that for long runs of the same characters, it is very easy to see the password. Such long runs are most commonly spaces in text files. Say your password is 8 chars, and the text file has 16 spaces in some line (for example, in the middle of ASCII-graphics table). If you just XOR that with your password, you'll see that output will have repeating sequences of characters. The attacker would just look for any such, try to guess the character in the original file (space would be the first candidate to try), and derive the length of the password from length of repeating groups.
Binary files can be even worse as they often contain repeating sequences of 0x00
bytes. Obviously, XORing with those is no-op, so your password will be visible in plain text in the output! An example of a very common binary format that has long sequences of nulls is .doc
.
I concur with Pavel Minaev's explanation of XOR's weaknesses. For those who are interested, here's a basic overview of the standard algorithm used to break the trivial XOR encryption in a few minutes:
Determine how long the key is. This is done by XORing the encrypted data with itself shifted various numbers of places, and examining how many bytes are the same.
If the bytes that are equal are greater than a certain percentage (6% according to Bruce Schneier's Applied Cryptography second edition), then you have shifted the data by a multiple of the keylength. By finding the smallest amount of shifting that results in a large amount of equal bytes, you find the keylength.
Shift the cipher text by the keylength, and XOR against itself. This removes the key and leaves you with the plaintext XORed with the plaintext shifted the length of the key. There should be enough plaintext to determine the message content.
Read more at Encryption Matters, Part 1
XOR encryption can be reasonably* strong if the following conditions are met:
- The plain text and the password are about the same length.
- The password is not reused for encrypting more than one message.
- The password cannot be guessed, IE by dictionary or other mathematical means. In practice this means the bits are randomized.
*Reasonably strong meaning it cannot be broken by trivial, mathematical means, as in GeneQ's post. It is still no stronger than your password.