How do I convert LF to CRLF?
You can use the replace method of strings. Like
txt.replace('\n', '\r\n')
EDIT :
in your case :
with open('input.txt') as inp, open('output.txt', 'w') as out:
txt = inp.read()
txt = txt.replace('\n', '\r\n')
out.write(txt)
Instead of converting, you should be able to just open the file using Python's universal newline support:
f = open('words.txt', 'rU')
(Note the U
.)
You don't need to convert the line endings in the files in order to be able to iterate over them. As suggested by NPE, simply use python's universal newlines mode.
The UnicodeDecodeError happens because the files you are processing are encoded as UTF-8 and when you attempt to decode the contents from bytes to a string, via str(line)
, Python is using the cp1252
encoding to convert the bytes read from the file into a Python 3 string (i.e. a sequence of unicode code points). However, there are bytes in those files that cannot be decoded with the cp1252
encoding and that causes a UnicodeDecodeError.
If you change str(line)
to line.decode('utf-8')
you should no longer get the UnicodeDecodeError. Check out the Text Vs. Data Instead of Unicode Vs. 8-bit writeup for some more details.
Finally, you might also find The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky useful.