How to convert CRLF to LF on a Windows machine in Python
Convert line endings in-place (with Python 3)
Line endings:
- Windows -
\r\n
, calledCRLF
- Linux/Unix/MacOS -
\n
, calledLF
Windows to Linux/Unix/MacOS (CRLF
➡ LF
)
Here is a short Python script for directly converting Windows line endings to Linux/Unix/MacOS line endings. The script works in-place, i.e., without creating an extra output file.
# replacement strings
WINDOWS_LINE_ENDING = b'\r\n'
UNIX_LINE_ENDING = b'\n'
# relative or absolute file path, e.g.:
file_path = r"c:\Users\Username\Desktop\file.txt"
with open(file_path, 'rb') as open_file:
content = open_file.read()
# Windows ➡ Unix
content = content.replace(WINDOWS_LINE_ENDING, UNIX_LINE_ENDING)
# Unix ➡ Windows
# content = content.replace(UNIX_LINE_ENDING, WINDOWS_LINE_ENDING)
with open(file_path, 'wb') as open_file:
open_file.write(content)
Linux/Unix/MacOS to Windows (LF
➡ CRLF
)
To change the converting from Linux/Unix/MacOS to Windows, simply comment the replacement for Unix ➡ Windows
back in (remove the #
in front of the line).
DO NOT comment out the command for the Windows ➡ Unix
replacement, as it ensures a correct conversion. When converting from LR
to CRLF
, it is important that there are no CRLF
line endings already present in the file. Otherwise, those lines would be converted to CRCRLF
. Converting lines from CRLF
to LF
first and then doing the aspired conversion from LF
to CRLF
will avoid this issue (thanks @neuralmer for pointing that out).
Code Explanation
Binary Mode
Important: We need to make sure that we open the file both times in binary mode (mode='rb'
and mode='wb'
) for the conversion to work.
When opening files in text mode (mode='r'
or mode='w'
without b
), the platform's native line endings (\r\n
on Windows and \r
on old Mac OS versions) are automatically converted to Python's Unix-style line endings: \n
. So the call to content.replace()
couldn't find any \r\n
line endings to replace.
In binary mode, no such conversion is done. Therefore the call to str.replace()
can do its work.
Binary Strings
In Python 3, if not declared otherwise, strings are stored as Unicode (UTF-8
). But we open our files in binary mode - therefore we need to add b
in front of our replacement strings to tell Python to handle those strings as binary, too.
Raw Strings
On Windows the path separator is a backslash \
which we would need to escape in a normal Python string with \\
. By adding r
in front of the string we create a so called "raw string" which doesn't need any escaping. So you can directly copy/paste the path from Windows Explorer into your script.
(Hint: Inside Windows Explorer press CTRL+L to automatically select the path from the address bar.)
Alternative solution
We open the file twice to avoid the need of repositioning the file pointer. We could also have opened the file once with mode='rb+'
but then we would have needed to move the pointer back to start after reading its content (open_file.seek(0)
) and truncate its original content before writing the new one (open_file.truncate(0)
).
Simply opening the file again in write mode does that automatically for us.
Cheers and happy programming,
winklerrr
Python 3:
The default newline type for open
is universal, in which case it doesn't mind which sort of newline each line has.
You can also request a specific form of newline with the newline
argument for open
.
Translating from one form to the other is thus rather simple in Python:
with open('filename.in', 'r') as infile, \
open('filename.out', 'w', newline='\n') as outfile:
outfile.writelines(infile.readlines())
Python 2:
The open
function supports universal newlines via the 'rU'
mode.
Again, translating from one form to the other:
with open('filename.in', 'rU') as infile, \
open('filename.out', 'w', newline='\n') as outfile:
outfile.writelines(infile.readlines())
(In Python 3, mode U is actually deprecated; the equivalent form is newline=None
, which is the default)