How can I know if a text file ends with carriage return or not?
After reading the file through ReadLine()
, you can seek back to two characters before the end of the file and compare those characters to CR-LF:
string s;
using (StreamReader sr = new StreamReader(@"C:\Users\User1\Desktop\a.txt", encoding: System.Text.Encoding.UTF8))
{
while (!sr.EndOfStream)
{
s = sr.ReadLine();
//process the line we read...
}
//if (sr.BaseStream.Length >= 2) { //ensure file is not so small
//back 2 bytes from end of file
sr.BaseStream.Seek(-2, SeekOrigin.End);
int s1 = sr.Read(); //read the char before last
int s2 = sr.Read(); //read the last char
if (s2 == 10) //file is end with CR-LF or LF ... (CR=13, LF=10)
{
if (s1 == 13) { } //file is end with CR-LF (Windows EOL format)
else { } //file is end with just LF, (UNIX/OSX format)
}
}
So you're processing a text file, meaning you need to read all text, and want to preserve any newline characters, even at the end of the file.
You've correctly concluded that ReadLine()
eats those, even if the file doesn't end with one. In fact, ReadLine()
eats the last carriage return when a file ends with a one (StreamReader.EndOfStream
is true
after reading the penultimate line). ReadAllText()
also eats the last newline. Given you're potentially dealing with large files, you also don't want to read the entire file in memory at once.
You also can't just compare the last two bytes of the file, because there are encodings that use more than one byte to encode a character, such as UTF-16. So you'll need to read the file being encoding-aware. A StreamReader does just that.
So a solution would be to create your own version of ReadLine()
, which includes the newline character(s) at the end:
public static class StreamReaderExtensions
{
public static string ReadLineWithNewLine(this StreamReader reader)
{
var builder = new StringBuilder();
while (!reader.EndOfStream)
{
int c = reader.Read();
builder.Append((char) c);
if (c == 10)
{
break;
}
}
return builder.ToString();
}
}
Then you can check the last returned line whether it ends in \n
:
string line = "";
using (var stream = new StreamReader(@"D:\Temp\NewlineAtEnd.txt"))
{
while (!stream.EndOfStream)
{
line = stream.ReadLineWithNewLine();
Console.Write(line);
}
}
Console.WriteLine();
if (line.EndsWith("\n"))
{
Console.WriteLine("Newline at end of file");
}
else
{
Console.WriteLine("No newline at end of file");
}
Though the StreamReader
is heavily optimized, I can't vouch for the performance of reading one character at a time. A quick test using two equal 100 MB text files showed a quite drastic slowdown compared to ReadLine()
(~1800 vs ~400 ms).
This approach does preserve the original line endings though, meaning you can safely rewrite a file using strings returned by this extension method, without changing all \n
to \r\n
or vice versa.