C# - Check if File is Text Based

In general: there is no way to tell.

A text file stored in UTF-16 will likely look like binary if you open it with an 8-bit encoding. Equally someone could save a text file as a .doc (it is a document).

While you could open the file and look at some of the content all such heuristics will sometimes fail (eg. notepad tries to do this, by careful selection of a few characters notepad will guess wrong and display completely different content).

If you have a specific scenario, rather than being able to open and process anything, you should be able to do much better.


I guess you could just check through the first 1000 (arbitrary number) characters and see if there are unprintable characters, or if they are all ascii in a certain range. If the latter, assume that it is text?

Whatever you do is going to be a guess.


To get the real type of a file, you must check its header, which won't be changed even the extension is modified. You can get the header list here, and use something like this in your code:

using(var stream = new FileStream(fileName, FileMode.Open, FileAccess.Read))
{
   using(var reader = new BinaryReader(stream))
   {
     // read the first X bytes of the file
     // In this example I want to check if the file is a BMP
     // whose header is 424D in hex(2 bytes 6677)
     string code = reader.ReadByte().ToString() + reader.ReadByte().ToString();
     if (code.Equals("6677"))
     {
        //it's a BMP file
     }
   }
}

Tags:

C#

File Io