What could cause the file command in Linux to report a text file as binary data?
Vim tries very hard to make sense of whatever you throw at it without complaining. This makes it a relatively poor tool to use to diagnose file
's output.
Vim's "[converted]" notice indicates there was something in the file that vim wouldn't expect to see in the text encoding suggested by your locale settings (LANG etc).
Others have already suggested
cat -v
xxd
You could try grepping for non-ASCII characters.
grep -P '[\x7f-\xff]' filename
The other possibility is non-standard line-endings for the platform (i.e. CRLF or CR) but I'd expect file
to cope with that and report "DOS text file" or similar.
If you run file -D filename
, file
displays debugging information, including the tests it performs. Near the end, it will show what test was successful in determining the file type.
For a regular text file, it looks like this:
[31> 0 regex,=^package[ \t]+[0-9A-Za-z_:]+ *;,""]
1 == 0 = 0
ascmagic 1
filename.txt: ISO-8859 text, with CRLF line terminators
This will tell you what it found to determine it's that mime type.
I found the issue using binary search to locate the problematic lines.
head -n {1/2 line count} file.cpp > a.txt
tail -n {1/2 line count} file.cpp > b.txt
Running file
against each half, and repeating the process, helped me locate the offending line. I found a Control+P (^P
) character embedded in it. Removing it solved the problem. I'll write myself a Perl script to search for these characters (and other extended) in the future.
A big thanks to everyone who provided an answer for all the tips!