Finding all "Non-Binary" files

I'd use file and pipe the output into grep or awk to find text files, then extract just the filename portion of file's output and pipe that into xargs.

something like:

file * | awk -F: '/ASCII text/ {print $1}' | xargs -d'\n' -r flip -u

Note that the grep searches for 'ASCII text' rather than any just 'text' - you probably don't want to mess with Rich Text documents or unicode text files etc.

You can also use find (or whatever) to generate a list of files to examine with file:

find /path/to/files -type f -exec file {} + | \
  awk -F: '/ASCII text/ {print $1}' | xargs -d'\n' -r flip -u

The -d'\n' argument to xargs makes xargs treat each input line as a separate argument, thus catering for filenames with spaces and other problematic characters. i.e. it's an alternative to xargs -0 when the input source doesn't or can't generate NULL-separated output (such as find's -print0 option). According to the changelog, xargs got the -d/--delimiter option in Sep 2005 so should be in any non-ancient linux distro (I wasn't sure, which is why I checked - I just vaguely remembered it was a "recent" addition).

Note that a linefeed is a valid character in filenames, so this will break if any filenames have linefeeds in them. For typical unix users, this is pathologically insane, but isn't unheard of if the files originated on Mac or Windows machines.

Also note that file is not perfect. It's very good at detecting the type of data in a file but can occasionally get confused.

I have used numerous variations of this method many times in the past with success.

No. There is nothing special about a binary or non-binary file. You can use heuristics like 'contains only characters in 0x01–0x7F', but that'll call text files with non-ASCII characters binary files, and unlucky binary files text files.

Now, once you've ignored that...

zip files

If its coming from your Windows user as a zip file, the zip format supports marking files as either binary or text in the archive itself. You can use unzip's -a option to pay attention to this and convert. Of course, see the first paragraph for why this may not be a good idea (the zip program may have guessed wrong when it made the archive).

zipinfo will tell you which files are binary (b) or text (t) in its zipfile listing.

other files

The file command will look at a file and try to identify it. In particular, you'll probably find its -i (output MIME type) option useful; only convert files with type text/*

The accepted answer didn't find all of them for me. Here is an example using grep's -I to ignore binaries, and ignoring all hidden files...

find . -type f -not -path '*/\.*' -exec grep -Il '.' {} \; | xargs -L 1 echo

Here it is in use in a practical application: dos2unix

https://unix.stackexchange.com/a/365679/112190

Finding all "Non-Binary" files

zip files

other files

Tags:

Text

Find

Files

Newlines

Related

Recent Posts