Why doesn't shell automatically fix "useless use of cat"?

"Useless use of cat" is more about how you write your code than about what actually runs when you execute the script. It's a sort of design anti-pattern, a way of going about something that could probably be done in a more efficient manner. It's a failure in understanding of how to best combine the given tools to create a new tool. I'd argue that stringing several sed and/or awk commands together in a pipeline also sometimes could be said to be a symptom of this same anti-pattern.

Fixing instances of "useless use of cat" in a script is a primarily matter of fixing the source code of the script manually. A tool such as ShellCheck can help with this by pointing out the obvious cases:

Click to copy

$ cat script.sh
#!/bin/sh
cat file | cat

Click to copy

$ shellcheck script.sh

In script.sh line 2:
cat file | cat
    ^-- SC2002: Useless cat. Consider 'cmd < file | ..' or 'cmd file | ..' instead.

Getting the shell to do this automatically would be difficult due to the nature of shell scripts. The way a script executes depends on the environment inherited from its parent process, and on the specific implementation of the available external commands.

The shell does not necessarily know what cat is. It could potentially be any command from anywhere in your $PATH, or a function.

If it was a built-in command (which it may be in some shells), it would have the ability to reorganise the pipeline as it would know of the semantics of its built-in cat command. Before doing that, it would additionally have to make assumptions about the next command in the pipeline, after the original cat.

Note that reading from standard input behaves slightly differently when it's connected to a pipe and when it's connected to a file. A pipe is not seekable, so depending on what the next command in the pipeline does, it may or may not behave differently if the pipeline was rearranged (it may detect whether the input is seekable and decide to do things differently if it is or if it isn't, in any case it would then behave differently).

This question is similar (in a very general sense) to "Are there any compilers that attempt to fix syntax errors on their own?" (at the Software Engineering StackExchange site), although that question is obviously about syntax errors, not useless design patterns. The idea about automatically changing the code based on intent is largely the same though.

Because it's not useless.

In the case of cat file | cmd, the fd 0 (stdin) of cmd will be a pipe, and in the case of cmd <file it may be a regular file, device, etc.

A pipe has different semantics from a regular file, and its semantics are not a subset of those of a regular file:

a regular file cannot be select(2)ed or poll(2)ed on in a meaningful way; a select(2) on it will always return "ready". Advanced interfaces like epoll(2) on Linux will simply not work with regular files.
on Linux there are system calls (splice(2), vmsplice(2), tee(2)) which only work on pipes [1]

Since cat is so much used, it could be implemented as a shell built-in which will avoid an extra process, but once you started on that path, the same thing could be done with most commands -- transforming the shell into a slower & clunkier perl or python. it's probably better to write another scripting language with an easy to use pipe-like syntax for continuations instead ;-)

[1] If you want a simple example not made up for the occasion, you can look at my "exec binary from stdin" git gist with some explanations in the comment here. Implementing cat inside it in order to make it work without UUoC would have made it 2 or 3 times bigger.

The 2 commands are not equivalent: consider error handling:

cat <file that doesn't exist> | less will produce an empty stream that will be passed to the piped program... as such you end up with a display showing nothing.

< <file that doesn't exist> less will fail to open bar, and then not open less at all.

Attempting to change the former to the latter could break any number of scripts that expect to run the program with a potentially blank input.

Why doesn't shell automatically fix "useless use of cat"?

Tags:

Performance

Posix

Shell Script

Related

Recent Posts