When should I use input redirection?
From the man grep
page (on Debian):
DESCRIPTION
grep searches the named input FILEs (or standard input if no files are named, or if a single hyphen-minus (-) is given as file name) for lines containing a match to the given PATTERN. By default, grep prints the matching lines.
In the first case, grep
opens the file; in the second, the shell opens the file and assigns it to the standard input of grep
, and grep
not being passed any file name argument assumes it needs to grep its standard input.
Pros of 1:
grep
can grep more than one file¹.grep
can display the file name where each occurrence ofline
is found.grep
could² (but I don't know of any implementation that does) do afadvise(POSIX_FADV_SEQUENTIAL)
on the file descriptors it opens.
Pros of 2:
If the file can't be opened, the shell returns an error which will include more relevant information (like line number in the script) and in a more consistent way (if you let the shell open files for other commands as well) than when
grep
opens it. And if the file can't be opened,grep
is not even called (which for some commands -- maybe notgrep
-- can make a big difference).in
grep line < in > out
, ifin
can't be opened,out
won't be created or truncated.There's no problem with some files with unusual names (like
-
or file names starting with-
)³.cosmetic: you can put
<file
anywhere on the command-line to show the command flow more naturally, like<in grep line >out
if you prefer.cosmetic: with GNU
grep
, you can choose what label to use in front of the matching line instead of just the file name as in:<file grep --label='Found in file at line' -Hn line
In terms of performance, if the file can't be opened, you save the execution of grep
when using redirection, but otherwise for grep
I don't expect much difference.
With redirection, you save having to pass an extra argument to grep
, you make grep
's argument parsing slightly easier. On the other hand, the shell will need (at least) an extra system call to dup2()
the file descriptor onto file descriptor 0.
In { grep -m1 line; next command; } < file
, grep
(here GNU grep
) will want to seek()
back to just after the matching line so the next command
sees the rest of the file (it will also need to determine whether the file is seekable or not). In other words, the position within stdin is another one of grep
's output. With grep -m1 line file
, it can optimise that out, that's one fewer thing for grep
to care about.
Notes
¹ With zsh
, you can do:
grep line < file1 < file2
but that's doing the equivalent of cat file1 file2 | grep line
(without invoking the cat
utility) and so is less efficient, can cause confusion if the first file doesn't end in a newline character and won't let you know in which file the pattern is found.
² That is to tell the system that grep
is going to read the file sequentially so the I/O scheduler can make more educated decisions for instance as to how to read the data. grep
can do that on its own fd, but it would be wrong to do it on that fd 0 that it borrows from its caller, as that fd (or rather the open file description it references) could be used later or even at the same time for non-sequential read.
³ In the case of ksh93
and bash
though, there are files like /dev/tcp/host/port
(and /dev/fd/x
on some systems in bash
) which, when used in the target of redirections the shell intercepts for special purposes instead of really opening the file on the file system (though generally, those files don't exist on the file system). /dev/stdin
serves the same purpose as -
recognised by grep
, but at least, here it's more properly namespaced (anybody can create a file called -
in any directory, while only administrators can create a file called /dev/tcp/host/port
and administrators should know better).
The answer by StephaneChazelas covers grep(1)
, and most Unix lineage commands work that way, but not all. It is standard to read either from standard input
(from the keyboard, from a file redirected via < file
, or from the output piped by another command, stupid example ls * | grep '^ab*c$'
), or from the file(s) given as arguments, like grep comment file1 file2 file3
. Some commands use the convention there that the file named -
is standard input, so you can say make-middle | cat head - tail
to get a stream with head
, whatever gen-middle
generates, followed by tail
. This is by design, to give flexibility in the use of the commands.
Which is better? As long as it works, cmd file
is shorter than cmd < file
; there could be a tiny difference in time between the shell doing the file frobbing (<
) and the command doing it by itself, but probably unnoticeable unless you do nothing else all day long. It will depend on considerations like the pros mentioned in Stephane's answer.