Why is the behavior of the `#!` syntax unspecified by POSIX?
I think primarily because:
the behaviour varies greatly between implementation. See https://www.in-ulm.de/~mascheck/various/shebang/ for all the details.
It could however now specify a minimum subset of most Unix-like implementations: like
#! *[^ ]+( +[^ ]+)?\n
(with only characters from the portable filename character set in those one or two words) where the first word is an absolute path to a native executable, the thing is not too long and behaviour unspecified if the executable is setuid/setgid, and implementation defined whether the interpreter path or the script path is passed asargv[0]
to the interpreter.POSIX doesn't specify the path of executables anyway. Several systems have pre-POSIX utilities in
/bin
//usr/bin
and have the POSIX utilities somewhere else (like on Solaris 10 where/bin/sh
is a Bourne shell and the POSIX one is in/usr/xpg4/bin
; Solaris 11 replaced it with ksh93 which is more POSIX compliant, but most of the other tools in/bin
are still ancient non-POSIX ones). Some systems are not POSIX ones but have a POSIX mode/emulation. All POSIX requires is that there be a documented environment in which a system behaves POSIXly.See Windows+Cygwin for instance. Actually, with Windows+Cygwin, the she-bang is honoured when a script is invoked by a cygwin application, but not by a native Windows application.
So even if POSIX specified the shebang mechanism it could not be used to write POSIX
sh
/sed
/awk
... scripts (also note that the shebang mechanism cannot be used to write reliablesed
/awk
script as it doesn't allow passing an end-of-option marker).
Now the fact that it's unspecified doesn't mean you can't use it (well, it says you shouldn't have the first line start with #!
if you expect it to be only a regular comment and not a she-bang), but that POSIX gives you no guarantee if you do.
In my experience, using shebangs gives you more guarantee of portability than using POSIX's way of writing shell scripts: leave off the she-bang, write the script in POSIX sh
syntax and hope that whatever invokes the script invokes a POSIX compliant sh
on it, which is fine if you know the script will be invoked in the right environment by the right tool but not otherwise.
You may have to do things like:
#! /bin/sh -
if : ^ false; then : fine, POSIX system by default
else
# cover Solaris 10 or older. ": ^ false" returns false
# in the Bourne shell as ^ is an alias for | there for
# compatibility with the Thompson shell.
PATH=`getconf PATH`:$PATH; export PATH
exec /usr/xpg4/bin/sh - "$0" ${1+"$@"}
fi
# rest of script
If you want to be portable to Windows+Cygwin, you may have to name your file with a .bat
or .ps1
extension and use some similar trick for cmd.exe
or powershell.exe
to invoke the cygwin sh
on the same file.
[T]he behavior seems consistent between all POSIX-complaint shells. I don't see the need the need for wiggle room here.
You aren't looking deeply enough.
Back in the 1980s, this mechanism was not de facto standardized. Although Dennis Ritchie had implemented it, that implementation had not reached the public in the AT&T side of the universe. It was effectively only publicly available and known in BSD; with executable shell scripts not available on AT&T Unix. Thus it was not reasonable to standardize it. The state of affairs is exemplified by this contemporary doco, one of many such:
Note that BSD allows files which begin with— Stephen Frede (1988). "Programming on System X Release Y". Australian Unix Systems User Group Newsletter. Volume 9. Number 4. p. 111.#! interpreter
to be executed directly, while SysV allows only a.out files to be executed directly. This means that an instance of one of theexec…()
routines in a BSD program may have to be changed under SysV to execute the interpreter (typlically/bin/sh
) for that program instead.
An important point here is that you are looking at shells, whereas the existence of executable shell scripts is actually a matter for the exec…()
functions. What shells do includes the precursors of the executable script mechanism, still to be found in some shells even today (and also nowadays mandated for the exec…p()
subset of functions), and is somewhat misleading. What the standard needs to address in this regard is how exec…()
on an interpreted script works, and at the time that POSIX was originally created it simply did not work in the first place across a major part of the spectrum of target operating systems.
A subordinate question is why this has not been standardized since, especially as the magic number mechanism for script interpreters had reached the public in the AT&T side of the universe and had been documented for exec…()
in the System 5 Interface Definition, by the turn of the 1990s:
An interpreter file begins with a line of the form—# ! pathname [arg]where pathname is the path of the interpreter, and arg is an optional argument. When youexec
an interpreter file, the systemexec
s the specified interpreter.
exec
. System V Interface Definition. Volume 1. 1991.
Unfortunately, the behaviour remains today almost as widely divergent as it was in the 1980s and there is no truly common behaviour to standardize. Some Unices (famously HP-UX and FreeBSD, for examples) do not support scripts as interpreters for scripts. Whether the first line is one, two, or many elements separated by whitespace varies between MacOS (and versions of FreeBSD before 2005) and others. The maximum supported path length varies. ␀
and characters outwith the POSIX portable filename character set are tricky, as are leading and trailing whitespace. What the 0th, 1st, and 2nd argument end up being is also tricky, with significant variation across systems. Some currently POSIX-conformant but non-Unix systems still do not support any such mechanism, and mandating it would convert them into no longer being POSIX conformant.
Further reading
- Which shell interpreter runs a script with no shebang?
- Why am I able to pass arguments to /usr/bin/env in this case?
script
. NetBSD Miscellaneous Information Manual. 2005-05-06.- https://unix.stackexchange.com/a/605761/5132