Why can spaces between options and parameters be omitted?

When you write the command line parsing bit of your code, you specify what options take arguments and which ones do not. For example, in a shell script accepting an -h option (for help for example) and an -a option that should take an argument, you do

opt_h=0     # default value
opt_a=""

while getopts 'a:h' opt; do
    case $opt in
        h)  opt_h=1 ;;
        a)  opt_a="$OPTARG" ;;
    esac
done

echo "h: $opt_h"
echo "a: $opt_a"

The a:h bit says "I'm expecting to parse two options, -a and -h, and -a should take an argument" (it's the : after a that tells the parser that -a takes a argument).

Therefore, there is never any ambiguity in where an option ends, where its value starts, and where another one starts after that.

Running it:

$ bash test.sh -h -a hello
h: 1
a: hello

$ bash test.sh -h -ahello
h: 1
a: hello

$ bash test.sh -hahello
h: 1
a: hello

This is why you most of the time shouldn't write your own command line parser to parse options.

There is only one case in this example that is tricky. The parsing usually stops at the first non-option, so when you have stuff on the command line that looks like options:

$ bash test.sh -a hello -world
test.sh: illegal option -- w
test.sh: illegal option -- o
test.sh: illegal option -- r
test.sh: illegal option -- l
test.sh: illegal option -- d
h: 0
a: hello

The following solves that:

$ bash test.sh -a hello -- -world
h: 0
a: hello

The -- signals an end of command line options, and the -world bit is left for the program to do whatever it wants with (it's in one of the positional variables).

That is, by the way, how you remove a file that has a dash in the start of its file name with rm.

EDIT:

Utilities written in C call getopt() (declared in unistd.h) which works pretty much the same way. In fact, for all we know, the bash function getopts may be implemented using a call to the C library function getopt(). Perl, Python and other languages have similar command line parsing libraries, and it is most likely that they perform their parsing in similar ways.

Some of these getopt and getopt-like library routines also handle "long" options. These are usually preceded by double-dash (--), and long options that takes arguments often does so after an equal sign, for example the --block-size=SIZE option of [some implementations of] the du utility (which also allows for -B SIZE to specify the same thing).

The reason manuals are often written to show a space in between the short options and their arguments is probably for readability.

EDIT: Really old tools, such as the dd and tar utilities, have options without dashes in front of them. This is purely for historical reasons and for maintaining compatibility with software that relies on them to work in exactly that way. The tar utility has gained the ability to take options with dashes in more recent times. The BSD manual for tar calls the old-style options for "bundled flags".


xargs is one of the POSIX utilities. As commented by @drewbenn, POSIX documents the option-parsing behavior for most of its utilities to match getopt, with some allowances for other implementations, saying in 12.1 Utility Argument Syntax:

This section describes the argument syntax of the standard utilities and introduces terminology used throughout POSIX.1-2008 for describing the arguments processed by the utilities.

Within POSIX.1-2008, a special notation is used for describing the syntax of a utility's arguments. Unless otherwise noted, all utility descriptions use this notation, which is illustrated by this example (see XCU Simple Commands):

and concluding with

It is recommended that all future utilities and applications use these guidelines to enhance user portability. The fact that some historical utilities could not be changed (to avoid breaking existing applications) should not deter this future goal.

Within POSIX (remember that it covers only the most commonly used utilities), there are exceptions which pass operands which would be options in other utilities as either positional parameters, or parameters with special syntax:

  • dd - convert and copy a file
  • expr - evaluate arguments as an expression
  • find [-H|-L] path... [operand_expression...]

POSIX allows for optional option-values:

Option-arguments are shown separated from their options by <blank> characters, except when the option-argument is enclosed in the '[' and ']' notation to indicate that it is optional.

Offhand, I do not recall which POSIX utilities use the feature. The ncurses tic and infocmp utilities use the feature for levels of the -v (verbose/debug) option.

The specific point which you asked about is detailed in the remainder of that paragraph, going on several lines.

Before POSIX, some implementations of ps accepted options without a leading hyphen. The POSIX description does not mention that in the description of the utility or in the syntax rationale:

  • ps - report process status

Aside from POSIX, there are long-option implementations (such as GNU getopt_long, or X Toolkit), using a variety of ways to separate or join an option's value to the option. For instance, punctuation may be used:

--option=value
--option value

Depending upon the implementation, a double-dash may/may not be used to distinguish long options from short (getopt): lynx and X Toolkit use a single dash; GNU getopt_long uses a double-dash, for instance. Also, a + may be used to indicate that an option is negated.

POSIX's description does not appear to mention any of these, but you are certainly likely to encounter them.

Tags:

Options