Why do shell script comparisons often use x$VAR = xyes?

If you're using a shell that does simple substitution and the SHELL_VAR variable does not exist (or is blank), then you need to watch out for the edge cases. The following translations will happen:

if test $SHELL_VAR = yes; then        -->  if test = yes; then
if test x$SHELL_VAR = xyes; then      -->  if test x = xyes; then

The first of these will generate an error since the fist argument to test has gone missing. The second does not have that problem.

Your case translates as follows:

if test "x$SHELL_VAR" = "xyes"; then  -->  if test "x" = "xyes"; then

The x, at least for POSIX-compliant shells, is actually redundant since the quotes ensue that both an empty argument and one containing spaces are interpreted as a single object.


The other reason that no-one else has yet mentioned is in relation to option processing. If you write:

if [ "$1" = "abc" ]; then ...

and $1 has the value '-n', the syntax of the test command is ambiguous; it is not clear what you were testing. The 'x' at the front prevents a leading dash from causing trouble.

You have to be looking at really ancient shells to find one where the test command does not have support for -n or -z; the Version 7 (1978) test command included them. It isn't quite irrelevant - some Version 6 UNIX stuff escaped into BSD, but these days, you'd be extremely hard pressed to find anything that ancient in current use.

Not using double quotes around values is dangerous, as a number of other people pointed out. Indeed, if there's a chance that file names might contain spaces (MacOS X and Windows both encourage that to some extent, and Unix has always supported it, though tools like xargs make it harder), then you should enclose file names in double quotes every time you use them too. Unless you are in charge of the value (e.g. during option handling, and you set the variable to 'no' at startup and 'yes' when a flag is included in the command line) then it is not safe to use unquoted forms of variables until you've proved them safe -- and you may as well do it all the time for many purposes. Or document that your scripts will fail horribly if users attempt to process files with blanks in the names. (And there are other characters to worry about too -- backticks could be rather nasty too, for instance.)


There's two reasons that I know of for this convention:

http://tldp.org/LDP/abs/html/comparison-ops.html

In a compound test, even quoting the string variable might not suffice. [ -n "$string" -o "$a" = "$b" ] may cause an error with some versions of Bash if $string is empty. The safe way is to append an extra character to possibly empty variables, [ "x$string" != x -o "x$a" = "x$b" ] (the "x's" cancel out).

Second, in other shells than Bash, especially older ones, the test conditions like '-z' to test for an empty variable did not exist, so while this:

if [ -z "$SOME_VAR" ]; then
  echo "this variable is not defined"
fi

will work fine in BASH, if you're aiming for portability across various UNIX environments where you can't be sure that the default shell will be Bash and whether it supports the -z test condition, it's safer to use the form if [ "x$SOME_VAR" = "x" ] since that will always have the intended effect. Essentially this is an old shell scripting trick for finding an empty variable, and it's still used today for backwards compatibility despite there being cleaner methods available.