Why doesn't rc.local run all my commands, and what can I do about it?

You can skip all the way down to A Quick Fix but that's not necessarily the best option. So I recommend reading all this first.

rc.local doesn't tolerate errors.

rc.local doesn't provide a way to intelligently recover from errors. If any command fails, it stops running. The first line, #!/bin/sh -e, causes it to be executed in a shell invoked with the -e flag. The -e flag is what makes a script (in this case, rc.local) stop running the first time a command fails within it.

You want rc.local to behave like this. If a command fails, you do not want it to continue on with whatever other startup commands might be relying on it having succeeded.

So if any command fails, the subsequent commands will not run. The problem here is that /script.sh did not run (not that it failed, see below), so most likely some command before it failed. But which one?

Was it /bin/chmod +x /script.sh?

No.

chmod runs fine at any time. Provided the filesystem that contains /bin was mounted, you can run /bin/chmod. And /bin is mounted before rc.local runs.

When run as root, /bin/chmod rarely fails. It will fail if the file it operates on is read-only, and might fail if the filesystem it's on doesn't supported permissions. Neither is likely here.

By the way, sh -e is the only reason it would actually be a problem if chmod failed. When you run a script file by explicitly invoking its interpreter, it doesn't matter if the file is marked executable. Only if it said /script.sh would the file's executable bit matter. Since it says sh /script.sh, it doesn't (unless of course /script.sh calls itself while it runs, which could fail from it not being executable, but it's unlikely it calls itself).

So what failed?

sh /home/incero/startup_script.sh failed. Almost certainly.

We know it ran, because it downloaded /script.sh.

(Otherwise, it would be important to make sure it did run, in case somehow /bin was not in PATH--rc.local doesn't necessarily have the same PATH as you have when you're logged in. If /bin weren't in rc.local's path, this would require sh to be run as /bin/sh. Since it did run, /bin is in PATH, which means you can run other commands that are located in /bin, without fully qualifying their names. For example, you can run just chmod rather than /bin/chmod. However, in keeping with your style in rc.local, I've used fully qualified names for all commands except sh, whenever I am suggesting you run them.)

We can be pretty sure /bin/chmod +x /script.sh never ran (or you would see that /script.sh was executed). And we know sh /script.sh wasn't run either.

But it downloaded /script.sh. It succeeded! How could it fail?

Two Meanings of Success

There are two different things a person might mean when s/he says a command succeeded:

  1. It did what you wanted it to do.
  2. It reported that it succeeded.

And so it is for failure. When a person says a command failed, it could mean:

  1. It did not do what you wanted it to do.
  2. It reported that it failed.

A script run with sh -e, like rc.local, will stop running the first time a command reports that it failed. It doesn't make a difference what the command actually did.

Unless you intend for startup_script.sh to report failure when it does what you want, this is a bug in startup_script.sh.

  • Some bugs prevent a script from doing what you want it to do. They affect what programmers call its side effects.
  • And some bugs prevent a script from reporting correctly whether or not it succeeded. They affect what programmers call its return value (which in this case is an exit status).

It's most likely that startup_script.sh did everything it should, except reported that it failed.

How Success Or Failure Is Reported

A script is a list of zero or more commands. Each command has an exit status. Assuming there are no failures in actually running the script (for example, if the interpreter couldn't read the next line of the script while running it), the exit status of a script is:

  • 0 (success) if the script was blank (i.e., had no commands).
  • N, if the script ended as a result of the command exit N, where N is some exit code.
  • The exit code of the last command that ran in the script, otherwise.

When an executable runs, it reports its own exit code--they're not just for scripts. (And technically, exit codes from scripts are the exit codes returned by the shells that run them.)

For example, if a C program ends with exit(0);, or return 0; in its main() function, the code 0 is given to the operating system, which provides it to calling process (which may, for example, be the shell from which the program was run).

0 means the program succeeded. Every other number means it failed. (This way, different numbers can sometimes refer to different reasons the program failed.)

Commands Meant to Fail

Sometimes, you run a program with the intention that it will fail. In these situations, you might think of its failure as success, even though it is not a bug that the program reports failure. For example, you might use rm on a file you suspect already does not exist, just to make sure it's deleted.

Something like this is probably happening in startup_script.sh, just before it stops running. The last command to run in the script is probably reporting failure (even though its "failure" might be totally fine or even necessary), which makes the script report failure.

Tests Meant to Fail

One special kind of command is a test, by which I mean a command run for its return value rather than its side effects. That is, a test is a command that is run so that its exit status can be examined (and acted upon).

For example, suppose I forget if 4 is equal to 5. Fortunately, I know shell scripting:

if [ 4 -eq 5 ]; then
    echo "Yeah, they're totally the same."
fi

Here, the test [ -eq 5 ] fails because it turns out 4 ≠ 5 after all. That doesn't mean the test didn't perform correctly; it did. It's job was to check if 4 = 5, then report success if so, and failure if not.

You see, in shell scripting, success can also mean true, and failure can also mean false.

Even though the echo statement never runs, the if block as a whole does return success.

However, supposed I'd written it shorter:

[ 4 -eq 5 ] && echo "Yeah, they're totally the same."

This is common shorthand. && is a boolean and operator. A && expression, which consists of && with statements on either side, returns false (failure) unless both sides return true (success). Just like a normal and.

If someone asks you, "did Derek go to the mall and think about a butterfly?" and you know Derek didn't go to the mall, you don't have to bother figuring out if he thought of a butterfly.

Similarly, if the command to the left of && fails (false), the whole && expression immediately fails (false). The statement on the right side of && is never run.

Here, [ 4 -eq 5 ] runs. It "fails" (returning false). So the whole && expression fails. echo "Yeah, they're totally the same." never runs. Everything behaved as it should be, but this command reports failure (even though the otherwise equivalent if conditional above reports success).

If that were the last statement in a script (and the script got to it, rather than terminating at some point before it), the whole script would report failure.

There's lots of tests besides this. For example, there are tests with || ("or"). However, the above example should be sufficient to explain what tests are, and make it possible for you to effectively use documentation to determine if a particular statement/command is a test.

sh vs. sh -e, Revisited

Since the #! line (see also this question) at the top of /etc/rc.local has sh -e, the operating system runs the script as though it were invoked with the command:

sh -e /etc/rc.local

In contrast, your other scripts, such as startup_script.sh, run without the -e flag:

sh /home/incero/startup_script.sh

Consequently they keep running even when a command in them reports failure.

This is normal and good. rc.local should be invoked with sh -e and most other scripts--including most scripts run by rc.local--should not.

Just make sure to remember the difference:

  • Scripts run with sh -e exit reporting failure the first time a command they contain exits reporting failure.

    It's as though the script were a single long command consisting of all the commands in the script joined with && operators.

  • Scripts run with sh (without -e) continue running until they get to a command that terminates (exits out of) them, or to the very end of the script. The success or failure of every command is essentially irrelevant (unless the next command checks it). The script exits with the exit status of the last command run.

Helping Your Script Understand It's Not Such a Failure After All

How can you keep your script from thinking it failed when it didn't?

You look at what happens just before it's done running.

  1. If a command failed when it ought to have succeeded, figure out why, and fix the problem.

  2. If a command failed, and that was the right thing to happen, then prevent that failure status from propagating.

    • One way to keep a failure status from propagating is to run another command that succeeds. /bin/true has no side effects and reports success (just as /bin/false does nothing and also fails).

    • Another is to make sure the script is terminated by exit 0.

      That's not necessarily the same thing as exit 0 being at the end of the script. For example, there might be an if-block where the script exits inside.

It's best to know what causes your script to report failure, before making it report success. If it really is failing in some way (in the sense of not doing what you want it to do), you don't really want it to report success.

A Quick Fix

If you can't make startup_script.sh exit reporting success, you can change the command in rc.local that runs it, so that command reports success even though startup_script.sh did not.

Currently you have:

sh /home/incero/startup_script.sh

This command has the same side effects (i.e., the side effect of running startup_script.sh), but always reports success:

sh /home/incero/startup_script.sh || /bin/true

Remember, it's better to know why startup_script.sh reports failure, and fix it.

How the Quick Fix Works

This is actually an example of a || test, an or test.

Suppose you ask if I took out the trash or brushed the owl. If I took out the trash, I can truthfully say "yes," even if I don't remember whether or not I brushed the owl.

The command to the left of || runs. If it succeeds (true), then the right-hand side doesn't have to run. So if startup_script.sh reports success, the true command never runs.

However, if startup_script.sh reports failure [I didn't take out the trash], then the result of /bin/true [if I brushed the owl] matters.

/bin/true always returns success (or true, as we sometimes call it). Consequently, the whole command succeeds, and the next command in rc.local can run.

An additional note on success/failure, true/false, zero/nonzero.

Feel free to ignore this. You might want to read this if you program in more than one language, though (i.e., not just shell scripting).

It is a point of great confusion for shell scripters taking up programming languages like C, and for C programmers taking up shell scripting, to discover:

  • In shell scripting:

    1. A return value of 0 means success and/or true.
    2. A return value of something other than 0 means failure and/or false.
  • In C programming:

    1. A return value of 0 means false..
    2. A return value of something other than 0 means true.
    3. There is no simple rule for what means success and what means failure. Sometimes 0 means success, other times it means failure, other times it means that the sum of the two numbers you just added is zero. Return values in general programming are used to signal a wide variety of different sorts of information.
    4. A program's numeric exit status should indicate success or failure in accordance with the rules for shell scripting. That is, even though 0 means false in your C program, you still make your program return 0 as its exit code if you want it to report it succeeded (which the shell then interprets as true).

You can (in Unix variants I use) override the default behavior by changing:

#!/bin/sh -e

To:

#!/bin/sh

At the start of the script. The "-e" flag instructs sh to exit on the first error. The long name of that flag is "errexit", so the original line is equivalent to:

#!/bin/sh --errexit