Why was `cp` designed to silently overwrite existing files?

The default overwrite behavior of cp is specified in POSIX.

If source_file is of type regular file, the following steps shall be taken:

3.a. The behavior is unspecified if dest_file exists and was written by a previous step. Otherwise, if dest_file exists, the following steps shall be taken:

3.a.i. If the -i option is in effect, the cp utility shall write a prompt to the standard error and read a line from the standard input. If the response is not affirmative, cp shall do nothing more with source_file and go on to any remaining files.

3.a.ii. A file descriptor for dest_file shall be obtained by performing actions equivalent to the open() function defined in the System Interfaces volume of POSIX.1-2017 called using dest_file as the path argument, and the bitwise-inclusive OR of O_WRONLY and O_TRUNC as the oflag argument.

3.a.iii. If the attempt to obtain a file descriptor fails and the -f option is in effect, cp shall attempt to remove the file by performing actions equivalent to the unlink() function defined in the System Interfaces volume of POSIX.1-2017 called using dest_file as the path argument. If this attempt succeeds, cp shall continue with step 3b.

When the POSIX specification was written, there already was a large number of scripts in existence, with a built-in assumption for the default overwrite behavior. Many of those scripts were designed to run without direct user presence, e.g. as cron jobs or other background tasks. Changing the behavior would have broken them. Reviewing and modifying them all to add an option to force overwriting wherever needed was probably considered a huge task with minimal benefits.

Also, the Unix command line was always designed to allow an experienced user to work efficiently, even at the expense of a hard learning curve for a beginner. When the user enters a command, the computer is to expect that the user really means it, without any second-guessing; it is the user's responsibility to be careful with potentially destructive commands.

When the original Unix was developed, the systems then had so little memory and mass storage compared to modern computers that overwrite warnings and prompts were probably seen as wasteful and unnecessary luxuries.

When the POSIX standard was being written, the precedent was firmly established, and the writers of the standard were well aware of the virtues of not breaking backwards compatibility.

Besides, as others have described, any user can add/enable those features for themselves, by using shell aliases or even by building a replacement cp command and modifying their $PATH to find the replacement before the standard system command, and get the safety net that way if desired.

But if you do so, you'll find that you are creating a hazard for yourself. If the cp command behaves one way when used interactively and another way when called from a script, you may not remember that the difference exists. On another system, you might end up being careless because you're become used to the warnings and prompts on your own system.

If the behavior in scripts will still match the POSIX standard, you're likely to get used to the prompts in interactive use, then write a script that does some mass copying - and then find you're again inadvertently overwritten something.

If you enforce prompting in scripts too, what will the command do when run in a context that has no user around, e.g. background processes or cron jobs? Will the script hang, abort, or overwrite?

Hanging or aborting means that a task that was supposed to get done automatically will not be done. Not overwriting may sometimes also cause a problem by itself: for example, it might cause old data to be processed twice by another system instead of being replaced with up-to-date data.

A large part of the power of the command line comes from the fact that once you know how to do something on the command line, you'll implicitly also know how to make it happen automatically by scripting. But that is only true if the commands you use interactively also work exactly the same when invoked in a script context. Any significant differences in behavior between interactive use and scripted use will create a sort of cognitive dissonance which is annoying to a power user.

cp comes from the beginning of Unix. It was there well before the Posix standard was written. Indeed: Posix just formalized the existing behavior of cp in this regard.

We're talking around Epoch (1970-01-01), when men were real men, women were real women and furry little creatures ... (I digress). In those days, adding extra code made a program bigger. That was an issue then, because the first computer that ran Unix was a PDP-7 (upgradable to 144KB RAM!). So things were small, efficient, without safety-features.

So, in those days, you had to know what you were doing, because the computer just did not have the power to prevent you from doing anything you regretted later.

(There is a nice cartoon by Zevar; search for "zevar cerveaux assiste par ordinateur" to find the evolution of the computer. Or try http://a54.idata.over-blog.com/2/07/74/62/dessins-et-bd/le-CAO-de-Zevar---reduc.jpg for as long as it exists)

For those really interested (I saw some speculation in the comments): The original cp on the first Unix was about two pages of assembler code (C came later). The relevant part was:

sys open; name1: 0; 0   " Open the input file
spa
  jmp error         " File open error
lac o17         " Why load 15 (017) into AC?
sys creat; name2: 0     " Create the output file
spa
  jmp error         " File create error

(So, a hard sys creat)

And, while we're at it: Version 2 of Unix used (code sniplet)

mode = buf[2] & 037;
if((fnew = creat(argv[2],mode)) < 0){
    stat(argv[2], buf);

which is also a hard creat without tests or safeguards. Note that the C-code for V2 Unix of cp is less than 55 lines!

Because these commands are also meant to be used in scripts, possibly running without any kind of human supervision, and also because there are plenty of cases where you indeed want to overwrite the target (the philosophy of the Linux shells is that the human knows what s/he is doing)

There are still a few safeguards:

GNU cp has a -n|--no-clobber option
if you copy several files to a single one cp will complain that the last one is not a directory.

Why was `cp` designed to silently overwrite existing files?

Tags:

History

Cp

Related

Recent Posts