Why is it possible to move a running program in Ubuntu?
Let me break it down.
When you run an executable, a sequence of system calls are executed, most notably fork()
and execve()
:
fork()
creates a child process of the calling process, which is (mostly) an exact copy of the parent, both still running the same executable (using copy-on-write memory pages, so it's efficient). It returns twice: In the parent, it returns the child PID. In the child, it returns 0. Normally, the child process calls execve right away:execve()
takes a full path to the executable as an argument and replaces the calling process with the executable. At this point the newly created process gets its own virtual address space i.e. virtual memory, and execution begins at its entry point (in a state specified by the platform ABI's rules for fresh processes).
At this point, the kernel's ELF loader has mapped the text and data segments of the executable into memory, as if it had used the mmap()
system call (with shared read-only and private read-write mappings respectively). The BSS is also mapped as if with MAP_ANONYMOUS. (BTW, I'm ignoring dynamic linking here for simplicity: The dynamic linker open()
s and mmap()
s all the dynamic libraries before jumping to the main executable's entry point.)
Only a few pages are actually loaded into memory from disk before a newly-exec()ed starts running its own code. Further pages are demand paged in as needed, if/when the process touches those parts of its virtual address space. (Pre-loading any pages of code or data before starting to execute user-space code is just a performance optimization.)
The executable file is identified by the inode on the lower level. After the file has started to be executed, the kernel keeps the file content intact by the inode reference, not by file name, like for open file descriptors or file-backed memory mappings. So you can easily move the executable to another location of the filesystem or even on a different filesystem. As a side note, to check process's various stat you can peek into the /proc/PID
(PID is the Process ID of the given process) directory. You can even open the executable file as /proc/PID/exe
, even it's been unlinked from disk.
Now let's dig down the moving:
When you move a file within a same filesystem, the system call that is executed is rename()
, which just renames the file to another name, the file's inode remain the same.
Whereas between two different filesystems, two things happen:
The content of the file in copied first to the new location, by
read()
andwrite()
After that, the file is unlinked from the source directory using
unlink()
and obviously the file will get a new inode on the new filesystem.
rm
is actually just unlink()
-ing the given file from the directory tree, so having the write permission on the directory will get you sufficient right to remove any file from that directory.
Now for fun, imagine what happens when you are moving files between two filesytems and you do not have permission to unlink()
the file from source?
Well, the file will be copied to the destination at first (read()
, write()
) and then unlink()
will fail due to insufficient permission. So, the file will remain in both filesystems!!
Well, that is pretty straighforward. Let's take an executable named /usr/local/bin/whoopdeedoo. That is only a reference to so called inode (basic structure of files on Unix Filesystems). It's the inode that gets marked "in use".
Now when you delete or move the file /usr/local/whoopdeedoo, the only thing that is moved (or wiped) is the reference to the inode. The inode itself remains unchanged. That's basically it.
I should verify it, but I believe you can do this on Mac OS X filesystems too.
Windows takes a different approach. Why? Who knows...? I am not familiar with the internals of NTFS. Theoretically, all filesystems that use references to intenal structures for filesnames should be able to do this.
I admit, I overly simplified, but go read the section "Implications" on Wikipedia, which does a much better job than me.
One thing that seems missing from all other answers is that: once a file is opened and a program holds an open file descriptor the file will not be removed from the system until that file descriptor is closed.
Attempts to delete the referenced inode will be delayed until the file is closed: renaming in the same or different file system cannot affect the open file, independently of the behaviour of the rename, nor explicitly deleting or overwriting the file with a new one. The only way in which you can mess a file up is by explicitly opening its inode and mess with the contents, not by operations on the directory such as renaming/deleting the file.
Moreover when the kernel executes a file it keeps a reference to the executable file and this will again prevent any modification of it during execution.
So in the end even if it looks like that you are able to delete/move the files that make up a running program, actually the contents of those files are kept in memory until the program ends.