Fork vs Clone on 2.6 Kernel Linux

fork() was the original UNIX system call. It can only be used to create new processes, not threads. Also, it is portable.

In Linux, clone() is a new, versatile system call which can be used to create a new thread of execution. Depending on the options passed, the new thread of execution can adhere to the semantics of a UNIX process, a POSIX thread, something in between, or something completely different (like a different container). You can specify all sorts of options dictating whether memory, file descriptors, various namespaces, signal handlers, and so on get shared or copied.

Since clone() is the superset system call, the implementation of the fork() system call wrapper in glibc actually calls clone(), but this is an implementation detail that programmers don't need to know about. The actual real fork() system call still exists in the Linux kernel for backward compatibility reasons even though it has become redundant, because programs that use very old versions of libc, or another libc besides glibc, might use it.

clone() is also used to implement the pthread_create() POSIX function for creating threads.

Portable programs should call fork() and pthread_create(), not clone().


It appears that there's two clone() things floating around in Linux 2.6

There's a system call:

int clone(int (*fn)(void *), void *child_stack,
          int flags, void *arg, ...
          /* pid_t *ptid, struct user_desc *tls, pid_t *ctid */ );

This is the "clone()" described by doing man 2 clone.

If you read that man page close enough, you will see this:

It is actually a library function layered on top of the
underlying clone() system call.

Apparently, you're supposed to implement threading using the "library function" layered on the confusingly identically named system call.

I wrote a short program:

#include <stdio.h>
#include <sys/types.h>
#include <unistd.h>
int
main(int ac, char **av)
{
    pid_t cpid;
    switch (cpid = fork()) {
    case 0:   // Child process
        break;
    case -1:  // Error
        break;
    default:  // parent process
        break;
    }
    return 0;
}

Compiled it with: c99 -Wall -Wextra, and ran it under strace -f to see what system calls forking actually do. I got this out of strace on a Linux 2.6.18 machine (x86_64 CPU):

20097 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x2b4ee9213770) = 20098
20097 exit_group(0)                     = ?
20098 exit_group(0)

No "fork" call appears in the strace output. The clone() call that shows up in the strace output has very different arguments from the man-page-clone. child_stack=0 as the first argument is different than int (*fn)(void *).

It appears that the fork(2) system call is implemented in terms of the real clone(), just like the "library function" clone() is implemented. The real clone() has a different set of arguments from the man-page-clone.

Simplistically, both of your apparently contradictory statements about fork() and clone() are correct. The "clone" involved is different, though.


fork() is just a particular set of flags to the system call clone(). clone() is general enough to create either a "process" or a "thread" or even weird things that are somewhere between processes and threads (for example, different "processes" that share the same file descriptor table).

Essentially, for every "type" of information associated with an execution context in the kernel, clone() gives you the choice of aliasing that information or copying it. Threads correspond to aliasing, processes correspond to copying. By specifying intermediate combinations of flags to clone(), you can create weird things that aren't threads or processes. You usually shouldn't do this, and I imagine there was some debate during the development of the Linux kernel about whether it should allow for such a general mechanism as clone().

Tags:

Linux

Fork