How are the fs/gs registers used in Linux AMD64?

What is then the use of GS?

x86_64 Linux kernel uses GS register as a efficiency way to acquire kernel space stack for system calls.

GS register stores the base address for per-cpu area. To acquire the kernel space stack, in entry_SYSCALL_64

movq    PER_CPU_VAR(cpu_current_top_of_stack), %rsp

After expanding PER_CPU_VAR, we get the following:

movq    %gs:cpu_current_top_of_stack, %rsp

To actually answer your fs:0 question: The x86_64 ABI requires that fs:0 contains the address "pointed to" by fs itself. That is, fs:-4 loads the value stored at fs:0 - 4. This feature is necessary because you cannot easily get the address pointed to by fs without going through kernel code. Having the address stored at fs:0 thus makes working with thread local storage much more efficient.

You can see this in action when you take the address of a thread local variable:

static __thread int test = 0;

int *f(void) {
    return &test;
}

int g(void) {
    return test;
}

compiles to

f:
    movq    %fs:0, %rax
    leaq    -4(%rax), %rax
    retq

g:
    movl    %fs:-4, %eax
    retq

i686 does the same but with %gs. On aarch64 this is not necessary because the address can be read from the tls register itself.


In x86-64 there are 3 TLS entries, two of them accesible via FS and GS, FS is used internally by glibc (in IA32 apparently FS is used by Wine and GS by glibc).

Glibc makes its TLS entry point to a struct pthread that contains some internal structures for threading. Glibc usually refers to a struct pthread variable as pd, presumably for pthread descriptor.

On x86-64, struct pthread starts with a tcbhead_t (this depends on the architecture, see the macros TLS_DTV_AT_TP and TLS_TCB_AT_TP). This Thread Control Block Header, AFAIU, contains some fields that are needed even when there is a single thread. The DTV is the Dynamic Thread Vector, and contains pointers to TLS blocks for DSOs loaded via dlopen(). Before or after the TCB there is a static TLS block for the executable and DSOs linked at (program's) load time. The TCB and DTV are explained pretty well in Ulrich Drepper's TLS document (look for the diagrams in chapter 3).