Why use the Global Offset Table for symbols defined in the shared library itself?
The Global Offset Table serves two purposes. One is to allow the dynamic linker "interpose" a different definition of the variable from the executable or other shared object. The second is to allow position independent code to be generated for references to variables on certain processor architectures.
ELF dynamic linking treats the entire process, the executable and all of the shared objects (dynamic libraries), as sharing one single global namespace. If multiple components (executable or shared objects) define the same global symbol then the dynamic linker normally chooses one definition of that symbol and all references to that symbol in all components refer to that one definition. (However, the ELF dynamic symbol resolution is complex and for various reasons different components can end up using different definitions of the the same global symbol.)
To implement this, when building a shared library the compiler will access global variables indirectly through the GOT. For each variable an entry in the GOT will be created containing a pointer to the variable. As your example code shows, the compiler will then use this entry to obtain the address of variable instead of trying to access it directly. When the shared object is loaded into a process the dynamic linker will determine whether any of the global variables have been superseded by variable definitions in another component. If so those global variables will have their GOT entries updated to point at the superseding variable.
By using the "hidden" or "protected" ELF visibility attributes it's possible to prevent global defined symbol from being superseded by a definition in another component, and thus removing the need to use the GOT on certain architectures. For example:
extern int global_visible;
extern int global_hidden __attribute__((visibility("hidden")));
static volatile int local; // volatile, so it's not optimized away
int
foo() {
return global_visible + global_hidden + local;
}
when compiled with -O3 -fPIC
with the x86_64 port of GCC generates:
foo():
mov rcx, QWORD PTR global_visible@GOTPCREL[rip]
mov edx, DWORD PTR local[rip]
mov eax, DWORD PTR global_hidden[rip]
add eax, DWORD PTR [rcx]
add eax, edx
ret
As you can see, only global_visible
uses the GOT, global_hidden
and local
don't use it. The "protected" visibility works similarly, it prevents the definition from being superseded but makes it still visible to the dynamic linker so it can be accessed by other components. The "hidden" visibility hides the symbol completely from the dynamic linker.
The necessity of making code relocatable in order allow shared objects to be loaded a different addresses in different process means that statically allocated variables, whether they have global or local scope, can't be accessed with directly with a single instruction on most architectures. The only exception I know of is the 64-bit x86 architecture, as you see above. It supports memory operands that are both PC-relative and have large 32-bit displacements that can reach any variable defined in the same component.
On all the other architectures I'm familiar with accessing variables in position dependent manner requires multiple instructions. How exactly varies greatly by architecture, but it often involves using the GOT. For example, if you compile the example C code above with x86_64 port of GCC using the -m32 -O3 -fPIC
options you get:
foo():
call __x86.get_pc_thunk.dx
add edx, OFFSET FLAT:_GLOBAL_OFFSET_TABLE_
push ebx
mov ebx, DWORD PTR global_visible@GOT[edx]
mov ecx, DWORD PTR local@GOTOFF[edx]
mov eax, DWORD PTR global_hidden@GOTOFF[edx]
add eax, DWORD PTR [ebx]
pop ebx
add eax, ecx
ret
__x86.get_pc_thunk.dx:
mov edx, DWORD PTR [esp]
ret
The GOT is used for all three variable accesses, but if you look closely global_hidden
and local
are handled differently than global_visible
. With the later, a pointer to the variable is accessed through the GOT, with former two variables they're accessed directly through the GOT. This a fairly common trick among architectures where the GOT is used for all position independent variable references.
The 32-bit x86 architecture is exceptional in one way here, since it has large 32-bit displacements and a 32-bit address space. This means that anywhere in memory can be accessed through the GOT base, not just the GOT itself. Most other architectures only support much smaller displacements, which makes the maximum distance something can be from the GOT base much smaller. Other architectures that use this trick will only put small (local/hidden/protected) variables in the GOT itself, large variables are stored outside the GOT and the GOT will contain a pointer to the variable just like with normal visibility global variables.