Where are the symbols etext, edata and end defined?

Note that on Mac OS X, the code above may not work! Instead you can have:

#include <stdio.h>
#include <stdlib.h>
#include <mach-o/getsect.h>

int main(int argc, char *argv[])
{
    printf("    program text (etext)      %10p\n", (void*)get_etext());
    printf("    initialized data (edata)  %10p\n", (void*)get_edata());
    printf("    uninitialized data (end)  %10p\n", (void*)get_end());

    exit(EXIT_SUCCESS);
}

These symbols are defined in a linker script file (dead link copy at archive.org).


What GCC does

Expanding kgiannakakis a bit more.

Those symbols are defined by the PROVIDE keyword of the linker script, documented at https://sourceware.org/binutils/docs-2.25/ld/PROVIDE.html#PROVIDE

The default scripts are generated when you build Binutils, and embedded into the ld executable: external files that may be installed in your distribution like in /usr/lib/ldscripts are not used by default.

Echo the linker script to be used:

ld -verbose | less

In binutils 2.24 it contains:

.text           :
{
  *(.text.unlikely .text.*_unlikely .text.unlikely.*)
  *(.text.exit .text.exit.*)
  *(.text.startup .text.startup.*)
  *(.text.hot .text.hot.*)
  *(.text .stub .text.* .gnu.linkonce.t.*)
  /* .gnu.warning sections are handled specially by elf32.em.  */
  *(.gnu.warning)
}
.fini           :
{
  KEEP (*(SORT_NONE(.fini)))
}
PROVIDE (__etext = .);
PROVIDE (_etext = .);
PROVIDE (etext = .);
.rodata         : { *(.rodata .rodata.* .gnu.linkonce.r.*) }
.rodata1        : { *(.rodata1) }

So we also discover that:

  • __etext and _etext will also work
  • etext is not the end of the .text section, but rather .fini, which also contains code
  • etext is not at the end of the segment, with .rodata following it, since Binutils dumps all readonly sections into the same segment

PROVIDE generates weak symbols: if you also define those symbols in your C code, your definition will win and hide this one.

Minimal Linux 32-bit example

To truly understand how things work, I like to create minimal examples!

main.S:

.section .text
    /* Exit system call. */
    mov $1, %eax
    /* Exit status. */
    mov sdata, %ebx
    int $0x80
.section .data
    .byte 2

link.ld:

SECTIONS
{
    . = 0x400000;
    .text :
    {
        *(.text)
        sdata = .;
        *(.data)
    }
}

Compile and run:

gas --32 -o main.o main.S
ld -m elf_i386 -o main -T link.ld main.o
./main
echo $?

Output:

 2

Explanation: sdata points to the first byte of the start of the .data section that follows.

So by controlling the first byte of that section, we control the exit status!

This example on GitHub.

Tags:

C

Gcc

Linker

Ld