How to decrease the size of generated binaries?

Apart from the obvious (-Os -s), aligning functions to the smallest possible value that will not crash (I don't know ARM alignment requirements) might squeeze out a few bytes per function.
-Os should already disable aligning functions, but this might still default to a value like 4 or 8. If aligning e.g. to 1 is possible with ARM, that might save some bytes.

-ffast-math (or the less abrasive -fno-math-errno) will not set errno and avoid some checks, which reduces code size. If, like most people, you don't read errno anyway, that's an option.

Properly using __restrict (or restrict) and const removes redundant loads, making code both faster and smaller (and more correct). Properly marking pure functions as such eleminates function calls.

Enabling LTO may help, and if that is not available, compiling all source files into a binary in one go (gcc foo.c bar.c baz.c -o program instead of compiling foo.c, bar.c, and baz.c to object files first and then linking) will have a similar effect. It makes everything visible to the optimizer at one time, possibly allowing it to work better.

-fdelete-null-pointer-checks may be an option (note that this is normally enabled with any "O", but not on embedded targets).

Putting static globals (you hopefully don't have that many, but still) into a struct can eleminate a lot of overhead initializing them. I learned that when writing my first OpenGL loader. Having all the function pointers in a struct and initializing the struct with = {} generates one call to memset, whereas initializing the pointers the "normal way" generates a hundred kilobytes of code just to set each one to zero individually.

Avoid non-trivial-constructor static local variables like the devil (POD types are no problem). Gcc will initialize non-trivial-constructor static locals threadsafe unless you compile with -fno-threadsafe-statics, which links in a lot of extra code (even if you don't use threads at all).

Using something like libowfat instead of the normal crt can greatly reduce your binary size.


If you want to squeeze every last drop of space out of your binaries, you'll probably have to learn assembly. For a very interesting (and entertaining) intro, see this link:

A Whirlwind Tutorial on Creating Really Teensy ELF Executables for Linux


You can also use -nostartfiles and/or -nodefaultlibs or the combo of both -nostdlib. In case you don't want a standard start file, you must write your own _start function then. See also this thread (archived) on oompf:

(quoting Perrin)

# man syscalls
# cat phat.cc
extern "C" void _start() {
        asm("int $0x80" :: "a"(1), "b"(42));
}
# g++ -fno-exceptions -Os -c phat.cc
# objdump -d phat.o

phat.o:     file format elf64-x86-64

Disassembly of section .text:

0000000000000000 <_start>:
   0:   53                      push   %rbx
   1:   b8 01 00 00 00          mov    $0x1,%eax
   6:   bb 2a 00 00 00          mov    $0x2a,%ebx
   b:   cd 80                   int    $0x80
   d:   5b                      pop    %rbx
   e:   c3                      retq
# ld -nostdlib -nostartfiles phat.o -o phat
# sstrip phat
# ls -l phat
-rwxr-xr-x 1 tbp src 294 2007-04-11 22:47 phat
# ./phat; echo $?
42

Summary: Above snippet yielded a binary of 294 bytes, each byte 8 bits.


Assuming that another tool is also allowed ;-)

Then consider UPX: the Ultimate Packer for Binaries which uses runtime decompression.

Happy coding.