Remove dead code when linking static library into dynamic library
You can use a version script to mark the entry points in combination with -ffunction-sections
and --gc-sections
.
For example, consider this C file (example.c
):
int
foo (void)
{
return 17;
}
int
bar (void)
{
return 251;
}
And this version script, called version.script
:
{
global: foo;
local: *;
};
Compile and link the sources like this:
gcc -Wl,--gc-sections -shared -ffunction-sections -Wl,--version-script=version.script example.c
If you look at the output of objdump -d --reloc a.out
, you will notice that only foo
is included in the shared object, but not bar
.
When removing functions in this way, the linker will take indirect dependencies into account. For example, if you turn foo
into this:
void *
foo (void)
{
extern int bar (void);
return bar;
}
the linker will put both foo
and bar
into the shared object because both are needed, even though only bar
is exported.
(Obviously, this will not work on all platforms, but ELF supports this.)
You're creating a library, and your symbols aren't static, so it's normal that the linker doesn't remove any global symbols.
This -gc-sections
option is designed for executables. The linker starts from the entrypoint (main
) and discovers the function calls. It marks the sections that are used, and discards the others.
A library doesn't have 1 entrypoint, it has as many entrypoints as global symbols, which explains that it cannot clean your symbols. What if someone uses your .h
file in his program and calls the "unused" functions?
To find out which functions aren't "used", I'd suggest that you convert void func_in_my_prog()
to int main()
(or copy the source into a modified one containing a main()
), then create an executable with the sources, and add -Wl,-Map=mapfile.txt
option when linking to create a mapfile.
gcc -Wl,--gc-sections -Wl,--Map=mapfile.txt -fdata-sections -ffunction-sections libmy_static_lib.c my_prog.c
This mapfile contains the discarded symbols:
Discarded input sections
.drectve 0x00000000 0x54 c:/gnatpro/17.1/bin/../lib/gcc/i686-pc-mingw32/6.2.1/crt2.o
.drectve 0x00000000 0x1c c:/gnatpro/17.1/bin/../lib/gcc/i686-pc-
...
.text$unused_func1
0x00000000 0x14 C:\Users\xx\AppData\Local\Temp\ccOOESqJ.o
.text$unused_func2
0x00000000 0x14 C:\Users\xx\AppData\Local\Temp\ccOOESqJ.o
.rdata$zzz 0x00000000 0x38 C:\Users\xx\AppData\Local\Temp\ccOOESqJ.o
...
now we see that the unused functions have been removed. They don't appear in the final executable anymore.
There are existing tools that do that (using this technique but not requiring a main
), for instance Callcatcher. One can also easily create a tool to disassemble the library and check for symbols defined but not called (I've written such tools in python several times and it's so much easier to parse assembly than from high-level code)
To cleanup, you can delete the unused functions manually from your sources (one must be careful with object-oriented languages and dispatching calls when using existing/custom assembly analysis tools. On the other hand, the compiler isn't going to remove a section that could be used, so that is safe)
You can also remove the relevant sections in the library file, avoiding to change source code, for instance by removing sections:
$ objcopy --remove-section .text$unused_func1 --remove-section text$unused_func2 libmy_static_lib.a stripped.a
$ nm stripped.a
libmy_static_lib.o:
00000000 b .bss
00000000 d .data
00000000 r .rdata
00000000 r .rdata$zzz
00000000 t .text
00000000 t .text$func1
00000000 T _func1
U _puts