__attribute__((weak)) and static libraries

To explain what's going on here, let's talk first about your original source files, with

a.h (1):

void foo() __attribute__((weak));

and:

a.c (1):

#include "a.h"
#include <stdio.h>

void foo() { printf("%s\n", __FILE__); }

The mixture of .c and .cpp files in your sample code is irrelevant to the issues, and all the code is C, so we'll say that main.cpp is main.c and do all compiling and linking with gcc:

$ gcc -Wall -c main.c a.c b.c
ar rcs a.a a.o
ar rcs b.a b.o

First let's review the differences between a weakly declared symbol, like your:

void foo() __attribute__((weak));

and a strongly declared symbol, like

void foo();

which is the default:

  • When a weak reference to foo (i.e. a reference to weakly declared foo) is linked in a program, the linker need not find a definition of foo anywhere in the linkage: it may remain undefined. If a strong reference to foo is linked in a program, the linker needs to find a definition of foo.

  • A linkage may contain at most one strong definition of foo (i.e. a definition of foo that declares it strongly). Otherwise a multiple-definition error results. But it may contain multiple weak definitions of foo without error.

  • If a linkage contains one or more weak definitions of foo and also a strong definition, then the linker chooses the strong definition and ignores the weak ones.

  • If a linkage contains just one weak definition of foo and no strong definition, inevitably the linker uses the one weak definition.

  • If a linkage contains multiple weak definitions of foo and no strong definition, then the linker chooses one of the weak definitions arbitrarily.

Next, let's review the differences between inputting an object file in a linkage and inputting a static library.

A static library is merely an ar archive of object files that we may offer to the linker from which to select the ones it needs to carry on the linkage.

When an object file is input to a linkage, the linker unconditionally links it into the output file.

When static library is input to a linkage, the linker examines the archive to find any object files within it that provide definitions it needs for unresolved symbol references that have accrued from input files already linked. If it finds any such object files in the archive, it extracts them and links them into the output file, exactly as if they were individually named input files and the static library was not mentioned at all.

With these observations in mind, consider the compile-and-link command:

gcc main.c a.o b.o

Behind the scenes gcc breaks it down, as it must, into a compile-step and link step, just as if you had run:

gcc -c main.c     # compile
gcc main.o a.o b.o  # link

All three object files are linked unconditionally into the (default) program ./a.out. a.o contains a weak definition of foo, as we can see:

$ nm --defined a.o
0000000000000000 W foo

Whereas b.o contains a strong definition:

$ nm --defined b.o
0000000000000000 T foo

The linker will find both definitions and choose the strong one from b.o, as we can also see:

$ gcc main.o a.o b.o -Wl,-trace-symbol=foo
main.o: reference to foo
a.o: definition of foo
b.o: definition of foo
$ ./a.out
b.c

Reversing the linkage order of a.o and b.o will make no difference: there's still exactly one strong definition of foo, the one in b.o.

By contrast consider the compile-and-link command:

gcc main.cpp a.a b.a

which breaks down into:

gcc -c main.cpp     # compile
gcc main.o a.a b.a  # link                   

Here, only main.o is linked unconditionally. That puts an undefined weak reference to foo into the linkage:

$ nm --undefined main.o
                 w foo
                 U _GLOBAL_OFFSET_TABLE_
                 U puts

That weak reference to foo does not need a definition. So the linker will not attempt to find a definition that resolves it in any of the object files in either a.a or b.a and will leave it undefined in the program, as we can see:

$ gcc main.o a.a b.a -Wl,-trace-symbol=foo
main.o: reference to foo
$ nm --undefined a.out
                 w __cxa_finalize@@GLIBC_2.2.5
                 w foo
                 w __gmon_start__
                 w _ITM_deregisterTMCloneTable
                 w _ITM_registerTMCloneTable
                 U __libc_start_main@@GLIBC_2.2.5
                 U puts@@GLIBC_2.2.5

Hence:

$ ./a.out
no foo

Again, it doesn't matter if you reverse the linkage order of a.a and b.a, but this time it is because neither of them contributes anything to the linkage.

Let's turn now to the different behavior you discovered by changing a.h and a.c to:

a.h (2):

void foo();

a.c (2):

#include "a.h"
#include <stdio.h>

void __attribute__((weak)) foo() { printf("%s\n", __FILE__); }

Once again:

$ gcc -Wall -c main.c a.c b.c
main.c: In function ‘main’:
main.c:4:18: warning: the address of ‘foo’ will always evaluate as ‘true’ [-Waddress]
 int main() { if (foo) foo(); else printf("no foo\n"); }

See that warning? main.o now contains a strongly declared reference to foo:

$ nm --undefined main.o
                 U foo
                 U _GLOBAL_OFFSET_TABLE_

so the code (when linked) must have a non-null address for foo. Proceeding:

$ ar rcs a.a a.o
$ ar rcs b.a b.o

Then try the linkage:

$ gcc main.o a.o b.o
$ ./a.out
b.c

And with the object files reversed:

$ gcc main.o b.o a.o
$ ./a.out
b.c

As before, the order makes no difference. All the object files are linked. b.o provides a strong definition of foo, a.o provides a weak one, so b.o wins.

Next try the linkage:

$ gcc main.o a.a b.a
$ ./a.out
a.c

And with the order of the libraries reversed:

$ gcc main.o b.a a.a
$ ./a.out
b.c

That does make a difference. Why? Let's redo the linkages with diagnostics:

$ gcc main.o a.a b.a -Wl,-trace,-trace-symbol=foo
/usr/bin/x86_64-linux-gnu-ld: mode elf_x86_64
/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/Scrt1.o
/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/crti.o
/usr/lib/gcc/x86_64-linux-gnu/7/crtbeginS.o
main.o
(a.a)a.o
libgcc_s.so.1 (/usr/lib/gcc/x86_64-linux-gnu/7/libgcc_s.so.1)
/lib/x86_64-linux-gnu/libc.so.6
(/usr/lib/x86_64-linux-gnu/libc_nonshared.a)elf-init.oS
/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
libgcc_s.so.1 (/usr/lib/gcc/x86_64-linux-gnu/7/libgcc_s.so.1)
/usr/lib/gcc/x86_64-linux-gnu/7/crtendS.o
/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/crtn.o
main.o: reference to foo
a.a(a.o): definition of foo

Ignoring the default libraries, the only object files of ours that get linked were:

main.o
(a.a)a.o

And the definition of foo was taken from the archive member a.o of a.a:

a.a(a.o): definition of foo

Reversing the library order:

$ gcc main.o b.a a.a -Wl,-trace,-trace-symbol=foo
/usr/bin/x86_64-linux-gnu-ld: mode elf_x86_64
/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/Scrt1.o
/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/crti.o
/usr/lib/gcc/x86_64-linux-gnu/7/crtbeginS.o
main.o
(b.a)b.o
libgcc_s.so.1 (/usr/lib/gcc/x86_64-linux-gnu/7/libgcc_s.so.1)
/lib/x86_64-linux-gnu/libc.so.6
(/usr/lib/x86_64-linux-gnu/libc_nonshared.a)elf-init.oS
/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
libgcc_s.so.1 (/usr/lib/gcc/x86_64-linux-gnu/7/libgcc_s.so.1)
/usr/lib/gcc/x86_64-linux-gnu/7/crtendS.o
/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/crtn.o
main.o: reference to foo
b.a(b.o): definition of foo

This time the object files linked were:

main.o
(b.a)b.o

And the definition of foo was taken from b.o in b.a:

b.a(b.o): definition of foo

In the first linkage, the linker had an unresolved strong reference to foo for which it needed a definition when it reached a.a. So it looked in the archive for an object file that provides a definition, and found a.o. That definition was a weak one, but that didn't matter. No strong definition had been seen. a.o was extracted from a.a and linked, and the reference to foo was thus resolved. Next b.a was reached, where a strong definition of foo would have been found in b.o, if the linker still needed one and looked for it. But it didn't need one any more and didn't look. The linkage:

gcc main.o a.a b.a

is exactly the same as:

gcc main.o a.o

And likewise the linkage:

$ gcc main.o b.a a.a

is exactly the same as:

$ gcc main.o b.o

Your real problem...

... emerges in one of your comments to the post:

I want to override [the] original function implementation when linking with a testing framework.

You want to link a program inputting some static library lib1.a which has some member file1.o that defines a symbol foo, and you want to knock out that definition of foo and link a different one that is defined in some other object file file2.o.

__attribute__((weak)) isn't applicable to that problem. The solution is more elementary. You just make sure to input file2.o to the linkage before you input lib1.a (and before any other input that provides a definition of foo). Then the linker will resolve references to foo with the definition provided in file2.o and will not try to find any other definition when it reaches lib1.a. The linker will not consume lib1.a(file1.o) at all. It might as well not exist.

And what if you have put file2.o in another static library lib2.a? Then inputting lib2.a before lib1.a will do the job of linking lib2.a(file2.o) before lib1.a is reached and resolving foo to the definition in file2.o.

Likewise, of course, every definition provided by members of lib2.a will be linked in preference to a definition of the same symbol provided in lib1.a. If that's not what you want, then don't like lib2.a: link file2.o itself.

Finally

Is it possible to use [the] weak attribute with static linking at all?

Certainly. Here is a first-principles use-case:

foo.h (1)

#ifndef FOO_H
#define FOO_H

int __attribute__((weak)) foo(int i)
{
    return i != 0;
}

#endif

aa.c

#include "foo.h"

int a(void)
{
    return foo(0);
}

bb.c

#include "foo.h"

int b(void)
{
    return foo(42);
}

prog.c

#include <stdio.h>

extern int a(void);
extern int b(void);

int main(void)
{
    puts(a() ? "true" : "false");
    puts(b() ? "true" : "false");
    return 0;
}

Compile all the source files, requesting a seperate ELF section for each function:

$ gcc -Wall -ffunction-sections -c prog.c aa.c bb.c

Note that the weak definition of foo is compiled via foo.h into both aa.o and bb.o, as we can see:

$ nm --defined aa.o
0000000000000000 T a
0000000000000000 W foo
$ nm --defined bb.o
0000000000000000 T b
0000000000000000 W foo

Now link a program from all the object files, requesting the linker to discard unused sections (and give us the map-file, and some diagnostics):

$ gcc prog.o aa.o bb.o -Wl,--gc-sections,-Map=mapfile,-trace,-trace-symbol=foo
/usr/bin/x86_64-linux-gnu-ld: mode elf_x86_64
/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/Scrt1.o
/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/crti.o
/usr/lib/gcc/x86_64-linux-gnu/7/crtbeginS.o
prog.o
aa.o
bb.o
libgcc_s.so.1 (/usr/lib/gcc/x86_64-linux-gnu/7/libgcc_s.so.1)
/lib/x86_64-linux-gnu/libc.so.6
(/usr/lib/x86_64-linux-gnu/libc_nonshared.a)elf-init.oS
/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
libgcc_s.so.1 (/usr/lib/gcc/x86_64-linux-gnu/7/libgcc_s.so.1)
/usr/lib/gcc/x86_64-linux-gnu/7/crtendS.o
/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/crtn.o
aa.o: definition of foo

This linkage is no different from:

$ ar rcs libaabb.a aa.o bb.o
$ gcc prog.o libaabb.a

Despite the fact that both aa.o and bb.o were loaded, and each contains a definition of foo, no multiple-definition error results, because each definition is weak. aa.o was loaded before bb.o and the definition of foo was linked from aa.o.

So what happened to the definition of foo in bb.o? The mapfile shows us:

mapfile (1)

...
...
Discarded input sections
...
...
 .text.foo      0x0000000000000000       0x13 bb.o
...
...

The linker discarded the function section that contained the definition in bb.o

Let's reverse the linkage order of aa.o and bb.o:

$ gcc prog.o bb.o aa.o -Wl,--gc-sections,-Map=mapfile,-trace,-trace-symbol=foo
...
prog.o
bb.o
aa.o
...
bb.o: definition of foo

And now the opposite thing happens. bb.o is loaded before aa.o. The definition of foo is linked from bb.o and:

mapfile (2)

...
...
Discarded input sections
...
...
 .text.foo      0x0000000000000000       0x13 aa.o
...
...

the definition from aa.o is chucked away.

There you see how the linker arbitrarily chooses one of multiple weak definitions of a symbol, in the absence of a strong definition. It simply picks the first one you give it and ignores the rest.

What we've just done here is effectively what the GCC C++ compiler does for us when we define a global inline function. Rewrite:

foo.h (2)

#ifndef FOO_H
#define FOO_H

inline int foo(int i)
{
    return i != 0;
}

#endif

Rename our source files *.c -> *.cpp; compile and link:

$ g++ -Wall -c prog.cpp aa.cpp bb.cpp

Now there is a weak definition of foo (C++ mangled) in each of aa.o and bb.o:

$ nm --defined aa.o bb.o

aa.o:
0000000000000000 T _Z1av
0000000000000000 W _Z3fooi

bb.o:
0000000000000000 T _Z1bv
0000000000000000 W _Z3fooi

The linkage uses the first definition it finds:

$ g++ prog.o aa.o bb.o -Wl,-Map=mapfile,-trace,-trace-symbol=_Z3fooi
...
prog.o
aa.o
bb.o
...
aa.o: definition of _Z3fooi
bb.o: reference to _Z3fooi

and throws away the other one:

mapfile (3)

...
...
Discarded input sections
...
...
 .text._Z3fooi  0x0000000000000000       0x13 bb.o
...
...

And as you may know, every instantiation of the C++ function template in global scope (or instantiation of a class template member function) is an inline global function. Rewrite again:

#ifndef FOO_H
#define FOO_H

template<typename T>
T foo(T i)
{
    return i != 0;
}

#endif

Recompile:

$ g++ -Wall -c prog.cpp aa.cpp bb.cpp

Again:

$ nm --defined aa.o bb.o

aa.o:
0000000000000000 T _Z1av
0000000000000000 W _Z3fooIiET_S0_

bb.o:
0000000000000000 T _Z1bv
0000000000000000 W _Z3fooIiET_S0_

each of aa.o and bb.o has a weak definition of:

$ c++filt _Z3fooIiET_S0_
int foo<int>(int)

and the linkage behaviour is now familiar. One way:

$ g++ prog.o aa.o bb.o -Wl,-Map=mapfile,-trace,-trace-symbol=_Z3fooIiET_S0_
...
prog.o
aa.o
bb.o
...
aa.o: definition of _Z3fooIiET_S0_
bb.o: reference to _Z3fooIiET_S0_

and the other way:

$ g++ prog.o bb.o aa.o -Wl,-Map=mapfile,-trace,-trace-symbol=_Z3fooIiET_S0_
...
prog.o
bb.o
aa.o
...
bb.o: definition of _Z3fooIiET_S0_
aa.o: reference to _Z3fooIiET_S0_

Our program's behavior is unchanged by the rewrites:

$ ./a.out
false
true

So the application of the weak attribute to symbols in the linkage of ELF objects - whether static or dynamic - enables the GCC implementation of C++ templates for the GNU linker. You could fairly say it enables the GCC implementation of modern C++.


I find that here is the best explanation:

The linker will only search through libraries to resolve a reference if it cannot resolve that reference after searching all input objects. If required, the libraries are searched from left to right according to their position on the linker command line. Objects within the library will be searched by the order in which they were archived. As soon as armlink finds a symbol match for the reference, the searching is finished, even if it matches a weak definition.
The ELF ABI section 4.6.1.2 says:
"A weak definition does not change the rules by which object files are selected from libraries. However, if a link set contains both a weak definition and a non-weak definition, the non-weak definition will always be used."
The "link set" is the set of objects that have been loaded by the linker. It does not include objects from libraries that are not required.
Therefore archiving two objects where one contains the weak definition of a given symbol and the other contains the non-weak definition of that symbol, into a library or separate libraries, is not recommended.

Observe the following. Basically renamed mv a.c definition.c mv b.c noweak.c and mv second_a.c declaration.c.

> for i in Makefile *.c; do echo "cat $i <<EOF"; cat $i; echo EOF; done
cat Makefile <<EOF
tgt=
tgt+=only_weak_1.out only_weak_2.out
tgt+=definition.out declaration.out noweak.out
tgt+=definition_static.out declaration_static.out noweak_static.out
tgt+=1.out 2.out 3.out 4.out
tgt+=5.out 6.out 7.out 8.out
tgt+=10.out 11.out 12.out
tgt+=13.out
tgt+=14.out

only_weak_1_obj= definition.o declaration.o
only_weak_2_obj= declaration.o definition.o
definition_obj= definition.o
declaration_obj= declaration.o
noweak_obj= noweak.o
definition_static_obj= definition.a
declaration_static_obj= declaration.a
noweak_static_obj= noweak.a
1_obj= declaration.o noweak.o
2_obj= noweak.o declaration.o
3_obj= declaration.a noweak.a
4_obj= noweak.a declaration.a
5_obj= definition.o noweak.o
6_obj= noweak.o definition.o
7_obj= definition.a noweak.a
8_obj= noweak.a definition.a
10_obj= noweak.a definition.a declaration.a
11_obj= definition.a declaration.a noweak.a
12_obj= declaration.a definition.a noweak.a
13_obj= all.a
14_obj= all.o


.PRECIOUS: % %.o %.c %.a
def: run
.PHONY: run
run: $(tgt)
    { $(foreach a,$^,echo "$($(a:.out=)_obj)#->#$(a)#:#$$(./$(a))";) } | { echo; column -t -s'#' -N 'objects, ,executable, ,output' -o' '; echo; }
.SECONDEXPANSION:
%.out: main.o $$(%_obj) 
    $(CC) -o $@ $^
%.o: %.c
    $(CC) -c -o $@ $^
%.a: %.o
    ar cr $@ $^
all.a: declaration.o definition.o noweak.o
    ar cr $@ $^
all.o: declaration.o definition.o noweak.o
    $(LD) -i -o $@ $^
clean:
    rm -fv *.o *.a *.out
EOF

cat declaration.c <<EOF
#include <stdio.h>
__attribute__((__weak__)) void foo();
void foo() { printf("%s\n", __FILE__); }
EOF
cat definition.c <<EOF
#include <stdio.h>
__attribute__((__weak__)) void foo() { printf("%s\n", __FILE__); }
EOF
cat main.c <<EOF
#include <stdio.h>
void foo();
int main() {
    if (foo) foo(); else printf("no foo\n");
    return 0;
}
EOF
cat noweak.c <<EOF
#include <stdio.h>
void foo() { printf("%s\n", __FILE__); }
EOF

> make
cc -c -o definition.o definition.c
cc -c -o declaration.o declaration.c
cc -c -o main.o main.c
cc -o only_weak_1.out main.o definition.o declaration.o
cc -o only_weak_2.out main.o declaration.o definition.o
cc -o definition.out main.o definition.o
cc -o declaration.out main.o declaration.o
cc -c -o noweak.o noweak.c
cc -o noweak.out main.o noweak.o
ar cr definition.a definition.o
cc -o definition_static.out main.o definition.a
ar cr declaration.a declaration.o
cc -o declaration_static.out main.o declaration.a
ar cr noweak.a noweak.o
cc -o noweak_static.out main.o noweak.a
cc -o 1.out main.o declaration.o noweak.o
cc -o 2.out main.o noweak.o declaration.o
cc -o 3.out main.o declaration.a noweak.a
cc -o 4.out main.o noweak.a declaration.a
cc -o 5.out main.o definition.o noweak.o
cc -o 6.out main.o noweak.o definition.o
cc -o 7.out main.o definition.a noweak.a
cc -o 8.out main.o noweak.a definition.a
cc -o 10.out main.o noweak.a definition.a declaration.a
cc -o 11.out main.o definition.a declaration.a noweak.a
cc -o 12.out main.o declaration.a definition.a noweak.a
ar cr all.a declaration.o definition.o noweak.o
cc -o 13.out main.o all.a
ld -i -o all.o declaration.o definition.o noweak.o
cc -o 14.out main.o all.o
{ echo "definition.o declaration.o#->#only_weak_1.out#:#$(./only_weak_1.out)"; echo "declaration.o definition.o#->#only_weak_2.out#:#$(./only_weak_2.out)"; echo "definition.o#->#definition.out#:#$(./definition.out)"; echo "declaration.o#->#declaration.out#:#$(./declaration.out)"; echo "noweak.o#->#noweak.out#:#$(./noweak.out)"; echo "definition.a#->#definition_static.out#:#$(./definition_static.out)"; echo "declaration.a#->#declaration_static.out#:#$(./declaration_static.out)"; echo "noweak.a#->#noweak_static.out#:#$(./noweak_static.out)"; echo "declaration.o noweak.o#->#1.out#:#$(./1.out)"; echo "noweak.o declaration.o#->#2.out#:#$(./2.out)"; echo "declaration.a noweak.a#->#3.out#:#$(./3.out)"; echo "noweak.a declaration.a#->#4.out#:#$(./4.out)"; echo "definition.o noweak.o#->#5.out#:#$(./5.out)"; echo "noweak.o definition.o#->#6.out#:#$(./6.out)"; echo "definition.a noweak.a#->#7.out#:#$(./7.out)"; echo "noweak.a definition.a#->#8.out#:#$(./8.out)"; echo "noweak.a definition.a declaration.a#->#10.out#:#$(./10.out)"; echo "definition.a declaration.a noweak.a#->#11.out#:#$(./11.out)"; echo "declaration.a definition.a noweak.a#->#12.out#:#$(./12.out)"; echo "all.a#->#13.out#:#$(./13.out)"; echo "all.o#->#14.out#:#$(./14.out)"; } | { echo; column -t -s'#' -N 'objects, ,executable, ,output' -o' '; echo; }

objects                                executable               output
definition.o declaration.o          -> only_weak_1.out        : definition.c
declaration.o definition.o          -> only_weak_2.out        : declaration.c
definition.o                        -> definition.out         : definition.c
declaration.o                       -> declaration.out        : declaration.c
noweak.o                            -> noweak.out             : noweak.c
definition.a                        -> definition_static.out  : definition.c
declaration.a                       -> declaration_static.out : declaration.c
noweak.a                            -> noweak_static.out      : noweak.c
declaration.o noweak.o              -> 1.out                  : noweak.c
noweak.o declaration.o              -> 2.out                  : noweak.c
declaration.a noweak.a              -> 3.out                  : declaration.c
noweak.a declaration.a              -> 4.out                  : noweak.c
definition.o noweak.o               -> 5.out                  : noweak.c
noweak.o definition.o               -> 6.out                  : noweak.c
definition.a noweak.a               -> 7.out                  : definition.c
noweak.a definition.a               -> 8.out                  : noweak.c
noweak.a definition.a declaration.a -> 10.out                 : noweak.c
definition.a declaration.a noweak.a -> 11.out                 : definition.c
declaration.a definition.a noweak.a -> 12.out                 : declaration.c
all.a                               -> 13.out                 : declaration.c
all.o                               -> 14.out                 : noweak.c

In case only weak symbols are used (case only_weak_1 and only_weak_2) the first definition is used.
In case of only static libraries (case 3, 4, 7, 8, 10, 11, 12, 13) the first definition is used.
In case only object files are used (cases 1, 2, 5, 6, 14) the weak symbols are omitted and only the symbol from noweak is used.
From the link I provided:

There are different ways to guarantee armlink selecting a non-weak version of a given symbol:
- Do not archive such objects
- Ensure that the weak and non-weak symbols are contained within the same object before archiving
- Use partial linking as an alternative.

Tags:

C++

G++