Is there a way to verify a binary against the sources?
Compilation is a mostly one-way operation, and it is not deterministic, at least not in a robust way.
You could recompile the source code and see if it yields the same binary. However, the exact binary can vary depending on a lot of parameters, including the compilation options and the exact version of the used compiler. Moreover, some compilers embed some "comments" in binary files, comments which usually include the compiler version but also may include the "build number" (if such a number is maintained) and, possibly, the build date and time -- in that case, you will not get the same binary, not down to the last byte. If you want to see if you got the "same" binary, you may thus have to first strip them of such comments (the Unix strip
command may be useful).
Strictly speaking, compilation could be randomized; since generating optimal code is a hard problem, some compilers employ randomized algorithms which, heuristically, are good on average. Such a compiler could generate a distinct binary each time. Since such behaviour makes debugging much harder, many compilers who indulge in heuristic algorithms will still try to be reproducible (i.e. they will get their randomness from a PRNG seeded with a specific, configurable value).
There is a much simpler solution: if you have the source code and can recompile it, then just use the output of your recompilation.
Of course, this does not completely solves the problem of trust; it just moves it around. When compiling from source:
- you have to trust that the source code does not contain backdoors;
- you have to trust the compiler itself for not playing nasty tricks on you.
At least, source code is nominally readable by humans (that's the point of source code), so you could perform some analysis of the code by reading it (or having it read by some specialist that you trust). There is no known way to make sure that a given piece of code does not contain any backdoor or vulnerability (otherwise, this would mean that we known how to produce bug-free code); however, it is much harder to conceal a backdoor in source code than in a compiled binary.
As for the compiler, see this very classic article.
The concept of reproducible builds seems to offer a solution for this problem. At least a theoretical one.
It means that every run of a build (or compilation) process should return the identical output, given that the input source was the same.
With it, every newly published binary could be cross-checked by me or others if it really represents the source code it claims to represent.
However there are only a few projects (in February 2017) that implemented this concept in their build processes already (mainly operating systems). So in most cases this solution is still a theoretical one.