Reconstruction of an accurate control flow graph from machine code, even at link time, is in fact an undecidable problem. For a quick overview of the difficulties involved, see the FAQ item "How do you reconstruct the control flow graph from the object files at link time?".
To make CFG construction more reliable, we apply some patches to the tool chain:
- In order to let Diablo differentiate easily between
instructions and embedded data in the code section, we emit
markers that indicate data in the code section. - The GNU assembler uses a technique called symbol relaxing to
speed up the linking process. Unfortunately, a lot of information
about the relocations that is important to Diablo (but not to a
regular linker) is lost when applying this technique. Our patch
disables this relaxing. - With a small patch to the GCC specs file (which unfortunately
introduces a dependency on Perl), we make the compiler emit
markers for inline assembly code. These markers help Diablo in
judging whether or not its optimizations may rely on calling
conventions for a particular piece of code. - We also patch ld/libbfd to turn off string table compaction.
There is no fundamental need to do this, we just haven't yet found the
time to support this feature in Diablo.