Static Compilation. Where is my printf?

Static compilation is the process of compiling a computer program in such a way that all the library code that the program depends on is included within the program's executable file. This is done by linking the program with static libraries (.a files in Unix-like systems, .lib files in Windows) rather than dynamic libraries (.so files in Unix-like systems, .dll files in Windows).

When you statically compile a program, the compiler includes copies of all the routines the program uses directly into the executable. The advantages of static compilation include:

  • Portability: The resulting executable is self-contained, which means it does not depend on the system's shared libraries and can be run on any compatible system without additional dependencies.

  • Performance: Sometimes, statically compiled programs can run slightly faster because they don't incur the overhead of dynamic linking at runtime.

  • Reliability: Since all the code the program needs is contained within its own executable, it's not susceptible to issues like "dependency hell" or problems arising from the wrong version of a shared library being present on the system.

However, there are also disadvantages:

  • Size: Statically compiled executables are typically larger because they include all the code they use, rather than sharing common libraries across the system.

  • Updates: If a library has a bug that is fixed or improved, you need to recompile the entire program with the updated static library to benefit from the changes. With dynamic libraries, you can simply update the library on the system.

  • Memory Usage: Multiple running instances of statically compiled programs do not share common library code in memory, leading to higher memory usage.

Overall, static compiling allows for the creation of executables that are self-contained and include all required library code. This approach has its advantages in terms of simplicity and reliability, but it can lead to larger file sizes and the possibility of redundancy.

On a Linux system, tools like objdump and nm can be used to examine statically built C programs to find the printf function and its call locations within the binary. Allow me to show you the way:

  1. Compile the Program Statically: First, you need to compile your C program statically. You can do this using the -static flag with gcc:

     gcc -static -o myprogram myprogram.c
    
  2. Identify the printf Function in the Binary: Use the nm tool to list symbols in the binary. The printf function will be included in the binary since it's statically linked:

     nm --defined-only myprogram | grep ' printf'
    

    This should give you the address of the printf function within your binary.

  3. Disassemble the Binary: Use objdump to disassemble the binary and find the printf code:

     objdump -d myprogram > myprogram.asm
    

    Then you can search for the address found with nm in the myprogram.asm file to see the disassembled code for printf.

  4. Find Calls to printf: To find where printf is being called from, you can search for the call instruction in the disassembly:

     grep -B 5 'call.*<printf>' myprogram.asm
    

    The -B 5 flag will show you 5 lines before the call instruction, which can help you identify the calling function. The output will show you the addresses of the instructions that are calling printf.

  5. Analyze the Call Sites: Each call site will have an address in the disassembly. You can look around these addresses to understand the context of the call, such as which function is making the call and what parameters are being passed.

You can see the assembly language representation of the machine code—the actual binary code for printf—in the disassembly. The results of objdump could be difficult to understand if you aren't an expert in assembly language.

Following these steps will assume that you are using an AMD64 machine and have the necessary tools to work in an environment similar to Unix. The process and tools you use may change depending on the system or architecture you're working with.