Generating Small Static Binaries

I’ve recently seen some shell script that tries to test for your OS architecture by running executables encoded within. There’s one for i386 (x86 platforms) and a few for ARM variants.

test_i386()
{
cat << EOF | $_base64 > /tmp/archtest && chmod a+x /tmp/archtest
f0VMRgEBAQAAAAAAAAAAAAIAAwABAAAA5oAECDQAAACoEAAAAAAAADQAIAAEAC
AAAAAIAECACABAgUBwAAFAcAAAUAAAAAEAAAAQAAANwPAADcnwQI3J8ECDgAAA
  .  .  .
AAAAAAAgAAAAAAAAAFYAAAABAAAAMAAAAAAAAAAUEAAAMgAAAAAAAAAAAAAAAQ
AwAAAAAAAAAAAAAARhAAAF8AAAAAAAAAAAAAAAEAAAAAAAAA
EOF
/tmp/archtest > /dev/null 2>&1 && arch=i386
}

What is terrible (well, to me at least) is that these executables are huge:

$ ls -l *.bin
-rwxrwx--- 1 root root 4832 Jul 17 06:58 archtest-armv6.bin
-rwxrwx--- 1 root root 4820 Jul 17 06:59 archtest-armv7.bin
-rwxrwx--- 1 root root 4992 Jul 17 06:59 archtest-armv8.bin
-rwxrwx--- 1 root root 4824 Jul 17 06:57 archtest-x86.bin

For one, I’m not sure why inspecting /proc/cpuinfo or uname -a is not really sufficient for their needs. And also, why such large binaries are required.

After all, what you want to do is just to check that it executes successfully. Were they trying to test for the presence of a working libc? Nope, because the binaries are statically-linked:

$ file ./archtest-x86.bin
./archtest-x86.bin: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), statically linked, stripped

I think this just adds unnecessary bloat.

There are ways to make smaller binaries.

Now, I am not talking about crazy techniques like using assembly language instead of C, or making a weird ELF that might load only on Linux, but just using normal C and the standard gcc and binutils.

Let’s get started.

Smaller Binaries

The main job of this minimal binary is to exit with an exit code; that is signaled to the Linux kernel via a syscall. Process creation and termination is handled by the kernel, and making system calls (syscalls) is the most direct way of talking to the kernel.

Since that’s all we really need to do, we don’t need the main() function, where parsing of argc and argv happens. We just need a symbol that indicates to gcc where the start of the executable code should be.

#include <sys/syscall.h>
#include <unistd.h>

void _start() {
    syscall(SYS_exit, 0);
}

If you have read any of the linked articles, you will know that making an exit syscall directly will remove a lot of boilerplate code that gcc adds for a normal program.

We then need to compile this C file with some flags to tell gcc to not expect, nor include a main function:

$ gcc -Os -nostartfiles t.c

$ ls -l
total 36
drwxrwxr-x 2 d d 4096 Jul 18 07:58 ./
drwxr-xr-x 9 d d 4096 Jul 18 06:20 ../
-rwxrwxr-x 1 d d 6256 Jul 18 07:58 a.out*
-rw-rw-r-- 1 d d   87 Jul 18 07:57 t.c

It’s still quite big, 6256 bytes, even bigger than the original program.

That’s because it’s a dynamic binary that will try to perform runtime linking. Let’s remove that included code by specifying -static.

$ gcc -Os -nostartfiles -static t.c

$ ls -l
total 32
drwxrwxr-x 2 d d 4096 Jul 18 08:00 ./
drwxr-xr-x 9 d d 4096 Jul 18 06:20 ../
-rwxrwxr-x 1 d d 1560 Jul 18 08:00 a.out*

1560 bytes. That’s already much better.

Let’s see what’s in the ELF binary:

$ readelf -S a.out
There are 9 section headers, starting at offset 0x3d8:

Section Headers:
  [Nr] Name              Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
  [ 0]                   NULL             0000000000000000  00000000
       0000000000000000  0000000000000000           0     0     0
  [ 1] .note.gnu.build-i NOTE             0000000000400120  00000120
       0000000000000024  0000000000000000   A       0     0     4
  [ 2] .text             PROGBITS         0000000000400150  00000150
       0000000000000043  0000000000000000  AX       0     0     16
  [ 3] .eh_frame         PROGBITS         0000000000400198  00000198
       0000000000000044  0000000000000000   A       0     0     8
  [ 4] .tbss             NOBITS           0000000000601000  000001dc
       0000000000000004  0000000000000000 WAT       0     0     4
  [ 5] .comment          PROGBITS         0000000000000000  000001dc
       000000000000002b  0000000000000001  MS       0     0     1
  [ 6] .symtab           SYMTAB           0000000000000000  00000208
       0000000000000150  0000000000000018           7     7     8
  [ 7] .strtab           STRTAB           0000000000000000  00000358
       0000000000000032  0000000000000000           0     0     1
  [ 8] .shstrtab         STRTAB           0000000000000000  0000038a
       000000000000004d  0000000000000000           0     0     1

Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
  L (link order), O (extra OS processing required), G (group), T (TLS),
  C (compressed), x (unknown), o (OS specific), E (exclude),
  l (large), p (processor specific)

The only section that really matters here is .text, where our executable code is. Everything else is technically deadweight.

Let’s see what can be removed.

Removing Useless Sections

The first thing to remove is information that aids debugging. These are basically where the function boundaries are, the names of the functions, etc.

You can do this by calling strip:

$ strip a.out

$ ls -l
total 32
drwxrwxr-x 2 d d 4096 Jul 18 08:02 ./
drwxr-xr-x 9 d d 4096 Jul 18 06:20 ../
-rwxrwxr-x 1 d d 1032 Jul 18 08:02 a.out*

We can try to remove other sections that are not really required. One of them is the build ID that is generated. We can pass --build-id=none to the linker, specified through gcc like so:

$ gcc -Os -nostartfiles -static -s -Wl,--build-id=none t.c

$ ls -l
total 32
drwxrwxr-x 2 d d 4096 Jul 18 08:04 ./
drwxr-xr-x 9 d d 4096 Jul 18 06:20 ../
-rwxrwxr-x 1 d d  856 Jul 18 08:04 a.out*

Note that we can skip the separate strip call by telling gcc to do that using -s.

856 bytes. Now we are below the 1000 bytes mark.

There are still a few more sections that were added by gcc. There does not seem to be any arguments that can be passed to omit those. One that stands out immediately is .comment:

$ readelf -S a.out
There are 6 section headers, starting at offset 0x1d8:

Section Headers:
  [Nr] Name              Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
  [ 0]                   NULL             0000000000000000  00000000
       0000000000000000  0000000000000000           0     0     0
  [ 1] .text             PROGBITS         00000000004000f0  000000f0
       0000000000000043  0000000000000000  AX       0     0     16
  [ 2] .eh_frame         PROGBITS         0000000000400138  00000138
       0000000000000044  0000000000000000   A       0     0     8
  [ 3] .tbss             NOBITS           0000000000601000  0000017c
       0000000000000004  0000000000000000 WAT       0     0     4
  [ 4] .comment          PROGBITS         0000000000000000  0000017c
       000000000000002b  0000000000000001  MS       0     0     1
  [ 5] .shstrtab         STRTAB           0000000000000000  000001a7
       000000000000002a  0000000000000000           0     0     1

We now need to use strip again to remove those sections post-compilation:

$ strip -R .comment -R .eh_frame -R .tbss -s a.out

$ ls -l
total 32
drwxrwxr-x 2 d d 4096 Jul 18 08:08 ./
drwxr-xr-x 9 d d 4096 Jul 18 06:20 ../
-rwxrwxr-x 1 d d  520 Jul 18 08:08 a.out*

520 bytes.

That seems to be as far as we can go. Now that the most of the sections are gone, there are no more redundant sections left in the ELF to get rid of.

$ readelf -S a.out
There are 3 section headers, starting at offset 0x148:

Section Headers:
  [Nr] Name              Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
  [ 0]                   NULL             0000000000000000  00000000
       0000000000000000  0000000000000000           0     0     0
  [ 1] .text             PROGBITS         00000000004000f0  000000f0
       0000000000000043  0000000000000000  AX       0     0     16
  [ 2] .shstrtab         STRTAB           0000000000000000  00000133
       0000000000000011  0000000000000000           0     0     1

Maybe one final act of desperation. The section names don’t really matter, since the OS just needs to load the correct segments into memory. Let’s rename the sections, and since objcopy can also work as a strip replacement, we shall just use that.

$ objcopy -R .comment -R .eh_frame -R .tbss --rename-section .text=a a.out

$ ls -l
total 32
drwxrwxr-x 2 d d 4096 Jul 18 08:09 ./
drwxr-xr-x 9 d d 4096 Jul 18 06:20 ../
-rwxrwxr-x 1 d d  512 Jul 18 08:09 a.out*

$ readelf -S a.out
There are 3 section headers, starting at offset 0x140:

Section Headers:
  [Nr] Name              Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
  [ 0]                   NULL             0000000000000000  00000000
       0000000000000000  0000000000000000           0     0     0
  [ 1] a                 PROGBITS         00000000004000f0  000000f0
       0000000000000043  0000000000000000  AX       0     0     16
  [ 2] .shstrtab         STRTAB           0000000000000000  00000133
       000000000000000d  0000000000000000           0     0     1

The final size is 512 bytes. That is nearly 9.5x smaller than the original.

Because this method is portable, it can easily be applied to all the 4 binaries embedded in the shell script.

I won’t be describing the process here, but doing the same steps on ARM results in a binary that is 592 bytes. That’s similar to the savings we achieved for x86, using the exact same process, hence “portable”.

And since this is being included into a shell script, why not use gzip to compress the binary even further?

$ gzip -9 < a.out | wc --bytes
202

One worry is that maybe gunzip might not be available on the target platform, which could be the case on some embedded device.

But there. 202 bytes is probably the best that we can reasonbly do.

Further Reading

Note that your small binary can do more than just exit. See this for a source of inspiration: https://github.com/docker-library/hello-world

For large, real-world applications, you can use flags like -ffunction-sections -fdata-section to put functions and data into their own separate segments, and followed by -Wl,gc-sections to remove un-referenced sections. More tips here: https://wiki.wxwidgets.org/Reducing_Executable_Size

Modern compilers also feature LTO, which encompass the entire program. These can help to save space, but that’s a whole other topic.

This entry was posted in code.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.