I’ve recently seen some shell script that tries to test for your OS architecture by running executables encoded within. There’s one for i386 (x86 platforms) and a few for ARM variants.
test_i386() { cat << EOF | $_base64 > /tmp/archtest && chmod a+x /tmp/archtest f0VMRgEBAQAAAAAAAAAAAAIAAwABAAAA5oAECDQAAACoEAAAAAAAADQAIAAEAC AAAAAIAECACABAgUBwAAFAcAAAUAAAAAEAAAAQAAANwPAADcnwQI3J8ECDgAAA . . . AAAAAAAgAAAAAAAAAFYAAAABAAAAMAAAAAAAAAAUEAAAMgAAAAAAAAAAAAAAAQ AwAAAAAAAAAAAAAARhAAAF8AAAAAAAAAAAAAAAEAAAAAAAAA EOF /tmp/archtest > /dev/null 2>&1 && arch=i386 }
What is terrible (well, to me at least) is that these executables are huge:
$ ls -l *.bin -rwxrwx--- 1 root root 4832 Jul 17 06:58 archtest-armv6.bin -rwxrwx--- 1 root root 4820 Jul 17 06:59 archtest-armv7.bin -rwxrwx--- 1 root root 4992 Jul 17 06:59 archtest-armv8.bin -rwxrwx--- 1 root root 4824 Jul 17 06:57 archtest-x86.bin
For one, I’m not sure why inspecting /proc/cpuinfo
or uname -a
is not really sufficient for their needs. And also, why such large binaries are required.
After all, what you want to do is just to check that it executes successfully. Were they trying to test for the presence of a working libc? Nope, because the binaries are statically-linked:
$ file ./archtest-x86.bin ./archtest-x86.bin: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), statically linked, stripped
I think this just adds unnecessary bloat.
There are ways to make smaller binaries.
Now, I am not talking about crazy techniques like using assembly language instead of C, or making a weird ELF that might load only on Linux, but just using normal C and the standard gcc
and binutils
.
Let’s get started.
Smaller Binaries
The main job of this minimal binary is to exit with an exit code; that is signaled to the Linux kernel via a syscall. Process creation and termination is handled by the kernel, and making system calls (syscalls) is the most direct way of talking to the kernel.
Since that’s all we really need to do, we don’t need the main()
function, where parsing of argc
and argv
happens. We just need a symbol that indicates to gcc
where the start of the executable code should be.
#include <sys/syscall.h> #include <unistd.h> void _start() { syscall(SYS_exit, 0); }
If you have read any of the linked articles, you will know that making an exit syscall directly will remove a lot of boilerplate code that gcc adds for a normal program.
We then need to compile this C file with some flags to tell gcc
to not expect, nor include a main function:
$ gcc -Os -nostartfiles t.c $ ls -l total 36 drwxrwxr-x 2 d d 4096 Jul 18 07:58 ./ drwxr-xr-x 9 d d 4096 Jul 18 06:20 ../ -rwxrwxr-x 1 d d 6256 Jul 18 07:58 a.out* -rw-rw-r-- 1 d d 87 Jul 18 07:57 t.c
It’s still quite big, 6256 bytes, even bigger than the original program.
That’s because it’s a dynamic binary that will try to perform runtime linking. Let’s remove that included code by specifying -static
.
$ gcc -Os -nostartfiles -static t.c $ ls -l total 32 drwxrwxr-x 2 d d 4096 Jul 18 08:00 ./ drwxr-xr-x 9 d d 4096 Jul 18 06:20 ../ -rwxrwxr-x 1 d d 1560 Jul 18 08:00 a.out*
1560 bytes. That’s already much better.
Let’s see what’s in the ELF binary:
$ readelf -S a.out There are 9 section headers, starting at offset 0x3d8: Section Headers: [Nr] Name Type Address Offset Size EntSize Flags Link Info Align [ 0] NULL 0000000000000000 00000000 0000000000000000 0000000000000000 0 0 0 [ 1] .note.gnu.build-i NOTE 0000000000400120 00000120 0000000000000024 0000000000000000 A 0 0 4 [ 2] .text PROGBITS 0000000000400150 00000150 0000000000000043 0000000000000000 AX 0 0 16 [ 3] .eh_frame PROGBITS 0000000000400198 00000198 0000000000000044 0000000000000000 A 0 0 8 [ 4] .tbss NOBITS 0000000000601000 000001dc 0000000000000004 0000000000000000 WAT 0 0 4 [ 5] .comment PROGBITS 0000000000000000 000001dc 000000000000002b 0000000000000001 MS 0 0 1 [ 6] .symtab SYMTAB 0000000000000000 00000208 0000000000000150 0000000000000018 7 7 8 [ 7] .strtab STRTAB 0000000000000000 00000358 0000000000000032 0000000000000000 0 0 1 [ 8] .shstrtab STRTAB 0000000000000000 0000038a 000000000000004d 0000000000000000 0 0 1 Key to Flags: W (write), A (alloc), X (execute), M (merge), S (strings), I (info), L (link order), O (extra OS processing required), G (group), T (TLS), C (compressed), x (unknown), o (OS specific), E (exclude), l (large), p (processor specific)
The only section that really matters here is .text
, where our executable code is. Everything else is technically deadweight.
Let’s see what can be removed.
Removing Useless Sections
The first thing to remove is information that aids debugging. These are basically where the function boundaries are, the names of the functions, etc.
You can do this by calling strip
:
$ strip a.out $ ls -l total 32 drwxrwxr-x 2 d d 4096 Jul 18 08:02 ./ drwxr-xr-x 9 d d 4096 Jul 18 06:20 ../ -rwxrwxr-x 1 d d 1032 Jul 18 08:02 a.out*
We can try to remove other sections that are not really required. One of them is the build ID that is generated. We can pass --build-id=none
to the linker, specified through gcc
like so:
$ gcc -Os -nostartfiles -static -s -Wl,--build-id=none t.c $ ls -l total 32 drwxrwxr-x 2 d d 4096 Jul 18 08:04 ./ drwxr-xr-x 9 d d 4096 Jul 18 06:20 ../ -rwxrwxr-x 1 d d 856 Jul 18 08:04 a.out*
Note that we can skip the separate strip
call by telling gcc
to do that using -s
.
856 bytes. Now we are below the 1000 bytes mark.
There are still a few more sections that were added by gcc. There does not seem to be any arguments that can be passed to omit those. One that stands out immediately is .comment
:
$ readelf -S a.out There are 6 section headers, starting at offset 0x1d8: Section Headers: [Nr] Name Type Address Offset Size EntSize Flags Link Info Align [ 0] NULL 0000000000000000 00000000 0000000000000000 0000000000000000 0 0 0 [ 1] .text PROGBITS 00000000004000f0 000000f0 0000000000000043 0000000000000000 AX 0 0 16 [ 2] .eh_frame PROGBITS 0000000000400138 00000138 0000000000000044 0000000000000000 A 0 0 8 [ 3] .tbss NOBITS 0000000000601000 0000017c 0000000000000004 0000000000000000 WAT 0 0 4 [ 4] .comment PROGBITS 0000000000000000 0000017c 000000000000002b 0000000000000001 MS 0 0 1 [ 5] .shstrtab STRTAB 0000000000000000 000001a7 000000000000002a 0000000000000000 0 0 1
We now need to use strip
again to remove those sections post-compilation:
$ strip -R .comment -R .eh_frame -R .tbss -s a.out $ ls -l total 32 drwxrwxr-x 2 d d 4096 Jul 18 08:08 ./ drwxr-xr-x 9 d d 4096 Jul 18 06:20 ../ -rwxrwxr-x 1 d d 520 Jul 18 08:08 a.out*
520 bytes.
That seems to be as far as we can go. Now that the most of the sections are gone, there are no more redundant sections left in the ELF to get rid of.
$ readelf -S a.out There are 3 section headers, starting at offset 0x148: Section Headers: [Nr] Name Type Address Offset Size EntSize Flags Link Info Align [ 0] NULL 0000000000000000 00000000 0000000000000000 0000000000000000 0 0 0 [ 1] .text PROGBITS 00000000004000f0 000000f0 0000000000000043 0000000000000000 AX 0 0 16 [ 2] .shstrtab STRTAB 0000000000000000 00000133 0000000000000011 0000000000000000 0 0 1
Maybe one final act of desperation. The section names don’t really matter, since the OS just needs to load the correct segments into memory. Let’s rename the sections, and since objcopy
can also work as a strip
replacement, we shall just use that.
$ objcopy -R .comment -R .eh_frame -R .tbss --rename-section .text=a a.out $ ls -l total 32 drwxrwxr-x 2 d d 4096 Jul 18 08:09 ./ drwxr-xr-x 9 d d 4096 Jul 18 06:20 ../ -rwxrwxr-x 1 d d 512 Jul 18 08:09 a.out* $ readelf -S a.out There are 3 section headers, starting at offset 0x140: Section Headers: [Nr] Name Type Address Offset Size EntSize Flags Link Info Align [ 0] NULL 0000000000000000 00000000 0000000000000000 0000000000000000 0 0 0 [ 1] a PROGBITS 00000000004000f0 000000f0 0000000000000043 0000000000000000 AX 0 0 16 [ 2] .shstrtab STRTAB 0000000000000000 00000133 000000000000000d 0000000000000000 0 0 1
The final size is 512 bytes. That is nearly 9.5x smaller than the original.
Because this method is portable, it can easily be applied to all the 4 binaries embedded in the shell script.
I won’t be describing the process here, but doing the same steps on ARM results in a binary that is 592 bytes. That’s similar to the savings we achieved for x86, using the exact same process, hence “portable”.
And since this is being included into a shell script, why not use gzip to compress the binary even further?
$ gzip -9 < a.out | wc --bytes 202
One worry is that maybe gunzip
might not be available on the target platform, which could be the case on some embedded device.
But there. 202 bytes is probably the best that we can reasonbly do.
Further Reading
Note that your small binary can do more than just exit. See this for a source of inspiration: https://github.com/docker-library/hello-world
For large, real-world applications, you can use flags like -ffunction-sections -fdata-section
to put functions and data into their own separate segments, and followed by -Wl,gc-sections
to remove un-referenced sections. More tips here: https://wiki.wxwidgets.org/Reducing_Executable_Size
Modern compilers also feature LTO, which encompass the entire program. These can help to save space, but that’s a whole other topic.
You take a big binary, cut it into small pieces. Then you rub those pieces against nylon or wool. Voila!! Small static binaries.