It is easy to script analysis steps with IDAPython, but now we want to automate this analysis over, let’s say, 10,000 files. I did a quick Google and I couldn’t find a guide on how to perform batch binary analysis tasks by automating IDA Pro 7.x.
Unfamiliar with this, I was constantly guessing whether it was the command-line arguments, the script, or a combination of both that was not working. I’m sharing my experience here so you won’t have to be fumbling around like I was.
I will be using IDA Pro for Windows here, but it should be applicable to any of their supported platforms like Mac or Linux.
Simple Binary Analysis
Let’s write some simple IDAPython analysis script and run it within the IDA Pro console. This script loops through all functions in the executable and prints out its address and name:
print 'count %d' % len(list(idautils.Functions()))
for ea in idautils.Functions():
print hex(ea), idc.get_func_name(ea)
idautils module contains higher-level functionality like getting a list of functions, or finding code & data references to addresses. If you are familiar with IDC scripting, most of the functions by the same name can be found within the
idc module. This is not really meant to be an IDAPython or IDC scripting tutorial, so you will need to look elsewhere for that.
I’ve recently seen some shell script that tries to test for your OS architecture by running executables encoded within. There’s one for i386 (x86 platforms) and a few for ARM variants.
cat << EOF | $_base64 > /tmp/archtest && chmod a+x /tmp/archtest
. . .
/tmp/archtest > /dev/null 2>&1 && arch=i386
What is terrible (well, to me at least) is that these executables are huge:
$ ls -l *.bin
-rwxrwx--- 1 root root 4832 Jul 17 06:58 archtest-armv6.bin
-rwxrwx--- 1 root root 4820 Jul 17 06:59 archtest-armv7.bin
-rwxrwx--- 1 root root 4992 Jul 17 06:59 archtest-armv8.bin
-rwxrwx--- 1 root root 4824 Jul 17 06:57 archtest-x86.bin
For one, I’m not sure why inspecting
uname -a is not really sufficient for their needs. And also, why such large binaries are required.
After all, what you want to do is just to check that it executes successfully. Were they trying to test for the presence of a working libc? Nope, because the binaries are statically-linked:
$ file ./archtest-x86.bin
./archtest-x86.bin: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), statically linked, stripped
I think this just adds unnecessary bloat.
There are ways to make smaller binaries.
Now, I am not talking about crazy techniques like using assembly language instead of C, or making a weird ELF that might load only on Linux, but just using normal C and the standard
Let’s get started.
collectd has always been able to grab interface traffic statistics from Linux. But what if we want to collect data about individual WiFi clients that connect to it? How much bandwidth is each of the clients using?
That information is already being recorded by the wireless driver; all we need to do is to query it. Turns out you can do that with the
wl utility. This is Broadcom’s proprietary tool to control and query the wireless interfaces.
To do this, first use
wl to get associated stations:
wl -i eth2 assoclist
Given a particular MAC address that is associated to the AP, query its info using
# wl -i eth2 sta_info d4:a3:00:aa:bb:cc
rateset [ 6 9 12 18 24 36 48 54 ]
idle 0 seconds
in network 16 seconds
state: AUTHENTICATED ASSOCIATED AUTHORIZED
flags 0x11e03b: BRCM WME N_CAP VHT_CAP AMPDU AMSDU
HT caps 0x6f: LDPC 40MHz SGI20 SGI40
VHT caps 0x43: LDPC SGI80 SU-BFE
tx data pkts: 663916
tx data bytes: 68730715
tx ucast pkts: 155
tx ucast bytes: 42699
tx mcast/bcast pkts: 663761
tx mcast/bcast bytes: 68688016
tx failures: 0
rx data pkts: 234
rx data bytes: 73557
rx ucast pkts: 192
rx ucast bytes: 62971
rx mcast/bcast pkts: 42
rx mcast/bcast bytes: 10586
rate of last tx pkt: 866667 kbps
rate of last rx pkt: 780000 kbps
rx decrypt succeeds: 195
rx decrypt failures: 1
tx data pkts retried: 19
tx data pkts retry exhausted: 0
per antenna rssi of last rx data frame: -61 -56 -59 0
per antenna average rssi of rx data frames: -61 -56 -57 0
per antenna noise floor: -104 -98 -98 0
The “easy way” is probably to write a shell script, invoked via the Exec plugin that calls
wl multiple times (once per interface, and once for each WiFi client) and uses
awk to get the information we need. This won’t be performant, of course.
wl itself does have quite a fair bit of overhead. It does some verification of the provided interface name. It checks for the Broadcom driver magic to ensure that the interface is a Broadcom device. It then needs to convert the MAC address from the argument string to binary, and vice-versa. Sure, that’s not really much “these days”, but we can definitely do better.
Instead, let’s short-circuit the process and write a plugin that directly collects the data, without going through
wl. This way, we avoid creating several new processes for every query.
I have been using collectd on my server to monitor traffic (inbound, outbound and to/from the Internet), as well as disk stats because it’s being used as a NAS. So far it has been helpful, observing various graphs to understand patterns, and detecting problems when they happen.
I’m also recording video from a WiFi camera, so I can constantly see traffic that comes into the server. But without visibility on the router itself, I am unable to determine whether the traffic is from the 5 GHz or 2.4 GHz band, or the guest network.
By getting a collectd instance onto the router, we can get those detailed interface statistics separately.
If you haven’t heard, Firefox Send is a service that solves the problem of sending large attachments without going through email. It does this in a privacy-preserving manner by encrypting the file in your browser first, before upload.
The concept is simple:
- An encryption key is generated in your browser
- Your file is encrypted with that key before being uploaded to the server.
- The download URL is returned by the server,
but will only work after the browser appends the secret key to the URL fragment.
Note that URL fragments are never sent to the server. They are often used for page anchors, and sometimes to keep track of local state in SPA.
The code that powers Firefox Send is actually open source, so you can run your own server, or read the code to figure out exactly how it works. The encryption details are documented in docs/encryption.md.
A master key is first generated and from it, a few keys are derived using HKDF SHA-256. The derived key length depends on its purpose, so for AES-128 encryption, the key will be 128-bit. Oddly though, the Subtle Crypto API returns a a 512-bit key for HMAC SHA-256, which had me stumped for a while. I wrote some code that you can try out online.
Because HKDF is based on a hash algorithm, derived keys are inherently not reversible to obtain the master key from which they were derived (unless the algorithm itself is somehow broken).
3 keys are derived from the master key:
- Data Encryption key. Used to encrypt the actual file contents.
- Authentication key. Given to the service and used to authenticate future downloaders.
- Metadata key. Used to encrypt the upload manifest (filename and size information) for display.