Batch Binary Analysis with IDA Pro 7.4 Automation

It is easy to script analysis steps with IDAPython, but now we want to automate this analysis over, let’s say, 10,000 files. I did a quick Google and I couldn’t find a guide on how to perform batch binary analysis tasks by automating IDA Pro 7.x.

Unfamiliar with this, I was constantly guessing whether it was the command-line arguments, the script, or a combination of both that was not working. I’m sharing my experience here so you won’t have to be fumbling around like I was.

I will be using IDA Pro for Windows here, but it should be applicable to any of their supported platforms like Mac or Linux.

Simple Binary Analysis

Let’s write some simple IDAPython analysis script and run it within the IDA Pro console. This script loops through all functions in the executable and prints out its address and name:

import idc
import idautils

print 'count %d' % len(list(idautils.Functions()))
for ea in idautils.Functions():
    print hex(ea), idc.get_func_name(ea)

The idautils module contains higher-level functionality like getting a list of functions, or finding code & data references to addresses. If you are familiar with IDC scripting, most of the functions by the same name can be found within the idc module. This is not really meant to be an IDAPython or IDC scripting tutorial, so you will need to look elsewhere for that.

Continue reading

Generating Small Static Binaries

I’ve recently seen some shell script that tries to test for your OS architecture by running executables encoded within. There’s one for i386 (x86 platforms) and a few for ARM variants.

test_i386()
{
cat << EOF | $_base64 > /tmp/archtest && chmod a+x /tmp/archtest
f0VMRgEBAQAAAAAAAAAAAAIAAwABAAAA5oAECDQAAACoEAAAAAAAADQAIAAEAC
AAAAAIAECACABAgUBwAAFAcAAAUAAAAAEAAAAQAAANwPAADcnwQI3J8ECDgAAA
  .  .  .
AAAAAAAgAAAAAAAAAFYAAAABAAAAMAAAAAAAAAAUEAAAMgAAAAAAAAAAAAAAAQ
AwAAAAAAAAAAAAAARhAAAF8AAAAAAAAAAAAAAAEAAAAAAAAA
EOF
/tmp/archtest > /dev/null 2>&1 && arch=i386
}

What is terrible (well, to me at least) is that these executables are huge:

$ ls -l *.bin
-rwxrwx--- 1 root root 4832 Jul 17 06:58 archtest-armv6.bin
-rwxrwx--- 1 root root 4820 Jul 17 06:59 archtest-armv7.bin
-rwxrwx--- 1 root root 4992 Jul 17 06:59 archtest-armv8.bin
-rwxrwx--- 1 root root 4824 Jul 17 06:57 archtest-x86.bin

For one, I’m not sure why inspecting /proc/cpuinfo or uname -a is not really sufficient for their needs. And also, why such large binaries are required.

After all, what you want to do is just to check that it executes successfully. Were they trying to test for the presence of a working libc? Nope, because the binaries are statically-linked:

$ file ./archtest-x86.bin
./archtest-x86.bin: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), statically linked, stripped

I think this just adds unnecessary bloat.

There are ways to make smaller binaries.

Now, I am not talking about crazy techniques like using assembly language instead of C, or making a weird ELF that might load only on Linux, but just using normal C and the standard gcc and binutils.

Let’s get started.

Continue reading

Detailed Wireless Client Stats with collectd

collectd has always been able to grab interface traffic statistics from Linux. But what if we want to collect data about individual WiFi clients that connect to it? How much bandwidth is each of the clients using?

That information is already being recorded by the wireless driver; all we need to do is to query it. Turns out you can do that with the wl utility. This is Broadcom’s proprietary tool to control and query the wireless interfaces.

To do this, first use wl to get associated stations:

wl -i eth2 assoclist

Given a particular MAC address that is associated to the AP, query its info using sta_info:

# wl -i eth2 sta_info d4:a3:00:aa:bb:cc
 STA d4:a3:00:aa:bb:cc:
     aid:2
     rateset [ 6 9 12 18 24 36 48 54 ]
     idle 0 seconds
     in network 16 seconds
     state: AUTHENTICATED ASSOCIATED AUTHORIZED
     flags 0x11e03b: BRCM WME N_CAP VHT_CAP AMPDU AMSDU
     HT caps 0x6f: LDPC 40MHz SGI20 SGI40
     VHT caps 0x43: LDPC SGI80 SU-BFE
     tx data pkts: 663916
     tx data bytes: 68730715
     tx ucast pkts: 155
     tx ucast bytes: 42699
     tx mcast/bcast pkts: 663761
     tx mcast/bcast bytes: 68688016
     tx failures: 0
     rx data pkts: 234
     rx data bytes: 73557
     rx ucast pkts: 192
     rx ucast bytes: 62971
     rx mcast/bcast pkts: 42
     rx mcast/bcast bytes: 10586
     rate of last tx pkt: 866667 kbps
     rate of last rx pkt: 780000 kbps
     rx decrypt succeeds: 195
     rx decrypt failures: 1
     tx data pkts retried: 19
     tx data pkts retry exhausted: 0
     per antenna rssi of last rx data frame: -61 -56 -59 0
     per antenna average rssi of rx data frames: -61 -56 -57 0
     per antenna noise floor: -104 -98 -98 0

The “easy way” is probably to write a shell script, invoked via the Exec plugin that calls wl multiple times (once per interface, and once for each WiFi client) and uses grep or awk to get the information we need. This won’t be performant, of course.

wl itself does have quite a fair bit of overhead. It does some verification of the provided interface name. It checks for the Broadcom driver magic to ensure that the interface is a Broadcom device. It then needs to convert the MAC address from the argument string to binary, and vice-versa. Sure, that’s not really much “these days”, but we can definitely do better.

Instead, let’s short-circuit the process and write a plugin that directly collects the data, without going through wl. This way, we avoid creating several new processes for every query.

Continue reading

Cross-compiling collectd for ASUSWRT

I have been using collectd on my server to monitor traffic (inbound, outbound and to/from the Internet), as well as disk stats because it’s being used as a NAS. So far it has been helpful, observing various graphs to understand patterns, and detecting problems when they happen.

I’m also recording video from a WiFi camera, so I can constantly see traffic that comes into the server. But without visibility on the router itself, I am unable to determine whether the traffic is from the 5 GHz or 2.4 GHz band, or the guest network.

By getting a collectd instance onto the router, we can get those detailed interface statistics separately.

Continue reading

Data Encryption on Firefox Send

If you haven’t heard, Firefox Send is a service that solves the problem of sending large attachments without going through email. It does this in a privacy-preserving manner by encrypting the file in your browser first, before upload.

The concept is simple:

  1. An encryption key is generated in your browser
  2. Your file is encrypted with that key before being uploaded to the server.
  3. The download URL is returned by the server,
    but will only work after the browser appends the secret key to the URL fragment.

Note that URL fragments are never sent to the server. They are often used for page anchors, and sometimes to keep track of local state in SPA.

This has been made possible through the use of Web Crypto API exposed via JavaScript.

Technical Details

The code that powers Firefox Send is actually open source, so you can run your own server, or read the code to figure out exactly how it works. The encryption details are documented in docs/encryption.md.

A master key is first generated and from it, a few keys are derived using HKDF SHA-256. The derived key length depends on its purpose, so for AES-128 encryption, the key will be 128-bit. Oddly though, the Subtle Crypto API returns a a 512-bit key for HMAC SHA-256, which had me stumped for a while. I wrote some code that you can try out online.

Because HKDF is based on a hash algorithm, derived keys are inherently not reversible to obtain the master key from which they were derived (unless the algorithm itself is somehow broken).

3 keys are derived from the master key:

  1. Data Encryption key. Used to encrypt the actual file contents.
  2. Authentication key. Given to the service and used to authenticate future downloaders.
  3. Metadata key. Used to encrypt the upload manifest (filename and size information) for display.

keys derived in Firefox Send

Continue reading