Visualizing Binary Features with matplotlib

Some time ago, I started playing around with data analysis and machine learning. One of the more popular tools for such tasks is IPython Notebook, a browser-based interactive REPL shell based on IPython. Each session becomes a “notebook” that records the entire REPL session with both inputs and (cached) outputs, which can be saved and reviewed at a later time, or exported into another format like HTML. This capability, combined with matplotlib for plotting and pandas for slicing and dicing data makes this a handy tool for analyzing and visualizing data. To give you an idea of how useful this tool can be, take a look at some example notebooks using the online notebook viewer.

In this quick post, I’ll describe how I visualize binary features (present/not present) and clustering of such data. I am assuming that you already have experience with all of the above-mentioned libraries. For this example, I’ve extracted permissions (uses-permission) and features (uses-feature) used by a set of Android apps using Androguard. The resulting visualization looks like this:

visualization of binary features

Each row represents one app and each column represents one feature. More specifically, each column represent whether a permission or feature is used by the app. Such a visualization makes it easy to see patterns, such as which permission or feature is more frequently used by apps (shown as downward streaks), or whether an app uses more or less features compared to other apps (which shows up as horizontal streaks).

While this may look relatively trivial, when the number of samples increase to thousands of apps, it becomes difficult to make sense of all the rows & columns in the data table by staring at it.

Bruteforcing LUKS Volumes Explained

Some weeks back, we were forced to reboot one of our server machines because it stopped responding. When the machine came back up, we were greeted with a password prompt to decrypt the partition. No problem, since we always used a password combination (ok, permutation) that consisted of a few words, something along the lines of “john”, “doe”, “1954”, and the server’s serial number. Except that it didn’t work, and we forgot the permutation rules AND whether we used “john” “doe” or “jack” “daniels”.

All the search results for bruteforcing LUKS are largely the same — “use cryptsetup luksOpen --test-passphrase“. In my case, the physical server is in the server room, and I don’t want to stand in front of the rack trying to figure all this out. My question is, can I do this offline on another machine? None of those blog entries were helpful in this regard.

The LUKS Header

To answer this question, I took a look at the LUKS header. This header is what provides multiple “key slots”, allowing you to specify up to 8 passwords or key files that can decrypt the volume. cryptsetup is the standard userspace tool (and library) to manipulate and mount LUKS volumes. Since LUKS was designed based on TKS1, the TKS1 document referenced by the cryptsetup project was very helpful. After consulting the documentation & code, I came up with the following diagram that describes the LUKS key verification process:

LUKS encryption flowchart

Cloud-Enabling a Bathroom Scale

Last week as I was making my rounds at the supermarket, I came across this digital bathroom scale on sale. With some membership card, the discount was almost 50% and at S$16, I thought that was a pretty good deal. It is “wireless” in that it has a separate display unit that could be detached from the scale itself. This bathroom scale had “HACK ME” written all over it.

It turns out that this bathroom scale is the EB9121 made by a Chinese (OEM?) company called Zhongshan Camry Electronic Co. Ltd (or simply Camry). The box specifically mentions that it uses infrared for transmission, and given that I had some experience looking at IR signals, I thought it would be rather straightforward.

Creating Minimal Throw-away CentOS 6 VMs

Whether you are using CentOS for a build server or simply testing out a new configuration, you can quickly create a VM (virtual machine) that is under 1GB. You can do this without downloading any special tools or ISO files — just the CentOS installation DVD and VirtualBox (or VMware if you prefer).

I like the text-based console, so you won’t be getting a GUI or fancy Linux desktop with this one. Given its small size, you could also archive the entire environment (or even several of them) for future use without having to waste gigabytes of free space. These environments also serve as a base which can be upgraded or added onto to provide more functionality later.

Encrypt All the Drives

I have always been an advocate on storage security (all types of security, actually). I like how iOS devices keep all files encrypted, even if you do not set a passcode on the device. They do this to facilitate quick erasure of files on the device — to erase all the data, they simply wipe the master key.

Erasing magnetic storage media isn’t difficult, but it is time-consuming. For solid state media such as SSDs and flash drives, the wear-leveling makes it difficult to ensure that all flash blocks have been securely overwritten. The answer to this is to encrypt everything.

Encrypt all the drives!! (meme)

Recently I have been busy building a Linux-based NAS and I decided to put this to practice.

