Patching ZIP Files

I have recently been working with VM images and to transport & distribute them conveniently I had to zip them up. I mainly work in a Windows environment and I use 7-Zip for packing and unpacking archives. It’s actually quite nice (and free), if you don’t mind the spartan interface. On my workstation, I have NTFS file encryption (EFS) enabled on my home directory. The way this works is that you can selectively encrypt files and folders by setting a special attribute on these items.

NTFS encryption attribute

I was in a hurry to prepare a VM image for a class and I used 7-Zip to archive the entire VM folder. The problem was, this encryption attribute was also recorded in the ZIP file and I only realized it when I unpacked it on a machine for testing right before the class started. On a machine that does not use NTFS, it warns the user and asks if it should continue extracting unencrypted. On a machine with NTFS, the destination files get encrypted and Windows starts nagging the user to back up the EFS keys. Compressing a 9 GB image into a 3 GB ZIP file took about 40 minutes, so there was really no time left to decrypt the files (EFS really sucks in this regard) and re-compress it into a ZIP file without encryption.

You can check whether a file is marked for encryption by checking the Attributes column in 7-Zip. (Note that this encryption attribute is native to NTFS and is different from ZIP file encryption.) The attribute string is somewhat cryptic but E means encrypted and A is archive. Interestingly, 7-Zip itself does not restore the encrypted attribute, but the native Windows unzipping functionality does.

ZIP attributes

Patch All The Things!

While I did not manage to solve this annoyance in time for the class, I had this idea to write a tool that would just patch the ZIP file. No changes need to be made to the file data and so there is actually no need to re-compress it.

Continue reading

PlaidCTF 2015 “RE GEX” Write-up

This challenge came with a simple Python TCP server and a monstrous regex in a separate file. It consists of multiple sub-patterns (2563 to be exact), separated by OR operators1. To get the flag, the input supplied must NOT match any of these sub-patterns. If you break up the large regex expression, you see something like this:

^(.*[^plaidctf].*|
.{,170}|
.{172,}|
.{88}[padt].{60}[licf].{6}[plai].{14}|
.{60}[aitf].{17}[pldc].{85}[dctf].{6}|
    ...
.{54}[pldc].{88}[plai][aitf].{26}|
.{11}[aitf].{41}[aitf].{97}[padt].{19})$

The 3 conditions listed up front give you a hint to the requirements: must be made up of the letters plaidctf and be of length 171. Any patterns that DO match will NOT reveal the flag.

I tried the dumb way, which is to brute force the sequence. Python’s itertools.permutations killed the process, so I thought to represent the string using a large number instead, composed of 3 bits (to represent one of plaidctf) for each “character”. Python supports arbitrary sized integers, so manipulating and counting integers up to 171 * 3 bits wasn’t an issue. Well, of course that process took forever…

Z3 to the rescue!

The sane (and probably only) way to solve it was to use a theorem prover like Z3, so I decided to invest some time to play around with it, even though it was too late to submit the flag.

The regex patterns were probably emitted by some program itself, so it followed some regular syntax, which meant that I didn’t have to resort to using a regex parser. There were only 3 types of expressions used: a single character (.), n-characters (.{n}), and character classes ([xyz]). The patterns were parsed into Z3 constraints, in which we only had to restrict the character classes imposed by the each pattern at particular positions. A regex pattern like .{88}[padt].{60}[licf].{6}[plai].{14} is turned into a constraint expression like this:

    Or(And(pattern >> 246 & 7 != 0,
           pattern >> 246 & 7 != 2,
           pattern >> 246 & 7 != 4,
           pattern >> 246 & 7 != 6),
       And(pattern >> 63 & 7 != 1,
           pattern >> 63 & 7 != 3,
           pattern >> 63 & 7 != 5,
           pattern >> 63 & 7 != 7),
       And(pattern >> 42 & 7 != 0,
           pattern >> 42 & 7 != 1,
           pattern >> 42 & 7 != 2,
           pattern >> 42 & 7 != 3))

Solving this contraint yields the correct solution for this particular regex pattern. By combining all constraints with an And operator, the model will provide the correct solution for this challenge. You can see that I retained the representation of the pattern as a BitVector of length 171 * 3.

You can find my code here, which expects the regex patterns to be pre-processed and provided as input. The model took 11min 50sec to solve on my machine. With that, the server returns flag{np_hard_reg3x_ftw!!!1_ftdtatililactldtadf}.

Seeing as how Z3 is getting more popular in CTFs, I hope to use this new found ability for future challenges.

If you’d like to see what else Z3 can do, check out some of the inspiring entries on Diary of a reverse-engineer.


  1. Well it’s not really called that, but you get the point 

Paper Shredder Repair

Now is probably a good time to mention that I have a paper shredder. When I was shopping for a shredder, the basic requirement is that it must be relatively “secure”. Straight cut shredders (that produce long straight strips) are definitely not secure.

Ultimately I settled on the CARL DS-3000 personal paper shredder. The DS-3000 is a cross-cut shredder which produces “particles” no larger than 2mm x 4.5mm and this meets DIN security level 4. These days, the NSA mandates 1mm x 5mm “particles” for classified documents.

At this point, it’s probably helpful to show you what my shredder bin looks like:

shredder confetti

From the particles, you can make out various truncated words such as “A/C”, “exp” and the number “5”, but it’s almost impossible to reconstruct any bank balances or personal information from it.

This particular model was the right balance between my budget and the level of security. Plus, the shredder is compact enough to sit on your desk. I bought it in 2009 and I use it every couple of months when I have accumulated enough material that needs to be destroyed.

I was in the middle of shredding papers when it suddenly stopped working. Now the shredder does not respond when I stick paper into its slot. The LED indicator looks dimmer than usual when it is turned on.

But I’m not ready to give up on it just yet…

Continue reading

Interesting 31C3 Talks

31C3 logo "a new dawn"

The 31st Chaos Communication Congress (31C3) ended just 3 days ago, and there were several interesting talks.

They have got live streaming of the event over the web, as well as encourage you to use an external player with RTMP or HLS support. The video streams were very reliable and best of all, it’s available in HD. In comparison, I tried the Apple live event once and it was really crappy. For one, the HLS1 URL is not publicly available , so someone had to dig that out and post it. Even after that, the audio stream was (I believe, unintentionally) a mix of both English and Chinese simultaneously.

The 31C3 video recordings were also uploaded very quickly after the event. This is much quicker than other events such as Black Hat (although as an attendee, you do get a copy of the stuff on a DVD). A really big kudos to the organizers and the video production team!

If you don’t have time to listen to each and every talk, here are a few selected talks that were interesting to me, as well as a short summary to see if it’s worth 30 or 60 minutes of your time.

A full list of talks can be found here: http://media.ccc.de/browse/congress/2014/index.html

Continue reading

Visualizing Binary Features with matplotlib

Some time ago, I started playing around with data analysis and machine learning. One of the more popular tools for such tasks is IPython Notebook, a browser-based interactive REPL shell based on IPython. Each session becomes a “notebook” that records the entire REPL session with both inputs and (cached) outputs, which can be saved and reviewed at a later time, or exported into another format like HTML. This capability, combined with matplotlib for plotting and pandas for slicing and dicing data makes this a handy tool for analyzing and visualizing data. To give you an idea of how useful this tool can be, take a look at some example notebooks using the online notebook viewer.

In this quick post, I’ll describe how I visualize binary features (present/not present) and clustering of such data. I am assuming that you already have experience with all of the above-mentioned libraries. For this example, I’ve extracted permissions (uses-permission) and features (uses-feature) used by a set of Android apps using Androguard. The resulting visualization looks like this:

visualization of binary features

Each row represents one app and each column represents one feature. More specifically, each column represent whether a permission or feature is used by the app. Such a visualization makes it easy to see patterns, such as which permission or feature is more frequently used by apps (shown as downward streaks), or whether an app uses more or less features compared to other apps (which shows up as horizontal streaks).

While this may look relatively trivial, when the number of samples increase to thousands of apps, it becomes difficult to make sense of all the rows & columns in the data table by staring at it.

Continue reading