PlaidCTF 2015 “RE GEX” Write-up

This challenge came with a simple Python TCP server and a monstrous regex in a separate file. It consists of multiple sub-patterns (2563 to be exact), separated by OR operators1. To get the flag, the input supplied must NOT match any of these sub-patterns. If you break up the large regex expression, you see something like this:

^(.*[^plaidctf].*|
.{,170}|
.{172,}|
.{88}[padt].{60}[licf].{6}[plai].{14}|
.{60}[aitf].{17}[pldc].{85}[dctf].{6}|
    ...
.{54}[pldc].{88}[plai][aitf].{26}|
.{11}[aitf].{41}[aitf].{97}[padt].{19})$

The 3 conditions listed up front give you a hint to the requirements: must be made up of the letters plaidctf and be of length 171. Any patterns that DO match will NOT reveal the flag.

I tried the dumb way, which is to brute force the sequence. Python’s itertools.permutations killed the process, so I thought to represent the string using a large number instead, composed of 3 bits (to represent one of plaidctf) for each “character”. Python supports arbitrary sized integers, so manipulating and counting integers up to 171 * 3 bits wasn’t an issue. Well, of course that process took forever…

Z3 to the rescue!

The sane (and probably only) way to solve it was to use a theorem prover like Z3, so I decided to invest some time to play around with it, even though it was too late to submit the flag.

The regex patterns were probably emitted by some program itself, so it followed some regular syntax, which meant that I didn’t have to resort to using a regex parser. There were only 3 types of expressions used: a single character (.), n-characters (.{n}), and character classes ([xyz]). The patterns were parsed into Z3 constraints, in which we only had to restrict the character classes imposed by the each pattern at particular positions. A regex pattern like .{88}[padt].{60}[licf].{6}[plai].{14} is turned into a constraint expression like this:

    Or(And(pattern >> 246 & 7 != 0,
           pattern >> 246 & 7 != 2,
           pattern >> 246 & 7 != 4,
           pattern >> 246 & 7 != 6),
       And(pattern >> 63 & 7 != 1,
           pattern >> 63 & 7 != 3,
           pattern >> 63 & 7 != 5,
           pattern >> 63 & 7 != 7),
       And(pattern >> 42 & 7 != 0,
           pattern >> 42 & 7 != 1,
           pattern >> 42 & 7 != 2,
           pattern >> 42 & 7 != 3))

Solving this contraint yields the correct solution for this particular regex pattern. By combining all constraints with an And operator, the model will provide the correct solution for this challenge. You can see that I retained the representation of the pattern as a BitVector of length 171 * 3.

You can find my code here, which expects the regex patterns to be pre-processed and provided as input. The model took 11min 50sec to solve on my machine. With that, the server returns flag{np_hard_reg3x_ftw!!!1_ftdtatililactldtadf}.

Seeing as how Z3 is getting more popular in CTFs, I hope to use this new found ability for future challenges.

If you’d like to see what else Z3 can do, check out some of the inspiring entries on Diary of a reverse-engineer.


  1. Well it’s not really called that, but you get the point 

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.