I have recently been working with VM images and to transport & distribute them conveniently I had to zip them up. I mainly work in a Windows environment and I use 7-Zip for packing and unpacking archives. It’s actually quite nice (and free), if you don’t mind the spartan interface. On my workstation, I have NTFS file encryption (EFS) enabled on my home directory. The way this works is that you can selectively encrypt files and folders by setting a special attribute on these items.
I was in a hurry to prepare a VM image for a class and I used 7-Zip to archive the entire VM folder. The problem was, this encryption attribute was also recorded in the ZIP file and I only realized it when I unpacked it on a machine for testing right before the class started. On a machine that does not use NTFS, it warns the user and asks if it should continue extracting unencrypted. On a machine with NTFS, the destination files get encrypted and Windows starts nagging the user to back up the EFS keys. Compressing a 9 GB image into a 3 GB ZIP file took about 40 minutes, so there was really no time left to decrypt the files (EFS really sucks in this regard) and re-compress it into a ZIP file without encryption.
You can check whether a file is marked for encryption by checking the Attributes column in 7-Zip. (Note that this encryption attribute is native to NTFS and is different from ZIP file encryption.) The attribute string is somewhat cryptic but
E means encrypted and
A is archive. Interestingly, 7-Zip itself does not restore the encrypted attribute, but the native Windows unzipping functionality does.
Patch All The Things!
While I did not manage to solve this annoyance in time for the class, I had this idea to write a tool that would just patch the ZIP file. No changes need to be made to the file data and so there is actually no need to re-compress it.
The goal is (as always) to write as little code as possible. Python has a zipfile module, which provides
ZipInfo.header_offset that looks promising, but that offset actually refers to the local file header. The structure that contains the file attribute information is the central directory file header (stored together near the end of the file). In order to get the location of the central directory header for each file within the ZIP file, it has to be manually parsed (the
zipfile.py source code was a great help).
File attributes are stored in the external file attributes field of the corresponding central directory file header. The list of Windows file attribute constants can be found here on MSDN. The attribute that we are interested in is
FILE_ATTRIBUTE_ENCRYPTED. The external file attributes field just need to be modified to mask off this
I wrote a Python tool that runs through a ZIP file to remove the encrypted attribute from all file entries. This script should come in handy in the future.
You can find the modzip.py script here.
- Wikipedia – Zip (file format)
- APPNOTE.TXT – .ZIP File Format Specification
- File Attribute Constants – MSDN