Over the years I’ve reverse engineered quite a few file formats, but I’ve never really sat down and picked apart why a format was designed the way it was. With that said, I wanted to show the ZFS archive file format and highlight some of the peculiarities I saw and perhaps you guys can answer some of my questions.
struct ZfsHeader { uint32 magic; uint32 unknown1; uint32 maxNameLength; uint32 filesPerBlock; uint32 fileCount; byte xorKey[4]; uint32 fileSectionOffset; };
- magic and unknown1 are self explanatory
- maxNameLength refers to the length of the block that stores a file’s name. Any extra spaces are null.
- The archive is split into ‘pages’ or ‘blocks’. Each ‘page’ contains, at max, filesPerBlock files
- fileCount is total number of files the archive contains
- xorKey is the XOR cipher used for encryption of the files
- fileSectionOffset is the offset of the main data section, aka fileLength – mainHeaderLength
struct ZfsEntryHeader { char name[16]; uint32 offset; uint32 id; uint32 size; uint32 time; uint32 unknown; };
- name is the file name right-padded with null characters
- offset is the offset to the actual file data
- id is a the numeric id of the file. The id’s increment from 0 to fileCount
- size is the length of the file
- unknown is self explanatory
[Main Header] [uint32 offsetToPage2] [Page 1 File Entry Headers] [Page 1 File Data] [uint32 offsetToPage3] [Page 2 File Entry Headers] [Page 2 File Data] etc.
maxNameLength
Why have a fixed size name block vs. null terminated or [size][string]? Was that just the popular thing to do back then so the entire header to could be cast directly to a struct?
filesPerBlock
What is the benefit to pagination? The only explanation I can see atm is that it was some artifact of their asset compiler max memory. Maybe I’m missing something since I’ve never programmed for that type of hardware.
fileSectionOffset
I’ve seen things like this a lot in my reverse engineering; they give the offset to a section that’s literally just after the header. Even if they were doing straight casting instead of incremental reading, a simple sizeof(mainHeader) would give them the offset to the next section. Again, if I’m missing something, please let me know.
Well that’s it for now,
-RichieSams