Welcome to this week’s blog update. This week was focused on fixing bugs related to seeding and set.dat
uploading, improving filtering mechanisms, and rewriting parts of the processing logic for scan.dat
.
After some manual checks with the recent set.dat
updates, a few issues surfaced:
1. Identical Detection Fix
Some detection entries had identical file sets, sizes, and checksums, which led to issues during the automatic merging of set.dat
. Previously, we used a megakey
to prevent such duplicates, but since it included not only file data but also language and platform information, some identical versions still slipped through.
To solve this, I replaced the megakey
check with a more focused comparison: filename, size, and checksum only, and logging the details of the fileset that got clashed.
2. Punycode Encoding Misplacement
The filenames were being encoded using Punycode every time they were processed for database insertion. However, there was no requirement for this encoding as it should have occurred earlier — ideally at the parsing stage, either by the scanning utility that generates .dat
files or on the application’s upload interface. I have removed the encoding during database updates. Though I still have to add it at the scan utility side, which I’ll do this week.
3. Path Format Normalization
Another issue was related to inconsistent file paths. Some set.dat
entries used Windows-style paths (xyz\abc
), while their corresponding detection entries used Unix-style (xyz/abc
). Since filtering was done using simple string matching, these mismatches caused failures. I resolved this by normalizing all paths to use forward slashes (/
) before storing them in the database.
4. Improving Extra File Detection with clonof
and romof
While analyzing filesets, I encountered a previously unnoticed clonof
field (similar to romof
). These fields indicate that extra files might be listed elsewhere in the .dat
file. The previous logic only looked in the resource
section, but I found that:
-
Extra files could also exist in the
game
section. -
The file references could chain across multiple sections (e.g., A → B → C).
To address this, I implemented an iterative lookup for extra files, ensuring all relevant files across multiple levels are properly detected.
scan.dat Processing Improvements
For scan.dat
, I introduced a file update strategy that runs before full fileset matching. All files that match based on file size and checksum are updated first. This allows us to update matching files early, without relying solely on complete fileset comparisons.
Minor Fixes & UI Enhancements
-
Prevented reprocessing of filesets in
set.dat
if a key already exists in subsequent runs. -
Passing the
--skiplog
CLI argument to set.dat processing to suppress verbose logs during fileset creation and automatic merging. -
Improved filtering in the dashboard adding more fields like engineid, transcation number and fileset id, and fixing some older issues.
-
Introduced a new “Possible Merges” button in the filesets dashboard to manually inspect and confirm suggested merges.This feature is backed by a new database table that stores fileset matches for later manual review.