Categories
Uncategorized

Week 2

Welcome to the weekly blog.
After wrapping up work on macfiles last week, I finally moved on to testing the first two phases of the data upload workflow — starting with scummvm.dat (which contains detection entries from ScummVM used for populating the database) and set.dat (data from older collections, provides more information).

Database Changes: File Size Enhancements

Before diving into the testing, there was a change Sev asked for — to extend the database schema to store three types of file sizes instead of just one. This was necessary due to the nature of macfiles, which have:

  • A data fork

  • A resource fork

  • A third size: the data section of the resource fork itself

This change introduced significant modifications to db_functions.py, which contains the core logic for working with .dat files. I had to be careful to ensure nothing broke during this transition.

Punycode Encoding Logic Fixes

At the same time, I fixed the punycode logic in db_functions.py. Punycode encoding (or rather, an extended version of the standard used in URL encoding) is employed by ScummVM to convert filenames into filesystem-independent ASCII-only representations.

There were inconsistencies between punycode logic in db_functions.py and the original implementation in the Dumper Companion. I made sure both implementations now align, and I ran unit tests from the Dumper Companion to verify correctness.

Feeding the Database – scummvm.dat

With those fixes in place, I moved on to populating the database with data from scummvm.dat. While Sev was working on the C++ side to add the correct filesize tags for detections, I ran manual tests using existing data. The parsing logic worked well, though I had to add support for the new “extra size” fields.

Additionally, I fixed the megakey calculation, which is used later when uploading the scummvm.dat again with updates. This involved sorting files alphabetically before computing the key to ensure consistent results.

I also introduced a small optimization: if a file is less than 5000 bytes, we can safely assume that all checksum types (e.g., md5-full_file, md5-5000B, md5-tail-5000B, md5-oneMB, or the macfile variants like -d/-r ) will be the same. In such cases, we now automatically fill all checksum fields with the same value used in detection.

Uploading and Matching – set.dat

Finally, I worked on uploading set.dat to the database, which usually contains the follwoing – metadata (mostly irrelevant), full size checksums only and filesizes.

    • scummvm.dat doesn’t contain full file checksums like set.dat, so a match between files from set.dat and scummvm.dat is only possible when a file’s size is less than the detection checksum size, generally 5000 Bytes.
    • This transitions the status from “detection” to “partial” — we now know all files in the game, but not all checksum types.

    • If there is no match, we create a new entry in the database with the status dat.

Fixes :

There was an issue with the session variable @fileset_last, which was mistakenly referencing the filechecksumtable instead of the latest entry in the filesettable. This broke the logic for matching entries.

When a detection matched a file, only one checksum was previously being transferred. I fixed this to include all relevant checksums from the detection file.

 Bug Fixes and Improvements

Fixed redirection logic in the logs: previously, when a matched detection entry was removed, the log URL still pointed to the deleted fileset ID. I updated this to redirect correctly to the matched fileset.

Updated the dashboard to show unmatched datentries. These were missing earlier because the SQL query used an inner JOIN with the game table, and since set.dat files don’t have game table references, they were filtered out. I replaced it with a LEFT JOIN on fileset to include them.


That’s everything I worked on this past week. I’m still a bit unsure about the set.dat matching logic, so I’ll be discussing it further with Sev to make sure everything is aligned.

Thanks for reading!

Categories
Uncategorized

Week 1

Welcome to this week’s blog. Most of the time this week was spent fixing the portability of the Mac files. So the plan was to test the working of the mac files on both Python and C++ side. On checking the C++ side halfway through, we realised, that some code was broken and was giving incorrect results. So, Sev decided to take a look at it himself while I started working on the same task on the Python side.

On the Python side, the code had three main issues:

  • Not all Mac file variants were being covered.

    Fig. 1 : 7 Mac file variants ( Image taken from macresman.cpp -> MacResManager::open() )
  • Instead of using the data section of the resource fork, the entire resource fork was being used for the checksum calculations, which was different from what the C++ side was doing.

    Fig. 2 : The data section of the resource fork had to be separately extracted
  • There was no file filtering, which caused problems when Mac files were present – specifically, AppleDouble and raw resource fork files, which had their forks spread over multiple files. Instead of showing a single file entry with all the checksums, extra entries were incorrectly displayed as non-Mac files.
Fig. 3 : First file entry should not be a part of this game entry.

I corrected all these issues. For filtering, I added 7 different categories for each file – NON_MAC, MAC_BINARY, APPLE_DOUBLE_RSRC, APPLE_DOUBLE_MACOSX, APPLE_DOUBLE_DOT_, RAW_RSRC and ACTUAL_FORK_MAC.

Fig. 4 shows consistent output for the all the mac file variants. Next task is to create proper test suites for its verification and check the workflow with the C++ side.

Fig. 4 : Checksum calculation of all 7 macfile variants on python side

Thank you for reading.

Categories
Uncategorized

Week 0

Hi, I’m Shivang Nagta, a pre-final year Computer Science undergraduate. I’ll be sharing my weekly blogs here, with updates on my GSoC project — “System for Checking Game Files Integrity.”

My mentors for this project are Sev and Rvanlaar, and I’m really grateful to have them guiding me. This project has been part of the last two GSoC years, so a lot of work has already been done. Here’s the current status:

Work done by the previous developers :
1. Server Side – 
The server has been written in Flask. There’s a dashboard for proper visualization. The database schema and logic for feeding/updating the database have been implemented.

2. Client Side / ScummVM App :
There’s a Check Integrity button in the ScummVM application, which hits the server endpoint for validation with the checksum data of the game files.

Work done by me previously :
1. Client Side / ScummVM App :

  • Fixed the freezing issue in the Check Integrity dialog box. It was caused by the MD5 calculation of large files, which blocked synchronous screen updates. I solved it by implementing a callback system.
  • Engines like GLK and Scumm don’t use the Advanced Detector, so I worked on implementing a custom system to dump their detection entries. Some verification is still needed, as the current logic of these engines introduces complications in the implementation of the custom dumping systems.

2. Server Side :

  • I worked on two particular tasks: Punycode names and the different Mac files portability. Both tasks require final verification and testing. I’ve already mentioned them in the last section of the blog.

Work plan for Official Coding Phase:
1. Testing all the workflows on the server side :

  • Initial seeding by scummvm.dat (checksum data from the detection entries)
  • Uploading set.dat (checksum data from some old collections)
  • Uploading scan.dat (checksum uploaded by developers by scanning the local files using a command line utlility provided on the server)
  • user.dat from api (checksum coming from the client by the Check Integrity feature added on the ScummVM application)
  • Reupload scummvm.dat / set.dat

2. Moderation features :

  • Review the user submitted fileset
  • Have a list of unmatched fileset
  • Manual merge with search feature for a particular fileset ID followed by a merge screen
  • Remove filesets / undo changes, on a new upload (roll back feature)
  • Easy searching and filters of filesets by different field

3. Some fixes :

  • Different types of Mac files (like Appledouble, Macbinary, and Rsrc) have forks represented differently for the same game data. The checksums of Resource forks and Dataforks need to be extracted separately to create correct entries.
  • Often, filenames from one OS are not supported on another. To tackle this, Sev built a method on top of the classic Punycode encoding method (used for URL encoding), but it needs proper integration and testing in this project.

Tomorrow marks the beginning of the offical coding phase. Thank you for reading.