Categories
Uncategorized

Week 11

Welcome to this week’s blog. This week, I worked on testing the workflow for scan data as well as the user file integrity service.

For the scan data, we tested with the WAGE game archives, which provided a good opportunity to test both the scan utility and the scan matching for the Mac files. Some fixes were indeed needed for the matching process. Initially, I was using both size (data fork size) and size-rd (resource fork’s data section size) simultaneously while filtering filesets. However, this was incorrect, since detection filesets only contain one of these at a time. Additionally, I fixed how matched entries were being processed. Previously, entries matched with detection were placed for manual merge to add specific files while avoiding unnecessary ones like license or readme files from commercial games. However, it made more sense to merge them automatically and later remove such files if necessary—especially since, for the archives like WAGE, the issue of extra files from commercial games would not occur.

I also carried out testing for the user integrity service, focusing on different response cases:

  1. All files are okay when a full fileset matches.

  2. Extra files are present.

  3. Some files are missing.

Another missing piece was reporting files due to checksum mismatches, which previously was being classified under extra files. This is now fixed. I also reviewed the manual merge process for user filesets. Unlike set filesets, the source fileset (user fileset here) should not be deleted after a manual merge, since it could be a possible new variant which would need additional metadata information. To support this, I implemented a feature to update fileset metadata—though it still requires some refinement. An additional thing that I need to add is to create an endpoint in the web server that can be triggered by the mail server. This endpoint will provide the mail information, particularly the user fileset ID, for which the user has provided some additional information via the pre-drafted email that is promted when user uses the ‘check integrity’ feature in the ScummVM application.

A few other fixes this week included:

  • Deleting multiple files from a fileset through dashboard: Previously, the query was being generated incorrectly. Instead of ‘DELETE FROM file WHERE id IN (‘1’, ‘2’, ‘3’)’ it was generating ‘DELETE FROM file WHERE id IN (‘1, 2, 3′)’ which, of course, did not work. This issue is now fixed.

  • Search filter issue: A bug occurred when a single quote (‘) was used as a value in search filters, breaking the query due to missing escaping for the quote. This has also been fixed.

Categories
Uncategorized

Week 10

Welcome to this week’s blog. This week, my work focused on enhancing API security, adding github authentication, refining project structure, and introducing a faster Python package manager (UV).

API Security Improvements

I implemented some checks on the validation endpoint, which processes the user game files data sent from the ScummVM application. These checks are designed to prevent any kind of brute-force attempts –

Checks on validation endpoint

On top of that, I introduced rate limiting using Flask-Limiter. Currently, the validation endpoint allows a maximum of 3 requests per minute per user.

GitHub OAuth & Role-Based Access

GitHub OAuth authentication is now in place, introducing a three-level role-based system. Though, I have tested it with my own dummy organisation, the integration with ScummVM is remaining:

  • Admin – Full access, plus the ability to clear the database.

  • Moderators – Same permissions as Admin, except database clearing.

  • Read-Only – Logged-in users with viewing rights only.

Github OAuth
Project Restructuring & UV Integration

As suggested by my mentor Rvanlaar, I restructured the project into a Python module, making the import logic cleaner and improving overall modularity. I also added UV, a high-performance Python package and project manager, offering faster dependency handling compared to pip.

Other Fixes & Improvements
  • Updated the apache config file to use the Python virtual environment instead of the global installation.

  • Correctly decode MacBinary filenames from headers using MacRoman instead of UTF-8.

  • Improved error handling for the scan utlility.

  • Use one of size or size-rd for filtering filesets for scan.dat in case of macfiles instead of both simultaneously.
Categories
Uncategorized

Week 9

Welcome to this week’s blog. This week was a busy one due to my college workload, but I mostly focused on enhancing the webpage. I worked on the configuration page, the manual merge dashboard, filtering, search-related improvements, and more.

  • Configuration Page:
    I added a new configuration page that allows users to customize their preferences, including:

    • Number of filesets per page

    • Number of logs per page

    • Column width percentages for the fileset search page

    • Column width percentages for the log page

    All these preferences are stored in cookies for persistence.

    User Configuration Page
  • Manual Merge Dashboard:
    I performed some refactoring of the codebase for manual merging. Additionally, I added options to:

    • Show either all files or only the common ones

    • Display either all fields of the files, or just the full-size MD5 and size (or size-rd in the case of Mac files)

  • Search Functionality:
    I improved the search system with the following features:

    • Exact match: Values wrapped in double quotes are matched exactly

    • OR search: Multiple terms separated by spaces are treated as an OR

    • AND search: Terms separated by + are treated as an AND

  • Sorting Enhancements:
    The sorting feature now includes three states for each column: ascending, descending, and default (unsorted).

Minor Fixes & Improvements
  • Added favicon to display on the webpage tab
  • Implemented checksum-based filtering in the fileset search page
  • Included metadata information in seeding logs (unless --skiplog is passed)
Goals for Next Week
  • Add GitHub-based authentication
  • Implement a three-tier user system: admin, moderator, and read-only
  • Add validation checks on user data to prevent brute force attacks
  • Refactor the entire project into a Python module for better structure and cleaner imports