{"id":31,"date":"2025-06-30T09:31:54","date_gmt":"2025-06-30T09:31:54","guid":{"rendered":"https:\/\/blogs.scummvm.org\/shivangnagta\/?p=31"},"modified":"2025-06-30T09:31:54","modified_gmt":"2025-06-30T09:31:54","slug":"week-4","status":"publish","type":"post","link":"https:\/\/blogs.scummvm.org\/shivangnagta\/2025\/06\/30\/week-4\/","title":{"rendered":"Week 4"},"content":{"rendered":"<p data-start=\"221\" data-end=\"429\">Welcome to this week\u2019s blog update. This week was focused on fixing bugs related to seeding and <code data-start=\"317\" data-end=\"326\">set.dat<\/code> uploading, improving filtering mechanisms, and rewriting parts of the processing logic for <code data-start=\"418\" data-end=\"428\">scan.dat<\/code>.<\/p>\n<p data-start=\"471\" data-end=\"553\">After some manual checks with the recent <code data-start=\"512\" data-end=\"521\">set.dat<\/code> updates, a few issues surfaced:<\/p>\n<h5 data-start=\"555\" data-end=\"589\">1. <strong data-start=\"562\" data-end=\"589\">Identical Detection Fix<\/strong><\/h5>\n<p data-start=\"590\" data-end=\"1087\">Some detection entries had identical file sets, sizes, and checksums, which led to issues during the automatic merging of <code data-start=\"712\" data-end=\"721\">set.dat<\/code>. Previously, we used a <code data-start=\"745\" data-end=\"754\">megakey<\/code> to prevent such duplicates, but since it included not only file data but also language and platform information, some identical versions still slipped through.<br data-start=\"910\" data-end=\"913\" \/>To solve this, I replaced the <code data-start=\"943\" data-end=\"952\">megakey<\/code> check with a more focused comparison: filename, size, and checksum only, and logging the details of the fileset that got clashed.<\/p>\n<h5 data-start=\"1089\" data-end=\"1130\">2. <strong data-start=\"1096\" data-end=\"1130\">Punycode Encoding Misplacement<\/strong><\/h5>\n<p data-start=\"1131\" data-end=\"1504\">The filenames were being encoded using Punycode every time they were processed for database insertion. However, there was no requirement for this encoding as it should have occurred earlier \u2014 ideally at the parsing stage, either by the scanning utility that generates <code data-start=\"1378\" data-end=\"1384\">.dat<\/code> files or on the application\u2019s upload interface. I have removed the encoding during database updates. Though I still have to add it at the scan utility side, which I&#8217;ll do this week.<\/p>\n<h5 data-start=\"1506\" data-end=\"1542\">3. <strong data-start=\"1513\" data-end=\"1542\">Path Format Normalization<\/strong><\/h5>\n<p data-start=\"1543\" data-end=\"1930\">Another issue was related to inconsistent file paths. Some <code data-start=\"1602\" data-end=\"1611\">set.dat<\/code> entries used Windows-style paths (<code data-start=\"1646\" data-end=\"1655\">xyz\\abc<\/code>), while their corresponding detection entries used Unix-style (<code data-start=\"1719\" data-end=\"1728\">xyz\/abc<\/code>). Since filtering was done using simple string matching, these mismatches caused failures. I resolved this by normalizing all paths to use forward slashes (<code data-start=\"1887\" data-end=\"1890\">\/<\/code>) before storing them in the database.<\/p>\n<h5 data-start=\"1932\" data-end=\"1999\">4. <strong data-start=\"1939\" data-end=\"1999\">Improving Extra File Detection with <code data-start=\"1977\" data-end=\"1985\">clonof<\/code> and <code data-start=\"1990\" data-end=\"1997\">romof<\/code><\/strong><\/h5>\n<p data-start=\"2000\" data-end=\"2260\">While analyzing filesets, I encountered a previously unnoticed <code data-start=\"2063\" data-end=\"2071\">clonof<\/code> field (similar to <code data-start=\"2090\" data-end=\"2097\">romof<\/code>). These fields indicate that extra files might be listed elsewhere in the <code data-start=\"2172\" data-end=\"2178\">.dat<\/code> file. The previous logic only looked in the <code data-start=\"2223\" data-end=\"2233\">resource<\/code> section, but I found that:<\/p>\n<ul data-start=\"2261\" data-end=\"2392\">\n<li data-start=\"2261\" data-end=\"2314\">\n<p data-start=\"2263\" data-end=\"2314\">Extra files could also exist in the <code data-start=\"2299\" data-end=\"2305\">game<\/code> section.<\/p>\n<\/li>\n<li data-start=\"2315\" data-end=\"2392\">\n<p data-start=\"2317\" data-end=\"2392\">The file references could chain across multiple sections (e.g., A \u2192 B \u2192 C).<\/p>\n<\/li>\n<\/ul>\n<p data-start=\"2394\" data-end=\"2539\">To address this, I implemented an iterative lookup for extra files, ensuring all relevant files across multiple levels are properly detected.<\/p>\n<h4 data-start=\"2546\" data-end=\"2584\">scan.dat Processing Improvements<\/h4>\n<p data-start=\"67\" data-end=\"162\">For <code data-start=\"73\" data-end=\"83\">scan.dat<\/code>, I introduced a file update strategy that runs before full fileset matching. All files that match based on file size and checksum are updated first. This allows us to update matching files early, without relying solely on complete fileset comparisons.<\/p>\n<h4 data-start=\"2907\" data-end=\"2942\">Minor Fixes &amp; UI Enhancements<\/h4>\n<ul data-start=\"2944\" data-end=\"3435\">\n<li data-start=\"2944\" data-end=\"3018\">\n<p data-start=\"2946\" data-end=\"3018\">Prevented reprocessing of filesets in <code data-start=\"2984\" data-end=\"2993\">set.dat<\/code> if a key already exists in subsequent runs.<\/p>\n<\/li>\n<li data-start=\"3019\" data-end=\"3125\">\n<p data-start=\"3021\" data-end=\"3125\">Passing the <code data-start=\"3029\" data-end=\"3040\">--skiplog<\/code> CLI argument to set.dat processing to\u00a0 suppress verbose logs during fileset creation and automatic merging.<\/p>\n<\/li>\n<li data-start=\"3126\" data-end=\"3196\">\n<p data-start=\"3128\" data-end=\"3196\">Improved filtering in the dashboard adding more fields like engineid, transcation number and fileset id, and fixing some older issues.<\/p>\n<\/li>\n<li data-start=\"3197\" data-end=\"3435\">\n<p data-start=\"3199\" data-end=\"3327\">Introduced a new &#8220;Possible Merges&#8221; button in the filesets dashboard to manually inspect and confirm suggested merges.This feature is backed by a new database table that stores fileset matches for later manual review.<\/p>\n<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Welcome to this week\u2019s blog update. This week was focused on fixing bugs related to seeding and set.dat uploading, improving filtering mechanisms, and rewriting parts of the processing logic for scan.dat. After some manual checks with the recent set.dat updates, a few issues surfaced: 1. Identical Detection Fix Some detection entries had identical file sets, [&hellip;]<\/p>\n","protected":false},"author":29,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-31","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/blogs.scummvm.org\/shivangnagta\/wp-json\/wp\/v2\/posts\/31","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blogs.scummvm.org\/shivangnagta\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.scummvm.org\/shivangnagta\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.scummvm.org\/shivangnagta\/wp-json\/wp\/v2\/users\/29"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.scummvm.org\/shivangnagta\/wp-json\/wp\/v2\/comments?post=31"}],"version-history":[{"count":1,"href":"https:\/\/blogs.scummvm.org\/shivangnagta\/wp-json\/wp\/v2\/posts\/31\/revisions"}],"predecessor-version":[{"id":32,"href":"https:\/\/blogs.scummvm.org\/shivangnagta\/wp-json\/wp\/v2\/posts\/31\/revisions\/32"}],"wp:attachment":[{"href":"https:\/\/blogs.scummvm.org\/shivangnagta\/wp-json\/wp\/v2\/media?parent=31"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.scummvm.org\/shivangnagta\/wp-json\/wp\/v2\/categories?post=31"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.scummvm.org\/shivangnagta\/wp-json\/wp\/v2\/tags?post=31"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}