{"id":26,"date":"2025-06-16T23:43:18","date_gmt":"2025-06-16T23:43:18","guid":{"rendered":"https:\/\/blogs.scummvm.org\/shivangnagta\/?p=26"},"modified":"2025-06-16T23:43:18","modified_gmt":"2025-06-16T23:43:18","slug":"week-2","status":"publish","type":"post","link":"https:\/\/blogs.scummvm.org\/shivangnagta\/2025\/06\/16\/week-2\/","title":{"rendered":"Week 2"},"content":{"rendered":"<p data-start=\"330\" data-end=\"365\">Welcome to the weekly blog.<br \/>\nAfter wrapping up work on macfiles last week, I finally moved on to testing the first two phases of the data upload workflow \u2014 starting with <code data-start=\"516\" data-end=\"529\">scummvm.dat<\/code> (which contains detection entries from ScummVM used for populating the database) and <code data-start=\"582\" data-end=\"591\">set.dat<\/code> (data from older collections, provides more information).<\/p>\n<h4 data-start=\"660\" data-end=\"707\">Database Changes: File Size Enhancements<\/h4>\n<p data-start=\"709\" data-end=\"927\">Before diving into the testing, there was a change Sev asked for \u2014 to extend the database schema to store three types of file sizes instead of just one. This was necessary due to the nature of macfiles, which have:<\/p>\n<ul data-start=\"929\" data-end=\"1033\">\n<li data-start=\"929\" data-end=\"946\">\n<p data-start=\"931\" data-end=\"946\">A <strong data-start=\"933\" data-end=\"946\">data fork<\/strong><\/p>\n<\/li>\n<li data-start=\"947\" data-end=\"968\">\n<p data-start=\"949\" data-end=\"968\">A <strong data-start=\"951\" data-end=\"968\">resource fork<\/strong><\/p>\n<\/li>\n<li data-start=\"969\" data-end=\"1033\">\n<p data-start=\"971\" data-end=\"1033\">A third size: the <strong data-start=\"989\" data-end=\"1026\">data section of the resource fork<\/strong> itself<\/p>\n<\/li>\n<\/ul>\n<p data-start=\"1035\" data-end=\"1234\">This change introduced significant modifications to <code data-start=\"1087\" data-end=\"1104\">db_functions.py<\/code>, which contains the core logic for working with <code data-start=\"1153\" data-end=\"1159\">.dat<\/code> files. I had to be careful to ensure nothing broke during this transition.<\/p>\n<h4 data-start=\"1236\" data-end=\"1272\">Punycode Encoding Logic Fixes<\/h4>\n<p data-start=\"1274\" data-end=\"1344\">At the same time, I fixed the punycode logic in <code data-start=\"1326\" data-end=\"1343\">db_functions.py<\/code>. Punycode encoding (or rather, an extended version of the standard used in URL encoding) is employed by ScummVM to convert filenames into filesystem-independent ASCII-only representations.<\/p>\n<p data-start=\"1539\" data-end=\"1827\">There were inconsistencies between punycode logic in <code data-start=\"1596\" data-end=\"1613\">db_functions.py<\/code> and the original implementation in the Dumper Companion.\u00a0I made sure both implementations now align, and I ran unit tests from the Dumper Companion to verify correctness.<\/p>\n<h4 data-start=\"1829\" data-end=\"1875\">Feeding the Database &#8211; scummvm.dat<\/h4>\n<p data-start=\"1877\" data-end=\"1971\">With those fixes in place, I moved on to populating the database with data from <code data-start=\"1957\" data-end=\"1970\">scummvm.dat<\/code>. While Sev was working on the C++ side to add the correct filesize tags for detections, I ran manual tests using existing data. The parsing logic worked well, though I had to add support for the new &#8220;extra size&#8221; fields.<\/p>\n<p data-start=\"2201\" data-end=\"2429\">Additionally, I fixed the <strong data-start=\"2227\" data-end=\"2250\">megakey calculation<\/strong>, which is used later when uploading the scummvm.dat again with updates. This involved sorting files alphabetically before computing the key to ensure consistent results.<\/p>\n<p data-start=\"2431\" data-end=\"2788\">I also introduced a small optimization: if a file is less than 5000 bytes, we can safely assume that all checksum types (e.g., <code data-start=\"2562\" data-end=\"2577\">md5-full_file<\/code>, <code data-start=\"2579\" data-end=\"2590\">md5-5000B<\/code>, <code data-start=\"2592\" data-end=\"2608\">md5-tail-5000B<\/code>, <code data-start=\"2610\" data-end=\"2621\">md5-oneMB<\/code>, or the macfile variants like <code data-start=\"2647\" data-end=\"2651\">-d<\/code>\/<code data-start=\"2652\" data-end=\"2656\">-r<\/code> ) will be the same. In such cases, we now automatically fill all checksum fields with the same value used in detection.<\/p>\n<h4 data-start=\"2790\" data-end=\"2829\">Uploading and Matching &#8211; set.dat<\/h4>\n<p data-start=\"2831\" data-end=\"2990\">Finally, I worked on uploading <code data-start=\"2862\" data-end=\"2871\">set.dat<\/code> to the database, which usually contains the follwoing &#8211; metadata (mostly irrelevant), full size checksums only and filesizes.<\/p>\n<ul>\n<li style=\"list-style-type: none\">\n<ul data-start=\"3351\" data-end=\"3595\">\n<li data-start=\"3351\" data-end=\"3459\"><code data-start=\"3138\" data-end=\"3151\">scummvm.dat<\/code> doesn&#8217;t contain full file checksums like set.dat, so a match between files from <code data-start=\"3224\" data-end=\"3233\">set.dat<\/code> and <code data-start=\"3238\" data-end=\"3251\">scummvm.dat<\/code> is only possible when a file\u2019s size is less than the detection checksum size, generally 5000 Bytes.<\/li>\n<li data-start=\"3463\" data-end=\"3595\">\n<p data-start=\"3465\" data-end=\"3595\">This transitions the status from \u201cdetection\u201d to \u201cpartial\u201d \u2014 we now know all files in the game, but not all checksum types.<\/p>\n<\/li>\n<li data-start=\"3463\" data-end=\"3595\">\n<p data-start=\"3465\" data-end=\"3595\">If there is no match, we create a <strong data-start=\"3645\" data-end=\"3658\">new entry<\/strong> in the database with the status <code data-start=\"3691\" data-end=\"3696\">dat.<\/code><\/p>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<h6>Fixes :<\/h6>\n<p data-start=\"3791\" data-end=\"4013\">There was an issue with the session variable <code data-start=\"3836\" data-end=\"3851\">@fileset_last<\/code>, which was mistakenly referencing the <strong data-start=\"3890\" data-end=\"3914\"><code data-start=\"3892\" data-end=\"3906\">filechecksum<\/code><\/strong>table instead of the latest entry in the <strong data-start=\"3950\" data-end=\"3969\"><code data-start=\"3952\" data-end=\"3961\">fileset<\/code><\/strong>table. This broke the logic for matching entries.<\/p>\n<p data-start=\"4016\" data-end=\"4180\">When a detection matched a file, only one checksum was previously being transferred. I fixed this to include all relevant checksums from the detection file.<\/p>\n<h4 data-start=\"3699\" data-end=\"3732\">\u00a0Bug Fixes and Improvements<\/h4>\n<p data-start=\"4183\" data-end=\"4388\">Fixed redirection logic in the logs: previously, when a matched detection entry was removed, the log URL still pointed to the deleted fileset ID. I updated this to redirect correctly to the matched fileset.<\/p>\n<p data-start=\"4391\" data-end=\"4683\">Updated the dashboard to show unmatched <strong data-start=\"4416\" data-end=\"4448\"><code data-start=\"4433\" data-end=\"4438\">dat<\/code><\/strong>entries. These were missing earlier because the SQL query used an inner JOIN with the <code data-start=\"4527\" data-end=\"4533\">game<\/code> table, and since <code data-start=\"4551\" data-end=\"4560\">set.dat<\/code> files don\u2019t have game table references, they were filtered out. I replaced it with a LEFT JOIN on <code data-start=\"4655\" data-end=\"4664\">fileset<\/code> to include them.<\/p>\n<hr data-start=\"4685\" data-end=\"4688\" \/>\n<p data-start=\"4690\" data-end=\"4872\">That\u2019s everything I worked on this past week. I\u2019m still a bit unsure about the <code data-start=\"4769\" data-end=\"4778\">set.dat<\/code> matching logic, so I\u2019ll be discussing it further with Sev to make sure everything is aligned.<\/p>\n<p data-start=\"4874\" data-end=\"4893\">Thanks for reading!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Welcome to the weekly blog. After wrapping up work on macfiles last week, I finally moved on to testing the first two phases of the data upload workflow \u2014 starting with scummvm.dat (which contains detection entries from ScummVM used for populating the database) and set.dat (data from older collections, provides more information). Database Changes: File [&hellip;]<\/p>\n","protected":false},"author":29,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-26","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/blogs.scummvm.org\/shivangnagta\/wp-json\/wp\/v2\/posts\/26","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blogs.scummvm.org\/shivangnagta\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.scummvm.org\/shivangnagta\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.scummvm.org\/shivangnagta\/wp-json\/wp\/v2\/users\/29"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.scummvm.org\/shivangnagta\/wp-json\/wp\/v2\/comments?post=26"}],"version-history":[{"count":1,"href":"https:\/\/blogs.scummvm.org\/shivangnagta\/wp-json\/wp\/v2\/posts\/26\/revisions"}],"predecessor-version":[{"id":27,"href":"https:\/\/blogs.scummvm.org\/shivangnagta\/wp-json\/wp\/v2\/posts\/26\/revisions\/27"}],"wp:attachment":[{"href":"https:\/\/blogs.scummvm.org\/shivangnagta\/wp-json\/wp\/v2\/media?parent=26"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.scummvm.org\/shivangnagta\/wp-json\/wp\/v2\/categories?post=26"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.scummvm.org\/shivangnagta\/wp-json\/wp\/v2\/tags?post=26"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}