Categories
Uncategorized

Week 4

Finishing up the foundation

It’s already week 4 – and I’ve almost completed the first major part of the project! This week I finished the implementation of exporting detection entries from engines based on AdvancedDetector, and revisited the database schema and DAT parser.

This means I have now finished the code for creating DATs from game files, exporting detection entries from ScummVM and loading DATs from various sources. The only work with DATs that remains (for now) is creating logic for matching entries from untrusted sources with the detection entries.

Dealing with Detection Entries

Once I had the exporting working, I simply had to load it into the parser that I had already written. Only it wasn’t as simple as I had hoped…

The detection entries proved to be quite an interesting foe to tackle, causing all sorts of bugs in every nook and cranny. The first snag that I encountered was filenames having spaces and brackets, which were special characters that the parser depended on. Seems my idea to use regex to keep the code clean was going to make this quite a challenge to tackle!

The solution we came up with is the enclose filenames (and other metadata like game titles) in quotes, but getting regex to ignore matches inside quotes is a little challenging, and I wasn’t going to dig myself deeper into this hole I made for myself. So I ended up rewriting the entire first part of the parser into a  match_outermost_brackets() function.

We put text into quotes to get the parser to ignore their contents, but what about text with quotes themselves? The answer is using an escape character, like a backslash (\). What if the text has a backslash in it? Well, escape that too! I had to write the function to add backlashes before exporting in ScummVM, but PHP has a neat built-in function stripslashes() that gets the original string for you!

Back to the tables

Exporting and parsing the entries is one thing, but now I needed to actually put the data into the db.

After some discussion, we came to the conclusion to modify the schema a little bit to accommodate some more metadata, and also make it easier to use. After updating schema.php to fit the new design, I worked on getting the data into the right places, and added some conditionals to alter the insertion behavior depending on the source of the DAT file.

One last addition to the parser was handling different sizes and types of checksums, from full checksums to the last 5000 bytes (tail) checksums. Once I got this done, I could finally get the data into the db without a hitch.

That’s all for now. Next week I’ll be writing the matching logic and moving on the creating the website to browse these entries. Hope you check in next week for more updates!

Thanks for reading!

Categories
Uncategorized

Week 3

My expectations for the week

Week 3 of the GSoC summer has rolled around! This week was spent writing the CLI tool for developers to create their own DATs from game files on their computer, and dumping detection entries from ScummVM, also into DATs.

Part 1: Creating the CLI

The CLI needs to do a couple things – taking in the directory path that contains game files, scanning it for files in the root folder, calculating the checksums for these files, and finally combining them all in the right format so it can be parsed by the parser we wrote previously.

Most of the difficulty of this part came from trying to keep the script free from external dependencies, while also keeping it from getting too bloated. Long python scripts can be a real eyesore ?. But after dropping in some list comprehensions and iterators to substitute my rudimentary implementation, I got the code looking pretty decent I think.

Part 2: ScummVM detection entries

If you’re not aware, ScummVM detection entries are multiple file-checksum pairs that can be used to identify which variant of game you are running. Every variant has some files specific to it, and the detection entries contain these files. This lets ScummVM run the game correctly.

Each engine has it’s own detection entries, and our goal is to extract these entries and format them in the same way we did our with the data in the CLI application for our parser, and write them into DAT files. This functionality will be accessed with a command (--dump-all-detection-entries) while running ScummVM from the command line.

Almost all engines in ScummVM, bar three of them (SCUMM, Sky, Glk), use AdvancedDetector for identifying the right game variant. We need to declare a virtual method to the MetaEngineDetection class, which all other detection classes inherit from, that we can define in the various detector classes.

While I haven’t implemented the dumping on the special engines yet, for engines that use AdvancedDetector, I had overridden the virtual method that I declared in the MetaEngineDetection class, and returned the protected _gameDescriptors variable.

This function is then called for every engine and the data is formatted, and dumped into DAT files, that will be run through our parser and inserted into the DB. Quite the pipeline!

That’s all for now, hope you check in next week for more progress!

Thanks for reading!

Categories
Uncategorized

Week 2

My expectations for the week

It’s the second week of the GSoC summer! After creating the database as defined by the schema, it was time to populate it with some real values!

This week was dedicated to writing the parser for DAT files, inserting them into the db, and a CLI tool to create DATs from directories containing game files.

Part 1: Parsing DAT files

The first order of business was to actually come up with a good way to parse the DAT files. The hardest thing to do was to get the text inside the outermost brackets.

While I could simply try something like \((.|\s)+\) to match opening and closing brackets, but that would end up matching the very first ‘(‘ and the very last ‘)’ – something we don’t want if the file has multiple top-level brackets. So I had to get a little creative.

I had done some research in Week 1, and decided that recursive regex was the best option to keep the code small and maintainable (it’s not really a regular grammar anymore, but that’s besides the point). PHP uses a regex parser that is based on PCRE (Perl Compatible Regular Expressions), so recursion is built-in.

Now, to figure out how that works…

\((?:[^)(]+|(?R))*+\)

This is what I came up with. The start and end are quite similar to my first guess, but let’s take a look at the inner part of the expression.

The (?:[^)(]+|(?R))*+ is a non-capturing group, and it matches either [^)(]+ or (?R) zero or more times. [^)(]+ simply matches anything that isn’t a bracket, but (?R) is the real special sauce. It will cause the pattern to recursively match itself, which means any nested brackets are matched in this group.

This leaves only the top-level closing bracket to match the outermost (the first) opening bracket, giving us exactly what we want!

regex101 – Yay it works!

Once we have this data, we can extract the checksum data inside the brackets using a much simpler parsing technique, splitting by spaces. Checksum data is in the format rom ( values ). We then split the values by spaces to get the name, size, and checksum value. Quite straightforward compared to what we just did! We can store this data as key-value pairs, which are called associated arrays in PHP.

Part 2: Inserting the data into the DB

I’ll keep this part short – I simply needed to loop through the data, extract the metadata we need (only the engine name for now) and insert into the right tables.

Everything was easy to do, but when I was testing it out with large DAT files, I took forever to actually run. Why? Because insert queries, when executed one by one, are very slow. The largest of the DAT files have ~100k files with 3 checksums each, and each file needs 4 insertions. That’s well over a million queries.

The fix for this was easy enough, just wrap it in a transaction ?. This reduced the running time to a much more manageable 2 minutes. Good enough for now!

Part 3: CLI application

The CLI application is still a work in progress at the moment, but I wanted to mention it here since I got the most important functionality out of the way – calculating the checksums of all the files in a given directory.

This gives the devs that have game files an easy way to create DAT files similar to the ones we were parsing earlier, so that they can then add the checksum data into the database.

There’s still stuff left to do – along with actually creating the interface part of the application, I also have to write the data into a DAT file. Shouldn’t be too hard, since it in basically the inverse of the parsing functionality we made earlier.

That’s all for now, hope you check in next week for more progress!

Thanks for reading!

Categories
Uncategorized

Week 1

My expectations for the week

The first week of the official coding period has arrived!

This week I wanted to focus on the implementation of the DB schema, and filling it with its initial seed values.

Part 1: Database Schema

At the start of the summer, the ScummVM team gave me a database schema that they had decided on for the system, and tasked me with implementing the schema in the form of a MySQL database using PHP for the backend.

I spent the first day trying to decipher the complex-looking diagram and converting it to code. First order of business was to brush up on my database design knowledge. Figuring out the purpose of each table, understanding the relationships between their entities, and poking and prodding to see if I could spot any holes in the design. (Nothing as of yet!)

I wrote down the queries to create the db and its tables with the correct relations, but it was still missing something major – data!

Part 2: Adding data to the DB

I spent day 2 and 3 on creating dummy values for the db in an attempt to properly understand how the various entities related to each other. I ended up finding some issues with my schema implementation, so it was worth the effort!

The rest of the week was spent on a more pressing task – parsing checksum data from DAT files, created in clrmamepro, to the database. This is something I had no idea how to do in PHP, so I spent quite a while learning the ropes on how to handle strings and regex in PHP ?. (It’s surprisingly easy to do.)

While this part is not quite finished, I have a good idea on how to break up the data and pass it to insertion queries, that will eventually send it the db.

All in all, it’s been quite an eventful week, and I learnt a lot! I’ll probably post again soon once I finish the up parsing the DAT files, hope you check in for that!

Thanks for reading!