Ellen's GSoC Blog

Week 4: MADE

Introduction

During this week of GSoC, I mostly focused on adding TTS to MADE, with some work on ADL as well. A PR has been made for MADE, though there may be more work for it in the future.

MADE

Most of my week was dedicated to adding TTS to MADE. MADE – or at least, Return to Zork – offered many new challenges. For one, rather than displaying text one piece at a time, like I’ve seen in most of the engines I’ve worked with, Return to Zork displays several pieces every frame across multiple channels. This meant that I couldn’t simply track the previously said text in a single variable to avoid speech loops, as it would be repeatedly overridden. To solve this problem, I tried several different approaches. First, I tried finding the function that actually sets most of the text, which I found in the form of sfReadMenu. This function does set the text one piece at a time, which would be good for voicing. However, the issue with it is that sfReadMenu is called several times when a new scene loads, even when the text that it’s setting isn’t visible, resulting in extraneous text being voiced. I tried to check for any flags that could identify this text as invisible, but I didn’t find anything good to use, so I scrapped this idea. I then thought of tracking several different previously said texts and their channels in an array, but this seemed unnecessarily cumbersome, especially because the channels are often changing. Eventually, I settled on adding a variable for the previously said text to the individual channels themselves, and using this variable to check if text should be voiced. This worked quite well in most cases, and after accounting for exceptions where the spoken text should be queued, it resulted in good TTS for much of the game.

MADE, however, had a few additional problems. For one, it seems to handle hovering over objects completely within the game scripts, which means that there’s no easy place to detect when the cursor hovers over buttons in Return to Zork’s save/load screen. After trying to search for possible conditions I could use, I eventually decided to recreate the click boxes for the save, load, and cancel buttons, as well as the text entry boxes for new save slots. Fortunately, none of the text for these buttons is in the form of an image, which meant I could find the IDs for each individual object or menu that displays them and use these IDs to get their text, a strategy that should work across languages.

Another issue arises with the tape recorder in Return to Zork. The text for the name and numbers on each tape recorder entry are displayed within channels, but the text that corresponds to these pieces – such as “name” or “trk” – are in sfDrawText, which results in awkward, out-of-order voicing (for example, when the track is 001 and the name is Wizard Trembyle, TTS would say “trk, name, 001, Wizard Trembyle”). Fortunately, this could easily be avoided by not voicing these pieces in sfDrawText, and instead fetching this text when voicing the channel text, resulting in cleaner voicing (such as “trk: 001, name: Wizard Trembyle”). Unfortunately, the tape recorder is interactive, which further complicates matters. Voicing each piece separately can result in issues if the user switches between entries where a number is the same, such as one entry having max track 001 and the other having max track 001, since the previously said text won’t change and thus it won’t be voiced at all. Getting around this involved voicing all pieces – name, track, and max track – whenever the name changes, as it should be unique for each entry, alongside voicing the track individually in specific situations to account for the user switching tracks. Voicing the time was another matter: if it’s voiced whenever it changes, then the TTS system will repeatedly try to voice it as it ticks upward, resulting in awkward interrupts or unnecessary lag. I eventually decided that it was best to try to voice it when the sound clip ends, so it’ll only be voiced at the moment it stops moving. These techniques involved keeping track of flags and the status of the previously said variables, since they’re modified and retrieved each frame, but it seems to have resulted in more responsive voicing of the tape recorder.

In conclusion, MADE was a little more challenging than expected, since Return to Zork has several simultaneous channels and handles much of its logic within the game scripts. Nonetheless, it was a fun experience adding TTS to it. It seems finished now, though it remains to be seen how well it will handle different games or translations, and it has a few hardcoded translations that need verification.

ADL

ADL has seemed quite easy so far. As an engine for rather simple text-based games, almost all of its text is handled by Display::printString and Display::printAsciiString. It doesn’t even require tracking the previously said text, since these methods to display text are called only once, and there are no buttons that require responsive voicing. Thus, I think I’ve almost finished TTS for it, though challenges may arise later.

Conclusion

Over week 4 of GSoC, I’ve completed more TTS implementations, with MADE finished and ADL mostly finished. MADE offered the most challenges, due to its different ways of handling text, but its lack of text in the form of images was welcome. It’s been an interesting week, and I’m excited to explore more engines. Next week, I’ll be looking to finish TTS for ADL and begin work on TTS for Parallaction.

Uncategorized

Week 3: Draci

Introduction

Another week of GSoC has passed, with more text-to-speech work done, as a PR has been opened adding TTS to Draci. In addition, TTS for WAGE and CruisE has been merged, and TTS for Cine has been more rigorously tested.

Draci

Much of my time this week was invested in adding TTS to Draci. In terms of text complexity, Draci was roughly below average: most of its text was displayed in a few places, which were easy to track down. There were, however, several considerations regarding the engine. For one, Draci’s Czech and Polish translations have full (or almost full) voiceovers, while the German and English translations only have subtitles. To account for these differences, I split TTS into subtitles and objects, my first time doing so, and had to refrain from voicing subtitles for versions with voiceovers. Beyond that, I had to implement making the subtitles last as long as the TTS is speaking, as the subtitles normally move so fast that the speech can’t keep up; TTS for missing voice clips for the Czech and Polish versions, which received its own option at the suggestion of my mentor; and recognition of cases where the text is only punctuation, which isn’t voiced by the TTS system, and is thus immediately skipped (the text remains on screen for as long as TTS is speaking, meaning if the text isn’t voiced, the system immediately believes that it’s time to move to the next subtitle). All of these were interesting exercises in timing and accommodating different versions.

In my opinion, one of the most interesting components of Draci was the encoding. Draci uses encodings not present in ScummVM’s CodePage enum, which meant I had to hunt them down and manually write a translation table. Fortunately, the Dragon History website states that the Czech, English, and German translations all use Kamenický encoding, with the German translation having an exception for ß. This simplified the process greatly, as it just required creating a translation table for Kamenický encoding and converting the bytes, a similar process to the fix for TeenAgent’s Russian translation. Unfortunately, the Polish encoding is only described by the same website as “some ridiculous proprietary encoding”. I initially tried observing the bytes of the in-game strings to figure out the encoding for each character, but I worried that I’d miss a few characters. Instead, I found the character sets in the original Draci source code, which showed the bytes of UTF-8 encoding for each character as characters in the Windows-1250 code page. This meant that all I had to do was translate these characters according to Windows-1250 encoding to bytes, then combine these bytes to obtain the UTF-8 encodings for the Polish characters. That was most of the work for encodings, aside from altering the encoding table for German and English to replace certain Czech characters with equivalents in those languages, as TTS for the credits struggled to pronounce those characters.

Ultimately, Draci’s encodings were interesting to explore, and I’m very glad that the Dragon History website had them listed.

MADE

At the end of the week, I started work on MADE, but more work must be done for it. It seems that Return to Zork handles text a little differently from what I’m used to, as it displays text in several simultaneous channels every frame. My usual solution of tracking the previously said text with a variable won’t work here, as it’ll be overridden by the other text. I’ve thought of a few possible solutions, including tracking several different previously said texts or finding where the text is set – which I may have already found, but I need to do more investigation – and I plan to explore them over this week.

Miscellaneous

Aside from Draci and MADE, I fixed a few problems with CruisE – thanks to my mentor for noticing that the encoding was wrong for some of the versions, among other necessary changes – and tested TTS for Cine for other versions. For Cine, I was worried that my method of checking image names for text in the form of an image would fail across different versions of Future Wars and Operation Stealth, but I was pleasantly surprised to not find any issues, at least in my testing. The only changes I ended up making were adding a few new translations, fixing some small caveats with the encoding for the German and French translations, adding line-by-line TTS for the Operation Stealth copy protection screen (which I initially had, but I removed due to fears that it wouldn’t work across versions), adding TTS for Future Wars’s grid copy protection screen, and adjusting for exceptions across versions (such as the US Amiga version of Future Wars using a different copy protection screen from all other Amiga versions). Cine is thus much more thoroughly tested across different game editions now.

Conclusion

This week involved more work on TTS, with testing for Cine, TTS implementation for Draci, and the beginnings of TTS implementation for MADE. Exploring Draci’s encodings was the most entertaining part, and I hope to bring over my more in-depth knowledge of encoding methods into future implementations. Next week, I’ll be finishing TTS for MADE and beginning TTS for ADL, alongside any changes I need to make to my open PRs for Cine and Draci.

Uncategorized

Week 2: CruisE

Introduction

Another week of GSoC has passed, and I’m fairly satisfied with my progress. This week, I mostly focused on adding TTS to CruisE, and I’ve opened a PR for it. However, I also spent some time working on Draci, and on updating older PRs.

CruisE

I began this week with adding TTS to CruisE. Fortunately, CruisE wasn’t too difficult to work with, as it has very few cases of text in the form of an image. In addition, most of Cruise for a Corpse’s text is displayed from renderText, which made it much easier to identify where text is displayed in the code. The difficulty with Cruise for a Corpse, however, came from making it user-friendly. For example, text can’t simply be voiced from renderText, as there are cases when the text appears on screen a significant amount of time after renderText is called, or cases where the text isn’t visible, such as when the user tries to open the inventory in the copy protection screen. In a similar manner, text can’t be voiced from createMenu, because there are times when it shouldn’t be voiced (mainly when the inventory is empty). It also can’t be voiced from createTextObject, which handles much of the game’s text, because freezeCell is sometimes called on objects created with this method, which means that the text appears on screen later. Voicing text for CruisE, therefore, required addressing these exceptions accordingly, such as by storing the text from createTextObject and voicing it when the cell is unfrozen; queuing the text of the first menu item hovered over after a menu is opened instead of interrupting the current speech, so the title of the menu is always voiced; and checking for button input in Op_GetMouseButton to stop TTS when skipping through dialog or a cutscene. I believe that these changes make the TTS for CruisE more responsive and more accurate to the gameplay.

Ultimately, actually finding where text was displayed in CruisE wasn’t that difficult. The more time-consuming part was finding and accounting for the many exceptions, skips, and places in the code that freeze, show, or hide text, in order to create a user-friendly experience. It was entertaining to look through the code and ponder the best implementation, as its code was more spread out than that of Cine or WAGE.

Miscellaneous

After CruisE, I worked on various components related to my project. For one, I revisited my WAGE PR, which required a fair number of changes, including resolving some difficulties across games that I missed (in some cases, different WAGE games handle the same circumstances surprisingly differently, such as Eidisi only calling renderSubmenu when hovering over a new item, while Ray’s Maze and Another Fine Mess call it every frame, requiring a check for the last submenu item that was hovered over in these games). I also worked on fixing TTS for TeenAgent’s Russian translation, which uses a custom encoding that has to be replicated for proper voicing – fortunately, it seemed that this encoding followed a simple pattern of adding a number to the original character to obtain a Cyrillic character in UTF-8, though since I don’t speak Russian, I’m uncertain if this works in all cases.

Beyond that, I started work on TTS for Draci. So far, Draci seems even simpler than CruisE, with fewer exceptions and oddities. Good progress has been made on Draci, but there is always a chance that something unexpected will emerge.

Conclusion

CruisE was a somewhat more complex engine to add TTS to than WAGE or Cine, and it was entertaining to hunt down its different exceptions and learn how it handles its inputs and menus. A PR has been opened for it, and while I imagine that there’ll be more work for it in the future, the hardest part for it is done. Next week, I’ll be continuing work on Draci, and I look forward to completing TTS support for it.

Uncategorized

Week 1: WAGE and Cine

Introduction

The first week of GSoC is over, and I’m fairly happy with how it went. My TeenAgent PR was merged, and I’ve opened two new PRs: one adding TTS to WAGE and one adding TTS to Cine. They’re still under review, with Cine requiring more testing and translation verifications, and I imagine that there will be more work to be done with them, but the hardest parts – getting familiar with the engines and adding TTS to most of their text – are over.

WAGE

I started this week with WAGE, which was a relatively simple engine. There was an abandoned PR adding TTS to this engine that I picked up, but while it provided a base for me to work with, it was buggy and unfinished, meaning there was still a significant amount of work to be done. As for the engine itself, WAGE’s games were almost entirely text-based, with few graphical components. For me, this had its benefits and disadvantages. On the positive side, I didn’t have to worry much about hovering over objects, and finding where text was output to the screen wasn’t too difficult: nearly all of the text was in the form of the action log, which was modified by only a small number of functions. On the negative side, the greater quantity of text meant that there was more to look at and be aware of.

For the most part, adding TTS to WAGE was simple. It was a matter of adding a toggle, identifying the small handful of methods that displayed text, and feeding it through the TTS manager. There was no need to clean up any of the text. I soon encountered a caveat, however, with how WAGE handles its command menu. Rather than being embedded into the engine, WAGE uses the MacWindowManager class to manage its menu, including submenus and dialogs. My first approach was to identify and process this text inside WAGE’s Gui::processEvent method by retrieving the menu item that the mouse is over from the MacWindowManager and voicing its text. This worked fine initially, until I tried to voice the buttons: once a MacDialog is open, it pauses the loop inside of WageEngine::run, which is what runs Gui::processEvent. Without Gui::processEvent running, I couldn’t check for the mouse hovering over a button from it. At this point, I realized that it would be difficult to keep everything exclusively in WAGE itself, so I added TTS code to MacWindowManager, which worked much better. I did end up restoring some of my original code to Gui::processEvent for menu items, since MacMenu didn’t seem to have a trigger for hovering over a menu item (only for clicking one, and I wanted to voice the menu item as soon as the user hovers over it). The end result was TTS working as expected for the command menu.

Ultimately, WAGE wasn’t particularly difficult, but the fact that it used MacWindowManager for much of its GUI was an initial challenge. With a little extra code, however, it functioned fine.

Cine

After WAGE, I worked on adding TTS to Cine. Cine’s games, Future Wars and Operation Stealth, have much less text than WAGE games, and much of it is displayed through a few methods. Voicing the majority of the engine’s text was as simple as feeding the text into a small handful of methods, mainly drawMessage, drawMenu, and drawCommand. From there, all that was necessary was making sure it all behaved in a user-friendly way, like voicing the “USE” and “INVENTORY” commands when using the F3 and F4 keys and vocalizing inventory items.

Unfortunately, Cine came with a rather large problem: it has a lot of text in the form of images. Just about any text in Future Wars and Operation Stealth that isn’t directly related to gameplay (i.e. commands and menus) is an image. This includes credits, some cutscene text in Operation Stealth, and everything in the copy protection screens. Two problems resulted from this issue. For one, much of Cine’s work is handled by global and object scripts built into the files, which meant that there was no easy location to find where these images are displayed. As a result, I had to go to the methods that render general images and catch the exact conditions (PRC name, object index and frame, background name, and so on) under which they display. For another, I needed to know the text of these images in all supported languages and versions, so I could hardcode it. This meant a mixture between looking through videos on YouTube and asking the community (thanks to my mentor, criezy, for providing the copy protection images in French, and eientei for providing the German, Spanish, and Italian copy protection fail texts for Operation Stealth). It also meant keeping track of exceptions: Future Wars has two different copy protection screens, one for DOS and one for Amiga and Atari ST; Operation Stealth’s opening credits has a credit for the IBM version only for the DOS version, but its end credits has this credit in all versions; and Future Wars has a different opening title screen for the French version (“Les Voyageurs du Temps: La Menace” as opposed to “Future Wars: Time Travellers” for Amiga and Atari or “Future Wars: Adventures in Time” for DOS). Accounting for these exceptions meant more checks and more text to include, but it has been done.

In the end, getting the copy protection screen and credits to be voiced was the bulk of the work for Cine. Finding the different translations, deciphering the best place to voice them, and adding a new method to recognize hovering over buttons in the copy protection screens was time-consuming, but entertaining.

Conclusion

WAGE and Cine weren’t too difficult to add TTS to, and it was fun to implement it. The most time-consuming was working with Cine’s copy protection screens. There were some difficulties, but through enough investigation, they’ve been resolved. However, as of this blog post, a fair number of translations in Cine need to be verified, and other versions of the game have to be tested.

Next week, I’ll be focusing on adding TTS to the cruisE engine. I’m looking forward to exploring it.

Week 0

Week 0: Introduction

Hello! I’m Ellen, a second-year undergraduate computer science student. Over the summer, I’ll be adding text-to-speech to several ScummVM engines to enhance their accessibility and assist language learners. I’ve already worked on adding TTS to two engines (Drascula and TeenAgent), and I hope to continue the process for other engines. I’m excited to work on this project!

Recent Posts

Recent Comments

Archives

Categories