Introduction
During this week of GSoC, I mostly focused on adding TTS to MADE, with some work on ADL as well. A PR has been made for MADE, though there may be more work for it in the future.
MADE
Most of my week was dedicated to adding TTS to MADE. MADE – or at least, Return to Zork – offered many new challenges. For one, rather than displaying text one piece at a time, like I’ve seen in most of the engines I’ve worked with, Return to Zork displays several pieces every frame across multiple channels. This meant that I couldn’t simply track the previously said text in a single variable to avoid speech loops, as it would be repeatedly overridden. To solve this problem, I tried several different approaches. First, I tried finding the function that actually sets most of the text, which I found in the form of sfReadMenu
. This function does set the text one piece at a time, which would be good for voicing. However, the issue with it is that sfReadMenu
is called several times when a new scene loads, even when the text that it’s setting isn’t visible, resulting in extraneous text being voiced. I tried to check for any flags that could identify this text as invisible, but I didn’t find anything good to use, so I scrapped this idea. I then thought of tracking several different previously said texts and their channels in an array, but this seemed unnecessarily cumbersome, especially because the channels are often changing. Eventually, I settled on adding a variable for the previously said text to the individual channels themselves, and using this variable to check if text should be voiced. This worked quite well in most cases, and after accounting for exceptions where the spoken text should be queued, it resulted in good TTS for much of the game.
MADE, however, had a few additional problems. For one, it seems to handle hovering over objects completely within the game scripts, which means that there’s no easy place to detect when the cursor hovers over buttons in Return to Zork’s save/load screen. After trying to search for possible conditions I could use, I eventually decided to recreate the click boxes for the save, load, and cancel buttons, as well as the text entry boxes for new save slots. Fortunately, none of the text for these buttons is in the form of an image, which meant I could find the IDs for each individual object or menu that displays them and use these IDs to get their text, a strategy that should work across languages.
Another issue arises with the tape recorder in Return to Zork. The text for the name and numbers on each tape recorder entry are displayed within channels, but the text that corresponds to these pieces – such as “name” or “trk” – are in sfDrawText
, which results in awkward, out-of-order voicing (for example, when the track is 001 and the name is Wizard Trembyle, TTS would say “trk, name, 001, Wizard Trembyle”). Fortunately, this could easily be avoided by not voicing these pieces in sfDrawText
, and instead fetching this text when voicing the channel text, resulting in cleaner voicing (such as “trk: 001, name: Wizard Trembyle”). Unfortunately, the tape recorder is interactive, which further complicates matters. Voicing each piece separately can result in issues if the user switches between entries where a number is the same, such as one entry having max track 001 and the other having max track 001, since the previously said text won’t change and thus it won’t be voiced at all. Getting around this involved voicing all pieces – name, track, and max track – whenever the name changes, as it should be unique for each entry, alongside voicing the track individually in specific situations to account for the user switching tracks. Voicing the time was another matter: if it’s voiced whenever it changes, then the TTS system will repeatedly try to voice it as it ticks upward, resulting in awkward interrupts or unnecessary lag. I eventually decided that it was best to try to voice it when the sound clip ends, so it’ll only be voiced at the moment it stops moving. These techniques involved keeping track of flags and the status of the previously said variables, since they’re modified and retrieved each frame, but it seems to have resulted in more responsive voicing of the tape recorder.
In conclusion, MADE was a little more challenging than expected, since Return to Zork has several simultaneous channels and handles much of its logic within the game scripts. Nonetheless, it was a fun experience adding TTS to it. It seems finished now, though it remains to be seen how well it will handle different games or translations, and it has a few hardcoded translations that need verification.
ADL
ADL has seemed quite easy so far. As an engine for rather simple text-based games, almost all of its text is handled by Display::printString
and Display::printAsciiString
. It doesn’t even require tracking the previously said text, since these methods to display text are called only once, and there are no buttons that require responsive voicing. Thus, I think I’ve almost finished TTS for it, though challenges may arise later.
Conclusion
Over week 4 of GSoC, I’ve completed more TTS implementations, with MADE finished and ADL mostly finished. MADE offered the most challenges, due to its different ways of handling text, but its lack of text in the form of images was welcome. It’s been an interesting week, and I’m excited to explore more engines. Next week, I’ll be looking to finish TTS for ADL and begin work on TTS for Parallaction.