Category: Uncategorized

Week 10: AGI

Introduction

This week, I opened a PR for adding text-to-speech to AGI. It was the last engine marked on my proposal, which means that I have now made PRs for every engine that was planned for my project. This is a major milestone, and gives me time to work on Gob as a stretch goal.

AGI

I spent much of the week working on AGI, which was fortunately a rather simple engine. While it supports many different games and fangames, most of these games seem to function very similarly: there may be introductions or other screens that display text using TextMgr::display, with most other text being displayed through popup windows, typed commands, or menus. Thus, AGI was fairly straightforward, requiring only a handful of TTS calls in key methods, as well as some consideration for when to voice the status menu and clock, which are always visible – I settled for voicing one or the other primarily when the game is loaded, the status menu changes, and the clock is enabled, to avoid voicing them too frequently – and when to stop TTS.

The major concerns with AGI were how it handles timing and TextMgr::display across games. Some games appear to use timer variables to signal when it’s time to change the text on screen, while others use when a sound ends. Therefore, from what I could tell, there was no consistent way to predict when text will change, which made it difficult to delay text changes until TTS is finished. I eventually settled for queuing text and delaying room and window changes until TTS finishes, which seems to be enough to allow TTS to voice everything displayed on screen in a reasonable manner. In addition, TextMgr::display differs between games: some call it only once, while others call it every frame, which necessitates keeping track of the previously said text to avoid speech loops. However, much of the text passed through this method is displayed in sentences or chunks, rather than all at once in a single call. This makes simply tracking the previously said text fail and results in awkward, choppy voicing when a sentence is broken across lines. My solution to this issue was to combine the text passed to TextMgr::display in a _combinedText variable, and then voice this when the game script returns or halts. Using this method, paragraphs and blocks of text are spoken cleanly as one sentence, and the previously said text can simply be set to this combined text to prevent speech loops. I also decided to add in newlines between these pieces of text if they aren’t displayed on subsequent rows, since in most cases, pieces of text that aren’t in subsequent rows shouldn’t be voiced as one quick sentence.

In conclusion, AGI was a fairly simple engine, since its games didn’t have much complexity. Most of the work went into handling timing and subsequently displayed text, as these factors can differ significantly between its many games. Otherwise, AGI fangames seem to have little variation in how they display text, which simplified work on it, though there’s a chance that games that aren’t fangames may function differently, as I mainly worked with fangames.

Gob

After opening a PR for AGI, I started work on Gob. So far, Gob doesn’t seem too complex: its text seems limited and appears to be displayed through only a few select methods. However, certain games, like Adibou 2, separate text from many of their buttons, which requires a technique to sync them. So far, I’ve decided on a method of checking whether the displayed text intersects with a hotspot, then expanding the collision rectangle of the text accordingly, which seems to work fairly well. Nonetheless, I would like to see if I can make this solution more robust for better compatibility, as it seems that Gob supports a variety of games with many differences between them.

Conclusion

This week, I opened a PR for my last planned engine, AGI, and started work on Gob as a stretch goal. I plan to have a PR up for Gob by the end of this week, which may possibly give me enough time to start another stretch goal.

Uncategorized

Week 9: SCUMM

Introduction

This week, I opened a PR for adding text-to-speech to SCUMM, which was an interesting engine to work on. I’m quite happy with the result, though with the wide variety of games supported by SCUMM, it may need more work in the future.

SCUMM

Most of my week was spent on adding TTS to SCUMM, an engine that was neither especially difficult nor especially easy to work on. On one hand, SCUMM games are very similar to most of the games I’ve worked on before. Most of them have few menus and rather simple means of displaying text, unlike engines like Efh or MM. Thus, creating a user-friendly TTS system was fairly easy for SCUMM. In addition, finding where text is displayed wasn’t difficult: most of it goes through printString and drawString, with actor speech being displayed with displayDialog. Furthermore, most of the GUI controls and buttons come with labels built into them, which makes voicing buttons when they’re hovered over a very simple process. As such, much of the work for SCUMM involved simply adding TTS voicing calls to these functions, while accounting for situations such as subtitles for voiced dialog and the need to delay the disappearance of text until TTS is finished.

On the other hand, however, SCUMM is larger than some of the engines I’ve worked with previously, and it supports many different games. This means that compatibility issues are a recurring problem, as what works for TTS in one version of the SCUMM engine may not work in others. A good example of this is verbMouseOver. When I first came across this method, I thought that it would be a great place to voice verbs when they’re hovered over. However, SCUMM version 5 – or at least, Indiana Jones and the Fate of Atlantis – doesn’t seem to reliably use this function for detection of hovering over verbs. In addition, while some games only call drawVerb once as a verb is hovered over, games like Fate of Atlantis call it every frame. Thus, to try to voice verbs as robustly as possible, I decided to add code to drawVerb, which most SCUMM versions seem to go through for verb drawing, to check whether the current verb is highlighted before voicing it, a strategy that seemed to work for many games. There were other compatibility issues that I had to resolve as well, such as SCUMM versions 0, 1, and 2 using custom text encodings that needed to be replicated; drawVerb sometimes being used to print strings that aren’t verbs, requiring them to be voiced even if they aren’t highlighted; and Passport to Adventure having a special help menu with buttons that aren’t considered GUI controls, which required a means of storing the text for each button and detecting when they’re hovered over.

Another concern was that SCUMM versions 7 and 8 use their own methods for displaying text, though they were fortunately similar enough to those used by earlier versions that it wasn’t too difficult to voice them. Humongous Entertainment games also seem to have different means of handling text, but because they don’t have much text in the first place, I didn’t have to worry as much about them. Thus, most of the compatibility issues were in earlier SCUMM versions, with later ones having fewer problems.

Ultimately, the most difficult component of SCUMM was the wide variety of games supported. Each version has its own ways of handling text that have to be considered, requiring careful thought about the best places to voice or stop text. I’m fairly happy with my implementation of TTS for this engine, but because of its many games, there may be some oddities that will need to be resolved.

Conclusion

I opened a PR for SCUMM this week, which was an interesting engine to explore, due to its greater size and variety of versions. I also revisited my MADE PR, an engine that had its own compatibility issues, with text indices varying across game versions, that should be solved now. Next week, I’ll be working on AGI, the last engine listed on my project.

Uncategorized

Week 8: MM

Post author By ellen
Post date July 28, 2025
No Comments on Week 8: MM

Introduction

This week, I opened a PR for adding text-to-speech to MM, which proved to be one of the more challenging engines of my project. The process of adding TTS to it took up most of the week, but I’m quite satisfied with the result. Nonetheless, I do have concerns about its compatibility across versions and about the possibility of missed menus, so there may be more work for it in the future.

MM

I began work on MM last week, when I finished most of the difficult work for it, mainly in the form of coming up with a strategy to clean and parse text and voice buttons. However, after that was finished, there was a great amount of time-consuming work to be done. Might and Magic games – or at least, Xeen games – have a relatively high number of menus and user input. In general, each menu has its own dialogs file, which handles text, graphics, and menu input. On the positive side, this meant that it was quite easy to find text for menus. On the negative side, there is practically no pattern among this text, and it’s very often in an order that doesn’t match the order that it’s displayed on screen. Button text may be listed first, or it may be listed somewhere closer to the end; labels and their values may be listed next to each other, or they may be listed in completely different blocks – for instance, “Might, Intellect, Hit Points, 9, 5, 15” instead of “Might, 9, Intellect, 5, Hit Points, 15”, which would be much easier to voice; and button text may be listed in the order that the buttons themselves are displayed, or in a completely different order. This lack of pattern applies even across similar situations: for example, locations order button text at the end, except for banks, which order them at the beginning. Such a problem occurs because text is displayed using a wide variety of formatting characters for factors such as position and justification, but it seems that they can’t easily be used for voicing, since these traits alone are usually not predictive of the best voicing order.

This lack of standardization prompted several concerns, because it meant that voicing the text directly would result in awkward ordering that may not make sense. It was possible to simply let FontSurface::writeString – which I discovered last week – handle the voicing, as it’s seemingly used for all displayed text. Such a strategy would be easier to implement, but wouldn’t be nearly as responsive or logical, since no buttons would be voiced with input and text would often be voiced out of order. However, some text is in the correct order, and so it makes sense to use this method without changing anything in a few cases. Thus, I decided to only override voicing in FontSurface::writeString individually for each dialog where it’s necessary, allowing for much smoother and cleaner voicing and providing a means of always voicing any text that I may have missed.

Voicing this text with different code for each dialog was rather time-consuming. It required looking through each menu, seeing how it orders text, and splitting it up accordingly, with each dialog requiring a different process. I decided to split the cleaned text along newlines using a method called getNextTextSection, as it was the most reliable means of getting pieces of text, and made voicing each dialog only a matter of switching around getNextTextSection calls. Unfortunately, this still required a fair amount of work, especially because of the occasional occurrence of oddities, such as the party screen having twice the text after leaving the character creation screen, which causes the first repetition of text – an outdated version – to be voiced. Fortunately, this process of tailoring each dialog wasn’t particularly difficult, but it did take a while.

Thus, after deciding on a procedure for voicing buttons – which involves a separate array of strings, with each applicable button having an index that corresponds to a string in this array – and for cleaning and splitting text, most of the work for MM was simply time-consuming. Nonetheless, as I added support to more menus, I did have to consider whether my methods were optimal, and I went over several possibilities. I considered storing text directly in each button without using any index variable, but this would be unreliable: the strings for each button are rarely in the same order as the buttons themselves, which would require hardcoding indices (for example, _buttons[5] = getNextTextSection(...)). This strategy would break completely if, for some reason, the order that buttons are added to the _buttons array is changed. I also considered reordering the buttons themselves, but after this caused the wrong images to be rendered on the character info sheet, I decided it was too dangerous. In contrast, I highly doubt that the order of text taken directly from the game files will ever change, which makes the original method of keeping the buttons and their text – populated by hardcoded splitting – separate the safest option I could think of. Similarly, I considered whether my current method of passing a string to FontSurface::writeString to be cleaned was optimal, as it restricted TTS input to whenever this method is called. Perhaps catching the text earlier in the code could remove the need to split the string, as its fields would already be separate. Indeed, input to this method in nearly all cases can be found earlier, with many of its values split from it. Unfortunately, most of the text directly from the resources is already in a large block, which necessitates sorting anyway, but this time with the need to clean it as well (though I could perhaps recreate the cleaning process that exists in writeString, I wasn’t certain if I wanted to run the risk of missing an important step, and it seemed unnecessary). I also considered using the string’s formatting characters, but this seemed unreliable, as traits such as exact text position aren’t consistently indicative of the best voicing order, and it seemed easier to clean first and split later. Thus, I settled on many of my initial strategies, though I may change them in the future if a better idea arises.

Ultimately, working on MM was more challenging than previous engines, due to its immense amount of text, menus, and input. Its text is also full of formatting characters that shouldn’t be voiced, requiring a lot of cleaning, and its lack of standardization demanded a different process for each menu. I had to think of several different approaches, and pick which one seemed to be the most robust and effective. It was a very interesting and entertaining experience, and I’m quite happy with the TTS functionality for this engine. However, I’m not certain how well it will perform with Might and Magic 1 or other languages, or if my methods are optimal, and thus there may be more work to be done in the future.

Conclusion

This week, I finished TTS for MM, a larger engine that required more work. It was a rewarding experience, and I now have a PR opened for it. Next week, I’ll be working on SCUMM, and perhaps AGI if all goes well. These are the last two engines planned for my project. I hope to reach a stretch goal, but that depends on whether SCUMM and AGI have any surprises like MM did.

Uncategorized

Week 7: Efh

Introduction

This week, I mainly worked on adding text-to-speech to Efh, which posed a few unique challenges. I’ve opened a PR for it, and have begun work on adding TTS to MM.

Efh

Adding TTS to Efh was a little different from my experiences with other games. In most of the engines I’ve worked on, text appears on screen one piece at a time. Different sections of text rarely appear consecutively on screen, and if they do, they’re often part of the same string or part of a clickable menu. Escape from Hell, however, handles text differently, primarily because it has many menus full of numerous different pieces of non-interactable text, meaning it should be voiced as soon as it appears rather than as the user hovers over or clicks it. This text is usually split into many different strings, and may be displayed in the code in an order that doesn’t make sense for TTS – for example, the text that appears at the bottom of the menu may actually be displayed first in the code, so voicing it in the order that it displays, which is the easier route, would be awkward. In addition, very little text is displayed or fetched in methods that are only called once. Such an issue is common in many engines, but is further concerning in Efh because of the great variety of text: tracking the previously spoken text in a string, which is what I usually do in situations such as this one, won’t work easily, because each menu has many different strings of text. Fortunately, my solution to this problem was rather simple for the most part, as it involved keeping track of a “say menu” flag. The flag is turned on after user input, then turned off the instant the text is spoken. This worked for most menus, but didn’t work perfectly for the lower status menu that stays visible throughout most actions, as the display of the menu could consume the flag for other menus or be voiced in situations where it wasn’t appropriate. Therefore, I introduced a separate flag specifically for this menu as a solution.

After solving this problem, most of Efh was straightforward. Because a significant amount of the game’s text is hardcoded, it was very easy to find the majority of it. From that point on, most of the work on TTS for Efh involved delaying text on screen during combat until TTS finishes; speaking the user’s choices in menus and combat; and checking for when it’s appropriate to stop or interrupt speech, as the same methods may be used for different situations, some of which shouldn’t stop speech.

Ultimately, Efh had a lot of little details that were important to consider for the player’s ease of use, such as the best time to voice the player’s inventory or the best moments to interrupt text. Its abundance of text in the form of menus necessitated a slightly different approach from other engines, which was fun to explore. I was also able to identify and tentatively solve a few bugs in this engine while I tested my TTS implementation.

MM

After Efh, I began work on MM. Like Efh, MM games have an abundance of text in their menus, which is oftentimes displayed out of order. Fortunately, most of its text seems to be displayed in methods that are called only once, meaning that I rarely have to consider repeat voicing. Unfortunately, at least in Xeen, it has a wide variety of menus that each have their own specific code and considerations, including how buttons are displayed, how text is ordered, and how many times the display methods are called. In many cases, the text for an entire menu is in one string, which would usually be acceptable, but most of the menus display attributes and their values in separate blocks – for example, one string may be “Might, Intelligence, Gold, 10, 9, 800”, instead of “Might 10, Intelligence 9, Gold 800” – that makes voicing it directly awkward for the user. In addition, its text is full of different characters used to define traits, including the text’s coloration and alignment, which means it has to be thoroughly cleaned before TTS can voice it. I’ve found that FontSurface::writeString is responsible for cleaning and displaying most text, which makes it a great place to start, but it doesn’t always work to voice the text from there, as text may be displayed out of order. Thus, for situations such as these, I’ve decided on passing a string to it to be cleaned, then storing that string to be voiced at a later time. In each menu, this string can be split up and stitched back together in a manner that makes more sense for TTS, primarily by combining attributes and their respective values.

Buttons are another matter for this engine, as most buttons have text that is inherently separate from them in the code. In most menus, the text for buttons is combined with the text for the rest of the menu. To allow for more interactive voicing of buttons, I’ve currently settled on taking the cleaned text for the buttons, splitting it along newline characters, and storing this text in an array. Each button then has an integer index that corresponds to a string in the array. Putting the strings directly in the buttons was a possibility that I initially considered, but because the display order of the buttons often doesn’t match the order of the text, I found that it was easier to do it this way and not have to worry about reordering the divided text. However, TTS for MM is still in a relatively early stage, and I may change these methods if I find better solutions.

So far, MM’s variety of menus and buttons require a fair amount of TTS work. I’m looking forward to continuing work on it next week.

Conclusion

During this week of GSoC, I finished TTS for Efh, and began work on TTS for MM. It was an entertaining week, as I had a chance to work on a different type of game, which provided unique challenges compared to the games I’ve worked on up to this point. Next week, I’ll be continuing work on MM, and hopefully beginning work on SCUMM.

Uncategorized

Week 6: Prince

Introduction

This week, I worked on adding text-to-speech to Prince, which was a fun engine to work on. A PR has been opened for it, though more work may be needed in the future. In addition, I began work on adding TTS to Efh, which I plan to finish next week. My ADL PR was also merged this week, and I updated some of my earlier TTS PRs.

Prince

Most of my week was spent on adding TTS to Prince. Fortunately, the Prince and the Coward has no text in the form of images that I could find, which meant no hardcoded text was needed for this engine. However, Prince had some complexities in how it displays text. Its text is displayed rather simply in methods such as checkMob and printAt – which are either only called once or have means of tracking text changes by checking changes in indices, meaning there is no need to track the previously spoken text for this engine – but there are several exceptions to consider. For example, how the text in printAt should be voiced depends on several factors, including slot and location: slot 9 is generally subtitles, while slot 10 is often either subtitles or, if the location is the map, map text. Differentiating between these types of text is important because of the presence of the dub, which necessitates splitting TTS into several categories of subtitles, objects, and missing voiceovers. Since the Polish, German, and Russian translations all have dubs in their languages, subtitles should almost never be voiced for them, while the English and Spanish translations, which lack dubs in their languages, should only have subtitles voiced if the dub is muted. Thus, a fair amount of consideration had to be given to splitting up the text.

Prince also had a few other key exceptions. For one, when I worked on voicing the text of objects when they’re hovered over, I initially thought to use the _selectedMob variable, which keeps track of the mob that the player is hovering over: if, in checkMob, the selected mob doesn’t match the current mob number, then the user must be hovering over a new mob, meaning that the text should be voiced. However, I found that left clicking resets _selectedMob, which results in the text being awkwardly voiced again even though it hasn’t changed. This was easily fixed by introducing a new variable that tracks the selected mob, but is not reset upon left clicking. In addition, I worked on speaking missing voiceovers; solving the issue with the gambling merchants in the town, which constantly talk even as the player interacts with the environment and thus interrupt other TTS, requiring an exception for them that only voices their text if the player isn’t in dialog; and creating several custom encoding tables.

Another significant problem was changing voices. There appears to be no easy indicator that differentiates speaking characters in Prince: text colors are shared across several characters, mob numbers are not unique to certain characters and are instead specific to each location, and dialog seems to be controlled almost entirely by game scripts without any key character indicators. Therefore, my solution was to use a combination of several factors to determine the voice. The text color is enough to differentiate some characters, as the color is sometimes unique. For characters that share text colors, I opted to also check for the location number, since most characters don’t move locations, and those that do can be a catch-all for cases when the location number doesn’t match that of other characters. However, I found that this didn’t work in a few specific scenarios, such as the tavern with Arivald and the bard, who both have the same text color and are in the same location. For such exceptions, I decided to check for the mob number as well, as it differs between them. The result is different voices for each character, though I do wonder if there may be some other cleaner indicator I could use.

Ultimately, Prince was an entertaining engine to explore. It was neither particularly difficult nor particularly easy, as it had its own unique set of challenges, but none that were daunting.

Efh

After opening a PR for Prince, I started work on Efh. So far, Efh seems fairly simple, as much of its text is directly hardcoded, making displayed text very easy to find. However, the fact that its menus display many pieces of text at once every frame is different from most of the engines I’ve worked with, though I’ve currently solved the issue with a simple flag that toggles on after user input and is then toggled off after voicing occurs. Aside from that, there doesn’t seem to be much complexity with Efh, though I still have a fair amount of text left to voice, since I need to account for user input and ease of use.

Conclusion

During this week of GSoC, I opened a PR for adding TTS to Prince and started work on Efh, as well as updated some of my earlier PRs. It was an interesting week, since I enjoyed Prince. Next week, I’ll be continuing work on Efh, and possibly beginning MM if all goes well.

Uncategorized

Week 5: ADL and Parallaction

Post author By ellen
Post date July 7, 2025
No Comments on Week 5: ADL and Parallaction

Introduction

During this week of GSoC, I opened PRs for adding text-to-speech to ADL and Parallaction. I thought that these engines would take me longer than they did, but neither of them were particularly challenging, which was a pleasant surprise.

ADL

I spent the beginning of this week finishing TTS for ADL. Fortunately, it needed very little additional work from last week: I only needed to add TTS to a few extra key presses and clean up the text, primarily by removing dashes that could interfere with voicing. Since ADL’s games are so simple, there wasn’t much to do with this engine, and I was able to make a PR early in the week.

Parallaction

After ADL, I worked on Parallaction, which has a level of complexity that is comparable to that of other engines I’ve worked with. To begin, Parallaction’s text is displayed through methods that are only called once, instead of every frame like most of the engines I’ve worked on. As a result, I didn’t have to track the previously said text for this engine, and it was rather simple to voice the text – although in certain cases, such as labels, I had to store the text to be voiced later, as all possible labels are initialized at once, and voicing them upon initialization would result in a great amount of invisible text being voiced. I was glad to find that voicing the text in these methods covered almost all instances of text, including dialogue options when hovered over.

Nonetheless, Parallaction did have a few exceptions to consider, mainly in the form of its credits and introduction. Most of the text in the introductory cutscene and the credits is in the form of an image, which required finding a good place to detect when these images are rendered. For example, the second and third opening credit lines appear after an animation plays, unlike the first line, which appears the instant the location switches. Therefore, I opted to voice this text in the on instruction opcode, as it is executed when these animations finish. A similar problem emerges with the end credits, which are in the form of images that slowly scroll upward. Trying to voice all of them the instant the credits begin results in the TTS speaking too quickly and being poorly synced with the actual credits. To fix this issue, I found that the instruction opcode inc is executed roughly when another credit moves onto screen. Thus, after voicing the first few credits at once – since inc isn’t executed for these – I voiced the credits one at a time in this opcode, resulting in much cleaner voicing. Finally, I resolved another syncing problem where the test results, displayed after picking a character, disappear too quickly for the TTS to keep up by delaying the loop that controls the location change while TTS is speaking. A similar strategy was used for some of the opening credits. The result is TTS that is better synced to the text for certain images and situations.

Another interesting task for Parallaction was switching voices for each character, which was doable because each character is differentiated by their .talk file. In most cases, a simple array with each file name was enough for switching voices during dialogue. However, some dialogue interactions, such as the one between all three playable characters at the end of the game, have more than one character per .talk file, differentiated by the mood value. This was resolved by updating the array to have extra entries, and adding the mood as an offset to the index for these special cases. Beyond that, the TTS voice needed to be switched when exiting a slide, showing a location comment (which I decided to use an extra “narrator” voice for), and switching characters, allowing for more dynamic character voicing.

Ultimately, Parallaction was an entertaining engine to add TTS to. It was mostly simple, with its exceptions – such as text in the form of an image, unique means of selecting a language, and password inputs – being not too hard to handle. Nonetheless, it’s currently only been tested with Nippon Safes, Inc. From the code, the Big Red Adventure seems to handle text similarly, allowing me to add tentative TTS to it, but I’m uncertain if it will work as well. Some translations also need verification.

Conclusion

I think that this week of GSoC was fairly successful, with PRs opened for adding TTS to ADL and Parallaction. ADL was easy, while Parallaction had very few surprises, making them both simpler than I expected. Next week, I’ll be working on adding TTS to Prince, which I’m looking forward to exploring.

Uncategorized

Week 4: MADE

Introduction

During this week of GSoC, I mostly focused on adding TTS to MADE, with some work on ADL as well. A PR has been made for MADE, though there may be more work for it in the future.

MADE

Most of my week was dedicated to adding TTS to MADE. MADE – or at least, Return to Zork – offered many new challenges. For one, rather than displaying text one piece at a time, like I’ve seen in most of the engines I’ve worked with, Return to Zork displays several pieces every frame across multiple channels. This meant that I couldn’t simply track the previously said text in a single variable to avoid speech loops, as it would be repeatedly overridden. To solve this problem, I tried several different approaches. First, I tried finding the function that actually sets most of the text, which I found in the form of sfReadMenu. This function does set the text one piece at a time, which would be good for voicing. However, the issue with it is that sfReadMenu is called several times when a new scene loads, even when the text that it’s setting isn’t visible, resulting in extraneous text being voiced. I tried to check for any flags that could identify this text as invisible, but I didn’t find anything good to use, so I scrapped this idea. I then thought of tracking several different previously said texts and their channels in an array, but this seemed unnecessarily cumbersome, especially because the channels are often changing. Eventually, I settled on adding a variable for the previously said text to the individual channels themselves, and using this variable to check if text should be voiced. This worked quite well in most cases, and after accounting for exceptions where the spoken text should be queued, it resulted in good TTS for much of the game.

MADE, however, had a few additional problems. For one, it seems to handle hovering over objects completely within the game scripts, which means that there’s no easy place to detect when the cursor hovers over buttons in Return to Zork’s save/load screen. After trying to search for possible conditions I could use, I eventually decided to recreate the click boxes for the save, load, and cancel buttons, as well as the text entry boxes for new save slots. Fortunately, none of the text for these buttons is in the form of an image, which meant I could find the IDs for each individual object or menu that displays them and use these IDs to get their text, a strategy that should work across languages.

Another issue arises with the tape recorder in Return to Zork. The text for the name and numbers on each tape recorder entry are displayed within channels, but the text that corresponds to these pieces – such as “name” or “trk” – are in sfDrawText, which results in awkward, out-of-order voicing (for example, when the track is 001 and the name is Wizard Trembyle, TTS would say “trk, name, 001, Wizard Trembyle”). Fortunately, this could easily be avoided by not voicing these pieces in sfDrawText, and instead fetching this text when voicing the channel text, resulting in cleaner voicing (such as “trk: 001, name: Wizard Trembyle”). Unfortunately, the tape recorder is interactive, which further complicates matters. Voicing each piece separately can result in issues if the user switches between entries where a number is the same, such as one entry having max track 001 and the other having max track 001, since the previously said text won’t change and thus it won’t be voiced at all. Getting around this involved voicing all pieces – name, track, and max track – whenever the name changes, as it should be unique for each entry, alongside voicing the track individually in specific situations to account for the user switching tracks. Voicing the time was another matter: if it’s voiced whenever it changes, then the TTS system will repeatedly try to voice it as it ticks upward, resulting in awkward interrupts or unnecessary lag. I eventually decided that it was best to try to voice it when the sound clip ends, so it’ll only be voiced at the moment it stops moving. These techniques involved keeping track of flags and the status of the previously said variables, since they’re modified and retrieved each frame, but it seems to have resulted in more responsive voicing of the tape recorder.

In conclusion, MADE was a little more challenging than expected, since Return to Zork has several simultaneous channels and handles much of its logic within the game scripts. Nonetheless, it was a fun experience adding TTS to it. It seems finished now, though it remains to be seen how well it will handle different games or translations, and it has a few hardcoded translations that need verification.

ADL

ADL has seemed quite easy so far. As an engine for rather simple text-based games, almost all of its text is handled by Display::printString and Display::printAsciiString. It doesn’t even require tracking the previously said text, since these methods to display text are called only once, and there are no buttons that require responsive voicing. Thus, I think I’ve almost finished TTS for it, though challenges may arise later.

Conclusion

Over week 4 of GSoC, I’ve completed more TTS implementations, with MADE finished and ADL mostly finished. MADE offered the most challenges, due to its different ways of handling text, but its lack of text in the form of images was welcome. It’s been an interesting week, and I’m excited to explore more engines. Next week, I’ll be looking to finish TTS for ADL and begin work on TTS for Parallaction.

Uncategorized

Week 3: Draci

Introduction

Another week of GSoC has passed, with more text-to-speech work done, as a PR has been opened adding TTS to Draci. In addition, TTS for WAGE and CruisE has been merged, and TTS for Cine has been more rigorously tested.

Draci

Much of my time this week was invested in adding TTS to Draci. In terms of text complexity, Draci was roughly below average: most of its text was displayed in a few places, which were easy to track down. There were, however, several considerations regarding the engine. For one, Draci’s Czech and Polish translations have full (or almost full) voiceovers, while the German and English translations only have subtitles. To account for these differences, I split TTS into subtitles and objects, my first time doing so, and had to refrain from voicing subtitles for versions with voiceovers. Beyond that, I had to implement making the subtitles last as long as the TTS is speaking, as the subtitles normally move so fast that the speech can’t keep up; TTS for missing voice clips for the Czech and Polish versions, which received its own option at the suggestion of my mentor; and recognition of cases where the text is only punctuation, which isn’t voiced by the TTS system, and is thus immediately skipped (the text remains on screen for as long as TTS is speaking, meaning if the text isn’t voiced, the system immediately believes that it’s time to move to the next subtitle). All of these were interesting exercises in timing and accommodating different versions.

In my opinion, one of the most interesting components of Draci was the encoding. Draci uses encodings not present in ScummVM’s CodePage enum, which meant I had to hunt them down and manually write a translation table. Fortunately, the Dragon History website states that the Czech, English, and German translations all use Kamenický encoding, with the German translation having an exception for ß. This simplified the process greatly, as it just required creating a translation table for Kamenický encoding and converting the bytes, a similar process to the fix for TeenAgent’s Russian translation. Unfortunately, the Polish encoding is only described by the same website as “some ridiculous proprietary encoding”. I initially tried observing the bytes of the in-game strings to figure out the encoding for each character, but I worried that I’d miss a few characters. Instead, I found the character sets in the original Draci source code, which showed the bytes of UTF-8 encoding for each character as characters in the Windows-1250 code page. This meant that all I had to do was translate these characters according to Windows-1250 encoding to bytes, then combine these bytes to obtain the UTF-8 encodings for the Polish characters. That was most of the work for encodings, aside from altering the encoding table for German and English to replace certain Czech characters with equivalents in those languages, as TTS for the credits struggled to pronounce those characters.

Ultimately, Draci’s encodings were interesting to explore, and I’m very glad that the Dragon History website had them listed.

MADE

At the end of the week, I started work on MADE, but more work must be done for it. It seems that Return to Zork handles text a little differently from what I’m used to, as it displays text in several simultaneous channels every frame. My usual solution of tracking the previously said text with a variable won’t work here, as it’ll be overridden by the other text. I’ve thought of a few possible solutions, including tracking several different previously said texts or finding where the text is set – which I may have already found, but I need to do more investigation – and I plan to explore them over this week.

Miscellaneous

Aside from Draci and MADE, I fixed a few problems with CruisE – thanks to my mentor for noticing that the encoding was wrong for some of the versions, among other necessary changes – and tested TTS for Cine for other versions. For Cine, I was worried that my method of checking image names for text in the form of an image would fail across different versions of Future Wars and Operation Stealth, but I was pleasantly surprised to not find any issues, at least in my testing. The only changes I ended up making were adding a few new translations, fixing some small caveats with the encoding for the German and French translations, adding line-by-line TTS for the Operation Stealth copy protection screen (which I initially had, but I removed due to fears that it wouldn’t work across versions), adding TTS for Future Wars’s grid copy protection screen, and adjusting for exceptions across versions (such as the US Amiga version of Future Wars using a different copy protection screen from all other Amiga versions). Cine is thus much more thoroughly tested across different game editions now.

Conclusion

This week involved more work on TTS, with testing for Cine, TTS implementation for Draci, and the beginnings of TTS implementation for MADE. Exploring Draci’s encodings was the most entertaining part, and I hope to bring over my more in-depth knowledge of encoding methods into future implementations. Next week, I’ll be finishing TTS for MADE and beginning TTS for ADL, alongside any changes I need to make to my open PRs for Cine and Draci.

Uncategorized

Week 2: CruisE

Introduction

Another week of GSoC has passed, and I’m fairly satisfied with my progress. This week, I mostly focused on adding TTS to CruisE, and I’ve opened a PR for it. However, I also spent some time working on Draci, and on updating older PRs.

CruisE

I began this week with adding TTS to CruisE. Fortunately, CruisE wasn’t too difficult to work with, as it has very few cases of text in the form of an image. In addition, most of Cruise for a Corpse’s text is displayed from renderText, which made it much easier to identify where text is displayed in the code. The difficulty with Cruise for a Corpse, however, came from making it user-friendly. For example, text can’t simply be voiced from renderText, as there are cases when the text appears on screen a significant amount of time after renderText is called, or cases where the text isn’t visible, such as when the user tries to open the inventory in the copy protection screen. In a similar manner, text can’t be voiced from createMenu, because there are times when it shouldn’t be voiced (mainly when the inventory is empty). It also can’t be voiced from createTextObject, which handles much of the game’s text, because freezeCell is sometimes called on objects created with this method, which means that the text appears on screen later. Voicing text for CruisE, therefore, required addressing these exceptions accordingly, such as by storing the text from createTextObject and voicing it when the cell is unfrozen; queuing the text of the first menu item hovered over after a menu is opened instead of interrupting the current speech, so the title of the menu is always voiced; and checking for button input in Op_GetMouseButton to stop TTS when skipping through dialog or a cutscene. I believe that these changes make the TTS for CruisE more responsive and more accurate to the gameplay.

Ultimately, actually finding where text was displayed in CruisE wasn’t that difficult. The more time-consuming part was finding and accounting for the many exceptions, skips, and places in the code that freeze, show, or hide text, in order to create a user-friendly experience. It was entertaining to look through the code and ponder the best implementation, as its code was more spread out than that of Cine or WAGE.

Miscellaneous

After CruisE, I worked on various components related to my project. For one, I revisited my WAGE PR, which required a fair number of changes, including resolving some difficulties across games that I missed (in some cases, different WAGE games handle the same circumstances surprisingly differently, such as Eidisi only calling renderSubmenu when hovering over a new item, while Ray’s Maze and Another Fine Mess call it every frame, requiring a check for the last submenu item that was hovered over in these games). I also worked on fixing TTS for TeenAgent’s Russian translation, which uses a custom encoding that has to be replicated for proper voicing – fortunately, it seemed that this encoding followed a simple pattern of adding a number to the original character to obtain a Cyrillic character in UTF-8, though since I don’t speak Russian, I’m uncertain if this works in all cases.

Beyond that, I started work on TTS for Draci. So far, Draci seems even simpler than CruisE, with fewer exceptions and oddities. Good progress has been made on Draci, but there is always a chance that something unexpected will emerge.

Conclusion

CruisE was a somewhat more complex engine to add TTS to than WAGE or Cine, and it was entertaining to hunt down its different exceptions and learn how it handles its inputs and menus. A PR has been opened for it, and while I imagine that there’ll be more work for it in the future, the hardest part for it is done. Next week, I’ll be continuing work on Draci, and I look forward to completing TTS support for it.

Uncategorized

Week 1: WAGE and Cine

Introduction

The first week of GSoC is over, and I’m fairly happy with how it went. My TeenAgent PR was merged, and I’ve opened two new PRs: one adding TTS to WAGE and one adding TTS to Cine. They’re still under review, with Cine requiring more testing and translation verifications, and I imagine that there will be more work to be done with them, but the hardest parts – getting familiar with the engines and adding TTS to most of their text – are over.

WAGE

I started this week with WAGE, which was a relatively simple engine. There was an abandoned PR adding TTS to this engine that I picked up, but while it provided a base for me to work with, it was buggy and unfinished, meaning there was still a significant amount of work to be done. As for the engine itself, WAGE’s games were almost entirely text-based, with few graphical components. For me, this had its benefits and disadvantages. On the positive side, I didn’t have to worry much about hovering over objects, and finding where text was output to the screen wasn’t too difficult: nearly all of the text was in the form of the action log, which was modified by only a small number of functions. On the negative side, the greater quantity of text meant that there was more to look at and be aware of.

For the most part, adding TTS to WAGE was simple. It was a matter of adding a toggle, identifying the small handful of methods that displayed text, and feeding it through the TTS manager. There was no need to clean up any of the text. I soon encountered a caveat, however, with how WAGE handles its command menu. Rather than being embedded into the engine, WAGE uses the MacWindowManager class to manage its menu, including submenus and dialogs. My first approach was to identify and process this text inside WAGE’s Gui::processEvent method by retrieving the menu item that the mouse is over from the MacWindowManager and voicing its text. This worked fine initially, until I tried to voice the buttons: once a MacDialog is open, it pauses the loop inside of WageEngine::run, which is what runs Gui::processEvent. Without Gui::processEvent running, I couldn’t check for the mouse hovering over a button from it. At this point, I realized that it would be difficult to keep everything exclusively in WAGE itself, so I added TTS code to MacWindowManager, which worked much better. I did end up restoring some of my original code to Gui::processEvent for menu items, since MacMenu didn’t seem to have a trigger for hovering over a menu item (only for clicking one, and I wanted to voice the menu item as soon as the user hovers over it). The end result was TTS working as expected for the command menu.

Ultimately, WAGE wasn’t particularly difficult, but the fact that it used MacWindowManager for much of its GUI was an initial challenge. With a little extra code, however, it functioned fine.

Cine

After WAGE, I worked on adding TTS to Cine. Cine’s games, Future Wars and Operation Stealth, have much less text than WAGE games, and much of it is displayed through a few methods. Voicing the majority of the engine’s text was as simple as feeding the text into a small handful of methods, mainly drawMessage, drawMenu, and drawCommand. From there, all that was necessary was making sure it all behaved in a user-friendly way, like voicing the “USE” and “INVENTORY” commands when using the F3 and F4 keys and vocalizing inventory items.

Unfortunately, Cine came with a rather large problem: it has a lot of text in the form of images. Just about any text in Future Wars and Operation Stealth that isn’t directly related to gameplay (i.e. commands and menus) is an image. This includes credits, some cutscene text in Operation Stealth, and everything in the copy protection screens. Two problems resulted from this issue. For one, much of Cine’s work is handled by global and object scripts built into the files, which meant that there was no easy location to find where these images are displayed. As a result, I had to go to the methods that render general images and catch the exact conditions (PRC name, object index and frame, background name, and so on) under which they display. For another, I needed to know the text of these images in all supported languages and versions, so I could hardcode it. This meant a mixture between looking through videos on YouTube and asking the community (thanks to my mentor, criezy, for providing the copy protection images in French, and eientei for providing the German, Spanish, and Italian copy protection fail texts for Operation Stealth). It also meant keeping track of exceptions: Future Wars has two different copy protection screens, one for DOS and one for Amiga and Atari ST; Operation Stealth’s opening credits has a credit for the IBM version only for the DOS version, but its end credits has this credit in all versions; and Future Wars has a different opening title screen for the French version (“Les Voyageurs du Temps: La Menace” as opposed to “Future Wars: Time Travellers” for Amiga and Atari or “Future Wars: Adventures in Time” for DOS). Accounting for these exceptions meant more checks and more text to include, but it has been done.

In the end, getting the copy protection screen and credits to be voiced was the bulk of the work for Cine. Finding the different translations, deciphering the best place to voice them, and adding a new method to recognize hovering over buttons in the copy protection screens was time-consuming, but entertaining.

Conclusion

WAGE and Cine weren’t too difficult to add TTS to, and it was fun to implement it. The most time-consuming was working with Cine’s copy protection screens. There were some difficulties, but through enough investigation, they’ve been resolved. However, as of this blog post, a fair number of translations in Cine need to be verified, and other versions of the game have to be tested.

Next week, I’ll be focusing on adding TTS to the cruisE engine. I’m looking forward to exploring it.

Recent Posts

Recent Comments

Archives

Categories