Introduction
This week, I mainly worked on adding text-to-speech to Efh, which posed a few unique challenges. I’ve opened a PR for it, and have begun work on adding TTS to MM.
Efh
Adding TTS to Efh was a little different from my experiences with other games. In most of the engines I’ve worked on, text appears on screen one piece at a time. Different sections of text rarely appear consecutively on screen, and if they do, they’re often part of the same string or part of a clickable menu. Escape from Hell, however, handles text differently, primarily because it has many menus full of numerous different pieces of non-interactable text, meaning it should be voiced as soon as it appears rather than as the user hovers over or clicks it. This text is usually split into many different strings, and may be displayed in the code in an order that doesn’t make sense for TTS – for example, the text that appears at the bottom of the menu may actually be displayed first in the code, so voicing it in the order that it displays, which is the easier route, would be awkward. In addition, very little text is displayed or fetched in methods that are only called once. Such an issue is common in many engines, but is further concerning in Efh because of the great variety of text: tracking the previously spoken text in a string, which is what I usually do in situations such as this one, won’t work easily, because each menu has many different strings of text. Fortunately, my solution to this problem was rather simple for the most part, as it involved keeping track of a “say menu” flag. The flag is turned on after user input, then turned off the instant the text is spoken. This worked for most menus, but didn’t work perfectly for the lower status menu that stays visible throughout most actions, as the display of the menu could consume the flag for other menus or be voiced in situations where it wasn’t appropriate. Therefore, I introduced a separate flag specifically for this menu as a solution.
After solving this problem, most of Efh was straightforward. Because a significant amount of the game’s text is hardcoded, it was very easy to find the majority of it. From that point on, most of the work on TTS for Efh involved delaying text on screen during combat until TTS finishes; speaking the user’s choices in menus and combat; and checking for when it’s appropriate to stop or interrupt speech, as the same methods may be used for different situations, some of which shouldn’t stop speech.
Ultimately, Efh had a lot of little details that were important to consider for the player’s ease of use, such as the best time to voice the player’s inventory or the best moments to interrupt text. Its abundance of text in the form of menus necessitated a slightly different approach from other engines, which was fun to explore. I was also able to identify and tentatively solve a few bugs in this engine while I tested my TTS implementation.
MM
After Efh, I began work on MM. Like Efh, MM games have an abundance of text in their menus, which is oftentimes displayed out of order. Fortunately, most of its text seems to be displayed in methods that are called only once, meaning that I rarely have to consider repeat voicing. Unfortunately, at least in Xeen, it has a wide variety of menus that each have their own specific code and considerations, including how buttons are displayed, how text is ordered, and how many times the display methods are called. In many cases, the text for an entire menu is in one string, which would usually be acceptable, but most of the menus display attributes and their values in separate blocks – for example, one string may be “Might, Intelligence, Gold, 10, 9, 800”, instead of “Might 10, Intelligence 9, Gold 800” – that makes voicing it directly awkward for the user. In addition, its text is full of different characters used to define traits, including the text’s coloration and alignment, which means it has to be thoroughly cleaned before TTS can voice it. I’ve found that FontSurface::writeString
is responsible for cleaning and displaying most text, which makes it a great place to start, but it doesn’t always work to voice the text from there, as text may be displayed out of order. Thus, for situations such as these, I’ve decided on passing a string to it to be cleaned, then storing that string to be voiced at a later time. In each menu, this string can be split up and stitched back together in a manner that makes more sense for TTS, primarily by combining attributes and their respective values.
Buttons are another matter for this engine, as most buttons have text that is inherently separate from them in the code. In most menus, the text for buttons is combined with the text for the rest of the menu. To allow for more interactive voicing of buttons, I’ve currently settled on taking the cleaned text for the buttons, splitting it along newline characters, and storing this text in an array. Each button then has an integer index that corresponds to a string in the array. Putting the strings directly in the buttons was a possibility that I initially considered, but because the display order of the buttons often doesn’t match the order of the text, I found that it was easier to do it this way and not have to worry about reordering the divided text. However, TTS for MM is still in a relatively early stage, and I may change these methods if I find better solutions.
So far, MM’s variety of menus and buttons require a fair amount of TTS work. I’m looking forward to continuing work on it next week.
Conclusion
During this week of GSoC, I finished TTS for Efh, and began work on TTS for MM. It was an entertaining week, as I had a chance to work on a different type of game, which provided unique challenges compared to the games I’ve worked on up to this point. Next week, I’ll be continuing work on MM, and hopefully beginning work on SCUMM.