Categories
Uncategorized

Week 8: MM

Introduction

This week, I opened a PR for adding text-to-speech to MM, which proved to be one of the more challenging engines of my project. The process of adding TTS to it took up most of the week, but I’m quite satisfied with the result. Nonetheless, I do have concerns about its compatibility across versions and about the possibility of missed menus, so there may be more work for it in the future.


MM

I began work on MM last week, when I finished most of the difficult work for it, mainly in the form of coming up with a strategy to clean and parse text and voice buttons. However, after that was finished, there was a great amount of time-consuming work to be done. Might and Magic games – or at least, Xeen games – have a relatively high number of menus and user input. In general, each menu has its own dialogs file, which handles text, graphics, and menu input. On the positive side, this meant that it was quite easy to find text for menus. On the negative side, there is practically no pattern among this text, and it’s very often in an order that doesn’t match the order that it’s displayed on screen. Button text may be listed first, or it may be listed somewhere closer to the end; labels and their values may be listed next to each other, or they may be listed in completely different blocks – for instance, “Might, Intellect, Hit Points, 9, 5, 15” instead of “Might, 9, Intellect, 5, Hit Points, 15”, which would be much easier to voice; and button text may be listed in the order that the buttons themselves are displayed, or in a completely different order. This lack of pattern applies even across similar situations: for example, locations order button text at the end, except for banks, which order them at the beginning. Such a problem occurs because text is displayed using a wide variety of formatting characters for factors such as position and justification, but it seems that they can’t easily be used for voicing, since these traits alone are usually not predictive of the best voicing order.

This lack of standardization prompted several concerns, because it meant that voicing the text directly would result in awkward ordering that may not make sense. It was possible to simply let FontSurface::writeString – which I discovered last week – handle the voicing, as it’s seemingly used for all displayed text. Such a strategy would be easier to implement, but wouldn’t be nearly as responsive or logical, since no buttons would be voiced with input and text would often be voiced out of order. However, some text is in the correct order, and so it makes sense to use this method without changing anything in a few cases. Thus, I decided to only override voicing in FontSurface::writeString individually for each dialog where it’s necessary, allowing for much smoother and cleaner voicing and providing a means of always voicing any text that I may have missed.

Voicing this text with different code for each dialog was rather time-consuming. It required looking through each menu, seeing how it orders text, and splitting it up accordingly, with each dialog requiring a different process. I decided to split the cleaned text along newlines using a method called getNextTextSection, as it was the most reliable means of getting pieces of text, and made voicing each dialog only a matter of switching around getNextTextSection calls. Unfortunately, this still required a fair amount of work, especially because of the occasional occurrence of oddities, such as the party screen having twice the text after leaving the character creation screen, which causes the first repetition of text – an outdated version – to be voiced. Fortunately, this process of tailoring each dialog wasn’t particularly difficult, but it did take a while.

Thus, after deciding on a procedure for voicing buttons – which involves a separate array of strings, with each applicable button having an index that corresponds to a string in this array – and for cleaning and splitting text, most of the work for MM was simply time-consuming. Nonetheless, as I added support to more menus, I did have to consider whether my methods were optimal, and I went over several possibilities. I considered storing text directly in each button without using any index variable, but this would be unreliable: the strings for each button are rarely in the same order as the buttons themselves, which would require hardcoding indices (for example, _buttons[5] = getNextTextSection(...)). This strategy would break completely if, for some reason, the order that buttons are added to the _buttons array is changed. I also considered reordering the buttons themselves, but after this caused the wrong images to be rendered on the character info sheet, I decided it was too dangerous. In contrast, I highly doubt that the order of text taken directly from the game files will ever change, which makes the original method of keeping the buttons and their text – populated by hardcoded splitting – separate the safest option I could think of. Similarly, I considered whether my current method of passing a string to FontSurface::writeString to be cleaned was optimal, as it restricted TTS input to whenever this method is called. Perhaps catching the text earlier in the code could remove the need to split the string, as its fields would already be separate. Indeed, input to this method in nearly all cases can be found earlier, with many of its values split from it. Unfortunately, most of the text directly from the resources is already in a large block, which necessitates sorting anyway, but this time with the need to clean it as well (though I could perhaps recreate the cleaning process that exists in writeString, I wasn’t certain if I wanted to run the risk of missing an important step, and it seemed unnecessary). I also considered using the string’s formatting characters, but this seemed unreliable, as traits such as exact text position aren’t consistently indicative of the best voicing order, and it seemed easier to clean first and split later. Thus, I settled on many of my initial strategies, though I may change them in the future if a better idea arises.

Ultimately, working on MM was more challenging than previous engines, due to its immense amount of text, menus, and input. Its text is also full of formatting characters that shouldn’t be voiced, requiring a lot of cleaning, and its lack of standardization demanded a different process for each menu. I had to think of several different approaches, and pick which one seemed to be the most robust and effective. It was a very interesting and entertaining experience, and I’m quite happy with the TTS functionality for this engine. However, I’m not certain how well it will perform with Might and Magic 1 or other languages, or if my methods are optimal, and thus there may be more work to be done in the future.


Conclusion

This week, I finished TTS for MM, a larger engine that required more work. It was a rewarding experience, and I now have a PR opened for it. Next week, I’ll be working on SCUMM, and perhaps AGI if all goes well. These are the last two engines planned for my project. I hope to reach a stretch goal, but that depends on whether SCUMM and AGI have any surprises like MM did.

Categories
Uncategorized

Week 7: Efh

Introduction

This week, I mainly worked on adding text-to-speech to Efh, which posed a few unique challenges. I’ve opened a PR for it, and have begun work on adding TTS to MM.


Efh

Adding TTS to Efh was a little different from my experiences with other games. In most of the engines I’ve worked on, text appears on screen one piece at a time. Different sections of text rarely appear consecutively on screen, and if they do, they’re often part of the same string or part of a clickable menu. Escape from Hell, however, handles text differently, primarily because it has many menus full of numerous different pieces of non-interactable text, meaning it should be voiced as soon as it appears rather than as the user hovers over or clicks it. This text is usually split into many different strings, and may be displayed in the code in an order that doesn’t make sense for TTS – for example, the text that appears at the bottom of the menu may actually be displayed first in the code, so voicing it in the order that it displays, which is the easier route, would be awkward. In addition, very little text is displayed or fetched in methods that are only called once. Such an issue is common in many engines, but is further concerning in Efh because of the great variety of text: tracking the previously spoken text in a string, which is what I usually do in situations such as this one, won’t work easily, because each menu has many different strings of text. Fortunately, my solution to this problem was rather simple for the most part, as it involved keeping track of a “say menu” flag. The flag is turned on after user input, then turned off the instant the text is spoken. This worked for most menus, but didn’t work perfectly for the lower status menu that stays visible throughout most actions, as the display of the menu could consume the flag for other menus or be voiced in situations where it wasn’t appropriate. Therefore, I introduced a separate flag specifically for this menu as a solution.

After solving this problem, most of Efh was straightforward. Because a significant amount of the game’s text is hardcoded, it was very easy to find the majority of it. From that point on, most of the work on TTS for Efh involved delaying text on screen during combat until TTS finishes; speaking the user’s choices in menus and combat; and checking for when it’s appropriate to stop or interrupt speech, as the same methods may be used for different situations, some of which shouldn’t stop speech.

Ultimately, Efh had a lot of little details that were important to consider for the player’s ease of use, such as the best time to voice the player’s inventory or the best moments to interrupt text. Its abundance of text in the form of menus necessitated a slightly different approach from other engines, which was fun to explore. I was also able to identify and tentatively solve a few bugs in this engine while I tested my TTS implementation.


MM

After Efh, I began work on MM. Like Efh, MM games have an abundance of text in their menus, which is oftentimes displayed out of order. Fortunately, most of its text seems to be displayed in methods that are called only once, meaning that I rarely have to consider repeat voicing. Unfortunately, at least in Xeen, it has a wide variety of menus that each have their own specific code and considerations, including how buttons are displayed, how text is ordered, and how many times the display methods are called. In many cases, the text for an entire menu is in one string, which would usually be acceptable, but most of the menus display attributes and their values in separate blocks – for example, one string may be “Might, Intelligence, Gold, 10, 9, 800”, instead of “Might 10, Intelligence 9, Gold 800” – that makes voicing it directly awkward for the user. In addition, its text is full of different characters used to define traits, including the text’s coloration and alignment, which means it has to be thoroughly cleaned before TTS can voice it. I’ve found that FontSurface::writeString is responsible for cleaning and displaying most text, which makes it a great place to start, but it doesn’t always work to voice the text from there, as text may be displayed out of order. Thus, for situations such as these, I’ve decided on passing a string to it to be cleaned, then storing that string to be voiced at a later time. In each menu, this string can be split up and stitched back together in a manner that makes more sense for TTS, primarily by combining attributes and their respective values.

Buttons are another matter for this engine, as most buttons have text that is inherently separate from them in the code. In most menus, the text for buttons is combined with the text for the rest of the menu. To allow for more interactive voicing of buttons, I’ve currently settled on taking the cleaned text for the buttons, splitting it along newline characters, and storing this text in an array. Each button then has an integer index that corresponds to a string in the array. Putting the strings directly in the buttons was a possibility that I initially considered, but because the display order of the buttons often doesn’t match the order of the text, I found that it was easier to do it this way and not have to worry about reordering the divided text. However, TTS for MM is still in a relatively early stage, and I may change these methods if I find better solutions.

So far, MM’s variety of menus and buttons require a fair amount of TTS work. I’m looking forward to continuing work on it next week.


Conclusion

During this week of GSoC, I finished TTS for Efh, and began work on TTS for MM. It was an entertaining week, as I had a chance to work on a different type of game, which provided unique challenges compared to the games I’ve worked on up to this point. Next week, I’ll be continuing work on MM, and hopefully beginning work on SCUMM.

Categories
Uncategorized

Week 6: Prince

Introduction

This week, I worked on adding text-to-speech to Prince, which was a fun engine to work on. A PR has been opened for it, though more work may be needed in the future. In addition, I began work on adding TTS to Efh, which I plan to finish next week. My ADL PR was also merged this week, and I updated some of my earlier TTS PRs.


Prince

Most of my week was spent on adding TTS to Prince. Fortunately, the Prince and the Coward has no text in the form of images that I could find, which meant no hardcoded text was needed for this engine. However, Prince had some complexities in how it displays text. Its text is displayed rather simply in methods such as checkMob and printAt – which are either only called once or have means of tracking text changes by checking changes in indices, meaning there is no need to track the previously spoken text for this engine – but there are several exceptions to consider. For example, how the text in printAt should be voiced depends on several factors, including slot and location: slot 9 is generally subtitles, while slot 10 is often either subtitles or, if the location is the map, map text. Differentiating between these types of text is important because of the presence of the dub, which necessitates splitting TTS into several categories of subtitles, objects, and missing voiceovers. Since the Polish, German, and Russian translations all have dubs in their languages, subtitles should almost never be voiced for them, while the English and Spanish translations, which lack dubs in their languages, should only have subtitles voiced if the dub is muted. Thus, a fair amount of consideration had to be given to splitting up the text.

Prince also had a few other key exceptions. For one, when I worked on voicing the text of objects when they’re hovered over, I initially thought to use the _selectedMob variable, which keeps track of the mob that the player is hovering over: if, in checkMob, the selected mob doesn’t match the current mob number, then the user must be hovering over a new mob, meaning that the text should be voiced. However, I found that left clicking resets _selectedMob, which results in the text being awkwardly voiced again even though it hasn’t changed. This was easily fixed by introducing a new variable that tracks the selected mob, but is not reset upon left clicking. In addition, I worked on speaking missing voiceovers; solving the issue with the gambling merchants in the town, which constantly talk even as the player interacts with the environment and thus interrupt other TTS, requiring an exception for them that only voices their text if the player isn’t in dialog; and creating several custom encoding tables.

Another significant problem was changing voices. There appears to be no easy indicator that differentiates speaking characters in Prince: text colors are shared across several characters, mob numbers are not unique to certain characters and are instead specific to each location, and dialog seems to be controlled almost entirely by game scripts without any key character indicators. Therefore, my solution was to use a combination of several factors to determine the voice. The text color is enough to differentiate some characters, as the color is sometimes unique. For characters that share text colors, I opted to also check for the location number, since most characters don’t move locations, and those that do can be a catch-all for cases when the location number doesn’t match that of other characters. However, I found that this didn’t work in a few specific scenarios, such as the tavern with Arivald and the bard, who both have the same text color and are in the same location. For such exceptions, I decided to check for the mob number as well, as it differs between them. The result is different voices for each character, though I do wonder if there may be some other cleaner indicator I could use.

Ultimately, Prince was an entertaining engine to explore. It was neither particularly difficult nor particularly easy, as it had its own unique set of challenges, but none that were daunting.


Efh

After opening a PR for Prince, I started work on Efh. So far, Efh seems fairly simple, as much of its text is directly hardcoded, making displayed text very easy to find. However, the fact that its menus display many pieces of text at once every frame is different from most of the engines I’ve worked with, though I’ve currently solved the issue with a simple flag that toggles on after user input and is then toggled off after voicing occurs. Aside from that, there doesn’t seem to be much complexity with Efh, though I still have a fair amount of text left to voice, since I need to account for user input and ease of use.


Conclusion

During this week of GSoC, I opened a PR for adding TTS to Prince and started work on Efh, as well as updated some of my earlier PRs. It was an interesting week, since I enjoyed Prince. Next week, I’ll be continuing work on Efh, and possibly beginning MM if all goes well.

Categories
Uncategorized

Week 5: ADL and Parallaction

Introduction

During this week of GSoC, I opened PRs for adding text-to-speech to ADL and Parallaction. I thought that these engines would take me longer than they did, but neither of them were particularly challenging, which was a pleasant surprise.


ADL

I spent the beginning of this week finishing TTS for ADL. Fortunately, it needed very little additional work from last week: I only needed to add TTS to a few extra key presses and clean up the text, primarily by removing dashes that could interfere with voicing. Since ADL’s games are so simple, there wasn’t much to do with this engine, and I was able to make a PR early in the week.


Parallaction

After ADL, I worked on Parallaction, which has a level of complexity that is comparable to that of other engines I’ve worked with. To begin, Parallaction’s text is displayed through methods that are only called once, instead of every frame like most of the engines I’ve worked on. As a result, I didn’t have to track the previously said text for this engine, and it was rather simple to voice the text – although in certain cases, such as labels, I had to store the text to be voiced later, as all possible labels are initialized at once, and voicing them upon initialization would result in a great amount of invisible text being voiced. I was glad to find that voicing the text in these methods covered almost all instances of text, including dialogue options when hovered over.

Nonetheless, Parallaction did have a few exceptions to consider, mainly in the form of its credits and introduction. Most of the text in the introductory cutscene and the credits is in the form of an image, which required finding a good place to detect when these images are rendered. For example, the second and third opening credit lines appear after an animation plays, unlike the first line, which appears the instant the location switches. Therefore, I opted to voice this text in the on instruction opcode, as it is executed when these animations finish. A similar problem emerges with the end credits, which are in the form of images that slowly scroll upward. Trying to voice all of them the instant the credits begin results in the TTS speaking too quickly and being poorly synced with the actual credits. To fix this issue, I found that the instruction opcode inc is executed roughly when another credit moves onto screen. Thus, after voicing the first few credits at once – since inc isn’t executed for these – I voiced the credits one at a time in this opcode, resulting in much cleaner voicing. Finally, I resolved another syncing problem where the test results, displayed after picking a character, disappear too quickly for the TTS to keep up by delaying the loop that controls the location change while TTS is speaking. A similar strategy was used for some of the opening credits. The result is TTS that is better synced to the text for certain images and situations.

Another interesting task for Parallaction was switching voices for each character, which was doable because each character is differentiated by their .talk file. In most cases, a simple array with each file name was enough for switching voices during dialogue. However, some dialogue interactions, such as the one between all three playable characters at the end of the game, have more than one character per .talk file, differentiated by the mood value. This was resolved by updating the array to have extra entries, and adding the mood as an offset to the index for these special cases. Beyond that, the TTS voice needed to be switched when exiting a slide, showing a location comment (which I decided to use an extra “narrator” voice for), and switching characters, allowing for more dynamic character voicing.

Ultimately, Parallaction was an entertaining engine to add TTS to. It was mostly simple, with its exceptions – such as text in the form of an image, unique means of selecting a language, and password inputs – being not too hard to handle. Nonetheless, it’s currently only been tested with Nippon Safes, Inc. From the code, the Big Red Adventure seems to handle text similarly, allowing me to add tentative TTS to it, but I’m uncertain if it will work as well. Some translations also need verification.


Conclusion

I think that this week of GSoC was fairly successful, with PRs opened for adding TTS to ADL and Parallaction. ADL was easy, while Parallaction had very few surprises, making them both simpler than I expected. Next week, I’ll be working on adding TTS to Prince, which I’m looking forward to exploring.