Week 8: MM – Ellen's GSoC Blog

Introduction

This week, I opened a PR for adding text-to-speech to MM, which proved to be one of the more challenging engines of my project. The process of adding TTS to it took up most of the week, but I’m quite satisfied with the result. Nonetheless, I do have concerns about its compatibility across versions and about the possibility of missed menus, so there may be more work for it in the future.

MM

I began work on MM last week, when I finished most of the difficult work for it, mainly in the form of coming up with a strategy to clean and parse text and voice buttons. However, after that was finished, there was a great amount of time-consuming work to be done. Might and Magic games – or at least, Xeen games – have a relatively high number of menus and user input. In general, each menu has its own dialogs file, which handles text, graphics, and menu input. On the positive side, this meant that it was quite easy to find text for menus. On the negative side, there is practically no pattern among this text, and it’s very often in an order that doesn’t match the order that it’s displayed on screen. Button text may be listed first, or it may be listed somewhere closer to the end; labels and their values may be listed next to each other, or they may be listed in completely different blocks – for instance, “Might, Intellect, Hit Points, 9, 5, 15” instead of “Might, 9, Intellect, 5, Hit Points, 15”, which would be much easier to voice; and button text may be listed in the order that the buttons themselves are displayed, or in a completely different order. This lack of pattern applies even across similar situations: for example, locations order button text at the end, except for banks, which order them at the beginning. Such a problem occurs because text is displayed using a wide variety of formatting characters for factors such as position and justification, but it seems that they can’t easily be used for voicing, since these traits alone are usually not predictive of the best voicing order.

This lack of standardization prompted several concerns, because it meant that voicing the text directly would result in awkward ordering that may not make sense. It was possible to simply let FontSurface::writeString – which I discovered last week – handle the voicing, as it’s seemingly used for all displayed text. Such a strategy would be easier to implement, but wouldn’t be nearly as responsive or logical, since no buttons would be voiced with input and text would often be voiced out of order. However, some text is in the correct order, and so it makes sense to use this method without changing anything in a few cases. Thus, I decided to only override voicing in FontSurface::writeString individually for each dialog where it’s necessary, allowing for much smoother and cleaner voicing and providing a means of always voicing any text that I may have missed.

Voicing this text with different code for each dialog was rather time-consuming. It required looking through each menu, seeing how it orders text, and splitting it up accordingly, with each dialog requiring a different process. I decided to split the cleaned text along newlines using a method called getNextTextSection, as it was the most reliable means of getting pieces of text, and made voicing each dialog only a matter of switching around getNextTextSection calls. Unfortunately, this still required a fair amount of work, especially because of the occasional occurrence of oddities, such as the party screen having twice the text after leaving the character creation screen, which causes the first repetition of text – an outdated version – to be voiced. Fortunately, this process of tailoring each dialog wasn’t particularly difficult, but it did take a while.

Thus, after deciding on a procedure for voicing buttons – which involves a separate array of strings, with each applicable button having an index that corresponds to a string in this array – and for cleaning and splitting text, most of the work for MM was simply time-consuming. Nonetheless, as I added support to more menus, I did have to consider whether my methods were optimal, and I went over several possibilities. I considered storing text directly in each button without using any index variable, but this would be unreliable: the strings for each button are rarely in the same order as the buttons themselves, which would require hardcoding indices (for example, _buttons[5] = getNextTextSection(...)). This strategy would break completely if, for some reason, the order that buttons are added to the _buttons array is changed. I also considered reordering the buttons themselves, but after this caused the wrong images to be rendered on the character info sheet, I decided it was too dangerous. In contrast, I highly doubt that the order of text taken directly from the game files will ever change, which makes the original method of keeping the buttons and their text – populated by hardcoded splitting – separate the safest option I could think of. Similarly, I considered whether my current method of passing a string to FontSurface::writeString to be cleaned was optimal, as it restricted TTS input to whenever this method is called. Perhaps catching the text earlier in the code could remove the need to split the string, as its fields would already be separate. Indeed, input to this method in nearly all cases can be found earlier, with many of its values split from it. Unfortunately, most of the text directly from the resources is already in a large block, which necessitates sorting anyway, but this time with the need to clean it as well (though I could perhaps recreate the cleaning process that exists in writeString, I wasn’t certain if I wanted to run the risk of missing an important step, and it seemed unnecessary). I also considered using the string’s formatting characters, but this seemed unreliable, as traits such as exact text position aren’t consistently indicative of the best voicing order, and it seemed easier to clean first and split later. Thus, I settled on many of my initial strategies, though I may change them in the future if a better idea arises.

Ultimately, working on MM was more challenging than previous engines, due to its immense amount of text, menus, and input. Its text is also full of formatting characters that shouldn’t be voiced, requiring a lot of cleaning, and its lack of standardization demanded a different process for each menu. I had to think of several different approaches, and pick which one seemed to be the most robust and effective. It was a very interesting and entertaining experience, and I’m quite happy with the TTS functionality for this engine. However, I’m not certain how well it will perform with Might and Magic 1 or other languages, or if my methods are optimal, and thus there may be more work to be done in the future.

Conclusion

This week, I finished TTS for MM, a larger engine that required more work. It was a rewarding experience, and I now have a PR opened for it. Next week, I’ll be working on SCUMM, and perhaps AGI if all goes well. These are the last two engines planned for my project. I hope to reach a stretch goal, but that depends on whether SCUMM and AGI have any surprises like MM did.

Recent Posts

Recent Comments

Archives

Categories

Introduction

MM

Conclusion

Leave a Reply Cancel reply