Introduction
This week, I opened PRs for adding text-to-speech to Got and Hugo. They were relatively simple engines, and neither had any major challenges.
Got
I started this week by finishing TTS for Got. As previously mentioned, Got was quite simple for TTS: most of its text was easy to locate in the code, and text was rarely primarily handled by the game scripts. Thus, most of the game’s text could be voiced rather easily. For cleaner voicing, I included checks for when the score, jewels, or keys change, allowing their values to only be voiced when they change; delays as the credits are voiced to prevent them from moving too quickly for TTS to keep up; and voicing of dialogue as soon as it is started, so that it syncs well with the text while it appears one character at a time.
Nevertheless, one consideration for Got was the best means of cleaning text. In most cases, newlines break up sentences in dialogue, which results in choppy voicing if they aren’t replaced. However, some dialogue, such as signs, use newlines instead of punctuation to break up distinct sentences. If these newlines are replaced, TTS awkwardly pronounces everything as one quick sentence. My solution to this issue was to keep the newlines only if there are two or more in a row, as this is usually the case for sentences separated exclusively by newlines. Singular newlines tend to break up whole sentences, and thus should be replaced.
Ultimately, Got was one of the simpler engines that I’ve worked on. It required some care for properly cleaning up text and guarding against awkward voicing, but it otherwise offered few unique challenges.
Hugo
After Got, I worked on adding TTS to Hugo. Like Got, Hugo was a fairly simple engine for TTS. Its games are straightforward and similar to each other, which made it easier to implement functional TTS for all scenarios. Furthermore, it has few unique procedures of displaying text, so voicing most or all of its text only required adding TTS calls to a few methods. Nevertheless, Hugo required some consideration for its dialog boxes. Unlike most other engines that I’ve worked with, Hugo displays a considerable amount of its text in ScummVM dialog boxes, which have their own TTS. This means that opening or working with some dialog boxes, like the top menu, stops all TTS. Stopping TTS in this manner, however, can interrupt voicing of the sound setting, which is shown in Hugo’s score line. Voicing it as soon as it changes doesn’t work properly, as in many cases, the top menu is immediately opened again, thus interrupting the voicing. Therefore, my solution was to keep trying to voice the new sound setting for as long as the cursor is in a position to open the top menu. In this way, once the top menu finally fully closes and doesn’t open again, the sound setting will be voiced.
Other aspects of Hugo were rather simple. It mostly entailed implementing more TTS for the aforementioned dialog boxes, as their built-in TTS only voices text as it’s hovered over and only one line at a time, and it seemed to me that these boxes needed to be voiced as one clean paragraph or sentence and as soon as they appear; voicing scoring changes; and correctly queuing voicing when dialog boxes, user input, and scores are voiced, to prevent them from interrupting each other as dialog boxes appear.
In conclusion, Hugo was fairly simple. Because its games involved few different means of displaying text and simple controls, it didn’t offer that many unique challenges.
Conclusion
This week, I finished adding TTS to Got and Hugo. Both of these engines were simple to work with, and they didn’t have any major caveats or challenges.