Introduction
This week, I opened a PR for adding text-to-speech to AGI. It was the last engine marked on my proposal, which means that I have now made PRs for every engine that was planned for my project. This is a major milestone, and gives me time to work on Gob as a stretch goal.
AGI
I spent much of the week working on AGI, which was fortunately a rather simple engine. While it supports many different games and fangames, most of these games seem to function very similarly: there may be introductions or other screens that display text using TextMgr::display
, with most other text being displayed through popup windows, typed commands, or menus. Thus, AGI was fairly straightforward, requiring only a handful of TTS calls in key methods, as well as some consideration for when to voice the status menu and clock, which are always visible – I settled for voicing one or the other primarily when the game is loaded, the status menu changes, and the clock is enabled, to avoid voicing them too frequently – and when to stop TTS.
The major concerns with AGI were how it handles timing and TextMgr::display
across games. Some games appear to use timer variables to signal when it’s time to change the text on screen, while others use when a sound ends. Therefore, from what I could tell, there was no consistent way to predict when text will change, which made it difficult to delay text changes until TTS is finished. I eventually settled for queuing text and delaying room and window changes until TTS finishes, which seems to be enough to allow TTS to voice everything displayed on screen in a reasonable manner. In addition, TextMgr::display
differs between games: some call it only once, while others call it every frame, which necessitates keeping track of the previously said text to avoid speech loops. However, much of the text passed through this method is displayed in sentences or chunks, rather than all at once in a single call. This makes simply tracking the previously said text fail and results in awkward, choppy voicing when a sentence is broken across lines. My solution to this issue was to combine the text passed to TextMgr::display
in a _combinedText
variable, and then voice this when the game script returns or halts. Using this method, paragraphs and blocks of text are spoken cleanly as one sentence, and the previously said text can simply be set to this combined text to prevent speech loops. I also decided to add in newlines between these pieces of text if they aren’t displayed on subsequent rows, since in most cases, pieces of text that aren’t in subsequent rows shouldn’t be voiced as one quick sentence.
In conclusion, AGI was a fairly simple engine, since its games didn’t have much complexity. Most of the work went into handling timing and subsequently displayed text, as these factors can differ significantly between its many games. Otherwise, AGI fangames seem to have little variation in how they display text, which simplified work on it, though there’s a chance that games that aren’t fangames may function differently, as I mainly worked with fangames.
Gob
After opening a PR for AGI, I started work on Gob. So far, Gob doesn’t seem too complex: its text seems limited and appears to be displayed through only a few select methods. However, certain games, like Adibou 2, separate text from many of their buttons, which requires a technique to sync them. So far, I’ve decided on a method of checking whether the displayed text intersects with a hotspot, then expanding the collision rectangle of the text accordingly, which seems to work fairly well. Nonetheless, I would like to see if I can make this solution more robust for better compatibility, as it seems that Gob supports a variety of games with many differences between them.
Conclusion
This week, I opened a PR for my last planned engine, AGI, and started work on Gob as a stretch goal. I plan to have a PR up for Gob by the end of this week, which may possibly give me enough time to start another stretch goal.