Introduction
This week, I opened a PR for adding text-to-speech to Gob, my first stretch goal. It was an engine that posed a few more challenges than usual, but that was entertaining to work on. I also started early work on adding TTS to Got as my next stretch goal.
Gob
Gob was a rather interesting engine for adding TTS to, primarily because of how it handles text and buttons. Unlike many of the engines that I’ve worked on, hotspots and text are inherently separated in Gob. This meant that there was no easy way to retrieve the text of a button or menu as the user hovers over it, which makes it more difficult to voice. Furthermore, without knowing which text is part of a clickable hotspot and which text is not, text that shouldn’t be voiced the instant it appears will be voiced. My solution to this issue was to maintain an array, _hotspotText
, which holds printed text and their positions. Then, as hotspots are added, their collision boxes are checked against those of the text array. If they intersect, the hotspot text’s collision rect is expanded to that of the hotspot. Through this method, hotspot text can be voiced as the user hovers over it, while most other non-hotspot text is voiced by a separate method to make sure that all text on screen is voiced. Nonetheless, refining this strategy took some time, as I encountered issues with properly removing elements from _hotspotText
when text is removed from the screen, which I resolved by deleting unassigned hotspot text in certain methods and removing pieces of assigned hotspot text when their corresponding hotspots are removed; the cursor being instantly placed over a hotspot in some cases, interrupting the speaking of non-hotspot text, which was resolved by queuing hotspot text when a non-hotspot piece of text is voiced; and having to move hotspot text in o1_copySprite
, as the text itself is sometimes moved there, and not moving the hotspot text with it results in a disjointed collision.
Another problem with Gob was timing of dialogue. From what I could tell, there are no unique indicators for when dialogue starts, progresses, or ends in Gob, which causes issues during dialogue interactions, as they may move too fast and interrupt voicing. In addition, dialogue appears to be displayed as TOT text, but so are object names, which means that TOT text can’t be set to always interrupt or always queue without causing awkward voicing. My solution was to only queue text if the user isn’t hovering over a hotspot, as in most cases, this means that it’s text that should be queued. After resolving this issue, most of Gob’s other problems – including needing to improve voicing of Ween’s notepad, which awkwardly tries to voice one character or section at a time as the user types, by only voicing it when it first opens – were rather straightforward.
Ultimately, Gob was an interesting engine because of how it handles hotspots and dialogue. It required different solutions from previous engines, and I’m quite happy with the result, though there may be more problems to resolve in the future.
Got
After Gob, I started work on adding TTS to Got. So far, Got seems very simple for TTS. Most timing and menus do not seem to be handled by game scripts, which makes it easier to locate where to voice text and to implement clean voicing. At the moment, nothing difficult has emerged, but this may change as I add more TTS to the engine.
Conclusion
This week, I added TTS to Gob, and started work on adding TTS to Got. Gob had several considerable intricacies, while Got appears to be rather simple so far. Next week, I’ll be continuing work on Got, and perhaps starting work on adding TTS to another engine if time permits. However, my semester starts next week, which will limit the time I have to work on my project.