Month: August 2025

GSoC Final Work Product

Project Goals

The main goal of this project was to add text-to-speech, or TTS, to a variety of ScummVM engines. TTS enhances the accessibility of games supported by these engines and helps language learners. Furthermore, adding TTS to more engines standardizes the implementation of this feature, meaning the inclusion of TTS will be more consistent across engines. Adding TTS to an engine entails adding responsive voicing of text elements such as dialogue, menus, user input, objects, and credits, as well as adapting TTS to user actions and the functionalities of the games themselves.

What was Done

The original goal of the project was to add TTS to 12 engines, but I had enough time to add TTS to an additional 3 engines. Therefore, I added full TTS support to a total of 15 engines over the summer. In addition, I added TTS to 2 engines prior to the official start of GSoC, and I helped fix a few bugs for one engine.

Current State

All of my pull requests have been merged, meaning 17 engines now have full TTS implementations. Games that use these engines should have comprehensive and usable TTS.

What’s Left

All of my project’s goals, as well as a few stretch goals, have been completed. Nonetheless, some engines that support a wide variety of games, such as SCUMM, may need additional testing by end users. Any required changes found from this testing should only be tweaks to behavior: the core of the TTS implementations themselves should be complete. In addition, there are still a few engines left that need TTS support.

Code

Most PRs are for adding TTS to certain engines, but some are fixes for previously merged code. The following is a list of each PR I opened:

TTS for Drascula: https://github.com/scummvm/scummvm/pull/6526
TTS for TeenAgent: https://github.com/scummvm/scummvm/pull/6566
TTS for WAGE: https://github.com/scummvm/scummvm/pull/6669
Fix for Russian TTS for TeenAgent: https://github.com/scummvm/scummvm/pull/6722
TTS for Cine: https://github.com/scummvm/scummvm/pull/6700
TTS for CruisE: https://github.com/scummvm/scummvm/pull/6710
TTS for Draci: https://github.com/scummvm/scummvm/pull/6742
Fixes for TTS for CruisE: https://github.com/scummvm/scummvm/pull/6752
TTS for MADE: https://github.com/scummvm/scummvm/pull/6779
TTS for ADL: https://github.com/scummvm/scummvm/pull/6785
TTS for Parallaction: https://github.com/scummvm/scummvm/pull/6795
TTS for Prince: https://github.com/scummvm/scummvm/pull/6807
TTS for EFH: https://github.com/scummvm/scummvm/pull/6820
Bug fixes for EFH: https://github.com/scummvm/scummvm/pull/6821
TTS for MM: https://github.com/scummvm/scummvm/pull/6835
Czech and Polish TTS for TeenAgent: https://github.com/scummvm/scummvm/pull/6867
TTS for SCUMM: https://github.com/scummvm/scummvm/pull/6856
TTS for AGI: https://github.com/scummvm/scummvm/pull/6862
TTS for Gob: https://github.com/scummvm/scummvm/pull/6880
TTS for Got: https://github.com/scummvm/scummvm/pull/6888
Fix for TTS for Got: https://github.com/scummvm/scummvm/pull/6900
TTS for Hugo: https://github.com/scummvm/scummvm/pull/6897

Challenges and What was Learned

Ensuring TTS compatibility across numerous games, languages, and versions that a single engine may support was sometimes challenging to comprehensively address. Furthermore, some engines handle their text almost entirely with game scripts, which makes it more difficult to narrow down where text is being displayed and how to feed it to TTS. To address these problems, I had to test thoroughly, think about limitations and features that differ between games or platforms, and employ strategies such as recreating click boxes to properly voice text as the user interacts with the game. Thus, most challenges with adding TTS revolved around compatibility and logical voicing of text elements.

I learned a great amount this summer about working with a team, understanding and modifying code written by other developers, and writing code that fits into an existing codebase. I am now more comfortable with navigating and working on larger projects.

Conclusion

I enjoyed working on my Google Summer of Code project, and I found it to be a very entertaining and rewarding experience. I am happy that I was able to contribute something to ScummVM, and I hope that it will be helpful to users.

I would like to thank my mentor, criezy, for kindly guiding me throughout the summer, and sev, for guiding me as well. I would also like to thank all members of the ScummVM team that I interacted with for their patience and assistance, and the entire ScummVM team for their work on this application.

Uncategorized

Week 12: Got and Hugo

Introduction

This week, I opened PRs for adding text-to-speech to Got and Hugo. They were relatively simple engines, and neither had any major challenges.

Got

I started this week by finishing TTS for Got. As previously mentioned, Got was quite simple for TTS: most of its text was easy to locate in the code, and text was rarely primarily handled by the game scripts. Thus, most of the game’s text could be voiced rather easily. For cleaner voicing, I included checks for when the score, jewels, or keys change, allowing their values to only be voiced when they change; delays as the credits are voiced to prevent them from moving too quickly for TTS to keep up; and voicing of dialogue as soon as it is started, so that it syncs well with the text while it appears one character at a time.

Nevertheless, one consideration for Got was the best means of cleaning text. In most cases, newlines break up sentences in dialogue, which results in choppy voicing if they aren’t replaced. However, some dialogue, such as signs, use newlines instead of punctuation to break up distinct sentences. If these newlines are replaced, TTS awkwardly pronounces everything as one quick sentence. My solution to this issue was to keep the newlines only if there are two or more in a row, as this is usually the case for sentences separated exclusively by newlines. Singular newlines tend to break up whole sentences, and thus should be replaced.

Ultimately, Got was one of the simpler engines that I’ve worked on. It required some care for properly cleaning up text and guarding against awkward voicing, but it otherwise offered few unique challenges.

Hugo

After Got, I worked on adding TTS to Hugo. Like Got, Hugo was a fairly simple engine for TTS. Its games are straightforward and similar to each other, which made it easier to implement functional TTS for all scenarios. Furthermore, it has few unique procedures of displaying text, so voicing most or all of its text only required adding TTS calls to a few methods. Nevertheless, Hugo required some consideration for its dialog boxes. Unlike most other engines that I’ve worked with, Hugo displays a considerable amount of its text in ScummVM dialog boxes, which have their own TTS. This means that opening or working with some dialog boxes, like the top menu, stops all TTS. Stopping TTS in this manner, however, can interrupt voicing of the sound setting, which is shown in Hugo’s score line. Voicing it as soon as it changes doesn’t work properly, as in many cases, the top menu is immediately opened again, thus interrupting the voicing. Therefore, my solution was to keep trying to voice the new sound setting for as long as the cursor is in a position to open the top menu. In this way, once the top menu finally fully closes and doesn’t open again, the sound setting will be voiced.

Other aspects of Hugo were rather simple. It mostly entailed implementing more TTS for the aforementioned dialog boxes, as their built-in TTS only voices text as it’s hovered over and only one line at a time, and it seemed to me that these boxes needed to be voiced as one clean paragraph or sentence and as soon as they appear; voicing scoring changes; and correctly queuing voicing when dialog boxes, user input, and scores are voiced, to prevent them from interrupting each other as dialog boxes appear.

In conclusion, Hugo was fairly simple. Because its games involved few different means of displaying text and simple controls, it didn’t offer that many unique challenges.

Conclusion

This week, I finished adding TTS to Got and Hugo. Both of these engines were simple to work with, and they didn’t have any major caveats or challenges.

Uncategorized

Week 11: Gob

Introduction

This week, I opened a PR for adding text-to-speech to Gob, my first stretch goal. It was an engine that posed a few more challenges than usual, but that was entertaining to work on. I also started early work on adding TTS to Got as my next stretch goal.

Gob

Gob was a rather interesting engine for adding TTS to, primarily because of how it handles text and buttons. Unlike many of the engines that I’ve worked on, hotspots and text are inherently separated in Gob. This meant that there was no easy way to retrieve the text of a button or menu as the user hovers over it, which makes it more difficult to voice. Furthermore, without knowing which text is part of a clickable hotspot and which text is not, text that shouldn’t be voiced the instant it appears will be voiced. My solution to this issue was to maintain an array, _hotspotText, which holds printed text and their positions. Then, as hotspots are added, their collision boxes are checked against those of the text array. If they intersect, the hotspot text’s collision rect is expanded to that of the hotspot. Through this method, hotspot text can be voiced as the user hovers over it, while most other non-hotspot text is voiced by a separate method to make sure that all text on screen is voiced. Nonetheless, refining this strategy took some time, as I encountered issues with properly removing elements from _hotspotText when text is removed from the screen, which I resolved by deleting unassigned hotspot text in certain methods and removing pieces of assigned hotspot text when their corresponding hotspots are removed; the cursor being instantly placed over a hotspot in some cases, interrupting the speaking of non-hotspot text, which was resolved by queuing hotspot text when a non-hotspot piece of text is voiced; and having to move hotspot text in o1_copySprite, as the text itself is sometimes moved there, and not moving the hotspot text with it results in a disjointed collision.

Another problem with Gob was timing of dialogue. From what I could tell, there are no unique indicators for when dialogue starts, progresses, or ends in Gob, which causes issues during dialogue interactions, as they may move too fast and interrupt voicing. In addition, dialogue appears to be displayed as TOT text, but so are object names, which means that TOT text can’t be set to always interrupt or always queue without causing awkward voicing. My solution was to only queue text if the user isn’t hovering over a hotspot, as in most cases, this means that it’s text that should be queued. After resolving this issue, most of Gob’s other problems – including needing to improve voicing of Ween’s notepad, which awkwardly tries to voice one character or section at a time as the user types, by only voicing it when it first opens – were rather straightforward.

Ultimately, Gob was an interesting engine because of how it handles hotspots and dialogue. It required different solutions from previous engines, and I’m quite happy with the result, though there may be more problems to resolve in the future.

Got

After Gob, I started work on adding TTS to Got. So far, Got seems very simple for TTS. Most timing and menus do not seem to be handled by game scripts, which makes it easier to locate where to voice text and to implement clean voicing. At the moment, nothing difficult has emerged, but this may change as I add more TTS to the engine.

Conclusion

This week, I added TTS to Gob, and started work on adding TTS to Got. Gob had several considerable intricacies, while Got appears to be rather simple so far. Next week, I’ll be continuing work on Got, and perhaps starting work on adding TTS to another engine if time permits. However, my semester starts next week, which will limit the time I have to work on my project.

Uncategorized

Week 10: AGI

Introduction

This week, I opened a PR for adding text-to-speech to AGI. It was the last engine marked on my proposal, which means that I have now made PRs for every engine that was planned for my project. This is a major milestone, and gives me time to work on Gob as a stretch goal.

AGI

I spent much of the week working on AGI, which was fortunately a rather simple engine. While it supports many different games and fangames, most of these games seem to function very similarly: there may be introductions or other screens that display text using TextMgr::display, with most other text being displayed through popup windows, typed commands, or menus. Thus, AGI was fairly straightforward, requiring only a handful of TTS calls in key methods, as well as some consideration for when to voice the status menu and clock, which are always visible – I settled for voicing one or the other primarily when the game is loaded, the status menu changes, and the clock is enabled, to avoid voicing them too frequently – and when to stop TTS.

The major concerns with AGI were how it handles timing and TextMgr::display across games. Some games appear to use timer variables to signal when it’s time to change the text on screen, while others use when a sound ends. Therefore, from what I could tell, there was no consistent way to predict when text will change, which made it difficult to delay text changes until TTS is finished. I eventually settled for queuing text and delaying room and window changes until TTS finishes, which seems to be enough to allow TTS to voice everything displayed on screen in a reasonable manner. In addition, TextMgr::display differs between games: some call it only once, while others call it every frame, which necessitates keeping track of the previously said text to avoid speech loops. However, much of the text passed through this method is displayed in sentences or chunks, rather than all at once in a single call. This makes simply tracking the previously said text fail and results in awkward, choppy voicing when a sentence is broken across lines. My solution to this issue was to combine the text passed to TextMgr::display in a _combinedText variable, and then voice this when the game script returns or halts. Using this method, paragraphs and blocks of text are spoken cleanly as one sentence, and the previously said text can simply be set to this combined text to prevent speech loops. I also decided to add in newlines between these pieces of text if they aren’t displayed on subsequent rows, since in most cases, pieces of text that aren’t in subsequent rows shouldn’t be voiced as one quick sentence.

In conclusion, AGI was a fairly simple engine, since its games didn’t have much complexity. Most of the work went into handling timing and subsequently displayed text, as these factors can differ significantly between its many games. Otherwise, AGI fangames seem to have little variation in how they display text, which simplified work on it, though there’s a chance that games that aren’t fangames may function differently, as I mainly worked with fangames.

Gob

After opening a PR for AGI, I started work on Gob. So far, Gob doesn’t seem too complex: its text seems limited and appears to be displayed through only a few select methods. However, certain games, like Adibou 2, separate text from many of their buttons, which requires a technique to sync them. So far, I’ve decided on a method of checking whether the displayed text intersects with a hotspot, then expanding the collision rectangle of the text accordingly, which seems to work fairly well. Nonetheless, I would like to see if I can make this solution more robust for better compatibility, as it seems that Gob supports a variety of games with many differences between them.

Conclusion

This week, I opened a PR for my last planned engine, AGI, and started work on Gob as a stretch goal. I plan to have a PR up for Gob by the end of this week, which may possibly give me enough time to start another stretch goal.

Uncategorized

Week 9: SCUMM

Introduction

This week, I opened a PR for adding text-to-speech to SCUMM, which was an interesting engine to work on. I’m quite happy with the result, though with the wide variety of games supported by SCUMM, it may need more work in the future.

SCUMM

Most of my week was spent on adding TTS to SCUMM, an engine that was neither especially difficult nor especially easy to work on. On one hand, SCUMM games are very similar to most of the games I’ve worked on before. Most of them have few menus and rather simple means of displaying text, unlike engines like Efh or MM. Thus, creating a user-friendly TTS system was fairly easy for SCUMM. In addition, finding where text is displayed wasn’t difficult: most of it goes through printString and drawString, with actor speech being displayed with displayDialog. Furthermore, most of the GUI controls and buttons come with labels built into them, which makes voicing buttons when they’re hovered over a very simple process. As such, much of the work for SCUMM involved simply adding TTS voicing calls to these functions, while accounting for situations such as subtitles for voiced dialog and the need to delay the disappearance of text until TTS is finished.

On the other hand, however, SCUMM is larger than some of the engines I’ve worked with previously, and it supports many different games. This means that compatibility issues are a recurring problem, as what works for TTS in one version of the SCUMM engine may not work in others. A good example of this is verbMouseOver. When I first came across this method, I thought that it would be a great place to voice verbs when they’re hovered over. However, SCUMM version 5 – or at least, Indiana Jones and the Fate of Atlantis – doesn’t seem to reliably use this function for detection of hovering over verbs. In addition, while some games only call drawVerb once as a verb is hovered over, games like Fate of Atlantis call it every frame. Thus, to try to voice verbs as robustly as possible, I decided to add code to drawVerb, which most SCUMM versions seem to go through for verb drawing, to check whether the current verb is highlighted before voicing it, a strategy that seemed to work for many games. There were other compatibility issues that I had to resolve as well, such as SCUMM versions 0, 1, and 2 using custom text encodings that needed to be replicated; drawVerb sometimes being used to print strings that aren’t verbs, requiring them to be voiced even if they aren’t highlighted; and Passport to Adventure having a special help menu with buttons that aren’t considered GUI controls, which required a means of storing the text for each button and detecting when they’re hovered over.

Another concern was that SCUMM versions 7 and 8 use their own methods for displaying text, though they were fortunately similar enough to those used by earlier versions that it wasn’t too difficult to voice them. Humongous Entertainment games also seem to have different means of handling text, but because they don’t have much text in the first place, I didn’t have to worry as much about them. Thus, most of the compatibility issues were in earlier SCUMM versions, with later ones having fewer problems.

Ultimately, the most difficult component of SCUMM was the wide variety of games supported. Each version has its own ways of handling text that have to be considered, requiring careful thought about the best places to voice or stop text. I’m fairly happy with my implementation of TTS for this engine, but because of its many games, there may be some oddities that will need to be resolved.

Conclusion

I opened a PR for SCUMM this week, which was an interesting engine to explore, due to its greater size and variety of versions. I also revisited my MADE PR, an engine that had its own compatibility issues, with text indices varying across game versions, that should be solved now. Next week, I’ll be working on AGI, the last engine listed on my project.

Recent Posts

Recent Comments

Archives

Categories