Google Summer of Code summary

Hello, the GSoC is at it’s end and it is time for me to summarize my work in the past 12 weeks. In case you only want to see the code, here are the pull requests: Mission SupernovaText to speechEncoding conversion.


Mission Supernova

As the first project, I worked on an engine for Mission Supernova 2. The project description can be found here

The engine for the first Mission Supernova game was already almost finished. At first, I decided to create a separate engine (supernova2) and I started with copying code, that is the same for both games from supernova to supernova2. After this I started adding the Mission Supernova 2 specific parts (interactions with objects, rooms, …), which thanks to having the original source codes wasn’t hard. When I had most of the game working, I had to work a bit on improving the translation. There was a tool from the first engine to create translations of game strings and images with only 2 colors. But unlike with the first game, we needed to translate colored images and we also needed to recreate the original image format, which contained multiple images, that got rendered on top of each other as layers. So I created a tool, which recreates the original image format from multiple bmp images and thus allows us to translate any image we want.  At the end I merged the supernova and supernova2 engines into one called “supernova” and created a pull request, which after some modifications suggested by the community got successfully merged.
A walkthrough of the Mission Supernova 2 can be found here.
Working on this project gave me the opportunity to get to know the ScummVM code base a bit. Thanks to this, I tried to create my own midi sound, which was pretty interesting and I also learned a tiny bit of German.
After the GSoC, there will be a public testing of the Mission Supernova games and I want to be around to fix the bugs that come up.

Text to speech

Because I finished the first project early, I chose to work on a text to speech project next. The task was to implement text to speech support for ScummVM for at least 2 platforms and then use this feature in the GUI for people with reduced sight or in the Mortevielle engine. The whole project description can be found here

I begun by identifying the best text to speech backend for each platform (Linux and Windows). For Windows, the choice was pretty clear. Microsoft’s SAPI is probably the best way to implement text to speech on Windows. For Linux the choice was a bit harder. There are several backends, that can be used: eSpeak, Festival, Flite, MaryTTS, speech-dispatcher and a lot of others. Unfortunately none of those could output speech in as high quality as SAPI on Windows, or NSSpeechSynthesizer on macOS. In the end I chose speech-dispatcher, because it is just a high-level API, which uses another backend for the speech, so it is up to the user to configure the speech-dispatcher to use the backend he likes the most. And by using the speech-dispatcher, it in the future allows the user to use some backend, that doesn’t yet exist and outputs speech in comparable or better quality than Windows or macOS.
I started with implementing simple text to speech manager for Linux and then I added the same on Windows. Implementing the manager for Windows was a little bit harder, because this was the first time I developed something windows specific, so some things were pretty new to me (Windows’s heavy use of wchar_t, Microsoft’s __uuidof operator and a lot more). I also had to rewrite some parts of SAPI’s headers to work with MinGW. After implementing some basic speech managers, I added text to speech to the GUI and then also to the Mortevielle engine. Then I spent a lot of time fixing bugs and adding more features to the managers, for example: Different actions, that should be done, when trying to speak while another speech is in progress. This lead to implementing my own queueing of speeches and making the managers multi-threaded. After there were no more features to add and all tests were passing, I created a pull request
Part of the pull request is also a text to speech manager for macOS, which was implemented by my GSoC mentor Criezy.
The text to speech can be seen in action here: Windows GUILinux GUIMacOS GUIWindows MortevielleLinux MortevielleMacOS Mortevielle
Thanks to this project I learned more about how different backends are handled in ScummVM and how to add optional features and optional libraries to ScummVM (the text to speech is an optional feature). I had an opportunity to work with multiple threads, which I don’t do that often and I wrote my first Windows specific code ever.
After the GSoC, once this project gets merged, I want to be there to fix any bug, that comes up (hopefully there aren’t many of them).

Encoding conversion

Because there was still a little bit of GSoC left, I started to look for another project to work on. While working on the text to speech project, I needed to convert between different character encodings (to UTF-8 on Linux, to UTF-16 on Windows). I implemented some conversion there, but it wasn’t perfect. On another pull request, that improved cloud support in ScummVM, there was also an issue with encodings and there was a short discussion about encoding conversion on the IRC. So I decided to add a way to convert encodings to ScummVM.

I started by adding an option to compile ScummVM with iconv and implementing conversions using the iconv library (if it is available). After that, I added another ways of converting encodings (SDL_iconv_string, ansiToUnicode and unicodeToAnsi on Windows and some already existing ways of conversion within the ScummVM). The resulting code tries to use as many ways of converting encodings as possible until it succeeds. First it tries iconv, then backend conversion algorithms (SDL, Win32, …) and then TransMan (translation class in ScummVM, which can be used for some encoding conversions). After this I added transliteration from Cyrillic to ASCII, which was needed by the cloud pull request. In the end I added tests and created a pull request
This project helped me to further explore ScummVM’s code base and I learned a few new things about character encodings.
After the GSoC, once this project gets merged, I want to be there to fix any bug, that comes up.

What the GSoC gave me?

In this GSoC, I had the opportunity to experiment with quite a lot of new things for me: text to speech synthesis, encoding conversions, creating MIDI sounds, multi-threading, reading through old C code (Kernighan and Ritchie function declarations, a lot of gotos, jumps, quite a bit of assembly), I got better at VIM, I learned more things with git and a lot more. I also had to familiarize myself with quite a bit of the ScummVM’s code base. I got used to program 8 hours per day, which will, along with the experience in programming I gained, surely help me with my school projects next semester. And the most important thing: I memorized like 30 hours of metal songs 😀

What’s next?

I would like to enjoy the rest of the holidays before the school begins again, so I won’t work as much, but I certainly want to fix anything not working in my code. After this I will probably have a lot of work with my school projects, but I certainly want to find at least a little bit of time throughout the school year to work on another project for ScummVM, maybe another engine, but I don’t know yet.

This is probably my last blog post for some time, because I am not a person, that would write a blog post, unless it is really interesting or important, so good bye readers.

Vacation update

Hello, last week I was on vacation, so I have done almost no work and that’s why this blog post will be a bit shorter.

I worked only on the text to speech project. I had to rework quite a big parts of the Windows and Linux managers, because I used SAPI’s and speech-dispatcher’s queues to easily queue speeches and that didn’t give me the needed control over the queues to properly implement the INTERRUPT_NO_REPEAT action I wrote about in last post. So now I use my own queues and start speeches one by one in a separate thread.

The goal for this last week of GSoC is clear: finish TTS and encoding conversion projects and create pull requests for them. Hopefully I can achieve this goal.

TTS almost done

Hello readers, since the last blog post, the TTS project slowly but surely neared it’s end. I added OSD message and GUI tooltip reading. We added the possibility to choose what should happen if the TTS gets request to read a message while another message is being read. So it can be added to a queue, it can just be ignored, or it can interrupt the current speech and get read instead (more actions, which depend on the first message in queue are possible). Before only interrupting the currently read message and reading the new one instead was possible. Me and Criezy (who implemented the macOS part) understood some actions differently, so right now the INTERRUPT_NO_REPEAT action, which will interrupt the current speech and delete the speech queue only if the new message is different than the currently spoken message behaves differently on Windows and Linux (it interrupts and deletes the queue if the last message in queue is different, not the currently spoken one.) Once I reimplement this action, the TTS will be ready for a pull request.

In the encoding project, I originally tried to program everything in an Encoding class I created in Common. But Criezy thought it would be better (and it is) to make an encoding conversion method in the OSystem and move all of the platform specific code there. So I reworked everything I had so far. Right now, the code is in a state, it is able to use every way of encoding conversion I wanted to use (iconv, SDL, Win32, TransMan). I even added Cyrillic to ASCII transliteration. Only problem seems to be multi byte encodings (UTF-16 and UTF-32). Because Win32 isn’t able to work with UTF-32 at all (at least I didn’t find a way to convert from / to UTF-32) so I use the U32String class for this. Another thing is endianness. Iconv and SDL uses BOM (byte order mark) at the begining of the string to specify in which endianness the string is, but the Win32 conversion (which uses U32String for UTF-32 and Win32 ansiToUnicode for UTF-16) and TransMan conversion don’t use BOM. The plan is to remove the BOM and use the machine’s native endianness or use a different endianness, but only when the programmer specifies (I assume, in such a case the programmer doesn’t need the BOM, because he is the one who specified which endianness he wants)

Project update

Hello, last week I have been making some finishing touches to the TTS project. I improved voice mapping in the Mortvile Manor game, because I used to use pitch values from the original game to map the characters to different voices. That would mean for example that if a user has only 3 female voices available, both female characters would map to the same voice. So I added an array of voice indices, so the voices map as evenly as possible. I improved TTS in GUI, but that still needs some more work. But the biggest addition to the TTS actually isn’t from me, because Criezy implemented TTS for Mac OS, which I couldn’t do (I don’t own a Mac).

Because TTS is almost finished, I started to work on my next project, which is adding a way to convert between different character encoding. Right now, if you need to convert between encodings, you can:

  • use the U32String class to convert between a few encodings.
  • use TransMan (if TransMan is enabled) to convert between unicode and current GUI charset
  • use platform specific code (SDL, Win32 and Mac have their own ways of converting encodings).
  • implement your own conversion (The most reliable, probably even after I finish this project).
This seems quite messy and confusing to me, we probably don’t want each engine to implement it’s own encoding conversions. So I am adding an option to compile ScummVM with iconv, which is a library, that specializes in encoding conversion. The plan is to first try to use iconv, then the platform specific conversion code and as a last resort use the TransMan to try to convert the string either to the GUI encoding or UTF-32 and try to convert that to the final encoding.

Mortville Manor TTS

Hello, in the past week, I worked some more on the TTS project. First I finished the GUI TTS, I needed to make sure, that everything behaves the same way on Windows and on Linux and I had to make sure I free everything I allocate. I even had to implement reference counting for a class, which wasn’t that hard, but it still was pretty interesting, because I have never needed to do this. I also had to implement conversion of the text being said to UTF-8. On Windows this involved adding a table to the code, which would allow looking up codepage identifier by its name. On Linux I just used SDL_iconv_string, which can convert from most of the GUI encodings, but unfortunately it seems, it cannot convert from CP850, which is apparently quite often used by game engines (it is used by the Mortevielle engine and also by the supernova engine). So right now I just convert the strings like if they were in ASCII (most of the encodings are just ASCII extensions, so I hope to successfully convert at least some strings). I am thinking about using the iconv library for handling the conversion, but that would mean another dependency needed for compiling the code.

After I was happy with the state the TTS was in and everything worked for the GUI, I moved to implementing TTS to the mortevielle engine. It took me some time to get oriented enough in the engine (partly because some functions have their original french names). But once I located the correct spot, the implementation was pretty easy.
The engine uses a table of character voice pitch (number > 5 is a female voice, < 5 is a male voice):

const int haut[9] = { 0, 0, 1, -3, 6, -2, 2, 7, -1 };

At first, I wanted to map each number to a different voice, which would be alright on Linux, because there are 4 voices for each gender (some characters would still be mapped to the same voice, but I think the result would be good enough). But unfortunately it’s quite unlikely, that on the other platforms the user is going to have that many different voices. So I ended up using only one voice for every character and just changing the pitch a little bit to make them at least a little bit different. If there is ever a situation, that there is no voice for a gender, I try to simulate that genders voice by taking a voice, that is available and changing its pitch a lot (higher for female, lower for male). It sounds quite comically, but with a little bit of fantasy it works.

Here are some results of the Mortevielle TTS, I wanted to show, that switching TTS states really works (that the TTS works as it should when switching between GUI and game).



ScummVM speaks

Hello readers, first I might have to apologize, because with how busy I was at the end of last week, I actually forgot, I should write a blog post. But at least now I have a lot more to show you.

The Mission Supernova project is near it’s end, I created a pull request with the work I have done on it and right now it waits to be reviewed by the ScummVM community. You can find the pull request here.

This allowed me to fully focus on my next project, which is the text to speech. First I implemented the TTS on Linux, which was pretty easy, because the speech-dispatcher API is quite simple to use (A lot simpler then SAPI from Microsoft). The code for that should be close to finished, I just need to add some checks in a few places. Then, so I can test the TTS, I implemented reading of text in the GUI (A blind user could use this to navigate through the menus and maybe play some game, which also uses the TTS to read text). I spent quite a while doing this, because I wasn’t sure where to put all the code the GUI needed for TTS, but I quite like the state it is at now. So after I finished the GUI modifications, I started to work on the Windows side of the TTS. This is quite challenging for me, because I am used to work on Linux with my pretty specific setup, so I quite often unsuccessfully tried to use my custom shortcuts I am used to on Linux, which was quite annoying. This was also my first Windows specific C++ code, so working with the COM, or seeing the Microsoft custom uuid operator was quite new for me. But even with all this, I have managed to implement a pretty good portion of the Windows TTS code, I don’t have all the features programmed yet, but some results can already be seen, or rather heard.

I guess this is enough of me writing, I promised to show you stuff, so here is how the ScummVM sounds like, when it speaks:


This is using the speech-dispatcher with espeak-ng, other TTS engines should work as well (I have tried Festival and it worked). The voices are set by user in speech-dispatcher configuration, but programmer can make some adjustments to them.
This is using the Microsoft SAPI, the voice selection depends on the selected language and the set of installed voices (user has to install them into windows first in order to appear in voice selection in ScummVM). As you can see, I for some reason have a Chinese voice installed, so I can show you that switching languages works. But I think the voice is supposed to be used for a different dialect or something like that, because it seems it doesn’t understand the text, because it only spells it instead of reading it as a whole

GSoC update

Hello, last week I almost finished the Supernova engine. All of the bugs I managed to find should be fixed, I added translations to most images and help files, I merged .dat files with translations for each game and tools to create them into one and I also added the “improved” mode. Right now the improved mode:
In the first game

  • Automatically dresses and undresses the player from space suit every time he walks through the airlock (after he first does this on his own).
  • Automatically closes one side and opens the other side of the airlock (after the player first walks through the airlock on his own).
In the second game
  • Enables using go verb alongside with push verb to interact with the puzzle tiles inside the pyramid (that way the player doesn’t have to click on push in between puzzle interactions and can focus only on solving the puzzle)
  • Enables using go verb alongside with push verb to interact with the password locked door inside the pyramid
The only thing left to do should be to create one last translated image and then create a pull request.

New project

Because I am at the finish of the supernova engine and there is still 6 weeks of GSoC left, I have to find another project to work on. The project I chose is add text to speech capability. I tried to find the best library to use for this, the dream would be something, that supports at least English, German and French with decent voice quality and runs on as many platforms as possible. Of course the dream didn’t happen, so the plan for now is to use SAPI for Windows and speech-dispatcher for Linux (these are the only platforms I can develop for). Finding a good Linux library wasn’t easy at all, the speech quality usually isn’t very good and quite often the only supported language is English, or there are some other issues, why I didn’t like the library. So we chose the speech-dispatcher, because it is a high level interface to whatever speech engine the end user chooses and so in the future it might enable the user to choose an engine, that doesn’t exist yet and sounds good.

Supernova images

Last week I pretty quickly finished the merge, there was only a bit of refactoring left. After that I worked on a tool that could recreate Mission Supernova datafiles containing images, which would allow us to translate them.

The MS datafile format is pretty simple and it reminds me of .bmp images (there is a header, after that palette and after that pixel data). At the first 4 bytes is size of the pixel data field. After that is 1 byte, which is the size of the palette. The palette follows after that. Pretty interesting thing is, that each color is brightened by the engine (shifted by 2 to the left), so we have to count with that when making a new image and generating palettes for them. This also means, that we loose a lot of colors (We for example aren’t able to encode the whole 256 colors of grayscale). After the palette is information about sections. Section is a image, that can get rendered on top of other sections (images). The information is basically only it’s size, location on the screen and location of it’s pixel data inside the file. Next is click field information. Click field is similar to section, it has size and location on the screen, but it doesn’t have image assigned to it. After this follows 2 to 258 bytes used to decompress the image. The last thing is the pixel data for all the sections.

The tool right now is semi-automated (I think automated enough). As it’s input it uses files containing palette, section and click field descriptions and a .bmp file for each section. It just copies the descriptions into the file after which it takes the pixel data from each .bmp image and copies them too. The thing, that isn’t automated is, that user has to tell the tool by a command line argument, where the pixel data of the images start (how many bytes of each .bmp image to skip)

I wrote the reasons, the tool is needed in the last post, but just to remind you. The main reason is, that we need to translate this image:

This writing basically means “You are almost there” and player is supposed to push certain letters using a clue on a note to open the door. A temporary solution was to translate the note to English (translating the password too), changing the password on the door and adding subtitle saying “You are almost there” when entering the room with this door.
When translating the image, I had quite a few problems with palette:

But after that I noticed, that the first 16 colors don’t change and the image palette starts on 17th color of the engines palette, so I shifted the palette by 16 colors (just adding blue to the front a few times, which isn’t used in this image) and I got this:

You probably noticed, that the picture is upside down and that there is a noticeable lighter circle around the writing. But with the help of dithering and mirroring the image in gimp I managed to translate the whole image:

And thanks to having the tool, I was able to add a better version of the translated ciphered image too:

Now I will fix some more bugs, me and my brother found while playing the game, then I will add an “improved” mode making some repetitive and annoying tasks less repetitive and less annoying.

Supernova engine merge

Hello, the last week was really busy for me. Because I had a lot of personal stuff to take care of, I sometimes struggled to work as much as I should (that’s the reason, why I am a day late with the blog post), but in the end, I think, I managed. Last week I planed to implement the outro, test the game, fix all the bugs I find and prepare to merge the supernova engines into one. I managed to complete everything I wanted, but I not only prepared the engines for the merge, I actually managed to merge them. The merge is not finished yet, I still have to do a lot of refactoring, testing and fixing bugs I made when merging the engines, but the games can be played on the single engine and they seem to work.

So the plan for next week is to mainly finish the merge. After that, there is not much left to do, so I will work on enhancing the translation.

Right now, we are able to load custom images from a .dat file that are in .pbm format, which is a simple format with only two colors per image. This format is enough to save for example newspaper images like this one:

But there are images in the second game that should be translated, for which only two colors aren’t enough, like this one:

which is in grayscale and with only two colors, I don’t think I can get it much better then this:

Another problem with images is, that the engine uses a format of images, that contain “sections”, that are rendered on top of each other, so we right now have no way to translate this:

Because in this room, you have to push the right letters to write a password. After a letter is pushed, it is highlighted and that is done by rendering different sections of the image.
So the task after I finish the merge will be figuring out and implementing a way to load custom images similar to the original ones (containing sections and a color palette).

Nearing the finsh

Hi, it’s been another busy week and the finish line for the Supernova2 engine is pretty near. Last time I wrote a post I had 22 out of 71 rooms implemented. Now I have finished implementing all of the rooms and the only thing missing is the outro. The game is playable from start to end, so my work for the next week will at first be, implementing the outro, then a lot of bughunting and once I can’t find any more bugs, preparing to and eventually merge the supernova and supernova2 engines.

Implementing rooms is pretty boring and repetitive task, but still one interesting thing happened. There is a part of the game, where the player is supposed to steal a dinosaur skeleton from a museum. There are pressure sensors, cameras and a security guard wandering around the museum and player has to evade all of that in order to not get caught and not trigger the alarm. The implementation of the alarm sound in the original code is pretty simple, it’s a few rows of assembly using the OUT instruction to send frequencies between 1500 Hz and 1800 Hz and thus make a siren sound. So there isn’t any file containing the sound and because of that it has to be generated. So I tried using the PCSpeaker class which can play tones of given frequency. Using the class I managed to play the sound, but unfortunately between every tone, there was a slight “tick” sound. So the solution was to write my own code to generate a raw sound stream with the siren sound. Should be easy, right? Well, it is, but because I have no experience with sound it took me longer than it should. At first I was pretty surprised that generating a stream that has every byte of the same value produces no sound. Fortunately it was enough to google “raw sound” (which took me a few hours) and then I remembered I actually took course this winter semester which part was about this. So I wrote the code just to end up with the same problem I had when using the PSSpeaker class, a “tick” sound between every tone change. I had to higher the audio rate up to 80000 (a lot for just a siren sound) to get rid of it. But thanks to Criezy’s advice we managed to modify the code so it is able to generate a good enough siren sound even on 44000 audio rate.