Project update

Hello, last week I have been making some finishing touches to the TTS project. I improved voice mapping in the Mortvile Manor game, because I used to use pitch values from the original game to map the characters to different voices. That would mean for example that if a user has only 3 female voices available, both female characters would map to the same voice. So I added an array of voice indices, so the voices map as evenly as possible. I improved TTS in GUI, but that still needs some more work. But the biggest addition to the TTS actually isn’t from me, because Criezy implemented TTS for Mac OS, which I couldn’t do (I don’t own a Mac).

Because TTS is almost finished, I started to work on my next project, which is adding a way to convert between different character encoding. Right now, if you need to convert between encodings, you can:

  • use the U32String class to convert between a few encodings.
  • use TransMan (if TransMan is enabled) to convert between unicode and current GUI charset
  • use platform specific code (SDL, Win32 and Mac have their own ways of converting encodings).
  • implement your own conversion (The most reliable, probably even after I finish this project).
This seems quite messy and confusing to me, we probably don’t want each engine to implement it’s own encoding conversions. So I am adding an option to compile ScummVM with iconv, which is a library, that specializes in encoding conversion. The plan is to first try to use iconv, then the platform specific conversion code and as a last resort use the TransMan to try to convert the string either to the GUI encoding or UTF-32 and try to convert that to the final encoding.

Mortville Manor TTS

Hello, in the past week, I worked some more on the TTS project. First I finished the GUI TTS, I needed to make sure, that everything behaves the same way on Windows and on Linux and I had to make sure I free everything I allocate. I even had to implement reference counting for a class, which wasn’t that hard, but it still was pretty interesting, because I have never needed to do this. I also had to implement conversion of the text being said to UTF-8. On Windows this involved adding a table to the code, which would allow looking up codepage identifier by its name. On Linux I just used SDL_iconv_string, which can convert from most of the GUI encodings, but unfortunately it seems, it cannot convert from CP850, which is apparently quite often used by game engines (it is used by the Mortevielle engine and also by the supernova engine). So right now I just convert the strings like if they were in ASCII (most of the encodings are just ASCII extensions, so I hope to successfully convert at least some strings). I am thinking about using the iconv library for handling the conversion, but that would mean another dependency needed for compiling the code.

After I was happy with the state the TTS was in and everything worked for the GUI, I moved to implementing TTS to the mortevielle engine. It took me some time to get oriented enough in the engine (partly because some functions have their original french names). But once I located the correct spot, the implementation was pretty easy.
The engine uses a table of character voice pitch (number > 5 is a female voice, < 5 is a male voice):

const int haut[9] = { 0, 0, 1, -3, 6, -2, 2, 7, -1 };

At first, I wanted to map each number to a different voice, which would be alright on Linux, because there are 4 voices for each gender (some characters would still be mapped to the same voice, but I think the result would be good enough). But unfortunately it’s quite unlikely, that on the other platforms the user is going to have that many different voices. So I ended up using only one voice for every character and just changing the pitch a little bit to make them at least a little bit different. If there is ever a situation, that there is no voice for a gender, I try to simulate that genders voice by taking a voice, that is available and changing its pitch a lot (higher for female, lower for male). It sounds quite comically, but with a little bit of fantasy it works.

Here are some results of the Mortevielle TTS, I wanted to show, that switching TTS states really works (that the TTS works as it should when switching between GUI and game).

Linux:

Windows:

ScummVM speaks

Hello readers, first I might have to apologize, because with how busy I was at the end of last week, I actually forgot, I should write a blog post. But at least now I have a lot more to show you.

The Mission Supernova project is near it’s end, I created a pull request with the work I have done on it and right now it waits to be reviewed by the ScummVM community. You can find the pull request here.

This allowed me to fully focus on my next project, which is the text to speech. First I implemented the TTS on Linux, which was pretty easy, because the speech-dispatcher API is quite simple to use (A lot simpler then SAPI from Microsoft). The code for that should be close to finished, I just need to add some checks in a few places. Then, so I can test the TTS, I implemented reading of text in the GUI (A blind user could use this to navigate through the menus and maybe play some game, which also uses the TTS to read text). I spent quite a while doing this, because I wasn’t sure where to put all the code the GUI needed for TTS, but I quite like the state it is at now. So after I finished the GUI modifications, I started to work on the Windows side of the TTS. This is quite challenging for me, because I am used to work on Linux with my pretty specific setup, so I quite often unsuccessfully tried to use my custom shortcuts I am used to on Linux, which was quite annoying. This was also my first Windows specific C++ code, so working with the COM, or seeing the Microsoft custom uuid operator was quite new for me. But even with all this, I have managed to implement a pretty good portion of the Windows TTS code, I don’t have all the features programmed yet, but some results can already be seen, or rather heard.

I guess this is enough of me writing, I promised to show you stuff, so here is how the ScummVM sounds like, when it speaks:

Linux:

This is using the speech-dispatcher with espeak-ng, other TTS engines should work as well (I have tried Festival and it worked). The voices are set by user in speech-dispatcher configuration, but programmer can make some adjustments to them.
Windows:
This is using the Microsoft SAPI, the voice selection depends on the selected language and the set of installed voices (user has to install them into windows first in order to appear in voice selection in ScummVM). As you can see, I for some reason have a Chinese voice installed, so I can show you that switching languages works. But I think the voice is supposed to be used for a different dialect or something like that, because it seems it doesn’t understand the text, because it only spells it instead of reading it as a whole

GSoC update

Hello, last week I almost finished the Supernova engine. All of the bugs I managed to find should be fixed, I added translations to most images and help files, I merged .dat files with translations for each game and tools to create them into one and I also added the “improved” mode. Right now the improved mode:
In the first game

  • Automatically dresses and undresses the player from space suit every time he walks through the airlock (after he first does this on his own).
  • Automatically closes one side and opens the other side of the airlock (after the player first walks through the airlock on his own).
In the second game
  • Enables using go verb alongside with push verb to interact with the puzzle tiles inside the pyramid (that way the player doesn’t have to click on push in between puzzle interactions and can focus only on solving the puzzle)
  • Enables using go verb alongside with push verb to interact with the password locked door inside the pyramid
The only thing left to do should be to create one last translated image and then create a pull request.

New project

Because I am at the finish of the supernova engine and there is still 6 weeks of GSoC left, I have to find another project to work on. The project I chose is add text to speech capability. I tried to find the best library to use for this, the dream would be something, that supports at least English, German and French with decent voice quality and runs on as many platforms as possible. Of course the dream didn’t happen, so the plan for now is to use SAPI for Windows and speech-dispatcher for Linux (these are the only platforms I can develop for). Finding a good Linux library wasn’t easy at all, the speech quality usually isn’t very good and quite often the only supported language is English, or there are some other issues, why I didn’t like the library. So we chose the speech-dispatcher, because it is a high level interface to whatever speech engine the end user chooses and so in the future it might enable the user to choose an engine, that doesn’t exist yet and sounds good.

Supernova images

Last week I pretty quickly finished the merge, there was only a bit of refactoring left. After that I worked on a tool that could recreate Mission Supernova datafiles containing images, which would allow us to translate them.

The MS datafile format is pretty simple and it reminds me of .bmp images (there is a header, after that palette and after that pixel data). At the first 4 bytes is size of the pixel data field. After that is 1 byte, which is the size of the palette. The palette follows after that. Pretty interesting thing is, that each color is brightened by the engine (shifted by 2 to the left), so we have to count with that when making a new image and generating palettes for them. This also means, that we loose a lot of colors (We for example aren’t able to encode the whole 256 colors of grayscale). After the palette is information about sections. Section is a image, that can get rendered on top of other sections (images). The information is basically only it’s size, location on the screen and location of it’s pixel data inside the file. Next is click field information. Click field is similar to section, it has size and location on the screen, but it doesn’t have image assigned to it. After this follows 2 to 258 bytes used to decompress the image. The last thing is the pixel data for all the sections.

The tool right now is semi-automated (I think automated enough). As it’s input it uses files containing palette, section and click field descriptions and a .bmp file for each section. It just copies the descriptions into the file after which it takes the pixel data from each .bmp image and copies them too. The thing, that isn’t automated is, that user has to tell the tool by a command line argument, where the pixel data of the images start (how many bytes of each .bmp image to skip)

I wrote the reasons, the tool is needed in the last post, but just to remind you. The main reason is, that we need to translate this image:

This writing basically means “You are almost there” and player is supposed to push certain letters using a clue on a note to open the door. A temporary solution was to translate the note to English (translating the password too), changing the password on the door and adding subtitle saying “You are almost there” when entering the room with this door.
When translating the image, I had quite a few problems with palette:

But after that I noticed, that the first 16 colors don’t change and the image palette starts on 17th color of the engines palette, so I shifted the palette by 16 colors (just adding blue to the front a few times, which isn’t used in this image) and I got this:

You probably noticed, that the picture is upside down and that there is a noticeable lighter circle around the writing. But with the help of dithering and mirroring the image in gimp I managed to translate the whole image:

And thanks to having the tool, I was able to add a better version of the translated ciphered image too:

Now I will fix some more bugs, me and my brother found while playing the game, then I will add an “improved” mode making some repetitive and annoying tasks less repetitive and less annoying.