WME TODO: Notes for posterity (including future me)

Here is a list of todos and ideas for further improvement of the WME engine:

1. The new patches contributed by yours truly could probably benefit from some optimization, performance-wise.

In particular,

  • bilinear blitting is very naive, plus it’s all float, so it has large margins for optimization simply by replacing it with a well-known optimized algorithm.
  • Multirect could probably benefit from some fine tuning;
  • It’s a matter of finding the best balance between the additional cost of multirect and the cost of going through tickets and blitting.

On the other hand

  • ui_tiled shouldn’t really require any further efforts unless some weird bug pops up.
  • blending could have some margin for improvement, but it’s basically pretty much fixed as it is.
2. The debugger is a bit of a hack as it is.

It is by far my least-favorite piece of code in there.

The thing is, we don’t really have a compiler giving us debug symbols; instead what the compiler does (and always does, even in released games) is adding a special II_DBG_LINE instruction that keeps a special register up-to-date with the last line traversed.

What we do is watch for those instructions and infer everything from there, including keeping a history of the watched variables and inspecting them to see if they’ve changed.

Among other things, the original debugger does not have a watch feature, as it’s very awkward to do.

What one should do then is hack the compiler in order to have symbols and easily improve on the debugger.

Another interesting side effect of hacking the compiler would be the ability to inject code.

If nothing else, it would help to have the compiler bundle the source file with the code and/or adding a checksum, to avoid copying source files by hand (please note that ATM we have NO way of knowing if the source file matches bytecode or if it’s something completely different, at least some support for generating checksums should be useful and easily added. I’ve found myself wondering what the hell was going on more than once, only to find that my sources were outdated)

Apart from that, I guess there is room for some further polishing/refactoring anyway (it’s become slightly uglier than I’d like over time).

In particular, it would help to improve the command line and let it accept e.g. hex values.

3. Fonts. We tend to have ugly fallback fonts. Find why and fix it. Would we perhaps be interested in selectively applying bilinear on bitmapped fonts only?
  1. Also, results from profiling show three areas of interest performance wise:
  • FMVs are sometimes slower and uglier than they should be.
    Software scaling ain’t terribly efficient.
  • The particle engine can benefit from some optimization, perhaps akin to what’s been done with ui_tiled image to lessen the load on the ticket system.
    It is heavy and it plays an integral role in the speed of J.u.l.i.a.
  • Fonts, again, are sometimes at least as heavy as their ui_tiled backgrounds.
    Perhaps caching text as well, as has been done for the backgrounds, to reduce load?

Apart from this, there are a few TODO’s lingering all over the code which can be taken care of with a quick fix, most of them pre-existing, some of them new, just grep it.

What’s next

Well, looks like GSOC is coming to a conclusion.
These will be the last 10 days of it.

The actual code is basically in place and, so I’m told, not that far from mergeability.

At least one feature-branch allows for glitch-less completion of the two main games I initially set on supporting (Rosemary and Dirty Split), and I hope I can test more (there’s Carol Reed and a bunch of others).

While we are at it: Dirty Split is a FANTASTIC game and I deeply regret having to play it with a walkthrough because of time constraints.
It’s beyond awesome. Everything there is to love in a graphic adventure is there.
Do yourself a favor and check it out (and also, the Carol Reed games – also because these people are INCREDIBLY supportive).

What I’ll be doing next will be:

* Producing a profiling document (I’m midway through data gathering).
This will keep me busy until at least Sunday and will presumably help posterity in further improving performance.

* Producing a TODO-list re:features, bugs and optimizations which I haven’t been able to squeeze in yet.
I’ll most likely do this in the final day before the soft-deadline – that is, Sunday night or Monday.

Then during the last week I’ll probably be busy doing some emergency fixes and I’ll be doing some cleanup, documenting and refactoring, plus squeezing some playtesting here and there (playtesting is an excellent activity to do when you are too cooked to do anything else).

I expect documentation-writing to come last (did I mention I also write documentation like a cow?), so I’ll probably be doing that during the very last couple of days, with cleanup of the actual code and refactoring taking the first half of the week.

There’s not *that* many obvious targets for refactoring, in fact – the code is quite clean, plus some has already been done in-branch.
I still hope I’ll be able to bring it up a notch or two.


Let-s spend a few words on blending.

Blending is kind of like bringing blitting a step further.

We don’t want to just copy a pixel from source to destination, but we want to combine the source with the destination in peculiar ways.

The most obvious is transparency – we want to overlay the two colors so that you can still see an hint of the previous color.

But we may also want to perform additive blending, which is kind of what happens when you project two cones of light on the same surface.

Subtractive blending is the opposite. It’s like when you have two differently-colored light sources and two objects create two shadows. When they meet, you have subtractive blending.

WME, in addition to opaque blitting, supports “regular” alpha blending, additive blending and subtractive blending.
Each can have a color modifier as a nice added feature – basically, the input sprite gets multiplied by the colorMod, which acts kind of like a pair of colored lenses.

Of course, the original uses the ready-made blending options provided by DirectX.
We, on the other hand, must roll our own.

Here’s some examples.
For starters, subtractive blending for a single pixel.
There are two special cases which are useful optimization-wise but could be stripped away.

void BlittingTools::blendPixelSubtractive(byte *ina, byte *inr, byte *ing, byte *inb, byte *outa, byte *outr, byte *outg, byte *outb) {
     * in[argb] = input pixel, a,r,g,b channel;
     * out[argb] = output pixel, a,r,g,b channel;
        if (*ina == 0) { // Fully transparent... let's not waste time on this.
        } else if (*ina == 255) { // Fully opaque. Forget
            *outa = *outa;
            *outr = MAX(*outr - *inr, 0);
            *outg = MAX(*outg - *ing, 0);
            *outb = MAX(*outb - *inb, 0);
        } else {
            *outa = *outa;
            *outb = MAX(*outb - (*inb * *ina >> 8), 0);
            *outg = MAX(*outg - (*ing * *ina >> 8), 0);
            *outr = MAX(*outr - (*inr * *ina >> 8), 0);
            // * ina >> 8 is kind of like * ina / 255, ina
            // goes from 0 - 255, so you end up multiplying by 0...1

Another example, additive with colormod.

void BlittingTools::blendPixelAdditive(byte *ina, byte *inr, byte *ing, byte *inb, byte *outa, byte *outr, byte *outg, byte *outb, byte *ca, byte *cr, byte *cg, byte *cb) {
        if (*ca != 255) {
            *ina = *ina * *ca >> 8;
        if (*ina == 0) {
        } else if (*ina == 255) {
            if (*cb != 255)
                *outb = MIN(*outb + ((*inb * *cb * *ina) >> 16), 255);
                *outb = MIN(*outb + (*inb * *ina >> 8), 255);
            if (*cg != 255)
                *outg = MIN(*outg + ((*ing * *cg * *ina) >> 16), 255);
                *outg = MIN(*outg + (*ing * *ina >> 8), 255);
            if (*cr != 255)
                *outr = MIN(*outr + ((*inr * *cr * *ina) >> 16), 255);
                *outr = MIN(*outr + (*inr * *ina >> 8), 255);

Tiled Surfaces

Okay, today’s entry is about optimizing tiled surfaces by caching them.
Here’s the deal – we have tiled surfaces, surfaces where a little square is repeated all over.

It’s great for backgrounds, as people who had a Geocities page in 1999 know (if you did not:

An example of a tiled image is the default system menu:

Another is dialogue balloons in Rosemary:

They are heavy.

Each tile is handled more or less as a stand alone surface at every redraw, which means: huge overhead.

The fix is: cache them.

This involves hacking the renderer a bit so that it draws to a “dummy” surface and changing the UITIledImage code accordingly:

bool BaseRenderOSystem::startSpriteBatch(bool swap, int width, int height) {
    if (swap) {
        _swapped = true;
        _auxSurface = _renderSurface;
        _renderSurface = new
        _renderSurface->create(width, height, _auxSurface->format);
    _spriteBatch = true;
    _batchNum = 1;
    return STATUS_OK;
bool BaseRenderOSystem::endSpriteBatch(bool swap) {
    if (_swapped) {
        _swapped = false;
        Graphics::Surface *temp;
        temp = _renderSurface;
        _renderSurface = _auxSurface;
        _auxSurface = temp;
    _spriteBatch = false;
    _batchNum = 0;
    return STATUS_OK;
// ....
bool UITiledImage::display(int x, int y, int width, int height) {
    int tileWidth = _middleMiddle.right - _middleMiddle.left;
    int tileHeight = _middleMiddle.bottom -;
    int nuColumns = (width - (_middleLeft.right - _middleLeft.left) - (_middleRight.right - _middleRight.left)) / tileWidth;
    int nuRows = (height - (_upMiddle.bottom - - (_downMiddle.bottom - / tileHeight;
    int col, row;
    assert (width != 0);
    assert (height != 0);
    // Below is the new if block, which is executed only once:
    if (_cache == nullptr || width != _width || height != _height) {
        // This as a side effect enables the special "dummy" rendering mode
        _gameRef->_renderer->startSpriteBatch(true, width, height);
        int x = 0;
        int y = 0;
        _width = width;
        _height = height;
        // top left/right
        _image->_surface->displayTrans(x,                                                       y, _upLeft);
        _image->_surface->displayTrans(x + (_upLeft.right - _upLeft.left) + nuColumns * tileWidth, y, _upRight);
        // Etc, same for SW, NW, NE tiles
        // tiles
        if (nuRows > 0 && nuColumns > 0) {
            yyy = y + (_upMiddle.bottom -;
            xxx = x + (_upLeft.right - _upLeft.left);
            _image->_surface->displayTrans(xxx, yyy, _middleMiddle);
            _image->_surface->repeatLastDisplayOp(tileWidth, tileWidth, nuColumns, nuRows);
        // This is where we retrieve the complete image
        _cache = _gameRef->_renderer->getAuxSurface();
    Rect32 dst; = 0;
    dst.left = 0;
    _cache->displayTrans(x, y, dst);
    return STATUS_OK;

Then we store that surface and when we need to redraw the UITiledImage we simply use the cached version: single drawing call, less overhead.

Multi rects

In our last post we were talking about the dirty rect system.
The simplest implementation uses a single rectangle; it can save us quite some unnecessary blitting already, but there is some dangers to be taken into

account that could spoil our grand plan.

For example, what happens if a character is moving in the SE corner of the screen and I’m moving the mouse cursor in the NW corner?

What would be the screen area to be considered “dirty” and thus redrawn?

Yu guessed it – basically, the whole screen.


This was in fact the way dirty recting was originally implemented in the WME engine for ScummVM:

void BaseRenderOSystem::addDirtyRect(const Common::Rect &rect) {
  if (!_dirtyRect) {
    _dirtyRect = new Common::Rect(rect);
  } else {

That’s it, one rect, extend it as needed.

We needed multiple dirty rects.

Whenever we “dirty” a screen area, we need to mark that separately.
But a simple array is probably not enough.

For example, what happens if multiple partially overlapping sprites are to be redrawn and we naively traverse the array?

We redraw the same area over and over again; if the rects are big enough, we may end up doing *more* blitting than without dirty rects, *more* blitting than

with a single rect.

Also bad, very, very bad.

Even if unnecessary redrawing is avoided, needlessly big rect lists can be a problem.

I chose to address the issue with a special class that:

a) Takes requests for adding rects…
a’) …while doing some on-the-fly-optimizations (e.g. discarding duplicates)
b) Can spit out, upon request, an optimized version with only the necessary rectangles.

It is still in the works so this is not final code, but the basic idea is definitely there.
Have a look – code is partial and amended as usual, but you don’t want to know about all the boring housekeeping and utility methods, do you?

namespace Wintermute {
DirtyRectContainer::DirtyRectContainer() {
    _disableDirtyRects = false;
    _clipRect = nullptr;
DirtyRectContainer::~DirtyRectContainer() {
    delete _clipRect;
void DirtyRectContainer::addDirtyRect(const Common::Rect &rect, const Common::Rect *clipRect) {
    if (_clipRect == nullptr) {
        _clipRect = new Common::Rect(*clipRect);
    } else {
    if (target > kMaxInputRects) {
        _disableDirtyRects = true;
    } else if (isHuge(&rect)) { // More than 90% of the viewport
        _disableDirtyRects = true;
    } else if (rect.width() == 0 || rect.height() == 0) {
    } else {
        _rectArray.insert_at(target, tmp);
Common::Array DirtyRectContainer::getOptimized() {
    if (_disableDirtyRects) {
        return getFallback();
    Common::Array ret;
    Common::Array queue;
    for (int i = 0; i < _rectArray.size(); i++) {
        queue.insert_at(queue.size(), _rectArray[i]);
    int j = 0;
    while (j < queue.size()) { Common::Rect *candidate = queue[j]; if (candidate->width() == 0 || candidate->height() == 0) {
            // We have no use for this
        if (ret.size() > kMaxOutputRects) {
            _disableDirtyRects = true;
            return getFallback();
        bool discard = false;
        // See if it's contained or containes
        for (uint i = 0; i < ret.size() && !discard; i++) { Common::Rect *existing = ret[i]; if (existing->contains(*candidate) || existing->equals(*candidate)) {
                // It is already contained by an existing rect - useless!
                discard = true;
            if (candidate->contains(*(existing))) {
                // Contains an existing one.
                // Extend the pre-existing one and discard this.
                discard = true;
            // Okay, we now see if we have an overlapping corner and slice accordingly.
            Common::Rect intersecting = existing->findIntersectingRect(*candidate);
            // We have to remove intersecting and enqueue the rest
            // We know that it's not a simple contained rect, we know that
            // it's not a cross-shaped thing like above, we know that it's not
            // a Commodore-logo type of deal (where the C is the bigger rect)
            // So intersecting is either the NE, SE, SW or NW corner of candidate
            if (intersecting.width() >= kMaxSplicingX &&
                    intersecting.height() >= kMaxSplicingY
               ) {
                if ( >= candidate->top &&
                        intersecting.left >= candidate->left &&
                        intersecting.bottom == candidate->bottom &&
                        intersecting.right == candidate->right
                   ) { // SE case
                    Common::Rect *neslice = new Common::Rect(*candidate);
                    neslice->bottom =;
                    neslice->left = intersecting.left;
                    if (neslice->width() != 0 && neslice->height() != 0) {
                        queue.insert_at(queue.size(), neslice);
                    candidate->right = intersecting.left;
                    if (candidate->width() == 0) {
                        discard = true;
                } else if () //SAME FOR NE 
                { // ...
                } else if () //SAME FOR NW
                { // ...
                } else if () //SAME FOR SW 
                { // ...
                } else if (existing->left <= candidate->left &&
                           existing->right >= candidate->right &&
                           existing->top >= candidate->top &&
                           existing->bottom <= candidate->bottom) { // Cross shaped intersections
                    Common::Rect *top_slice = new Common::Rect(*candidate);
                    Common::Rect *bottom_slice = new Common::Rect(*candidate);
                    top_slice->bottom = existing->top;
                    bottom_slice->top = existing->bottom;
                    if (top_slice->height() > 0 && top_slice->width() > 0) {
                        queue.insert_at(queue.size(), top_slice);
                    if (bottom_slice->height() > 0 && bottom_slice->width() > 0) {
                        queue.insert_at(queue.size(), bottom_slice);
                    discard = true;
                } else if () // Same for vertical vs horizontal cross intersection
            } // End of intersecting test
        } // End loop
        if (discard) {
        } else {
            if (isHuge(candidate)) {
                _disableDirtyRects = true;
                return getFallback();
            ret.insert_at(ret.size(), candidate);
    } // End loop
    return ret;

The Dreaded Rosemary Bug

The Rosemary Bug, as we call it, is the first bug I’ve run into in ScummVM.

That means – after 10 years I found a bug in an unsupported, unstable, unreleased (in builds) piece of code, which must mean something about the QA at ScummVM.

It caused the main character to appear “jigsawed” when looking east AND you crossed her with the mouse pointer.
Like this:

Not pretty, huh?

This is due to the dirty rect system.
Basically, the idea is that normally you don’t need to redraw ALL of the rectangle containing the sprite, but just the little square where the cursor was and is no more.
This applies not only to the cursor, of course, but to ANY need that may arise to redraw part of a sprite that’s been obscured by something smaller without redrawing the rest, which stays unchanged.

E.g. if your game had a lunar eclipse, when it’s over you would need to redraw the moon, not the whole sky.

Now, there is a problem.

Consider this example:

1) Our background sprite, painted in our viewport:

2) An eclipse! Another sprite (the black disc) is drawn on top of the first one.

3) For the eclipse to be over, we only need redraw the bit “ruined” by the black disc; everything else is *already* on the viewport.
We take the area that corresponds to the pink square from the original sprite and paint it over:

4) All is fine!

Everything is okay, but what if a sprite is mirrored – that is , it has a bit set that tells the engine “always draw this flipped”?
Look what happens if we don’t take special measures:

1) We draw the original texture FLIPPED.


3) Now, as before, we take the area that corresponds to the pink square from the original texture (which we had drawn flipped before)


Uh, oh.

Mike Oldfield is not happy.

We need to “mirror” the offsets too for dirty rects to work with mirrored sprites, of course.

That’s what was happening with Rosemary, in transparent_surface.cpp:

Common::Rect TransparentSurface::blit(Graphics::Surface &target, int posX, int posY, int flipping, Common::Rect *pPartRect, uint color, int width, int height) {
    int ca = (color >> 24) & 0xff;
    Common::Rect retSize; = 0;
    retSize.left = 0;
    // Check if we need to draw anything at all
    if (ca == 0)
        return retSize;
    int cr = (color >> 16) & 0xff;
    int cg = (color >> 8) & 0xff;
    int cb = (color >> 0) & 0xff;
    // Compensate for transparency. Since we're coming
    // down to 255 alpha, we just compensate for the colors here
    if (ca != 255) {
        cr = cr * ca >> 8;
        cg = cg * ca >> 8;
        cb = cb * ca >> 8;
    // Create an encapsulating surface for the data
    TransparentSurface srcImage(*this, false);
    // TODO: Is the data really in the screen format?
    if (format.bytesPerPixel != 4) {
        warning("TransparentSurface can only blit 32 bpp images");
        return retSize;
    if (pPartRect) {
        srcImage.pixels = getBasePtr(pPartRect->left, pPartRect->top);
        srcImage.w = pPartRect->width();
        srcImage.h = pPartRect->height();
    if (width == -1)
        width = srcImage.w;
    if (height == -1)
        height = srcImage.h;
    Graphics::Surface *img = nullptr;
    Graphics::Surface *imgScaled = nullptr;
    byte *savedPixels = nullptr;
         * et cetera
         * ...

This is how it’s been fixed:

if (pPartRect) {
    int xOffset = pPartRect->left;
    int yOffset = pPartRect->top;
    if (flipping & FLIP_V) {
        yOffset = srcImage.h - pPartRect->bottom;
    if (flipping & FLIP_H) {
        xOffset = srcImage.w - pPartRect->right;
    srcImage.pixels = getBasePtr(xOffset, yOffset);
    srcImage.w = pPartRect->width();
    srcImage.h = pPartRect->height();

You can now move the cursor over Rosemary all you want – it’s not gonna break ?

The debug console

So, we were talking about the debugger.

We’ve seen how hooks are inserted into the execution unit, let’s see how you interact with the debugger.
We extend Common::Console, which already provides some very minimal functionality.
It mainly enables us to easily bind commands to functions, tokenizing arguments on our behalf and giving us C-style argc/argv.
It works like this:

Console::Console(WintermuteEngine *vm) : GUI::Debugger() {
    DCmd_Register("print_foo", WRAP_METHOD(Console, Cmd_PrintFoo));
    DCmd_Register("do_bar", WRAP_METHOD(Console, Cmd_DoBar));
// ...
bool Console::Cmd_AddBreakpoint(int argc, const char **argv) {
    return true;

Then, we proceed to parse user input on our own.

Our methods must return true or false – true for keeping the console open, false for closing it.

Ideally, for simple tasks, we could work on the engine directly from there, e.g.:

DCmd_Register( ....  Cmd_SkipInstruction));
Cmd_SkipInstruction (...) {
    // usage: skip threadnumber
    return true;

This could get messy fast, though.
What we have going on here, though, is a MVA sort of pattern, which is more or less like its more famous brother MVC, except that the +** is bidirectional – the adapter can fiddle with the model and vice versa – the reason is decoupling and ease of mantainance:




We have a middle layer called the DebuggerAdapter that works on the script engine for us, while we need to knownothing about the script engine itself in the console proper.

The header looks kinda like this:

namespace Wintermute {
class DebuggerAdapter {
    DebuggerAdapter(WintermuteEngine *vm);
    // Called by Script (=~Model)
    bool triggerBreakpoint(ScScript *script);
    bool triggerStep(ScScript *script);
    bool triggerWatch(ScScript *script, const char *symbol);
    // Called by Console (~=View)
    int addWatch(const char *filename, const char *symbol);
    int addBreakpoint(const char *filename, int line);
    // ...
    int stepOver();
    int stepInto();
    int stepContinue();
    // ...
    SourceFile *_lastSource;
    bool compiledExists(Common::String filename);
    void reset();
    WintermuteEngine *_engine;
    int32 _lastDepth;
    ScScript *_lastScript;
    int32 _lastLine;

An example of how it works:

// in debugger.cpp:
bool Console::Cmd_AddBreakpoint(int argc, const char **argv) {
     * Add a breakpoint
    if (argc == 3) {
        int error = ADAPTER->addBreakpoint(argv[1], atoi(argv[2]));
        if (!error) {
            DebugPrintf("%s: OK\n", argv[0]);
        } else if (error == NO_SUCH_SCRIPT) {
            Common::String msg = Common::String::format("no such script: %s, breakpoint NOT created\n", argv[1]);
            debugWarning(argv[0], ERROR, msg);
        } else if (error == NO_SUCH_SOURCE) {
            Common::String msg = Common::String::format("no such source file: %s\n", argv[1]);
            debugWarning(argv[0], WARNING, msg);
        } else if (error == NO_SUCH_LINE) {
            Common::String msg = Common::String::format("source %s has no line %d\n", argv[1], atoi(argv[2]));
            debugWarning(argv[0], WARNING, msg);
        } else if (error == IS_BLANK) {
            Common::String msg = Common::String::format("%s:%d looks like a comment/blank line.\n", argv[1], atoi(argv[2]));
            debugWarning(argv[0], WARNING, msg);
        } else {
            Common::String msg = Common::String::format("Error code %d", error);
            debugWarning(argv[0], WARNING, msg);
    } else {
        DebugPrintf("Usage: %s   to break at line  of file \n", argv[0]);
    return true;

// in debugger_adapter.cpp
int DebuggerAdapter::addBreakpoint(const char *filename, int line) {
    // TODO: Check if file exists, check if line exists
    if (!compiledExists(filename)) {
        return NO_SUCH_SCRIPT;
    int isLegal = isBreakpointLegal(filename, line);
    if (isLegal == OK) {
        SCENGINE->addBreakpoint(filename, line);
        return OK;
    } else if (isLegal == IS_BLANK) {
        // We don't have the SOURCE. A warning will do.
        SCENGINE->addBreakpoint(filename, line);
        return IS_BLANK;
    } else if (isLegal == NO_SUCH_SOURCE) {
        // We don't have the SOURCE. A warning will do.
        SCENGINE->addBreakpoint(filename, line);
        return NO_SUCH_SOURCE;
    } else if (isLegal == NO_SUCH_LINE) {
        // No line in the source A warning will do.
        SCENGINE->addBreakpoint(filename, line);
        return NO_SUCH_LINE;
    } else {
        // Something weird? Don't do anything.
        return isLegal;

This is a somewhat trivial case since SCENGINE gives us addBreakpoint and we simply wrap it, but DebuggerAdapter does get more complex.

This way, when and if we want a prettier user interface, we don’t need to worry (not too much, at least) about breaking functionality, and if we decide to restructure the script engine and / or the way the adapter interacts with it, we don’t need to worry too much about rendering the interface unusable.

This could actually turn out useful if somebody decides to stitch together a graphical debugger – just code the actual GUI and add water.

One cool thing our interface does is displaying source files – we’ll talk about it in the next installment.


Mid term roundup

So, midterm is upon us.
It’s been a few really challenging, but also fun weeks where only a couple of times I found myself really stuck.
I am surprisingly more-or-less on schedule, which means that two new, cool features are on their way to players and developers.

Let’s talk about them.

The first one, as we mentioned before, is rotozoom.
The branch is basically ready (I still may have to fix just a couple of non-important few lines, but it’s there) , which means that WME games get sprite rotation+scaling suypport on ScummVM:

Which in turn means that at least a puzzle in one game, J.u.l.i.a, is now playable.
The J.u.l.i.a greenlight demo, available here, is playable to completion thanks to the patch:
I’m not sure which ones, but I guess there are or will be other games that use rotation around.
They are now playable on ScummVM, bar other issues.

The second one is the debugger.

It’s not quite as production ready as the rotozoom feature, but the bulk is most definitely there, so it’s a matter of polishing it:

You can set breakpoints, watch variables, step-execute (both step over and step into), import source code if available (if not, you’ll have a hard time using the debugger anyway) and dump variables and engine resources.

All you need to debug your games on ScummVM+WME and, more importantly, very useful in debugging the engine itself.
Will turn out rather handy post-midterm, where the real fun (for arbitrary values of “fun”) is.

Post-midterm, I plan on putting the finishing touches on the debugger, obviously, but also doing some bug hunting (for example, by playing popular WME games) and some doing refactoring and profiling+optimizing.

Now, there are quite a few areas of interest there.

Let’s see them.

I have learned by now that the biggest culprit for performance issues (and a few bugs) is the rect-system + the ticket system.
The latter, especially, is a rather complex business that is not present in the original engine, which uses a SDL backend, that does a *lot* of comparisons.

For starters, there is a known , aka “the Rosemary bug” (as it is evident in Rosemary when the main character walks right-to-left) that seems to be triggered by mirrored sprites.
For some reason, the sprite gets “jigsawed” WHEN dirty rects are enabled.
This may mean a bug in the dirty rect system.

The rect system also allows for optimization in the form of support for multiple rects.
Which means that, basically, at the moment, if two single pixels are updated, the whole rectangular area encompassing them is redrawn (bad for performance).
We would hope to just work with two 1px-wide rectangles.

The ticket comparison (which is carried out by the ticket system to recognized duplicate rendering tickets) might also be optimized further, but that’s something that will be considered after profiling.

Something that may yield better results is avoiding *abusing* the ticket system by prerendering stuff that would result in *huge* amounts of requests – namely, UITiledImage, which results in a request for every single tile.
Most evident in the system menu.

TransparentSurface my benefit from optimizing too; my bilinear-filtered copy function certainly does not help, as it is far from optimized.
To have good performance with that either the rect system must be made a lot better, or that must be optimized.

There is also some refactoring to be done in there.
For starters, there are cases where handy things provided by scummvm are not used, e.g. obvious things like const char*’s everywhere instead of Common::String.
Then there is some leftover code here and there, and a few paths *seem* to be modelled around implementation specific things of the original WME (the whole graphic stack *may* be it), so maybe one could try to see if there is a simpler way of handling it.

I hope the next half will be as fun and as fulfilling as the first half.

See you!

Introducing the debugger

I have been cooking up a little debugger for a while.
Now, writing a debugger for the Wintermute Engine is a bit weird, because things do work in a slightly peculiar fashion.
For example, you can have no debugging symbols, because that would mean hacking the compiler, which is not part of the ScummVM package.
But, even more interesting: you don’t need any.

Here’s the deal: each script runs inside its own “VM” – let’s call it that – with its iP, stack and execution unit.
There is a scheduler that handles the various VMs, creates new ones (e.g. on event) and kills off those that have finished the execution of their scripts.
The “execution unit” is a method that gets called once per instruction, evaluates the instruction and advances the IP.

Like this:

bool ScScript::executeInstruction() {
    // ... 
    ScValue *op1; 
    ScValue *op2;
    uint32 inst = getDWORD();
    switch (inst) {
    case II_DEF_VAR:
        // ... 
    case II_DEF_GLOB_VAR:
        // ... 
    case II_DEF_CONST_VAR: {
        // ...
    case II_RET:
        // ...
        // ...
    case II_ADD:
        op2 = _stack->pop();
        op1 = _stack->pop();
        if (op1->isNULL() || op2->isNULL()) {
        } else if (op1->getType() == VAL_STRING || op2->getType() == VAL_STRING) {
            char *tempStr = new char [strlen(op1->getString()) + strlen(op2->getString()) + 1];
            strcpy(tempStr, op1->getString());
            strcat(tempStr, op2->getString());
            delete[] tempStr;
        } else if (op1->getType() == VAL_INT && op2->getType() == VAL_INT) {
            _operand->setInt(op1->getInt() + op2->getInt());
        } else {
            _operand->setFloat(op1->getFloat() + op2->getFloat());
    case II_SUB: {
        // ...
    case II_DBG_LINE: {
        // ...
        _gameRef->LOG(0, "Fatal: Invalid instruction %d ('%s', line %d, IP:0x%x)\n", inst, _filename, _currentLine, _iP - sizeof(uint32));
        _state = SCRIPT_FINISHED;
        ret = STATUS_FAILED;
    return ret;

So do we do?
We put hooks in there, ideally.

This way, each time an instruction is executed we can look around and see what’s happening.
We can implement watch, breakpoints and stuff.
Oh, but then there’s the problem of knowing where the hell we are first.
Okay – watching can easily implemented all the same, by keeping a watch list and keeping track of what has changed since last time, but breaking?
We have no debug symbols, so how do we know WHERE we are?
Interestingly, the WME team has been so kind as to add a special instruction to the instruction set: II_DBG_LINE.
Guess what it does?
It updates a counter, _line, in a way that it always contains the number of the current line.

What we do then is to place our hooks in a way to monitor that number and act accordingly – e.g. we have a list of breakpoints, if the current line is in there, we break – that is, we open the debugger window like this:

case II_DBG_LINE: {
    int newLine = getDWORD();
    if (newLine != _currentLine) {
        _currentLine = newLine;
    for (int j = 0; j < _engine->_breakpoints.size(); j++) {
        if (_engine->_breakpoints[j]._line == _currentLine &&
            !strcmp(_engine->_breakpoints[j]._filename.c_str(), _filename)) { 
                // This one breaks execution and does housekeeping.
                // We resume from here once user stops execution
    if (1) { 
        if (_callStack->_sP <= _step) { // This is set elsewhere, suffice saying that I may or may not want to enter a function past a certain call depth _adapter->triggerStep(this);
    if (1) { 
             for (int i = 0; i < _watchlist.size(); i++) {
                // Etc

Then, we wait for user input.
How this is handled we’ll see in the next post.

Interpolation made stupid

Here we are again.
We left talking about transforms and rotations.
See, all in there is good and well, but there is a slight problem in practice I hinted at.

If you remember we said how the fact that we are working in the discrete poses a problem: we have to approximate real coordintes of the destination point with integer coordinates – this, using the naive method, nearest neighbor approximation (aka: simply rounding to the next pixel) causes aliasing.

That’s where interpolation kicks in.

A warning for those unaware: once again, signal processing is terribly big field and everything has vast theoretical implications (and vast practical applications too, e.g. most concepts apply to audio processing as well).
We deal with the matter in an extremely practical fashion, hence the post title.

Now – specifically, we deal with bilinear transform.

Intuitively, the idea is: we try to infer what colour would be a pixel at the given coordinates if there actually were one.
We can’t add new information – this is immediately apparent by the fact that interpolated transforms are inevitably slightly “blurry” -, but we can eliminate the aliasing caused by abruptly rounding to the closest value.

If I am not mistaken, “abruptly” has a special meaning in this context, and means: “upper bound in frequency as per the Shannon-Nyquist theorem”:

Now, let’s get going. Look at this:

(You will excuse me for posting a pic of a whiteboard, but my parallel scanner from 1997 has apprently just died)

In the example, out projected source is (4/5, 4/5).
What would look like a pixel in (4/5, 4/5)?
We don’t really know, but we can estimate its color to be a weighted average of its neighbors.

So, the math (from Wikipedia):

And now let’s see some code from transparent_surface.cpp – this time you don’t get the pseudocode since it’s really a matter of carrying out the above and checking for a few edge cases and avoiding division by zero:

void TransparentSurface::copyPixelBilinear(float projX, float projY, int dstX, int dstY, const Common::Rect &srcRect, const Common::Rect &dstRect, const TransparentSurface *src, TransparentSurface *dst) {
            int srcW = srcRect.width();
            int srcH = srcRect.height();
            int dstW = dstRect.width();
            int dstH = dstRect.height();
            assert(dstX >= 0 && dstX < dstW);
            assert(dstY >= 0 && dstY < dstH);
            float x1 = floor(projX);
            float x2 = ceil(projX);
            float y1 = floor(projY);
            float y2 = ceil(projY);
            uint32 Q11, Q12, Q21, Q22;
            if (x1 >= srcW || x1 < 0 || y1 >= srcH || y1 < 0) { 
                Q11 = 0;
            } else {
                Q11 = READ_UINT32((const byte *)src->getBasePtr((int)(x1 + srcRect.left),(int)(y1 +;
            if (x1 >= srcW || x1 < 0 || y2 >= srcH || y2 < 0) { 
                Q12 = 0;
            } else {
                Q12 = READ_UINT32((const byte *)src->getBasePtr((int)(x1 + srcRect.left), (int)(y2 +;
            if (x2 >= srcW || x2 < 0 || y1 >= srcH || y1 < 0) { 
                Q21 = 0;
            } else {
                Q21 = READ_UINT32((const byte *)src->getBasePtr((int)(x2 + srcRect.left), (int)(y1 +;
            if (x2 >= srcW || x2 < 0 || y2 >= srcH || y2 < 0) { 
                Q22 = 0;
            } else {
                Q22 = READ_UINT32((const byte *)src->getBasePtr((int)(x2 + srcRect.left), (int)(y2 +;
            byte *Q11s = (byte *)&Q11;
            byte *Q12s = (byte *)&Q12;
            byte *Q21s = (byte *)&Q21;
            byte *Q22s = (byte *)&Q22;
            uint32 color;
            byte *dest = (byte *)&color;
            float q11x = (x2 - projX);
            float q11y = (y2 - projY);
            float q21x = (projX - x1);
            float q21y = (y2 - projY);
            float q12x = (x2 - projX);
            float q12y = (projY - y1);
            if (x1 == x2 && y1 == y2) {
                for (int c = 0; c < 4; c++) {
                    dest[c] = ((float)Q11s[c]);
            } else {
                if (x1 == x2) { 
                    q11x = 0.5; 
                    q12x = 0.5; 
                    q21x = 0.5; 
                } else if (y1 == y2) { 
                    q11y = 0.5; 
                    q12y = 0.5; 
                    q21y = 0.5; 
                for (int c = 0; c < 4; c++) {
                    dest[c] = (byte)(
                                ((float)Q11s[c]) * q11x * q11y +
                                ((float)Q21s[c]) * q21x * q21y +
                                ((float)Q12s[c]) * q12x * q12y +
                                ((float)Q22s[c]) * (1.0 - 
                                                     q11x * q11y - 
                                                     q21x * q21y - 
                                                     q12x * q12y)
            WRITE_UINT32((byte *)dst->getBasePtr(dstX + dstRect.left, dstY +, color);