Mid term roundup

So, midterm is upon us.
It’s been a few really challenging, but also fun weeks where only a couple of times I found myself really stuck.
I am surprisingly more-or-less on schedule, which means that two new, cool features are on their way to players and developers.

Let’s talk about them.

The first one, as we mentioned before, is rotozoom.
The branch is basically ready (I still may have to fix just a couple of non-important few lines, but it’s there) , which means that WME games get sprite rotation+scaling suypport on ScummVM:

Which in turn means that at least a puzzle in one game, J.u.l.i.a, is now playable.
The J.u.l.i.a greenlight demo, available here, is playable to completion thanks to the patch: http://steamcommunity.com/sharedfiles/filedetails/?id=122116943
I’m not sure which ones, but I guess there are or will be other games that use rotation around.
They are now playable on ScummVM, bar other issues.

The second one is the debugger.

It’s not quite as production ready as the rotozoom feature, but the bulk is most definitely there, so it’s a matter of polishing it: https://github.com/tobiatesan/scummvm/tree/38db3907a7a0e31a6ff71e2e2c89b0d91a476589

You can set breakpoints, watch variables, step-execute (both step over and step into), import source code if available (if not, you’ll have a hard time using the debugger anyway) and dump variables and engine resources.

All you need to debug your games on ScummVM+WME and, more importantly, very useful in debugging the engine itself.
Will turn out rather handy post-midterm, where the real fun (for arbitrary values of “fun”) is.

Post-midterm, I plan on putting the finishing touches on the debugger, obviously, but also doing some bug hunting (for example, by playing popular WME games) and some doing refactoring and profiling+optimizing.

Now, there are quite a few areas of interest there.

Let’s see them.

I have learned by now that the biggest culprit for performance issues (and a few bugs) is the rect-system + the ticket system.
The latter, especially, is a rather complex business that is not present in the original engine, which uses a SDL backend, that does a *lot* of comparisons.

For starters, there is a known , aka “the Rosemary bug” (as it is evident in Rosemary when the main character walks right-to-left) that seems to be triggered by mirrored sprites.
For some reason, the sprite gets “jigsawed” WHEN dirty rects are enabled.
This may mean a bug in the dirty rect system.

The rect system also allows for optimization in the form of support for multiple rects.
Which means that, basically, at the moment, if two single pixels are updated, the whole rectangular area encompassing them is redrawn (bad for performance).
We would hope to just work with two 1px-wide rectangles.

The ticket comparison (which is carried out by the ticket system to recognized duplicate rendering tickets) might also be optimized further, but that’s something that will be considered after profiling.

Something that may yield better results is avoiding *abusing* the ticket system by prerendering stuff that would result in *huge* amounts of requests – namely, UITiledImage, which results in a request for every single tile.
Most evident in the system menu.

TransparentSurface my benefit from optimizing too; my bilinear-filtered copy function certainly does not help, as it is far from optimized.
To have good performance with that either the rect system must be made a lot better, or that must be optimized.

There is also some refactoring to be done in there.
For starters, there are cases where handy things provided by scummvm are not used, e.g. obvious things like const char*’s everywhere instead of Common::String.
Then there is some leftover code here and there, and a few paths *seem* to be modelled around implementation specific things of the original WME (the whole graphic stack *may* be it), so maybe one could try to see if there is a simpler way of handling it.

I hope the next half will be as fun and as fulfilling as the first half.

See you!

Introducing the debugger

I have been cooking up a little debugger for a while.
Now, writing a debugger for the Wintermute Engine is a bit weird, because things do work in a slightly peculiar fashion.
For example, you can have no debugging symbols, because that would mean hacking the compiler, which is not part of the ScummVM package.
But, even more interesting: you don’t need any.

Here’s the deal: each script runs inside its own “VM” – let’s call it that – with its iP, stack and execution unit.
There is a scheduler that handles the various VMs, creates new ones (e.g. on event) and kills off those that have finished the execution of their scripts.
The “execution unit” is a method that gets called once per instruction, evaluates the instruction and advances the IP.

Like this:

bool ScScript::executeInstruction() {
    // ... 
    ScValue *op1; 
    ScValue *op2;
    uint32 inst = getDWORD();
    switch (inst) {
    case II_DEF_VAR:
        // ... 
    case II_DEF_GLOB_VAR:
        // ... 
    case II_DEF_CONST_VAR: {
        // ...
    case II_RET:
        // ...
        // ...
    case II_ADD:
        op2 = _stack->pop();
        op1 = _stack->pop();
        if (op1->isNULL() || op2->isNULL()) {
        } else if (op1->getType() == VAL_STRING || op2->getType() == VAL_STRING) {
            char *tempStr = new char [strlen(op1->getString()) + strlen(op2->getString()) + 1];
            strcpy(tempStr, op1->getString());
            strcat(tempStr, op2->getString());
            delete[] tempStr;
        } else if (op1->getType() == VAL_INT && op2->getType() == VAL_INT) {
            _operand->setInt(op1->getInt() + op2->getInt());
        } else {
            _operand->setFloat(op1->getFloat() + op2->getFloat());
    case II_SUB: {
        // ...
    case II_DBG_LINE: {
        // ...
        _gameRef->LOG(0, "Fatal: Invalid instruction %d ('%s', line %d, IP:0x%x)\n", inst, _filename, _currentLine, _iP - sizeof(uint32));
        _state = SCRIPT_FINISHED;
        ret = STATUS_FAILED;
    return ret;

So do we do?
We put hooks in there, ideally.

This way, each time an instruction is executed we can look around and see what’s happening.
We can implement watch, breakpoints and stuff.
Oh, but then there’s the problem of knowing where the hell we are first.
Okay – watching can easily implemented all the same, by keeping a watch list and keeping track of what has changed since last time, but breaking?
We have no debug symbols, so how do we know WHERE we are?
Interestingly, the WME team has been so kind as to add a special instruction to the instruction set: II_DBG_LINE.
Guess what it does?
It updates a counter, _line, in a way that it always contains the number of the current line.

What we do then is to place our hooks in a way to monitor that number and act accordingly – e.g. we have a list of breakpoints, if the current line is in there, we break – that is, we open the debugger window like this:

case II_DBG_LINE: {
    int newLine = getDWORD();
    if (newLine != _currentLine) {
        _currentLine = newLine;
    for (int j = 0; j < _engine->_breakpoints.size(); j++) {
        if (_engine->_breakpoints[j]._line == _currentLine &&
            !strcmp(_engine->_breakpoints[j]._filename.c_str(), _filename)) { 
                // This one breaks execution and does housekeeping.
                // We resume from here once user stops execution
    if (1) { 
        if (_callStack->_sP <= _step) { // This is set elsewhere, suffice saying that I may or may not want to enter a function past a certain call depth _adapter->triggerStep(this);
    if (1) { 
             for (int i = 0; i < _watchlist.size(); i++) {
                // Etc

Then, we wait for user input.
How this is handled we’ll see in the next post.

Interpolation made stupid

Here we are again.
We left talking about transforms and rotations.
See, all in there is good and well, but there is a slight problem in practice I hinted at.

If you remember we said how the fact that we are working in the discrete poses a problem: we have to approximate real coordintes of the destination point with integer coordinates – this, using the naive method, nearest neighbor approximation (aka: simply rounding to the next pixel) causes aliasing.

That’s where interpolation kicks in.

A warning for those unaware: once again, signal processing is terribly big field and everything has vast theoretical implications (and vast practical applications too, e.g. most concepts apply to audio processing as well).
We deal with the matter in an extremely practical fashion, hence the post title.

Now – specifically, we deal with bilinear transform.

Intuitively, the idea is: we try to infer what colour would be a pixel at the given coordinates if there actually were one.
We can’t add new information – this is immediately apparent by the fact that interpolated transforms are inevitably slightly “blurry” -, but we can eliminate the aliasing caused by abruptly rounding to the closest value.

If I am not mistaken, “abruptly” has a special meaning in this context, and means: “upper bound in frequency as per the Shannon-Nyquist theorem”: http://en.wikipedia.org/wiki/Nyquist%E2%80%93Shannon_sampling_theorem

Now, let’s get going. Look at this:

(You will excuse me for posting a pic of a whiteboard, but my parallel scanner from 1997 has apprently just died)

In the example, out projected source is (4/5, 4/5).
What would look like a pixel in (4/5, 4/5)?
We don’t really know, but we can estimate its color to be a weighted average of its neighbors.

So, the math (from Wikipedia):

And now let’s see some code from transparent_surface.cpp – this time you don’t get the pseudocode since it’s really a matter of carrying out the above and checking for a few edge cases and avoiding division by zero:

void TransparentSurface::copyPixelBilinear(float projX, float projY, int dstX, int dstY, const Common::Rect &srcRect, const Common::Rect &dstRect, const TransparentSurface *src, TransparentSurface *dst) {
            int srcW = srcRect.width();
            int srcH = srcRect.height();
            int dstW = dstRect.width();
            int dstH = dstRect.height();
            assert(dstX >= 0 && dstX < dstW);
            assert(dstY >= 0 && dstY < dstH);
            float x1 = floor(projX);
            float x2 = ceil(projX);
            float y1 = floor(projY);
            float y2 = ceil(projY);
            uint32 Q11, Q12, Q21, Q22;
            if (x1 >= srcW || x1 < 0 || y1 >= srcH || y1 < 0) { 
                Q11 = 0;
            } else {
                Q11 = READ_UINT32((const byte *)src->getBasePtr((int)(x1 + srcRect.left),(int)(y1 + srcRect.top)));
            if (x1 >= srcW || x1 < 0 || y2 >= srcH || y2 < 0) { 
                Q12 = 0;
            } else {
                Q12 = READ_UINT32((const byte *)src->getBasePtr((int)(x1 + srcRect.left), (int)(y2 + srcRect.top)));
            if (x2 >= srcW || x2 < 0 || y1 >= srcH || y1 < 0) { 
                Q21 = 0;
            } else {
                Q21 = READ_UINT32((const byte *)src->getBasePtr((int)(x2 + srcRect.left), (int)(y1 + srcRect.top)));
            if (x2 >= srcW || x2 < 0 || y2 >= srcH || y2 < 0) { 
                Q22 = 0;
            } else {
                Q22 = READ_UINT32((const byte *)src->getBasePtr((int)(x2 + srcRect.left), (int)(y2 + srcRect.top)));
            byte *Q11s = (byte *)&Q11;
            byte *Q12s = (byte *)&Q12;
            byte *Q21s = (byte *)&Q21;
            byte *Q22s = (byte *)&Q22;
            uint32 color;
            byte *dest = (byte *)&color;
            float q11x = (x2 - projX);
            float q11y = (y2 - projY);
            float q21x = (projX - x1);
            float q21y = (y2 - projY);
            float q12x = (x2 - projX);
            float q12y = (projY - y1);
            if (x1 == x2 && y1 == y2) {
                for (int c = 0; c < 4; c++) {
                    dest[c] = ((float)Q11s[c]);
            } else {
                if (x1 == x2) { 
                    q11x = 0.5; 
                    q12x = 0.5; 
                    q21x = 0.5; 
                } else if (y1 == y2) { 
                    q11y = 0.5; 
                    q12y = 0.5; 
                    q21y = 0.5; 
                for (int c = 0; c < 4; c++) {
                    dest[c] = (byte)(
                                ((float)Q11s[c]) * q11x * q11y +
                                ((float)Q21s[c]) * q21x * q21y +
                                ((float)Q12s[c]) * q12x * q12y +
                                ((float)Q22s[c]) * (1.0 - 
                                                     q11x * q11y - 
                                                     q21x * q21y - 
                                                     q12x * q12y)
            WRITE_UINT32((byte *)dst->getBasePtr(dstX + dstRect.left, dstY + dstRect.top), color);


Rotation made stupid

In many of the more relaxed civilizations on the Outer Eastern Rim of the Galaxy, the Hitchhiker’s Guide has already supplanted the great Encyclopaedia Galactica as the standard repository of all knowledge and wisdom, for though it has many omissions and contains much that is apocryphal, or at least wildly inaccurate, it scores over the older, more pedestrian work in two important respects. First, it is slightly cheaper; and second, it has the words “DON’T PANIC” inscribed in large friendly letters on its cover.
—Hitchhiker’s Guide to the Galaxy

This is post will go into some detail explaining how to actually rotate a sprite in practice.
It’s a good starting point if you are rolling your own rotozoom function.
It has, naturally, zero academic value and I am also quite sure you can buy manuals that will give you a better idea for just a few dollars.
And you can always troll IRC.
But still.

Now, we assume a little knowledge of linear algebra.
You can probably go ahead all the same if you haven’t taken Algebra 101, but it makes much more sense if you have.
There is countless algebra books which will teach you the basics. Try Strang’s or Nicholson’s.

That said.
Let’s dive in.
Well, a sprite is a matrix of pixels, all right?
To rotate, zoom it and do all the nice things we want to do we need to match a pixel in the source matrix to a pixel in the destination matrix and copy its color.
“This goes here”.

So – look at the source pixel, calculate where it ends up and copy that color. [well, almost, hold on].

Copying a byte is something very simple and your language of choice already offers support for that.
Calculating the destination, well, we need a function for that.
So we have:

(x’, y’) = f(x, y, phi)

don’t we?
There is the textbook formula for rotation:

I’m not giving you a proof or anything (hey, if books aren’t enough you can easily trick a doctoral student into doing the job for you)- it just works.
We assume that rotation is carried out around the origin and that no zooming is involved.

Zooming is actually rather trivial (just multiply by the desidred amount), rotating around a custom point basically means shifting the whole thing so that the hotspot is the new origin, then rotating as above, then shifting again so that the hotspot is where it was in the first place and everything else has moved.
Once you’ve got the simple case, it’s rather simple, though.
See also:

That’s it. Now we can match a point in our destination sprite to a point in our source sprite.
Well, this is not quite it.
You see, while rotating points is a rather mundane matter, we are dealing with pixels, that is… we are working in the discrete.
As you can see in the drawing below, P (x,y) in the destination sprite does not match exactly any pixel in the source.

Which means that we must do some approximation.
What do we do? Well, we can either round to the nearest integer or interpolate.
Interpolation will be the subject of the next post, for the moment, we round.
We basically want something like a).

This also means that it’s wonderfully more practical to scan the destination, calculate where the destination pixel would end up and copy that instead of the other way around.

Now, all we need to do is to scan the whole of the destination matrix and copy the correct color into each pixel.
A nice for() loop will do.
Here you are.
First in pseudocode:

procedure rotate (angle, src, dst, zoom):
  invAngle = 2 * PI - angle; // We need to rotate in the opposite direction, remember?
  for (y = 0; y < dstH; y++):
    for (x = 0; x < dstW; x++):
    targX = ((x * cos(invAngle) - y * sin(invAngle))) * zoom
    targY = ((x * sin(invAngle) + y * cos(invAngle))) * zoom
    dst[x][y] = src[targX][targY]

And now for the actual code used in my patch.
It is only marginally different in that it supports custom centering (omitted above for clarity) and works in degrees instead of radians because that’s what the original Wintermute Engine does.

TransparentSurface *TransparentSurface::rotate(Common::Rect aSrcRect, TransformStruct transform) const {
Point32 newHotspot;
Rect32 rect = TransformTools::newRect(Rect32 (aSrcRect), transform, &newHotspot);
Common::Rect srcRect(0, 0, (int16)(aSrcRect.right - aSrcRect.left), (int16)(aSrcRect.bottom - aSrcRect.top));
Common::Rect dstRect(0, 0, (int16)(rect.right - rect.left), (int16)(rect.bottom - rect.top));
TransparentSurface *target = new TransparentSurface();
assert(format.bytesPerPixel == 4);
int dstW = dstRect.width();
int dstH = dstRect.height();
target->create((uint16)dstW, (uint16)dstH, this->format);
uint32 invAngle = (360 - transform._angle) % 360;
float invCos = cos(invAngle * M_PI / 180.0);
float invSin = sin(invAngle * M_PI / 180.0);
for (int y = 0; y < dstH; y++) {
for (int x = 0; x < dstW; x++) {
int x1 = x - newHotspot.x;
int y1 = y - newHotspot.y;
float targX = ((x1 * invCos - y1 * invSin)) * 100.0 / transform._zoom.x + srcRect.left;
float targY = ((x1 * invSin + y1 * invCos)) * 100.0 / transform._zoom.y + srcRect.top;
targX += transform._hotspot.x;
targY += transform._hotspot.y;
nearestCopy(targX, targY, x, y, srcRect, dstRect, this, target);
} else {
bilinearCopy(targX, targY, x, y, srcRect, dstRect, this, target);
return target;


It’s done, it works, it’s pretty and it is not too sluggish on an i7-4770 if you overclock it a little and close all other apps.

…it’s the best thing ever!

We left with me trying to come up with an alternative approach to handle the rotation problem because rotating first and centering later was a mess because of rounding errors piling up.

We also talked about bounding boxes.

So, the good news is that the new approach – it works.

It is definitely better than the other approach, compare:

The old & ugly one is left, the new and pretty one is right.
Even better, you can zoom as much as you want and it will stay perfectly centered instead of wandering around.

It’s actually good enough to play the J.U.L.I.A. Greenlight demo, ladies and gentlemen:

See those rays? Those are rotated & scaled sprites. Yay!
Just wait for the patch to be merged (or for some horribly bad showstopper to pop up), download the demo here and see for yourself.

How did I do it?
Well, we are talking about it in two blog posts:

  • Rotation made stupid
  • Interpolation made stupid

Now, the only problem with it is that it’s rather slow.
Well, not “with it” (even if the interpolating code could do with some optimization), but with the whole of the GFX/rect subsystem.
Something I’ll tackle on the the second half-term, I guess.