Autumn Cleaning

The past week has seen quite a few deletions from the codebase, I’ve removed the debugger, the registry and a lot of smaller pieces here and there in the code, all stuff that isn’t really necessary in a ScummVM-version of the engine (like in-engine fullscreen-switching-handling).

I’ve also found time for quite a bit of cleanup, with the assistance of the nifty astyle tool I added braces to all if-, for- and while-statements, breaking single-line ifs that looked like this:
if (conditional) doStuff();

into:

if (conditional) {
doStuff();
}

which made for quite a bit more readable and consistent code.

I’ve also removed Base as super-class for a few classes that didn’t really need to reference BaseGame (formerly CBGame), thus easing the cross-file-dependencies that were making the compile-times horrible.

File Management
The package management in the file manager has been refactored into a subclass of Common::Archive, which at this point means that it’s quite close to being able to be added to SearchMan, and then just using SearchMan directly, one of the blockers for that is the filename-handling for absolute paths (i.e. redirecting C:\Windows\fonts somewhere usefull, another is that quite a few classes use the “readWholeFile”-function (those that just got a bad case of “endianness”-itch, can relax, it’s only used for text files)).

Save games
Where formerly save games were taking ages and ages to load/save, they now should take quite a bit shorter, the speed-issue here came from the fact that the “load-progress-indicator” was drawn for every instance of every class that was loaded/saved, this in turn made a full screen update (as everything does for the moment, for lack of fully working dirty rect-updates). This meant that for every millisecond of usefull load/save-work, the engine would be doing hundreds of milliseconds of work updating the screen just to possibly move a progress bar another pixel.

I changed this behaviour to only update the exact region of the screen where the indicator is, and only as far as the indicator has progressed (and, only when the progress bar has changed at all). This might result in minor differencies in how the progress bar looks (as it isn’t redrawn on top of itself for every single instance any longer). But that’s a cost I think most users will be willing to take when the end result is a decrease in save/load-time by a factor of 10.

Lazy loading of images
The engine used to take a few seconds to startup a game on my computer, the reason for that was simply that all the images in the entire game would be loaded. I looked into this, and the only reason most of these images were loaded, was simply that their width and height was needed to set the default size of _rect in BaseSubFrame. I changed _rect to private and made ALL access to it use getters and setters (even internally in the class). Then I added the field _wantsDefaultRect, that kept track of whether _rect should have the width/height-values or something else. When getRect is triggered, and _wantsDefaultRect is true, the image will be loaded, to get at those values.

Now, this might sound like a rather odd approach for a simple getter, but the end result is that the state of the class is preserved WITHOUT loading every single image on startup, thus reducing both the load time and the memory footprint by quite a bit. (For Dirty Split I halved the memory footprint of the first screen, and reduced the load-time to between a fifth and a tenth).

One side-effect of this that might prove this solution quirky, is that there is no guarantee that all frames of an animation reside in memory now, which might result in some animations being slow or late the first time they are played.

Singleton and separation
Instead of passing the File Manager around to everyone and everything, it’s now moved to a singleton that is supposed to keep it, and other stuff that survive between loading savegames. Since BaseGame is already quite a bit on the heavy side (clocking in at over 4 KLOCs), I think it’s appropriate to try to move as much functionality as possible OUT of that class and put it elsewhere.

Minor stuff
I had forgotten to add the space-bar as a “printable” key in the events-checks, which made typing anything but rather long words in i.e. save-game titles impossible. That is fixed now.

  • Sounds that were playing when saving now resume properly on load.
  • Screen-fading now works.
  • Save game thumbnails now scale properly (thanks to some scaling code I got from clone2727), where before they would repeat the first column in the last few rows, owing to some integer-division gone haywire.
  • The settings that were formerly in the settings.xml-file have now been moved to ConfMan, which is responsible for the ScummVM-settings-file (.scummvmrc/ScummVM Preferences/scummvm.ini), any game-specific settings will be stored there with the prefix priv_ (for instance the subtitles-flag in Dirty Split is stored as priv_Subtitles).

What’s left?

I’m back again, after a week of harvest-work, I’m now ready to start the second half of GSoC, a big question then, is what’s left to do?

  • Volume-settings are currently ignored (well, mute works, but that’s it)
  • Videos desynch a bit (I used the video-code from the Sword 2.5-engine, which states this to  be a known issue)
  • Only OGG-audio is supported at the moment, while games like “the white chamber” also use WAV.
  • Detection still needs a bit of work:
    • Detection is hardcoded for the games I have tested with, and doesn’t allow for user-games with changing/unknown MD5’s.
    • Detection is set with a common target, and Savegames use hardcoded filenames, thus making ALL WinterMute-games share savegame-namespace. This means that slot 3 in J.U.L.I.A. also is slot 3 in Dirty Split…
  • TTF-fonts still need a bit of adjusting:
    • TTF-fonts currently lack a decent fallback to theme-fonts (in ANY case text will be drawn, just not with a font that looks at all similar to what was intended when the games were made)
    • TTF-fonts drew a bit on the dark side (cheap Star Wars-jokes aside, this was because I drew 16bpp for some reason, and then converted that to 32bpp, I have a fix for it now)
  • Variable renaming:
    • There are still the odd variable name here and there that follows either the VarName-naming convention, or the style_where_you_put_in_a_bunch_of_these. I think I got most of the former. And the latter, well it will have to change as well.
  • Drawing is slow, or as _sev put it:

<_sev> i don’t agree it is slow
<_sev> it is über-slow

Well, this is the bit that I talked a bit about last time, sadly I have still not solved the dirty rect-thing (since I haven’t really been working on GSoC the past week, owing to the mentioned harvest-work). I did go through the blit-function I use right now, and noticed two things:

As fuzzie mentioned back when I originally tried to refactor the blit-code from the Sword25-engine for common usage, I changed the constant shifts to variable shifts (i.e. “pix << 24 & 0xff” became “pix << blueShift & 0xff”), which would disable any compiler-optimizations that could change that to byte-access, instead of a shift, accordingly I also changed the byte-writes to format.colorFromARGB-calls, which add even more shifts. Doing some profiling on thise code revealed that ~30% of the total runtime was spent in this function. Simply changing the shifts back to constant shifts, and the format.colorFromARGB-calls to byte-writes, reduced this to ~19%. Which now makes flip() the heaviest part of the entire code (since it does a complete redraw of the frame every frame), counting in at ~57% of the total runtime spent in g_system->updateScreen(). Finishing up dirty rects should reduce that load by quite a bit more.

I did do some minor game testing the past week, as meta kindly provided me a retail version of J.U.L.I.A. I was able to play on from the demo, the downside being that I found J.U.L.I.A. to be using sprite rotation, which is not supported in WME Lite, from which I have based this port. This makes it rather unlikely that the full J.U.L.I.A.-version will become completable with this port during GSoC, although, it isn’t unlikely that I’ll end up adding in sprite rotation at some future point.

Going forwards, I’ll try to get the things listed above fixed, and hopefully end up with a reducing my list of TODOs, instead of finding new roadblocks.

Time to clean those rects up.

My current solution to the rendering in WME is rather slow, capping out at around 30 fps on my i7, which means I get lower frame rates in Dirty Split than I do in Diablo III. The major difference between the two would of course be partially explained by Dirty Split being rendered entirely in software. But that alone shouldn’t excuse a non-changing 2D scene not managing to get more than 30 fps.

Now there are a few places that the rendering can be improved: First of all, is there any reason to redraw the screen if NOTHING has changed? Probably not. Is there any reason to produce as many frames per second as possible, maxing the CPU-load? Probably not.

So, I put in a framerate cap, for the time being this is capped at 25 fps, to keep my CPU from maxing (and thus my laptop from getting quite hot). In practice it should probably go a bit higher than that, but since the rendering itself limits it’s possibility at the moment, that discussion is rather moot.

To get the rendering to go a bit smoother, I had to detect if the new frame was at all different from the last frame, and preferably also WHAT was different from the previous frame, and then only redraw the parts that were changed. (The “dirty rectangles”)

The old renderer

Originally, all surfaces that wanted to be rendered would apply a scale to themselves and then pass the scaled surface to the renderer (the surfaces would also cache the last scale, to avoid reapplying the scale every frame if it was only drawn at one size). Every frame would start off with clearing the screen-buffer, then drawing the surfaces one by one. Thus there was almost no difference between drawing a very different frame, or the same frame again, as all the surfaces would need to be redrawn anyhow.

Replacing this means I have to take care to keep the same behaviour intact, which means:

  • Any area that was drawn last frame, but isn’t drawn now needs a redraw
  • Any area that doesn’t get anything drawn in it, should get filled with the clear-colour (which was originally drawn into the screen-buffer on clear anyhow)
  • If any element was drawn before element X, it still needs to be redrawn before element X if it needs to be redrawn.
  • If any element was drawn after element X, it still needs to be redrawn before element X f it needs to be redrawn.

Render-tickets

The name I chose for my solution is “renderTickets”. Where before any draw-call would mean an immediate update to the screen-buffer, in the new system, a renderTicket makes note of the operation that was asked for (a ticket), as well as a copy of the data that was to be used as a source for the operation.
When adding a ticket, a search is done to see if the surface that asked to draw issued any tickets last frame, and if they are the same as the one’s this frame. Any tickets that are unchanged will be reused, while any new tickets will trigger an update in their target screen region.
Additionally the renderer keeps track of the order of the tickets, a ticket is only accepted for reuse if it is asked for at the same point it was asked for last frame, with one exception: A new ticket that arrives as ticket N, will increment the expected order of the tickets that should arrive as N+1 and on, since a new ticket will get it’s region redrawn completely anyhow, this should still keep the Z-order, but avoid having to redraw uneccesarrily areas that arrive in-order, but with new tickets before them. Any ticket
that triggers a redraw, has it’s target-position added as a dirty rect (at this point, that means that the single rect used is scaled to include it’s area)
Finally, when the engine asks for a flip from the back-buffer to the screen-buffer, the actual drawing starts:
  1. First, the list of render tickets is purged of all items that were drawn last frame, but did not receive requests for draw this frame.
  2. The dirty rect is filled with the clear-colour
  3. Any tickets that intersect the dirty rect get’s that section redrawn
  4. Finally the back-buffer is copied to the screen buffer and onto the screen.

Issues

The current implementation has a few issues with Z-order, and uses a bit more memory than the old solution. The memory usage was expected, as every ticket has to keep a copy of the section it wants to draw (as the scale can differ between draws, or the Surface that asked for a draw can be destroyed before screen-flip). Thus the implementation isn’t enabled by default at this point.
There also is no code done yet for Fading/Line-drawing with dirty-rects, and I also might want to use multiple dirty rects, instead of scaling a single update-area. Happily though, the current solution allows the engine to idle without problem, which means that a screen that doesn’t change doesn’t even trigger a buffer-copy, and puts the CPU-usage down from 100% to less than 10% when nothing is happening.

Other developments

I did a quick test of the engine on my PPC-machine, and fixed an endian-assumption in the scripts, which means that I possibly might be the first person to ever start up J.U.L.I.A. and Dirty Split on a PPC Mac 😀 That did expose the need for a way more efficient render-solution though, because while the game did RUN fine, it wasn’t fast enough to be very playable.
In some free moments the past week I have also done quite a bit of variable and function-renaming to move towards following the ScummVM convention (ScummVM uses “funcName()” and “varName”, while WME uses “FuncName()” and “VarName”), thanks to _sev’s earlier help, the “m_MemberVar” -> “_memberVar” rename is already done, although a few of those might need a bit of lookthrough to catch cases like _iD (which probably should either be _id or _ID).

Current engine status

While there are no fancy pictures this time around, I thought I’d list my estimates on the engine-status:
  • Graphics: 80-90% Works completely, but is very slow
  • Sound: 50-60% Works, but lacks volume-control and WAV-file support
  • Fonts: 70-80% Works fine, but is a bit darker than in the original, also no solution is in place for replacing system-fonts (that won’t necessarily be available on non-Windows-platforms)
  • PPC-support: Can’t guess at a number, but the engine starts, and seems to run OK
  • Video: Works, but has the same issues as Broken Sword 2.5 lists, video desyncs from audio, and is very slow
  • Savegames: 80-90% Currently broken by a mis-setting of version-numbers, but that should be a quick fix, otherwise works fine, and has no noticed memory-leaks/issues.
  • Renaming to ScummVM-convention: 40-50% would be my estimate, but I would guess at closer to 40 than 50.
  • Sprite mirroring: 100%