Categories
Week 4

When Structs Lie

Hey everyone! This week I finally got back to the task given by Sev in my first week — converting all the raw struct casts in the Dungeon Master engine to portable byte-offset macros. This was responsible for undefined behavior that could cause crashes or wrong results on platforms with different endianness or alignment requirements.

Earlier it used to be like this:

// Struct definition
class TextString {
    Thing _nextThing;
    uint16 _textDataRef;
public:
    explicit TextString(uint16 *rawDat) : _nextThing(rawDat[0]), _textDataRef(rawDat[1]) {}
    bool isVisible() { return _textDataRef & 1; }
    void setVisible(bool visible) { _textDataRef = (_textDataRef & ~1) | (visible ? 1 : 0); }
};

Now it looks like this:

#define TEXTSTRING_nextThing(address)   (Thing(READ_LE_UINT16((address) + 0)))
#define TEXTSTRING_textDataRef(address)  READ_LE_UINT16((address) + 2)
#define TEXTSTRING_isVisible(address)    (READ_LE_UINT16((address) + 2) & 1)
#define TEXTSTRING_setVisible(address, visible) WRITE_LE_UINT16((address) + 2, (READ_LE_UINT16((address) + 2) & ~1) | ((visible) ? 1 : 0))

And the call site changed from:

TextString textString(_thingData[kDMstringTypeText] + thing.getIndex() * _thingDataWordCount[kDMstringTypeText]);
if (textString.isVisible()) { ... }

To:

byte *textString = getThingData(thing);
if (TEXTSTRING_isVisible(textString)) { ... }

Why this refactor was necessary

This refactor was necessary because the original code relied on casting raw game data directly to C++ structs, which assumes specific object layouts, alignment, and byte ordering. These assumptions are not guaranteed across different architectures, especially on big-endian machines. By switching to explicit byte offsets and endian-aware accessors, the engine now accesses game data in a portable and predictable way.

Lesson 1:

It was a massive refactor across 13 structs in the engine and required multiple attempts. My first approach was still broken — I wrote:

uint16 getTextDataRef() const { return READ_LE_UINT16((const uint16 *)this + 1); }

I thought using READ_LE_UINT16 alone would fix the portability issue, but I missed the real problem: the C++ standard does not guarantee that a struct’s members are packed without padding. A compiler is free to insert padding between members, so the second field is not guaranteed to be located exactly one uint16 after the start of the object. As a result, (const uint16 *)this + 1 may not point to the second field at all. READ_LE_UINT16 handles endianness correctly, but it can’t save you if you’re reading from the wrong offset in the first place.

Lesson 2:

After converting all the struct methods to macros operating on byte *, I still had an issue at the call sites. I was passing non-byte * pointers directly to the macros, relying on the cast inside the macro definition to handle it. While this compiled and ran, Sev pointed out that casts hide problems — you are forcing the compiler to interpret memory as a different type, relying on assumptions about object layout that may not hold on all platforms. By switching the call sites to use byte * explicitly, the macros now operate directly on raw data using fixed byte offsets rather than through potentially misleading type casts.

Result

In total, this refactor touched 13 classes throughout the Dungeon Master engine. After several iterations, the final result is a portable implementation that accesses serialized game data explicitly through byte offsets.

Another Freeze

Towards the end of the week I continued my playthrough of the game and reached level 9 out of the 13 dungeon levels. During testing I encountered another freeze when interacting with a Wizard Eye monster. After some debugging, I found that the issue originated from projectile allocation. Wizard Eyes attack by spawning projectile objects, which are allocated through getUnusedThing(). However, the size of projectile data was being incorrectly divided by two using >> 1, causing the engine to allocate only 2 words instead of the required 5. In ScummVM, _thingDataWordCount already stores sizes in words, making the division unnecessary.
Before:

int16 thingDataByteCount = _thingDataWordCount[thingType] >> 1;

After:

int16 thingDataWordCount = _thingDataWordCount[thingType];

As a result, newly spawned projectiles overlapped existing objects in memory and corrupted their linked-list pointers. In the failing case, a projectile’s next pointer was overwritten and linked back to Door 0 instead of terminating at _thingEndOfList, creating a cyclic linked list on that map square. The rendering function drawObjectsCreaturesProjectilesExplosions() then looped infinitely while traversing this list, freezing the game. Removing the obsolete >> 1 division restored the correct allocation size and resolved the issue.

 

That was it for this week. Next week I plan to finish my gameplay testing of Dungeon Master, continue hunting for any remaining issues, and start implementing the remaining unimplemented methods in the engine. See you in the next one. Until then, goodbye! 🙂

Categories
Week 3

Reading Between the Bytes

Hi, this is week 3 of my GSoC journey. After spending the last two weeks chasing crashes across the DM engine, this week turned into something more like detective work — digging into array indices, stale pointers, and byte alignment bugs that were quietly corrupting things under the hood.
This week, I faced freezes along with crashes. Some of these bugs only revealed themselves when stress-testing specific actions, like throwing explosives at a creature group, which would lock up the game entirely instead of just throwing an error.

Stride and Seek

Flying items rendering with garbled pixels when shrunk on screen. The cause was in blitToBitmapShrinkWithPalChange, which computed the destination row stride from destPixelWidth, the raw pixel width. But allocated bitmap rows are actually 8-byte aligned via getNormalizedByteWidth, so the real stride is wider than that. Using the raw width meant every row started at the wrong offset, bleeding into the next and corrupting pixel data.

Fix: compute destStride = getNormalizedByteWidth(destPixelWidth / 2) * 2 and use it instead of destPixelWidth when indexing destLine = &destBitmap[destY * destStride]

Before:

After:

 

The Defense Drift

Next was a crash in processEventEnableChampionAction, but the more dangerous part of this bug wasn’t the crash. _actionDefense is a lookup table indexed by action type, via _actionIndex, but the code was indexing it with curChampion->_actionDefense instead, the champion’s current defense value. That’s an arbitrary runtime number, not a valid index, so when it grew large enough to exceed the table size, the read went out of bounds and crashed outright. That part was at least loud and easy to notice.

The real problem showed up when the stat stayed within bounds. The lookup would quietly succeed, but against the wrong table entry, subtracting an incorrect defense bonus every time a combat action ended. There was no crash, no error, nothing to flag it. Just a small wrong number applied again and again, every fight, every action. Over enough time, that drift compounded into a champion whose defense stat had wandered far from what it should be, sometimes ending up nearly immortal, sometimes alarmingly fragile, with nothing in the logs to explain why.

Fix: a one-character typo fix, replacing _actionDefense[curChampion->_actionDefense] with _actionDefense[curChampion->_actionIndex].

The Gimme Slot Bug

Next one came from the gimme debug command. Cmd_gimme loops over every slot for a given thing type, but unused slots have their first word set to thingNone and the loop wasn’t checking for that before passing them to getIconIndex, which calls into getWeaponInfo or getObjectType. Those ended up reading garbage from empty slots, sometimes crashing.

Fix: read the raw first word of each slot, and skip it with continue if it equals thingNone.

The Duplicate Item Bug

Another gimme bug. When it allocated a new item slot by copying thing data and bumping _thingCounts, it forgot to update dummyThing’s index to point at the new slot, still handing over the original thingIndex instead of thingCount. Two references ended up pointing at the same thing slot, causing duplicate item corruption and crashes whenever one reference was modified or deleted.

Fix: one line, dummyThing.setIndex(thingCount), after the allocation.

The Object Aspect Crash

Next one was inside drawObjectsCreaturesProjectilesExplosions, the function responsible for rendering objects, creatures, and projectiles in the dungeon view. It chained getObjectInfoIndex straight into _objectInfos[...]._objectAspectIndex and then into _objectAspects209[...], all in one expression, with no validation along the way. But getObjectInfoIndex can return an out-of-range value for invalid or corrupt things, and _objectAspectIndex itself can exceed k85_ObjAspectCount, causing crashes.

Fix: guard both indices before use, skipping the object with continue if either is out of range.

An Enum That Wasn’t One

Next one was about ActiveGroup::_directions, typed as Direction, an enum. But the field doesn’t actually store a single direction, it stores packed 2-bit direction values for up to 4 creatures combined together, built via getGroupValueUpdatedWithCreatureValue. That’s a bit-packed integer, not a valid enum value, and casting a packed integer to an enum type is undefined behavior in C++.

Fix: change the field type from Direction to uint16, and remove the (Direction) casts at every assignment site, letting the packed value be stored and manipulated as a plain integer like it should’ve been.

The Freeze That Wouldn’t Crash

This one ate up most of my week, and it didn’t go down easy. The game would freeze, not crash, whenever explosives were thrown at certain creature groups. No ASan output to point at anything, since nothing was actually invalid memory. GDB’s backtrace wasn’t any more helpful either, it didn’t even surface the method actually responsible.
So it came down to reasoning it out manually. The freeze only ever happened when throwing explosives at a group with loot to drop, which meant something to do with damaging groups and handing out their possessions. That pointed toward GroupMan, so I started going through its methods one by one, comparing them against the original reversed code to spot where our implementation had drifted.
That led to dropCreatureFixedPossessions, which iterates a fixedPossessions pointer array to hand out loot. Two early-exit continue paths, one for skipping a random drop, one for when no unused thing slot was available, both forgot to advance the pointer before looping back. Without that advance, the loop kept re-reading the exact same array entry forever, spinning in place instead of crashing.

Fix: advance currFixedPossession = *fixedPossessions++ in both continue branches before skipping, so the loop always makes forward progress.

The Projectile Crash

Next one started simple, throw a sword or any projectile, and the game crashed. No obvious cause at first. Going through the methods from the GDB backtrace one by one eventually led to initializeGraphicData, where the loop over 6 projectile scale levels was writing all 6 byte-counts to the same fixed offsets, derivedBitmapIndex, derivedBitmapIndex+6, derivedBitmapIndex+12, every iteration, missing + projectileScaleIndex. Only the last iteration’s values stuck, leaving every earlier entry sized incorrectly.

Fix: add + projectileScaleIndex to all three cache offsets, so each scale level writes to its own correct slot.

New Addition

Added a nuke debugger command. Killing monsters manually just to clear a path for testing wasted time, so nuke finds the group directly in front of the party via mapCoordsAfterRelMovement and groupGetThing, then removes it instantly with groupDelete. No gameplay impact.

Looking Ahead, Faster

I’m going to push to finish the DM engine faster going forward, since there are more engines waiting in line after this one. I’ll do my best to wrap it up as quickly as I can, though I’ll admit, it’s turned out to be far more complex and broken than I initially expected.

Thank you for reading. See You in the Next One 🙂


 

Categories
Week 2

More Bug Fixes in the DM Engine

Hi everyone! Following up on Week 1, most of this week was spent resolving PVS-Studio warnings in the DM engine. Many of these warnings were responsible for crashes or pointed to code paths that could lead to unstable behavior.

By the end of the week, I was able to reduce the warning count from 103 to 13, with the remaining ones being false positives. With the static analysis issues largely addressed, I can now focus fully on gameplay-related bugs, fixing the remaining crashes, and implementing the remaining methods.

One such crash was in DisplayMan::drawDoorButton():
doorButtonOrdinal = kDMDerivedBitmapFirstDoorButton + (doorButtonOrdinal * 2) + ((doorButton != kDMDoorButtonD3R) ? 0 : doorButton - 1)
The existing logic worked for most door button values but failed when doorButton = 0, leading to an incorrect offset and eventually a crash. I corrected the expression to ((!doorButton) ? 0 : doorButton - 1), fixing the crash and restoring the intended behavior.

Another crash occurred while picking up a coin placed on a pressure plate. The game would immediately crash when processing the resulting door animation. After tracing the issue, I found that processEventDoorAnimation() unconditionally queried creature attributes before checking whether a creature group actually existed on that square. Reordering the checks fixed the heap buffer overflow and matched the behavior of the reversed source code.

I also fixed several other crashes and memory safety issues in the DM engine, including an incorrect bitmask in ProjExpl::projectileDelete() and an out-of-bounds champion access in drawPanelFoodWaterPoisoned().

Towards the end of the week, I shifted my focus back to the GUI, where I fixed a bug in the launcher that caused the collapsed state of groups to be lost when switching between views.
The fix prevents LauncherSimple::updateListing() from saving the collapsed group state during the initial build, avoiding an unnecessary overwrite of the already saved configuration and correctly preserving the collapsed groups when switching views.

I also added support for stopping fluid scrolling animations with a click, allowing the first click to stop the ongoing animation and the next click to perform the intended selection in lists, grids, and scroll containers.

Additionally, I added configuration support for fluid scrolling, allowing it to be toggled from the options menu, as the feature did not turn out to be a pleasant experience for everyone.

That wraps up this week’s progress. Next week, I’ll continue working on the remaining gameplay issues and implementing the remaining methods. Thanks for reading, and see you in the next update!

Categories
Week 1

Chasing Crashes in the DM Engine

Hi everyone! Now that the GSoC coding period has begun, I moved from working primarily on the GUI to investigating crashes and gameplay bugs in the DM engine while becoming more familiar with its internals.

The first issue I looked at was a progression blocker near the start of the game. After assembling my party in the Hall of Champions, I found, as Strangerke had pointed out, that the gate to the dungeon remained shut despite stepping on the floor sensors. The issue turned out to be an incorrect early return in the sensor effect logic, preventing the pressure plate from triggering the door-opening event.

Another crash occurred when highlighting a selection in the game’s save dialog. The cause was a simple parameter ordering mistake in a box constructor used to draw the selection highlight.

Another task involved implementing EventManager::highlightScreenBox(), enabling visual highlighting for UI elements such as the movement controls.

 

Next was a save-loading bug, where the first in-game load attempt would start a new game instead of loading the selected save. Setting the correct game mode when a valid save slot was chosen resolved the issue.

Next was a stairs transition crash when moving from level 0 to level 1. The crash is currently fixed, but Sev explained that I will need to revisit this area and make the implementation aware of endianness and struct padding to achieve a fully portable solution.

 

Next was a crash that occurred whenever certain creatures, such as mummies, entered the player’s field of view.

 

Another bug involved a teleporter that was intended to appear as a normal wall. Incorrect bitfield masks caused it to render instead as a room filled with teleporter, hiding the key that was supposed to be visible on the ground.

Before:

After:

 

Next was a crash in the game’s debugger, where entering an invalid gimme command such as gimme gold key instead of the expected gimme GOLD KEY would result in a crash.

Another set of fixes involved correcting several incorrectly ordered CLIP calls across the codebase, which were responsible for a number of crashes.

I also resolved several warnings reported by PVS-Studio, addressing a number of potential issues in the codebase.

Now that I’m gaining momentum, I hope to tackle many more bugs in the coming weeks while continuing to resolve PVS-Studio warnings and improve the overall stability of the DM engine. That’s all for this week. See you in the next update!