Kyra disassembled!

It’s a day later than I’d estimated, but last night, I finished the Kyra disassembler, including function detection.

For now, I’m only handling one game – Hand of Fate – but it would be pretty simple to add the other Kyra games as well, as the only things different between them are the “magic” functions, and they’re all referred to by a parameter to a specific opcode. However, there’s not much point in doing that right now; it won’t help me to decompile the bytecode.

Here’s an excerpt from the disassembly of a small script:

00000016: jumpTo 0x0 (0)
00000018: push 6 (1)
0000001a: push 3 (1)
0000001c: setCharacterPos (0)
0000001e: addSP 2 (-2)
00000020: push 6 (1)
00000022: push 2 (1)
00000024: setCharacterPos (0)
00000026: addSP 2 (-2)

The next step is to get the code flow analysis working with KYRA, and then it’s on to the code generation.

SCUMM over!

I’ve finished the SCUMM code generation now. Unfortunately, a bunch of stuff got in the way, and I didn’t quite make my new revised deadline – but at least I finished in time for the original deadline.

There may still be some tweaks here and there which would be nice to add, but the code generation is working for all opcodes, and generating some pretty nice output. To give an example, here is the complete output for the script I showed you last time:

C:\scummvm\gsoc2010-decompiler>decompile -escummv6 script-33.dmp
00000000: VAR_GAME_LOADED = 0;
00000006: var173 = var177;
0000000C: var177 = 874;
00000012: cursorCmd_Image(var177, 93);
0000001A: stopObjectCodeB();

And just to show it does more than just such simple scripts, here’s part of another, more complicated script:

C:\scummvm\gsoc2010-decompiler>decompile -escummv6 script-30.dmp
00000000: localvar2 = 1;
00000006: while ((localvar2 <= (60 * var132))) {
00000014:   breakHere();
00000015:   delaySeconds(1);
00000019:   if (((VAR_VIRT_MOUSE_X != localvar0) || (VAR_VIRT_MOUSE_Y != localvar1))) {
0000002B:     localvar0 = VAR_VIRT_MOUSE_X;
00000031:     localvar1 = VAR_VIRT_MOUSE_Y;
00000037:     jump 0;
0000003A:   }
0000003A:   if (bitvar1) {
00000040:     bitvar1 = 0;
00000046:     jump 0;
00000049:   }
00000049:   ++(localvar2);
0000004F: }

Now that all parts of the SCUMM decompilation have been handled, it’s time to move on to the next engine – KYRA. My next task is therefore to get a disassembler for that written.

First output!

I finally have some actual output working. It’s only a proof of concept right now, but it’s a nice start.

Here is the output the decompiler generates for one of the short Sam & Max scripts – one of the scripts I used for the code flow testing (link points to the graph for that script):

C:\scummvm\gsoc2010-decompiler>decompile -escummv6 script-33.dmp
VAR_GAME_LOADED = 0;
var173 = var177;
var177 = 874;
Unknown opcode 6B99 at address 00000018
Unknown opcode 66 at address 0000001A

It only does these simple assignments for now – everything else is left unknown – but like I said, it’s a start.

The opcodes it complains about are cursorCmd_Image and stopObjectCodeB, since I don’t handle those yet.

Re-assessment

After a brief discussion with my mentors, I’m going to use a different approach than I’d originally envisioned for the code generation part of the decompiler.

Basically, I’m going to try and emulate the principle used by descumm for generating the SCUMM code. This approach should be simpler than what I had originally envisioned, but still sufficiently generic to work without too many changes for KYRA.

Since this approach is vastly different from the original plan, I’m also re-evaluating my milestone plan. The original plan puts the expected date for SCUMM code generation at July 24, but I think I’ll be able to get this done a few days earlier than that – I’m guessing it’ll be around July 21 instead, so I’m setting that as the new date for that milestone for now, although it is still a pretty rough estimate. In a day or two, I will hopefully have a much better feel about how long this will really take, and then I will look at changing the other milestones as well – and possibly re-evaluating this one as well.

CFG analysis complete!

And with that, the CFG analysis is now completed and documented, completing the fourth milestone. There are still a couple of things that could be improved upon, but they are not applicable to SCUMM, so I’ll leave those things alone for now and then I can get back to them if KYRA needs them (or if time permits later in the project).

Here’s an example of a script with all of the analysis complete:

Code flow graph of samnmax/script-30.dmp

It may seem a little messy, but I’ve verified it, and the analysis gives the expected results for everything.

The next milestone is code generation for SCUMMv6 – so each of these boxes must be converted to a piece of code. Once that milestone is complete, the decompiler should be able to fully decompile all SCUMMv6 scripts.

Short-circuiting

I’ve just finished adding short-circuit detection to the decompiler. This detection is done as part of the initial group creation.

The algorithm for detecting short-circuiting is as follows: Two groups, A and B, that follow sequentially in the byte code (no instructions appear between the last instruction of A and the first instruction of B), are merged if:

  • Both group A and B end with a conditional jump (they have two possible successors each)
  • For each outgoing edge from A, the edge must either go to B, or to a successor of B – in other words, merging may not add any new successors. More formally, succ(A) ⊆ {B} ∪ succ(B), where succ(X) is the possible successors for X.

To make sure the merging happens correctly when several groups should be merged, the groups are iterated in reverse order (the group with the last instruction is processed first).

Here’s an example to show how this works. The number in parenthesis is the effect that instruction has on the stack, and for ease of reading, the jump addresses have been made absolute.

0x00: wordVarDec 123 (0)
0x03: pushWord 41 (1)
0x06: pushWordVar 321 (1)
0x09: eq (-1)
0x0a: jumpTrue 0x17 (-1)
0x0d: pushWord 42 (1)
0x10: pushWordVar 123 (1)
0x13: eq (-1)
0x14: jumpTrue 0x00 (-1)
0x17: stopObjectCodeA (0)

Normally, there would be 4 groups in this script: the first and last instructions would be groups on their own, and instructions 0x03 through 0x0a would be a group, as would 0x0d through 0x14. For simplicitly, we’ll number these groups 1 through 4.

0x00: wordVarDec 123 (0)
===
0x03: pushWord 41 (1)
0x06: pushWordVar 321 (1)
0x09: eq (-1)
0x0a: jumpTrue 0x17 (-1)
===
0x0d: pushWord 42 (1)
0x10: pushWordVar 123 (1)
0x13: eq (-1)
0x14: jumpTrue 0x00 (-1)
===
0x17: stopObjectCodeA (0)

The possible successors for group 3 are group 4 and group 1, while the possible suceessors for group 2 are group 3 and group 4. Since group 4 is a shared successor, and the group 2-group 3 link becomes irrelevant due to the merge, the two groups can be merged to a single one since the algorithm detects them as a short-circuited condition check.