They really should have provided a pad of these with the game:
Continued from the previous entry
In the last post, we learned that the code for skipping the introduction movie is different between the demo and the full version of EMI. In this post, we’ll delve into how this works in the demo and see if we can fix the problem in ResidualVM.
I first tried to inject my Z button debug script into the main demo file with all of the scripts (MagDemo.lab), but I was only able to get a dialog box in ResidualVM. Undaunted, I tried to modify the _system.lua script to load different scenes depending on the state of the system.buttonHandler variable, which does key mapping. In short, I wanted to see if the default button handler was loaded during the intro cut scene. However, whatever I tried to do, nothing was changing! Even things like removing BOOT, the first function called in the game, weren’t having any effect.
As it turns out, the demo has an extra lab file (i9n.lab) which contains a slightly different version of the _system.lua script. This file is apparently loaded after or perhaps instead of the _system.lua script in the main object file. With this sorted, I went back to testing to see what the state of system.buttonHandler was.
As it turns out, in ResidualVM, the default handler is set, likely by the _system.lua script, when it loads the default controls in _control.lua. In the demo, there is no handler! I checked for this using the following test:
- Inserted code after the call to StartMovie(“intro”) to save if the buttonHandler was set or not
- Changed the set that you switch to depending on the state of that variable.
It is likely that the demo uses custom code for handling ESC during the movie. This is also why ResidualVM crashes with a segfault when you press F1, at this point in the demo, nothing is watching for F1. The original designers missed this because it’s not possible to reach in the retail demo. It also appears to resemble what happens in the full game as well, with ESC being the only active key while movies are playing.
That said, how can we fix this in ResidualVM?
A simple patch was pushed, but I’m not sure this is the right way to do it. If not, there will be a follow up post describing how to fix the issue the right way!
Continued from the previous entry
After the initial excitement of fixing the bug, Inguin remembered that the Manatee was the reason why that code was written in the first place. We have to try it out, so let’s get to the MicroGroggery and try it out:
- dofile(“_jumpscripts.lua”)
- jump_script(“250”)
- switch_to_set(“mig”)
Great, we’re there! After a brief conversation with the barkeep, we get to ride the Manatee. And then things go a little bit wrong:
It turns out that the Chandelier wasn’t a great test of the position code. The base actor was at (0,0,0) and had no rotation. The Manatee (or more specifically: mig.manatee_rotator) has a rotation before we attach Guybrush’s actor to it. So, there’s a problem with attaching actors when there’s rotation. Let’s do some investigation and see what the behavior of the retail engine is, as compared to the current code in ResidualVM.
The configuration for all of the tests start with this setup:
- Actor1 is located at [1,2,3]
- Actor2 is located at [2,4,6]
- Actor1 is rotated to (0, 90, 0) using setrot
- We call Actor1:attach(Actor2, nil) to attach the two actors
Once these steps have been completed, I looked at the information we can get from the Actors:
Command | ResidualVM | Retail |
---|---|---|
Actor1:getpos() | (1,2,3) | (1,2,3) |
Actor1:getworldpos() | (1,2,3) | (1,2,3) |
Actor1:getrot() | (0,90,0) | (0,90,0) |
Actor2:getpos() | (1,2,3) | (-3,2,1) |
Actor2:getworldpos() | (4,4,2) | (2,4,6) |
Actor2:getrot() | (0,0,0) | (0,-90,0) |
So, what does this tell us? For Actor1, the attaching doesn’t change his position or behavior and both engines do the same thing! Actor2, who was attached to Actor1 doesn’t fare quite as well. Starting with getrot, we see that the retail engine shows the angle relative to the attached actor. The world position from getworldpos is the original position before we attached the actors, while the position reported by getpos is wrong.
Let’s try another test with this setup:
- Actor1 is located at [1,2,3]
- Actor2 is located at [2,4,6]
- Actor1 is rotated to (0, 90, 0) using setrot
- Actor2 is rotated to (20, 0, 0) using setrot
- We call Actor1:attach(Actor2, nil) to attach the two actors
What happens this time?
Command | ResidualVM | Retail |
---|---|---|
Actor1:getpos() | (1,2,3) | (1,2,3) |
Actor1:getworldpos() | (1,2,3) | (1,2,3) |
Actor1:getrot() | (0,90,0) | (0,90,0) |
Actor2:getpos() | (1,2,3) | (-3,2,1) |
Actor2:getworldpos() | (4,4,2) | (2,4,6) |
Actor2:getrot() | (20,0,0) | (90,70,-90) |
Finally, let’s look at what happens when we detach Actor2 from Actor1 using the second example:
- Use the setup from the previous example
- Then a2:detach()
Here’s what happened this time:
Command | ResidualVM | Retail |
---|---|---|
Actor1:getpos() | (1,2,3) | (1,2,3) |
Actor1:getworldpos() | (1,2,3) | (1,2,3) |
Actor1:getrot() | (0,90,0) | (0,90,0) |
Actor2:getpos() | (4,4,2) | (2,4,6) |
Actor2:getworldpos() | (4,4,2) | (2,4,6) |
Actor2:getrot() | (0,0,0) | (0,0,-20) |
We can now make a few guesses as to what’s going wrong. In the next post, we’ll figure out how to fix the problem!
One of the requirements for the 2014 GSoC is to write a patch and complete a pull request. While we’ve finished a small fix with the segfault, I wanted to take on something a bit bigger for this pull request. Unfortunately, I didn’t have enough time to figure out all of the Actor offsets, embedded functions and structures to complete the SetActorLocalAlpha function for this deadline. Instead, I decided to try and figure out why the chandelier lights were being rendered in the wrong place inside the Governor’s Mansion. Unfortunately, this ended up being a much bigger problem than anticipated. Here’s what I’ve found, along with more information about what the original engine does.
To start with, it was tedious to replay the same part of the game over and over again to get into the mansion. To avoid this in ResidualVM, we can use the debug console to run a script to change the current setup. To do this, we must first load the file that contains this script. Bring up the console with CTRL-D and type:
- lua_do dofile(“_jumpscripts.lua”)
Press ESC, to close the console, and this code will be executed. Getting back into the console again, we run the jump_script function:
- lua_do jump_script(“mansion interior”)
Press ESC, and we’ll be warped into the Governor’s Mansion setup, cool! The jump_script takes care of setting all of the game states as well, you can consider them to be like bookmarks in story. From one of these bookmarks, we can also jump to specific setups using the switch_to_set(“setname”), where setname is the name of the set you’d like to switch to.
Unfortunately, this trick doesn’t work in the retail build because we don’t have a console to type Lua into. Thanks to klusark, who recently fixed mklab, we can now unpack and repack the .m4b file and insert our own code into the game’s data. I’m using two scripts, one which extracts a Lua script, and another that pushes in the new script and rebuilds the .m4b file.
I used these scripts to unpack the _control.lua script and wrote this patch (containing only my own code, so it should be okay to post) to add a dialog box for typing in Lua to be executed directly. This mimics the console in ResidualVM, but works in the retail build as well. You enter this console by pressing the z key. We can also print variables as the message line for the dialog box by assigning a value to the dd variable.
Now, let’s take a look at what’s going wrong with the chandelier lights. Examining gmi.lua, the script that contains the setup for the Governor’s mansion, we see that the main chandelier actor is located in the variable gmi.chandelier. There’s another actor called gmi.chandelier2 that is attached to the first chandelier variable that actually has all of the candle actors attached to it. So, let’s explore these variables and see if we can figure out what the problem is.
Using the new console tool, we can press z to bring up our debug console, switch into the Governor’s Mansion setup and print out the position for the chandeliers:
- dofile(“_jumpscripts.lua”)
- jump_script(“mansion interior”)
- dd = gmi.chandelier:getpos()
- dd = gmi.chandelier2:getpos()
Here’s the output from the retail version for gmi.chandelier2:
Here’s the output from ResidualVM:
So, the chandelier2 variable is in the wrong position, with the y axis being -1 instead of 1 as in the retail version. Let’s see if just moving chandelier2 to the right place fixes the problem:
- gmi.chandelier2:setpos(0.0, 1.0, 0.0)
Indeed it does! So, what broke it? After inserting some more debugging print outs, I found that the actor.attach method was the culprit that was adjusting the position. Specifically, the call to the AttachActor lua method was causing the problem, which leads us to the method:Actor::attachToActor. In this function, the attached actor’s position is modified by subtracting the base object’s position from the attached actor’s position. Checking against the retail version, it seems that the subtraction should be reversed, with the attached actor’s position subtracted from the object it’s being attached to.
I injected a new lua script with a series of attachments and compared those results with the retail engine and things looked perfect! Or did they…
Continued from the previous entry.
Before digging into the Quaternion implementation, I first inspected the retail version of EMI again to verify its approach to applying the rotation angles.
In the scene ‘mig’, I moved Guybrush to the position (1,0,-7) so it was easy to see the result of rotation. Next, I set the rotation to (0,0,0), resulting in the following, which represents our origin pose:
To make sure that nothing else is interfering, I’ll set Guybrush back to this rotation before each operation. Next, I set Guybrush to 90 degrees for each of the rotation angles to determine the name for each principle axis:
- guybrush:setrot(90,0,0) – Pitch (rotation in the Y-Axis)
- guybrush:setrot(0,90,0) – Yaw (rotation in the Z-Axis)
- guybrush:setrot(0,0,90) – Roll (rotation in the X-Axis)
We now know which axis is which in the setrot() method. Using this information, we’ll determine what order the rotations are applied by combining the principle rotations:
Which is produced by the following rotations:
- guybrush:setrot(45,45,0)
- guybrush:setrot(0,45,45)
- guybrush:setrot(0,45,45)
- guybrush:setrot(45,45,45)
From this, we can say that the setrot() method’s arguments are definitely Pitch, Yaw, Roll, in that order. From the combined rotations, it appears that the rotations are applied in the order ZXY (by axis). With this information, I can ensure that the Quaternion implementation is using the correct rotation order.
Continued from the previous entry
With the segfault resolved, the bug no longer crashes ResidualVM, but instead, the game gets stuck, preventing the player from continuing. In the game log, we see some messages that might help us to determine the problem:
In this error, we see first that the Lua interpreter is printing lua: (null). This indicates that a value was unexpectedly null. The Active Stack tells us where this error occurred, much like the backtrace in the previous entry, but for the Lua script. Finally, we see the warning that we added in the previous entry, telling us that there’s a registry key read, SpewOnError, which doesn’t cause the segfault anymore because of our fix. So that’s good!
In the stack trace, we see that this failed on a call to a tag method named (helpfully) function, so let’s see if we can find that code. In the EMI Demo, the scripts of interest are located in the MagDemo.lab file. Following the same directions as before, unlab the file and delua the Lua scripts.
Inside, _options.lua, there’s a comment with the original line number, 1503. Search for that and we find that the function called was main_menu.return_to_game, which kind of makes sense as to why things are getting fouled up. In the original demo on Windows, pressing F1 does not bring up the main menu, but rather, does nothing, while pressing ESC skips the cutscene.
It appears that the game is in a wrong state, but it would be helpful to have more information about the problem and more details as to what was run. Let’s enable debugging information in ResidualVM to see if there’s anything else that can help us track this down.
In ResidualVM, there are debug flags that can be enabled from the command line, like this:
- ./residualvm –debugflags=<flag list separated by commas>
In addition, there’s an in game debug mode you can enter by pressing CTRL-D. This will bring up a console from which you can turn on debug flags. For both, individual classes of flags may be enabled instead of all flags if you’re working on a specific area of the engine.
Let’s tackle the part of the bug where pressing ESC doesn’t skip the cutscene first. With all of the debugging messages on, we get these messages in the log when we press ESC during the opening cutscene:
Following the path of execution, we see a call to GetControlState(). This function returns the state of the key being passed in. From common/keyboard.h, we see that the keys its checking are:
-
- KEYCODE_LCTRL (306)
- KEYCODE_LALT (308)
- KEYCODE_BACKSPACE (8)
- KEYCODE_LCTRL (306)
- KEYCODE_LALT (308)
From this sequence, it appears that the script being run is the SampleButtonHandler (in _control.lua), which includes the first three calls, then the CommonButtonHandler (in _control.lua) which does the second two.
We then see that the script reported that the Override key was hit. This code is in the CommonButtonHandler which then calls the call_override function, which can be found in the _system.lua file. This function is supposed to stop the current script if the system override is active. Let’s check the value of this variable.
Using the console, type:
- lua_do if(system.override.is_active) then PrintDebug(“Active”) else PrintDebug(“Inactive”) end
We find that the override is inactive after starting the game up again and checking the log. So this function call does nothing, control is handed back to the button handler and the movie continues. This reflects the behavior that we see when playing the game, the ESC key is ignored.
Also, in the previous screenshot, we see a repeated series of functions:
- function: IsMoviePlaying()
- function: break_here()
These function calls either come from a function in the _system.lua script called wait_for_movie which does these checks repeatedly until the movie is finished, or from the StartMovie function which contains similar logic.
In the actual game, in _cut_scenes.lua, there’s a call to EscapeMovie when the override key is pressed during the playback. In the demo, RunFullscreenMovie is much simpler, without this logic. In the Demo, the BOOTTWO function, which is part of the game scripts’ startup sequence. In this function, there’s a call to a function called StartMovie this function begins playing the intro movie. So the demo doesn’t use RunFullscreenMovie at all! We can confirm that there is a movie named intro by checking in the movies directory, so we are sure that this is the code that starts the demo and plays the movie.
So, how did it work in the original interpreter in Windows and what can we do to fix it in ResidualVM? We’ll keep digging in the next blog post!
When playing with the demo version of EMI (which you can get for free here from the ResidualVM project), I found that when you press F1, which usually brings you to a menu, the game pauses, but no menu appears. While this might be the intended behavior, pressing Esc now causes ResidualVM to crash. Let’s explore this crash and figure out how to fix it!
First, we’ll need to restart ResidualVM with a debugger. Start up ResidualVM with the command:
- gdb ./residualvm
When the prompt comes up, type run to begin running ResidualVM. Start the game as usual, and then trigger the crash. When the crash occurs this time, in our window with gdb, we can see that the debugger has caught the error and stopped execution:
We can see that there is an error from ResidualVM itself, warning us that there was a null value in the Lua engine from the ‘gettable’ tag method, and also the actual crash in ResidualVM which follows that.
It’s helpful to see how we got here, and GDB let’s us do that by checking the backtrace. Type bt to see the backtace, which is a list of the function calls that brought us to this location. Typing up or down moves the current position up or down in the stack, letting us look at different variables at each call. To print out a variable’s value, use the print or p command followed by the variable name you’re interested. It’s also sometimes useful to see the code that preceded the error. Using the list or l command will print the code around the current position in the stack.
After some exploration, we see that we dereferenced the g_registry variable when it was null, causing the segfault.
Now that we know what’s causing the problem, let’s look at the engine code for the registry functions and see if we can identify why this is breaking in the demo.
In engines/grim/grim.cpp (line 87) we see that the registry is initialized when the engine is started and the game type is ”GType_GRIM”. In engines/grim/detection.cpp (line 413) we see that the game type for the EMI Demo is ”GType_MONKEY4”. So, the registry is never created for the demo, and because of that, the g_registry variable is never set.
The registry for Grim Fandango holds settings and values that would normally be found in the Windows registry. As of now, there are no EMI specific registry options set up and the registry is specific to Grim Fandango. Although we’ll probably need to implement something like this later on, for now, let’s fix the segfault by preventing the code from dereferencing g_registry when it’s null.
Searching through the code, we find that the only accesses to g_registry that don’t check for a null value are in engines/grim/lua_v1.cpp in the functions:
- Lua_V1::ReadRegistryValue
- Lua_V1::WriteRegistryValue
- Lua_V1::postRestoreHandle
Looking back at our backtrace, we see that indeed, the path that the code took went through the ReadRegistryValue function before segfaulting. Since the code in postRestoreHandle that accesses g_registry is only used in games with the ”GType_GRIM” tag, we can safely ignore this instance as those games will always have the g_registry variable set. With checks added to see if g_registry is defined before performing registry actions in the functions ReadRegistryValue and WriteRegistryValue, the bug is fixed and ResidualVM no longer crashes. I also added a warning to let anyone else know that there was an attempt to access a registry that didn’t exist. To fix this bug properly, we should override these functions so that they point to the EMI_Registry instead. For now, we’ll stick with just fixing the segfault.
So now, we have fixed the segfault, but the game is now stuck, preventing the demo from continuing further. In the next post, we’ll discuss how to fix this issue.
Continued from the previous entry
In this post, we’ll be focusing on variables, function calls and the structure of the decompiled function, SetActorLocalAlpha.
In the previous entry, we examined the Lua script that calls this function and identified its arguments. Applying that knowledge to the disassembled code indicates that the first 4 calls to lua_lua2C are actually calls to get the parameters for the function. As such, this code can be re-written with descriptive variable names. Additionally, the types suggested by these parameters suggest similarities with code that’s already been written.
Starting with the first parameter, we see that in the script, the function is called by itself, with no colon or period operator. This indicates that the function is standalone and not a member of any class. Next, we see that the first parameter is self.hActor, as seen in the call to SetActorLocalAlpha in the previous post. Since we know that the variable is a member of the “self” object, we need to identify what this variable is used for. Often, it’s possible to tell the object’s type by looking at what sets the member variable. Searching through the scripts for “hActor =” will identify where the hActor variable was set, and we’re in luck! The file _actors.lua is the only file in the scripts directory that matches this criteria. Let’s take a look at where it’s used.
The first hit we get is in the actorTemplate, a structure that is used in the Lua script as a template for all new actor objects. While this is useful for identifying members of the Actor class, this doesn’t help with identifying the type for hActor. Let’s move on to the next instance.
In this function, Actor.create, we see that the actorTemplate is copied into the variable local1. The variable later has the hActor member set by saving the return value of the function LoadActor. From the source for LoadActor found in engines/grim/lua_v1_actor.cpp (line 37), we can see that this function creates a new Actor, and therefore, the type for hActor is most likely an Actor. While this might seem a bit obvious, when things are less obvious, you’ll still follow the same basic steps.
With the knowledge that this variable is an Actor object, we can improve our translated code by naming the variable that we’ve saved the 1st parameter actorObj. We can also continue through the code and simplify functions that use this a parameter. It is also a good idea to compare other uses of Actor objects to see if already rewritten code matches what has been found so far. In this case, we see a call to lua_isuserdata and lua_tag:
The call to lua_isuserdata is checking that the 1st parameter contains an object with the type UserData. If the 1st parameter doesn’t have a UserData object, the whole function will just return. In the second box, the call to lua_tag is comparing the UserData’s tag with the number 52544341h. While this might seem like a random number, if we interpret this value as a string of four characters, they spell out ‘RTCA’, or ‘ACTR’ in little endian.
Before we continue, let’s look at what we mean by Lua UserData and tags. In the ResidualVM wiki we see that in the modified version of Lua used in this engine, variables are saved in a pool and are identified by their pool id number and tagged with an identifier. The id number is used to retrieve the data from the pool, while the tag is used to identify the type of the object. Going back to engines/grim/lua_v1_actor.cpp, when the engine loads the Actor, it creates a new Actor object instance, then adds this instance into the pool. It is also applying the tag using a macro: MKTAG.
So, applying this information, we interpret these two lines of code as checking to see if the variable is UserData, and if so, does it have the tag ‘ACTR’. If not, the code will return. This bit of assembly can now be converted into C++:
To this point, we haven’t really added anything to the project yet since this code had already been worked out by one of the previous developers. In the next post, we’ll start working on filling in the missing parts of the code with the information that we’ve learned so far.
Continued from the previous entry
After our work from the previous post, we have a skeleton of the functionality provided by the Lua command we’re working on. However, working from assembly isn’t always the best approach. For EMI, we already have a whole lot of help from the excellent work done by the developers who previously worked on Grim Fandango, EMI and Myst3 support in ResidualVM! In this project, they have provided us with code and tools to inspect the scripts that are being run in the game. Let’s inspect a script that called the function stub we’re working on.
First, let’s take a look at the game scripts. In EMI, the game scripts can be found in the file local.m4b. This file is actually a bundle of files, which can be extracted using the tool unlab, found in the residualvm-utils repository. Once you’ve built this utility, let’s unpack it so we can get at the Lua scripts that make up the game. In the directory with the EMI data files:
- mkdir local
- cd local
- unlab ../local.m4b
After running this command, inside the local directory there will be a large number of files, the most important for us now are the files that end in .lua. If you inspect these files, you’ll see that they’re not text, but a binary format. To make the scripts readable, we’ll use the tool delua, also found in the residualvm-utils repository. To make this easier, we’ll just decode all of the scripts in the directory at once so we can easily search through them:
- mkdir scripts
- for i in *.lua; do delua $i > scripts/$i; done
- cd scripts
With the scripts converted into a readable format, we can now search through them for instances where our function of interest is used:
- grep SetActorLocalAlpha *.lua
From this, we can see that when SetActorLocalAlpha is called, it’s called with 4 arguments. These correspond with the 4 calls to lua_lua2C at the beginning of the disassembled function from the previous entry. Importantly, we also see that the first variable passed is arg1.hActor. This is useful information because it gives us context for the type of the variable and hints as to how its used in the code. We can also look at other Lua functions that are called from SetActorLocalAlpha to find the types of the variables used.
Before we go back to SetActorLocalAlpha, let’s examine the set_vertex_alpha_mode Lua function more closely. As you can see in the code listing above, there are a bunch of unnamed variables. Let’s figure out what they mean so that we can better identify the arguments to SetActorLocalAlpha. Let’s start with the function arguments.
In Lua, a method function can be called in two different ways, with a period between the class and the method or a colon. These methods differ in that the second transparently passes a reference to the object as the first argument. In most applications, we name this argument “self”. In wed.lua, we see that the set_vertex_alpha_mode method is called using the colon operator. Because of this, we know that the first argument is really the “self” variable. Once we have identified a variable like this, we’ll update the rest of the function to reflect this new knowledge. We’ll continue this process, using cues from the calling function and the contents of the function to name the rest of the arguments and the local variables.
With all of this additional information, we can now continue filling in the details for the SetActorLocalAlpha function.
Continued from the previous entry
A Stubbed Function is one in which the function is present, but doesn’t implement the full behavior required. In ResidualVM, the EMI engine has a number of stubbed functions which represent the unfinished Lua function calls. These functions print a warning to the console to show that they’re actually used, and when they’re used. In the previous post, I identified a function, SetActorLocalAlpha that was stubbed and located the code in the original binary that implements this function. In this post, we’ll work through the assembly and create a patch implementing the missing functionality from this call.
After renaming the routine to SetActorLocalAlpha, we start at the beginning of the disassembled code. Here, we see a representation of the stack with each variable representing the local variables in this code. After these, the first real code from the function is present. In our first pass over the code, I usually start by examining any functions called from here. In this code, the first function we see called is:
?lua_lua2C@@YAIH@Z
While this looks a little confusing, this function call is encoded or mangled to ensure that it doesn’t collide with any other named functions in this program. Mangling names allows for implementations with different calling parameters, such as the overloading functionality found in C++.
Dropping this name into a demangler gives us this result:
unsigned int __cdecl lua_lua2C(int)
Which is a whole lot more readable! We know that the EMI engine uses Lua to execute the game scripting, so looking into the Lua documentation for what this does will help us to understand what the decompiled function is intended to do. We can also look at the existing code for ResidualVM and its Lua implementation for information. With some research, we find that Lua maintains a stack of values or objects and this is how values are accessed from this stack.
As we go, we examine the structure of the branches and jumps (helpfully represented as blocks in IDA) to sketch out the shape of the code. I usually re-write the code in C as I work through it, so, after the first pass my code will contain the function calls, if/else statements and loops.
The code at this point is really just pseudo-code, but we’ll expand on it further in the next pass, in the next post.