Hi,
Just a quick update from the last one. I did several commits on Tuesday to finish implementing flameSet and get ready to start on sprite drawing. I am in the process now of translating the sprite drawing, however I will be away for a few days (back on Monday), and I will not be able to work on this during that time. When I get back, I will be working on sprite and tile drawing (tiles also use the sprite function superSprite to draw). In the mean time, I will leave you with an explanation of this fascinating little routine used in processing cycles:
Now the interesting thing here is the LDA statement at the end. This looks fairly normal, LDA >image2 meaning LoaD into A the value at (>) ‘image2’. However, there is something really interesting going on. Let’s start by just going over what the routine does step by step:
In: ACC = cycID, Y = cycIndex
– cycIndex -> tmp
– switch to 16 bit mode
– preserve X
– grab just the 8bit index
– cycID -> X
– load the pointer from cycPtrs that corresponds to cycID
– add tmp (this gets us to the frame within the frame list inside the cycle)
– store to……the argument of the future LDA operation? Okay, something weird is going on…
– back to 8bit mode
– load the value at…….no longer ‘image2’, but rather, the address of the frame within the cycle? (this gets us the value of the frame at the address we put together dynamically earlier)
Alright, so now it should be clear what’s going on, and why it’s interesting. This is only possible thanks to the program both running inside regular work memory (as opposed to consoles, where the game is running in ROM), and being written in assembly. This is what that looks like:
What the routine does, is utilize the fact that the argument of the LDA opcode, is technically in mutable WRAM just like the any other RAM address (for example, ‘image2’). This means there’s no reason you can not modify a future instruction dynamically along the way. This is referred to as ‘self-modifying code’ and in principle can be very dangerous. Dynamically changing instructions has pretty huge implications. However in this case it’s only a tiny little change happening. So you may be asking then, why would it need to do this at all? You can always use X and go LDA >startofcycles,X, or LDA (DP) where DP is the full address stored in Direct Page memory. Well, in the latter case, the only difference is that you waste DP memory that way. Otherwise the cycles are the same (STA addr = 4, STA dp = 3, LDA addr = 4, LDA (dp) = 5). For the former however, the comparison is STA addr (4) + LDA addr (4) vs TAX (2) + LDA addr,X (4-5) + SEP #$20 (3) (you need to index in 16 bit, but LDA in 8bit, so the SEP #$30 becomes SEP #$10, and there is an additional SEP #$20). So of the three options, we have 8 cycles while using extra DP memory, 8 cycles with no DP memory, or 9-10 cycles with no DP memory (and potentially other small performance changes by using the X register). This is more or less what that looks like:
So yes, this is as far as I can tell, the more efficient way to write this routine. But was the cost of making the routine so strange and hard to read, really worth the 1-2 cycle performance improvement? I’ll let you decide 😛