New Game Plus

Well guys, I already got optimizations done for x86_64 and ARM on the Adventure Game Studio, which is almost everything I wanted done with it anyways. All I have left is optimizing it for PowerPC, (and possibly other architectures, but I can’t think of any other ones). Anyways, my mentors and I both thought it would be better to also work on the general Graphics::ManagedSurface code and optimize that instead of optimizing only AGS code. So my plan looks as follows:

  • Clean up the Graphics::TransparentSurface code by moving its blitting functions into graphics/blit-alpha.cpp.
  • ┬áThen add the relevant blitting methods into Graphics::ManagedSurface, and then…
  • Phase out Graphics::TransparentSurface, first by removing it from the Broken Sword 2.5 engine, and then possibly removing it from the other engines.
  • Then start on a way for me to put CPU extensions detection into ScummVM so I can use SSE2/SSE4/AVX depending on what’s available (or NEON if arm7 supports it).
  • Actually implement the vectorized versions of the blitting/blending code for the new Graphics::ManagedSurface.


And when I finally get back to my AGS code, I will have to just make sure that it compiles on all of ScummVM’s targets (even if I didn’t specifically optimize for them, you know just in case), and try to get PowerPC to work (it was proving quite difficult as I don’t have PowerPC hardware except for my Xbox360 I guess).


Intel and AMD!

I finally ported over my vectorized code to Intel and AMD chips! And with time to spare, because my midterm evaluations are coming up (I’d like to thank my mentors they have helped me so much, I wouldn’t have done all of this this quickly without their help). So yea where to go from now? My current plans are to port the code to PowerPC’s AltiVec extensions and make a AVX2 version of the SSE2 code I made for x86_64 processors. Other than that here are some pictures I took of weird bugs while porting the Arm NEON code to x86 SSE2 with some comments, (which these pictures are dearly needed, this blog has been quite boring without pictures).

I think this was the first picture I took. Here is what the game “Kings Quest 2: AGDI” looks like with only 32bit pixel graphics blitting (I didn’t implement 16bit pixel formats yet here)
Same build as the one above. As you can see by the water on the shore I got alpha blending working correctly, but there is some off by one error at the right of the screen where it overdraws a pixel or 2.
Yea, so when I finally did get 16bit blitting/blending working, I noticed that scaled images were being messed with a lot and well just looked completely borked.
This is probably the worst looking picture of them all. Its got the nasty off by one error and the main character looks like something is not right…

Now don’t worry, I fixed all the bugs. In fact you will be able to tell that it’s fine once my PR (#5144) gets accepted. Hopefully it makes your games run a bit faster (even if you don’t have vector extensions in your computer).


How Testing is Going! (And How Easily Vectorize Any Algorithm)

Good Morning! This post builds off of my last one here about the test code I’m writing. So to actually test these functions I moved the cobbled together benchmarking code from some random function in the engine init code into another function already found in the AGS codebase: Test_Gfx. So from there I actually needed to implement the testing functions for the blender modes and for the actual drawing functions themselves to make sure they work pretty much in the same fashion. So far I only have the blender mode unit test working at it pretty much is just a loop for every parameter of the blending function and it asserts to make sure the original function and the current Arm NEON one are the same. But right now I’m still ironing out a LOT of small edge cases with the blender function that don’t really show up in real usage, but should still be the same.

Oh yea, I also thought it would be nice to just to let people know that using SIMD intrinsics is not as hard as it sounds. So I’ll speak for myself since there is a lot of people reading this who won’t think it’s a difficult task to vectorize a function, but I sure did when I started this GSOC job. So here is a pretty simple way of making things SIMDized.

  1. Load your data into a SIMD data structure.
    For example, Arm NEON has a helpful function: vld1q_u32 (SSE equivalent is _mm_lddqu_si128, a little scarier looking).
    This takes a pointer to data, and stores it into a uint32x4_t structure, which is pretty much just an array of 4 uint32_t’s. Sometimes it’s not this easy though, and you have to load stuff in 1 at a time in serial. Storing is pretty much the same but the opposite of this.
  2. Just translate normal serial operations into SIMD ones.
    + becomes vaddq_u32
    * becomes vmulq_u32
  3. I’ve only told half the story though, while most of porting serial code to SIMD is this simple, you will have to mess with the order and structure of your vectors (uint32x4_t’s and __m128i’s) to make them work nicely. You may want to, for example take a uint32x4_t of ARGB pixels and transform them into 4 uint32x4_t’s of alpha, red, green, and blue of each pixel to make the math operations easier, and this is where the other functions that convert and swizzle vectors come in handy.