Categories
Uncategorized

How Testing is Going! (And How Easily Vectorize Any Algorithm)

Good Morning! This post builds off of my last one here about the test code I’m writing. So to actually test these functions I moved the cobbled together benchmarking code from some random function in the engine init code into another function already found in the AGS codebase: Test_Gfx. So from there I actually needed to implement the testing functions for the blender modes and for the actual drawing functions themselves to make sure they work pretty much in the same fashion. So far I only have the blender mode unit test working at it pretty much is just a loop for every parameter of the blending function and it asserts to make sure the original function and the current Arm NEON one are the same. But right now I’m still ironing out a LOT of small edge cases with the blender function that don’t really show up in real usage, but should still be the same.

Oh yea, I also thought it would be nice to just to let people know that using SIMD intrinsics is not as hard as it sounds. So I’ll speak for myself since there is a lot of people reading this who won’t think it’s a difficult task to vectorize a function, but I sure did when I started this GSOC job. So here is a pretty simple way of making things SIMDized.

  1. Load your data into a SIMD data structure.
    For example, Arm NEON has a helpful function: vld1q_u32 (SSE equivalent is _mm_lddqu_si128, a little scarier looking).
    This takes a pointer to data, and stores it into a uint32x4_t structure, which is pretty much just an array of 4 uint32_t’s. Sometimes it’s not this easy though, and you have to load stuff in 1 at a time in serial. Storing is pretty much the same but the opposite of this.
  2. Just translate normal serial operations into SIMD ones.
    + becomes vaddq_u32
    * becomes vmulq_u32
    etc…
  3. I’ve only told half the story though, while most of porting serial code to SIMD is this simple, you will have to mess with the order and structure of your vectors (uint32x4_t’s and __m128i’s) to make them work nicely. You may want to, for example take a uint32x4_t of ARGB pixels and transform them into 4 uint32x4_t’s of alpha, red, green, and blue of each pixel to make the math operations easier, and this is where the other functions that convert and swizzle vectors come in handy.

Leave a Reply

Your email address will not be published. Required fields are marked *