The performance of the scaler is fine now, even in debug builds without optimization. I have reimplemented the changed pixel detection through a new part of the API. The backend queries the plugin to see if it supports using an old image to compare changed pixels. One problem is that panning the screen causes the whole image to be reupdated and the mouse movements to become choppy. However, it happens rarely enough that it really is not an issue, and in optimized builds it does not matter.
I also templated the function for 32bpp support. This scaler is unique in that it uses the products of interpolation to then compare to other pixels (in other scalers, the products of interpolation are only written to the final image). The existing interpolation functions mangled the alpha channel (in the case of rgba and argb) and padding bits (in rgb888). This caused quirky image defects to happen that were trickier to track down, since the different alpha channels caused the comparisons to be different without changing the color of the pixels.
In the past, I had debugged these problems by simply returning the color red from an interpolation function. Then I would see if the broken pixels turned red. Since this scaler compared the results of the interpolation, whenever I followed a similar technique, the scaler would choose a different path, and the image would change in more chaotic ways (e.g. lines ceasing to anti-alias, black pixels appearing (but not red)). Everything at least appears to be fixed now.
Here are some sample images scaled with the 32bpp scaler.