Dirty Rectangles system performance considerations

As I’ve spent the last week fixing some minor bugs and documenting code, I wanted to analyse better the effects of this system on the current games.
I’ve done some profiling by measuring fps in two modes: analysis and release;
Analysis build is a build with less optimizations enabled and debug symbols whilst release mode is the classic o3 build with every possible optimization enabled.
The scenes used for these tests are the following:
EMI – ship scene, lucre island and act 1 beginning.
Grim demo: first scenes of the demo.

Here are some screenshots for clarity.

And here are some results.

Before dirty rectangle system (analysis / release):
Ship scene: 13.50 / 57 fps
Lucre Island: 9 / 47 fps
Act 1 Beginning: 25 / 135 fps
Grim scene 1: 50 / 160 fps
Grim scene 2: 62 / 220 fps
Grim scene 3: 57 / 243 fps
Grim scene 4: 60 / 205 fps
After dirty rectangle system (analysis / release):
Ship scene: 12 / 55 fps
Lucre Island: 9 / 45 fps
Act 1 Beginning: 24 / 133 fps
Grim scene 1: 23 / 136 fps
Grim scene 2: 62 / 500 fps
Grim scene 3: 27 / 180 fps
Grim scene 4: 42 / 250 fps

As we can see dirty rects introduces an heavy overhead, especially with analysis build; but release build is somewhat balanced: the fps is pretty much the same for crowded scenes whereas it goes up by quite a bit if the scene has only a few animated objects (like grim scene 2 or scene 4, where animated objects are small and dirty rects yield some performance advantage).

In my personal opinion dirty rects should only be employed on some specific scenarios, as its overhead generally slows down the code and it only shines in some cases.
Dirty rects is a system that is probably better off being used in 2D games where screen changes are more controllable and there is no need to perform more calculation to know which region of the screen is going to be affected.

Developing this system was quite challenging and it took a lot of time but I think that the overall task was beneficial because it gave us an insight on how this could have affected performance: I think that implementing this system on an higher level of abstraction might result in being more effective but more research would be required for doing so (such system would not be applicable for this project though as the engine has to support a vast variety of games).

Dirty Rectangle System Pt 4

In the past week I’ve been working on the last two tasks for dirty rects:
The first one was implementing a system that allowed to clip a draw call and only render a portion of what’s supposed to be drawn, I implemented this in three different ways, based on the type of the draw call:

Blitting draw calls were implemented with a scissor rectangle logic inside the blitting module: this allows the clipping to be very fast as the clipped parts are skipped completely.

Rasterization draw calls are implemented in a different way: this time the scissor rectangle function is implemented on a pixel level, which means that every pixel is checked to be within the scissor rect before being written to the color buffer: this allows the dirty rects system to ignore everything that is outside the dirty region and thus manages to not cover regions that shouldn’t be touched that frame.

Clear buffers draw calls are clipped with a special function inside FrameBuffer that only clears a region of the screen instead of clearing everything.

This covers the implementation of the first sub task: the second one was to detect which regions of the screen changed and output a list of rectangles that need to be updated.

This task was implemented by keeping a copy of the previous frame draw calls and comparing the current frame draw calls with the previous one: this comparison tries to find the first difference between the two lists and then marks as dirty every rectangle that is covered by subsequent draw calls.
Once this list is obtained with this method I also use a simple merging algorithm to avoid re-rendering of the same region with overlapping rectangles and also to reduce the number of draw calls.

What happens after I have this information can be described with the following pseudocode:

foreach (drawCall in currentFrame.drawCalls) {
    foreach (dirtyRegion in currentFrame.dirtyRegions) {
        if (drawCall.dirtyRegion.intersects(dirtyRegion)) {
           drawCall.execute(dirtyRegion);
        }
    }
}

There’s only one problem with this implementation: I have found that EMI intro sequence is not detected properly and causes some glitches whereas everything else works fine.

From what I’ve seen though this method isn’t helping the overall engine performance by much: as most of the time is spent in 3D rasterization this system doesn’t cope well with animated models that change very frequently and fails to be effective.

I will keep you updated with how this progresses in the next blog posts, stay tuned for more info!