vercidium

Forward vs Deferred Rendering

Added 2024-12-22 23:27:11 +0000 UTC

Hi-res video and screenshots are available here

Most games use one of two rendering systems:

Forward Rendering - objects are rendered at full quality when they are first rendered
Deferred Rendering - objects are rendered quickly, then expensive effects are applied later on

The term 'effects' is used a lot in this article. It refers to expensive effects like lighting, shadows and fog

I've always used deferred rendering systems, but decided to try forward rendering for my low poly forest as I wanted to learn something new.

I always thought deferred rendering was the way to go, but the benchmarks at the end of this article say otherwise!

Waste

The purpose of both rendering systems is to prevent waste. Waste is when your GPU spends a lot of time rendering something that you never see. For example, in my forest project objects are rendered in this order:

terrain
vegetation
buildings
players

The terrain is rendered first, with all effects (lighting, shadows, fog) applied:

Then trees and grass are rendered over the top:

This means that most of the time spent rendering the terrain - that's now covered by vegetation - was wasted. I'd guess that over 50% of the original terrain pixels are now covered by grass + trees.

And now some of the grass + trees are covered by the building.

At each stage of rendering, we're covering up part of the result of the last stage. This is what I call Waste, and I've highlighted it in red below:

Each rendering method - forward and deferred rendering - uses a different method to reduce waste. First we'll look at Forward Renderers.

Waste Solution #1

The first solution is the simplest to explain, but difficult to implement. Let's say the player is right up close to our camera, and we render them first.

When we render this player, we are writing to two textures:

RGB colour texture
Float depth texture

The colour texture stores the pixels you see, and the depth texture stores how far away each pixel is from your camera. Think of depth as a 3rd axis that goes into your screen.

The depth texture only stores one number for each pixel (as opposed to 3 values in a RGB texture), which is in the range 0.0 - 1.0. Pixels default to the value 1.0, which is shown as white in the image below. A value of 0.0 represents the closest a pixel can be to your camera.

The player's body is slightly lighter as it's further away than their head:

The main purpose of the depth texture is to stop far-away pixels from rendering on top of close-up pixels. If we didn't use a depth texture and rendered the world after the player, it would look like this:

By using a depth texture, we can prevent far-away pixels from rendering on top of close-up pixels:

The depth texture now looks like this:

This depth texture also allows for a powerful optimisation that's used in nearly every 3D game: Early Depth Testing. In the old days, we would check each pixel's depth after the fragment shader runs:

But modern GPUs swap this around by checking the depth of a pixel before the fragment shader runs:

This prevents the fragment shader from spending time shading pixels that we'll never see.

However this optimisation doesn't work unless we render the close-up objects first. If we render the terrain first and then a large building in front of it, the fragment shader will run for both terrain and the building, because the building is closer to the camera.

But if we render the building first and then the terrain, the fragment shader won't run for the pixels in the terrain that are behind the building. This is an amazing optimisation!

But if there's a building between two trees, what do we render first?

We have three options:

Draw all trees first, meaning some time spent rendering the trees in the background will be wasted
Draw all buildings first, meaning some time spent rendering the building will be wasted
Sort every object in the world based on its distance to the camera, and render the closest objects first

This is where things get complicated. With thousands of trees and buildings, it can be slow to calculate the order to render them. Is it quicker to just waste rendering pixels on the GPU, than it is to calculate the order of all objects on the CPU?

I tested a worst-case scenario, where the camera is positioned right next to a building:

If I render the building first, the scene renders in ~2.1ms (470 FPS)
If I render the building last, the scene renders in ~2.8ms (350 FPS)

That's a saving of 0.7ms. If the CPU can order the objects in less than 0.7ms, then it's worth doing. But if it takes longer than that, it might be worth looking at deferred rendering.

Deferred Rendering

With deferred rendering, the goal is for rendering objects to be so cheap that it doesn't matter what order you render them in.

In our forward renderer, the effects are applied at the end of each terrain shader, tree shader, grass shader, etc:

Deferred rendering delays these expensive effects until after everything has rendered. To do this, we need to keep track of all of the parameters that are passed to the ApplyEffects function, so we'll store them in extra textures.

Before I mentioned that the forward renderer writes to two textures:

RGB colour
Float depth

Our deferred renderer needs 3 more textures:

XYZ normal
XYZ world position
Float reflectivity

This ensures that we are keeping track of all necessary data to apply these effects later.

Then after the world has rendered to these 5 textures, we'll use a new shader to read from them, apply the effects, and render it to the screen:

Note this shader isn't the same as a 3D model shader. It just runs as a large rectangle that's the same size as the window. This is referred to as the 'Window Buffer' in my engine, and in this free code repo

We've now guaranteed that expensive effects will only be applied to visible pixels.

But this seems too easy - what's the catch? Writing and reading from these extra textures isn't free, and it's still important to ensure we render the world from front-to-back to ensure the fragment shaders are doing as little as possible.

I implemented deferred rendering for the forest project and compared their performance when the camera is pressed up against a building:

Note: there's an extra deferred rendering optimisation at the end of this article that speeds this up!

There's a drop in both approaches when rendering back-to-front, but it affects forward rendering the worst. When rendering front-to-back, both approaches are nearly identical!

But there is an extra problem to solve when using deferred rendering: transparency.

Deferred Caveat: Transparency

Since effects aren't applied until the very end, transparent objects like water will blend onto the un-shaded terrain (the raw colour texture).

This means we need to render transparent objects after the deferred shading step. But something doesn't look right:

We can see water + fog through the terrain. This looks similar to the strange output from before when we weren't using a depth texture. This is because the deferred shading step writes directly to the screen, which doesn't have access to our depth texture.

So we'll create a new framebuffer (FBO) that has two textures:

RGB colour texture
Float depth texture

The deferred shader will write to this colour texture instead of the screen. Then we'll render transparent objects onto the colour texture, which means they'll blend onto the correct colours.

This extra framebuffer isn't the end of the world, but it's one extra step that makes deferred shading slightly slower:

Note the depth texture in the World FBO and Deferred FBO are the same, as textures can be shared across FBO's

Whereas the forward rendering approach is much simpler:

Both approaches use a dithering shader to avoid colour banding. You can read more about that here

Forward Optimisation - Depth Pre-Pass

If sorting objects front-to-back is too complex, you can instead perform a depth pre-pass. This is where you render the whole world, but only write to the depth texture. No fragment shaders, no colour, no lighting.

Then, we'll render the scene normally, but instead of using a 'Less' depth comparison (i.e. only process pixels that are closer to the camera), we'll use an 'Equal' comparison. This means that the fragment shader only runs for pixels that align exactly with the depth texture.a.

This optimisation doesn't work well for the forest scene because there is a lot of geometry (object shaders need to run twice now), but it may improve performance for games that are shading-bottlenecked rather than geometry-bottlenecked.

Deferred Optimisation - Compression

The extra bottleneck added by deferred rendering is the time spent writing to the extra textures. With 5 textures, we are writing 29 bytes for each pixel:

Colour - RGB16 (16 bit to remove banding - see my Modern Dithering with OpenGL post) = 6 bytes. Possibly 8 as RGB textures are still stored as RGBA (for 4 byte alignment)
Depth - R32F = 4 bytes
Normal - RGB16F = 6 bytes
Position - RGB32F = 12 bytes
Reflectivity - R8F = 1 byte

For a 1920x1080px image, that's 57 megabytes. That's a lot!

If we render back-to-front, we'll have a lot of waste as we're writing extra data to these textures that gets covered up later. So it's possible we're writing over 100 MB of texture data each frame.

Let's see if we can reduce the amount of texture data we need, and if it has an effect on performance.

The first thing we can do is compress normals into pitch-yaw byte vectors (6 bytes down to 2 bytes). This is explained in my video here (timestamped). This results in some data loss, i.e. normals will appear smoother than before. You could use 16 bit floats for more quality.

We can completely remove the position texture and instead reconstruct the position from the depth texture using fancy matrix maths. I don't understand it well enough to explain it, it's better to read the linked article. This is a common 3D rendering trick.

Reflectivity is already as small as it can be (1 byte).

Now we're only storing 3 more bytes than forward rendering:

2 bytes for normals = RG8i
1 byte for reflectivity = R8F

I took these two screenshots now I can't remember which screenshot used forward rendering and which used deferred rendering with compressed textures:

They look the same to me! There are tiny shadow artifacts in the deferred renderer when the leave sway in the wind, but you'd never know.

Let's compare the benchmarks for rendering the above image:

Forward rendering is slightly faster, and the naïve deferred approach (with no texture compression) plummets on a 3440x1440px scene.

To make deferred rendering look good, let's render our scene from back-to-front while looking at a building up close. This is the worst-case-scenario for forward rendering because it renders all the beautiful terrain + trees + grass, then covers it up with a building:

The benchmarks are as follows:

Deferred (compressed) slightly outperformed forward rendering on 1080p, but fell behind on 1440p. The memory bandwidth on a 3440x1440px screen must be too great!

I was curious how much faster forward rendering would be if I rendered the scene front-to-back. I'm surprised back-to-front kept up so well on 1080p, but it makes sense it falls behind on 1440p.

Future Additions

I haven't figured out how ambient occlusion (SSAO) will work with a forward renderer, because it needs to know the normals of the surrounding pixels to calculate SSAO. Maybe it requires storing those extra 2 bytes for normals? Maybe I can pack them into the alpha component of the RGB colour texture, as OpenGL stores RGB textures as RGBA behind the scenes.

There's also a method that reconstructs normal vectors from the depth buffer, which sounds interesting. It has a lot of artifacts so it wouldn't look great for lighting, but for ambient occlusion it might be enough.

I also learned about virtual shadow maps recently, which provide hugely improved shadow quality, and also means shadows don't need to be rendered every frame. For example in Sector's Edge, since the sun angle stays the same (no day/night cycle), each section of the map can have its own shadow texture. This texture only needs to be updated when voxels are added or removed from it. Players and particles would use the default shadow rendering method. For this low-poly forest, it wouldn't work because the trees constantly sway in the wind.

Hi-res video and screenshots are available here