The 2024 Wheel Reinvention Jam is in 4 days. September 23-29, 2024. More info

Engine Optimization: Fast High-Quality Cloudscape through Noise and Reprojection

Chen
Hey guys, it’s been a long time. This time I want to share an optimization I did with my volumetric cloud. Here is the procedural volumetric cloudscape article that contains my initial implementation. In this post, we are going to expand upon that, and introduce techniques that would make things go fast without loss of quality.

Where we left off

As you can recall, our volumetric cloud solution was quite expensive. To run at a decent quality at 1080p, it was taking about 10~30ms on a GTX 1060 card, definitely over the frame budget.

There were two dimensions to our cost. One is the number of rays we are shooting, corresponds to the size of our shader output. The other dimension is sample count, which determines the fidelity of our cloud. Both of these need to be pretty high to achieve nice quality.

In the end of the last post, I ended up doing this cloud pass on a lower res texture and upscaled it. This reduces the number of rays drastically, and hence making this a practical implementation. However I was not quite happy with the quality loss. You can really notice the bilinear upsample artifacts and it’s just not satisfying to have it at such a low quality. So I decided to do a second pass and optimize it in the other dimension instead: sample count.

The problem

I hate the low-res look on the clouds, so this time I’m going to render this at 1080p. Let’s see how fast it is on different number of samples. Let’s start small:


16 samples per pixel

Not looking good. The slices are artifacts from undersampling the volumes. Despite the low quality, we are already hitting way above the frame budget. Let’s try crank things up a bit:


32 samples per pixel

Better, but still ridden with noticeable artifacts. Let’s up it a bit more:


64 samples per pixel

Still some artifacts remain … but somewhat passable. Even this barely passable cloud shading is taking us THIRTY milliseconds. That is way above our frame budget.

Let’s jitter things up a bit

One common trick in graphics is to turn artifacts into noise. If you recall, we sample the cloud by ray marching at a constant step. This gives the “slices” artifact when the sampling rate is too low. However, we can add noise to the starting point of our sampling region.

1
2
3
4
5
6
7
float Delta = GetStepSize(); // this is our raymarch step size
float T0, T1 = SetSamplingRange(); // T0 is the start range, and T1 is the end range

// Our noise trick. Before sampling, we jitter our start range a bit
T0 += Random(PixelCoordinate) * Delta;

// then we raymarch …..


This gets adds a lot of randomness to our regular sampling pattern, removing the “slicing” artifacts and introducing noise instead. Even better, I use a blue noise distribution for my random numbers for an “even” randomness, it really does a trick on your eyes. Here is the result with only 16 samples:


16 samples per pixel jittered

(Make sure you open the original image to check out those blue noise patterns! When the image is downscaled it's really hard to notice them)

Wow, this is already much better. One more trick is to cycle through multiple noise textures to trick the eye to do temporal integration for us. When we do that, noise is actually substantially less noticeable … hold on, why don’t we temporally integrate it ourselves?

Temporal reprojection

Instead of throwing away the samples from our previous frame, how about we keep it and add it to current frame’s integration? This is a neat idea borrowed from the newly emerged TAA technique, which is explained in this video by playdead.
Essentially, we can take the camera transform from the previous frame and current frame, then reproject pixels from last frame to the current frame.


Illustration of temporal reprojection by Playdead

By doing this every frame, we create a feedback loop of samples and accumulate it through time. TAA uses an exponential moving average to integrate samples across time, which gives a nice weight falloff for stale samples while keeping them in the integration. It’s a nice scheme so that’s what I chose to use for my clouds.

A typical TAA implementation would also account for a velocity buffer in order to achieve more accurate reprojection for moving/deforming objects. However, our clouds are very slow moving, so it’s not necessary here.

However, since the reprojection is rarely pixel-perfect, we have to be careful with how we sample our previous frames. A bilinear filter works fine in our case. It introduces some blurring in TAA, but since we are dealing with fuzzy clouds, this much blurring is barely noticeable.

Putting it all together

With our temporal reprojection in place, we can cut our samples even more. I’ve cut it down to 4 samples per pixel. The image still contains some noise, but trust me, when this gets animated, you can’t even notice a difference:


4 samples per pixel, temporally reprojected

(Again, open the image at full res to see the quality improvements for yourself!)

And the best part is, this is only 3 milliseconds. We have achieved a 10x speedup and an even better quality. This is awesome!

Video

Here is a video demonstrating the robustness of this technique under heavy motion and movement. I achieved pretty high quality cloudscape at 3ms (worse case). Still room for improvements, but I say this is good for now!


Comments

With blue noise and temporal reprojection, it looks kind of like a soft film grain has been applied to the clouds. If I didn't know otherwise, I would assume it was an intentional stylization. Good stuff!
Looking really nice, the blue noise effect looks really cool. With the TAA technique, do you just keep the last frame buffer around to sample from or more?
@Miles, @Oliver thanks guys!

@Oliver for the temporal integration, I only keep the last frame. However, since I am integrating using an exponential moving average of the samples, that means we don't kick out any old samples. Instead, we decrease their weights in the total integration exponentially through time. Due to floating point precision they will eventually vanish, but they will stick around for quite a lot of frames.