Monter»Blog
Chen
I want to try something different for this post. Rather than just a boring rundown of the technology I implemented, I am going to write about how I approached at solving a bug in my shadow system. I think this post is a lot more down to earth than the other ones and I hope you will enjoy it.

Cascaded Shadow Map(CSM) bug

Alright, just when I want to take a shot from a different angle to show off my new assets, my shadow map broke. You can clearly see the huge gap in shadow on the right of this screenshot here:

From the look of it, it seems to be a problem with how my cascade volumes are getting computed. The first thing to do is stepping through the code that calculates the cascade map volumes. Well, the values look right until they get multiplied with a bunch of other matrices and turned into some numbers that are hard to reasoned with.

Debugging the CSM

In situations like these, a graphical debugger is priceless. I just have to pass the 8 view frustum corners into some renderer that draws them out in the game for me, and immediately I can see what's going on. Unfortunately, Monter doesn't have that yet, and for the time being, I'm too lazy to implement one right now. Since I have the shadow map texture stored, I can start by inspecting that first. So I quickly wrote a shadow map texture viewer routine. When my camera hit an angle that produces the artifact above, I switch to texture viewing mode to see what's going on with the shadow map.

Immediately I can see what's causing the artifact. This is the shot taken:

And this is the shadow map texture at that exact frame:

(Each of these four splits is used for one of the four shadow cascades)
The shadow map is offseted too much to one side for some strange reason.

After playing around with it a little bit, a spark of brilliance stroke me: I could just tune up the PCF rate and make the light direction tangential to the plane surface, so that the area that's covered by the shadow map will have noise on it. That way, I can easily inspect which part of the scene the shadow map covers! That solution worked beautifully:

From the gif, you can clearly see four splits of shadow maps, and depending on camera's orientation, these shadow map resizes, and is incorrectly sized when the camera is facing toward negative X axis.

Guided by what appeared on the screen, I arrived at these lines of code that most likely cause this behavior:

1
2
3
4
for (int CornerIndex = 0; CornerIndex < ARRAY_COUNT(FrustumCorners); ++CornerIndex)
{
    FrustumCorners[CornerIndex] = ApplyMat4(FrustumCorners[CornerIndex], Inverse(View) * LightSpaceView);
}


All this code does is to transform the view frustum corners from view space to light space, and that's probably where everything went wrong.

To make it easier to see what's going on in the code, I pulled stuff out to make each operation more explicit:

1
2
3
4
5
6
7
8
9
mat4 InverseView = Inverse(View);
for (int CornerIndex = 0; CornerIndex < ARRAY_COUNT(FrustumCorners); ++CornerIndex)
{
    FrustumCorners[CornerIndex] = ApplyMat4(FrustumCorners[CornerIndex], InverseView);
}
for (int CornerIndex = 0; CornerIndex < ARRAY_COUNT(FrustumCorners); ++CornerIndex)
{
    FrustumCorners[CornerIndex] = ApplyMat4(FrustumCorners[CornerIndex], LightSpaceView);
}


Inverse() is a new function that I introduced not long ago, so it might be the culprit. So I stepped in and inspect InverseView when the artifact appeared. The result is quite strange; InverseView's first row is basically a zero vector, and I expected it to be a normalized vector (by the way, the convention I’m using is row-major matrix and left-handed coord system).

Actual values inside InverseView:


As a view matrix, even when inversed, the first 3 row vectors should be the three orthogonal axis of that local view coordinate system, but here the first row vector is a zero vector. I used a trustworthy matrix inverse calculator online to compare the results and it also agrees with me that the first row vector should be a normalized vector even after inverting it. Therefore I concluded that my Inverse() is busted.

Correctly inverted view matrix values:


Diving into Inverse()

I use gaussian-jordan elimination method to invert matrices, so there’s quite some procedures to step through to find what went wrong. After some digging, I found a subtle bug in this code snippet:

1
2
3
4
5
6
7
8
9
//scale all pivots to 1
for (int R = 0; R < 4; ++R)
{
    for (int C = 0; C < 4; ++C)
    {
        Result.Data[R][C] /= Augment.Data[R][R];
        Augment.Data[R][C] /= Augment.Data[R][R];
    }
}


At the end of gaussian-jordan elimination algorithm, every row is scaled so that the pivot becomes 1 again. When this operation is done in my head, it is done in parallel. However, when machine executes this operation, it can only scale one element at a time. In this code, we scale each element by the pivot point, but the pivot point itself is also getting scaled. If the pivot point gets scaled before the other elements in the same row gets scaled, the subsequent scaling will produce incorrect results.

We can fix this problem by caching the pivot value first, then apply it to each row element:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
//scale all pivots to 1
for (int R = 0; R < 4; ++R)
{
    f32 Scale = 1.0f / Augment.Data[R][R];
    for (int C = 0; C < 4; ++C)
    {
        Result.Data[R][C] *= Scale;
        Augment.Data[R][C] *= Scale;
    }
}


In fact, since this is the last part of the algorithm, the identity matrix part of the augmented matrix really has no use anymore. We can stop modifying the augmented matrix and use it just to scale the result matrix.

1
2
3
4
5
6
7
8
//scale all pivots to 1
for (int R = 0; R < 4; ++R)
{
    for (int C = 0; C < 4; ++C)
    {
        Result.Data[R][C] /= Augment.Data[R][C];
    }
}


Now CSM works in all view angles. I am happy again.

Here’s a full shot of the scene, with SSAO turned on:




Chen
Hi folks. I will talk about the progress I’ve made since last time and how the blog post style will be changed.

Less Emphasis on the Well-covered Subjects on the Internet

When I read my own posts from the past, I noticed they have a lot of duplicate info with popular tutorials out there. So in the future posts, I will minimize the effort I spend on talking about popular subjects and talk more about the unspoken quarks and side-effect of them.

Shadow Improvement

If the player’s PC can’t handle high resolution shadow map texture, then alias will occur on the edges of the shadow, which is not good. One way to deal with the jagged shadow edges is PCF. It blurs shadow’s edges by sampling neighbor texels’ depth test results and blend them together. The way I am blending is to precompute some poisson disk samples, then for each pixel that I’m testing against, rotate the disk by a random amount, then use that disk to sample the neighbor texels.

Result of random poisson sampling PCF:


Incorrectness of PCF

What most tutorials missed is that PCF causes self-intersecting errors if its sampling area too big. By taking neighbor depth samples from the shadow map texture, you are comparing depths that don’t belong to you. Reason why PCF works at all is because most shadow mapping system use a bias value, and that small bias leave some room for error, so that PCF of a small area works most of the time.

Diagram describing why PCF doesn’t work if the sample area is too big:


Cascaded Shadow Map Seams

Cascaded shadow map seams is another common artifact most tutorials ignore. By seam, I mean a substantial difference in shadow quality. I briefly mentioned this in my first blog post and said I would just leave it like this. Well, I lied. Turned out the artifact caused by this is quite noticeable and I was forced to fix it.

This is the CSM seam in action (I tweaked the cascade size a bit to exaggerate the artifact):


My solution to this problem is to blend the edges of shadow cascades. When the shadow map is being rendered, I extend the near plane for each cascades a little bit to overlap with the preceding cascades. And when a fragment end up being in that overlapped region, I can take the depth test results from both cascades, then linearly interpolate them based on where that fragment point is.

Here’s CSM seam being blended:


Renderer Tweaks

Besides shadow, there isn’t a substantial amount of improvements in renderer. When I shade the scene, I only apply ambient occlusion factor to ambient terms now, meaning only surfaces covered in shadow will have any ambient occlusion. I also gave ambient light a bluish tint, since most of the ambient light comes from atomsphere light in outdoor scenes.

World Editor Improved

Before finishing up the manipulator widget for my world editor, I edit the scene by typing in numbers to change an entity’s size or position. But now, I can do all of that by dragging the mouse and pressing a couple of hotkeys:

Editor in action:


Surprisingly, I didn’t find any useful material that talks about how to do this kind of entity manipulator seriously on the internet, so I came up with my own hacks.

For scaling, I project the entity position onto screen space, and record the first mouse position. Then depending on the mouse distance from the entity position in screen space, I calculate the shrinking/growing size.

For translation along XZ plane, I project a ray through where the mouse is pointing at into the 3d scene. That ray will hit the XZ plane (hopefully). If it hits, then the first hit position is recorded, and let’s call it FirstHitP. In each subsequent frames, collision between mouse ray and XZ plane is recorded, let’s call that HitP. We can take (HitP - FirstHitP) as the temporary translation vector for that entity. It works pretty well for XZ plane translation, and you can easily see how this can be applied to translations along other dimensions too.

Rotation takes the same idea of projecting mouse rays and hitting the plane. It just takes vectors and dot product them to find the difference angle. Then that angle is used to produce a quaternion to concatenate with entity’s local orientation (also a quaternion).

Ditched <windows.h>

Yep. I no longer need windows.h header in my platform layer. Inspired by Casey’s hint on removing windows.h in handmade hero, I pulled out all the struct definitions and function definitions into my own custom header, which is only 700 lines. This reduced the game’s compile time by one whole second.

Last but not Least, New Assets

A picture is worth a thousand words:


The point of importing more detailed assets into the game at this point is to field test Monter’s renderer, making sure that it looks good for both low-poly and high-detail models. The result isn’t half bad. That being said, this is placeholder art, and better models will be made!
Chen
Hi everyone. This is post is just me picking up where I left off from last blog post, so it's gonna be a short one.

It is often the case that temporary shaders are written for debugging purposes and need to be added and removed frequently. Same for merging shaders to speed up rendering and splitting shaders for more reusability. If we just build shaders separately and store them in their own files, managing those files and editing the code that build and store the shaders become an unpleasant hassle very quickly. Unable to bear this chore, I set out to resolve this issue.

Runtime Uber Shader

My first approach to this rising problem is using a runtime uber shader. Runtime uber shader is a shader that contains all the shader subroutines. Instead of binding different shaders by calling glUseProgram() for different operations, I permanently bind the uber shader and some kind of flag is uploaded to this shader to change which routine of it is executed.

An example uber shader would look something like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
*some data layout here*

uniform int Mode;

void main() 
{
    if (Mode == SHADER_MODE_PHONG_SHADE) 
    {
        //phong shading code
    }
    else if (Mode == SHADER_MODE_BLUR)
    {
        //blur code
    }
    else if (Mode == SHADER_MODE_DEPTH_PASS)
    {
        //depth pass code
    }
    //and a lot of other subroutines
}


And the c++ code that uses this shader will look like this:

1
2
3
4
5
UploadShaderUniform(“Mode”, SHADER_MODE_PHONG_SHADE);
DrawWorld();
PrepareStatesForBlur();
UploadShaderUniform(“Mode”, SHADER_MODE_BLUR);
DrawFullscreenQuad();


It pretty much solves the shader management problem. You only have to build one single shader and keep all the shader code in one place, and it will run different code depending on the mode you set it too. It worked pretty well at first, so I adopted it and used it to build out the renderer.

Uber Shader Performance

After I finished the renderer pipeline with the uber shader, it had a dozen of branches. It was holding up pretty well until I moved my development environment from a PC to a laptop with a crappy GPU. When I run my game on the new laptop, the rendering process is dramatically slower; it was so severe that the game barely runs at 60FPS with just phong shading on. That led me to suspect it’s the whole uber shader approach that slowed everything down. There’s no way to prove it unless I pull the shader out into small pieces that run on their own, so I started pulling them out.

My suspicion turned out to be correct. When running the shaders without the branchings, it was substantially faster. Now, this behavior is probably hardware dependent, but I want Monter to run on machines even with crappy hardware, so uber shader approach isn’t gonna cut it.

”Compile-time branching” in Shader

We are back to square one with small pieces of shaders lying around, which need to be managed manually. Another simple solution that came to mind is to #define each part of the shader code and store them in one file. When we are compiling the file to a certain type of shader, we insert “#define <SHADER_TYPE>” on top of the string so that shader compiler will only compile the code for that certain shader.

Here’s what the shader looks like:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
//common code

#if defined(SHADER_PHONG_SHADE)
//phong shading code
#endif

#if defined(SHADER_DEPTH_PASS)
//depth pass code
#endif

#if defined(SHADER_BLUR)
//blur code
#endif


Here’s the c++ code that builds these shaders:

1
2
3
shader PhongShader = BuildShader(ShaderCode, “#define SHADER_PHONG_SHADE”);
shader DepthPassShader = BuildShader(ShaderCode, “#define SHADER_DEPTH_PASS”);
shader BlurShader = BuildShader(ShaderCode, “#define SHADER_BLUR”);


Automatic Shader Construction

We now have one single place to store all the shaders, but it’s still a huge hassle to visit the shader building code this frequently. I wanted to do better.

First thing I noticed is that, when we are compiling the shader, we can already deduce what shaders there are by looking at the compile-time branching code. If each shader segment is marked, we can compile all those segments separately and dump them into a shader table. To “mark” the shader segments, I replaced the #define’s with my own annotation syntax that will be preprocessed by the shader builder code. I also designed annotations to contain name tags, which will be used as its key when the shader table is built. This way, the shader builder can intelligently pull out the shader segments hidden inside that file and compile them into separate shaders, then insert them into a table with their own unique keys.
So things become very convenient; we only have to edit the shader code, and everything else is automated.

Here’s the previous shader code transformed into annotated form:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
//common code

@begin SHADER_PHONG_SHADE => PhongShade
//phong shading code
@end

@begin SHADER_DEPTH_PASS => DepthPass
//depth pass code
@end

@begin SHADER_BLUR => Blur
//blur code
@end


If the game code needs to use the shader, all it needs to do is:
1
2
BindShader(ShaderTable[“DepthPass”]);
//draw calls that does something


Victory

At this point, all the hassles have been automated. Adding a new shader to our system is as easy as typing dozens of characters in the shader file. Not to mention that Monter has a live shader editing system, so new shaders can be inserted as the game is running. How nice is that!
Chen
As promised, I will be a doing a full review of the game engine in its current state. This blog is dedicated to the renderer in my game engine. I will provide you with a high-level overview, how the renderer came about, and some interesting implementation details.

Overview



Like every other game, Monter’s rendering pipeline is not just a simple draw call. It involves precomputing textures such as shadow maps and ambient occlusion textures to help enrich the visual scene when the actual rendering starts. It also has a post-processing stack which helps smooth out the jagged edges and emulates color bleeding from bright surfaces. Each pass, except shadow mapping, will be discussed in detail in the following sections.

Shadow Map Pass

This pass has already been covered by my first blog, so I won’t be explaining it in detail. All it does is generate the depth texture for us to use when we try to compute the shaded points when the actual scene rendering happens.

SSAO Pass (Position + AO + Blur)

Screen-space ambient occlusion is a method that approximates global illumination at a small scale. For each point on the screen, an AO (ambient occlusion) factor is computed based on its surrounding geometries. These AO values are stored in a texture which is used to approximate how much ambient light each point receives when it gets phong-shaded.

In Monter, the SSAO pass consists of three draw-calls: position gather, AO calculation, and 4x4 Blur. Position gather is a pass in which each pixel’s corresponded position in view-space is rendered to a texture (why view-space? explained later). The resulting position texture contains the geometry information we need to compute the AO factor for each pixel because we can query the neighboring pixel depths at any pixel on the screen. So we just compute an AO factor for each pixel and store them into a new texture. The AO calculation goes as following:

1. Treat the position texture map as a function of two real numbers, x and y, outputting a 3d position. Partial derivatives of this function relative to the x and y axis of each point on the screen are computed and used to find the normals of each position which are the normalized cross products of these derivatives. These come in handy when we compute the AO factor.

2. We now want to distribute a bunch of random sample points within the hemisphere above that point oriented along the normal we just calculated. Then we project each sample back into the position texture and test its own depth against the depth retrieved from the position texture. If a depth is smaller (closer to camera), that sample is not occluded, otherwise it is occluded. Of course, the requirement for this to work is to sample positions in view-space, which is why we store positions in view space during the position gather pass.

To accomplish this, a certain number of random unit vectors (offsets) are generated with the aid of a 4x4 texture containing random noise tiled over the screen. Then for each offset, we compute a sample point by adding the original point with that offset. By doing that, we generate a bunch of random samples within the sphere with the point as the center. However, that is not enough; we want all those samples to lie inside the hemisphere. What we can do is, for each offset that makes an angle more than 90 degrees with the normal (easily testable with a dot product) we could just negate it. That will ensure all of our offsets to be within the hemisphere of the point along its normal.

A extremely simplified version of SSAO code in Monter’s renderer:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
float AO = 0.0f; //where we store AO factor

//get normal
vec2 dXPos = Fragment.TexCoord + vec2(1, 0) * TexelSize;
vec2 dYPos = Fragment.TexCoord + vec2(0, 1) * TexelSize;
vec3 dX = texture(PositionsTex, dXPos).xyz - Origin;  //derivative relative to X
vec3 dY = texture(PositionsTex, dYPos).xyz - Origin;  //derivative relative to Y
vec3 Normal = normalize(cross(dY, dX));
    
//calculate occlusion factor
float OcclusionCount = 0;
for (int SampleIndex = 0; SampleIndex < SSAO_SAMPLE_MAX; ++SampleIndex)
{
    vec3 RandomOffset = RandomOffests[SampleIndex];
    if (dot(Normal, normalize(RandomOffset)) < 0.0f)
    {
       RandomOffset = -RandomOffset;
    }
    vec3 Sample = Origin + SSAOSampleRadius * RandomOffset;	
    OcclusionCount += SampleIsOccluded(Sample);
}
   
AO = OcclusionCount / float(SSAO_SAMPLE_MAX);


The code is extremely simplified for ease of understanding. I’m hiding a lot of details here, especially in SampleIsOccluded() function, but for now just imagine it to be:

1
2
3
4
5
6
7
int SampleIsOccluded(vec3 Sample)
{
    //Project() function does projection & manual W division & remapping to [0, 1] range
    vec3 NeighborPosition = texture(PositionsTex, Project(Sample.xy)); 
    if (NeighborPosition.z < Sample.z) return 1;
    return 0;
}


The occlusion test of each sample is just a simple depth test. After all this, you sum up all the samples that are not occluded and you are left with the AO factor you are looking for. However, in my case, I’m not finished yet. I used a 4x4 random vector tiled texture to generate random noise. As a result, there is a banding noise caused by the 4x4 pattern repetition. So all I have to do is a 4x4 blur on the AO texture to remove that artifact.

If you are looking into implementing this feature yourself, I highly recommend chapman’s SSAO blog, as it covers a lot of details that I didn’t mention.

SSAO Optimization

The original SSAO method presented in chapman’s blog produces good results but it’s too slow. In Monter’s renderer, the entire AO pass is done on a low resolution texture, which are upsampled when needed for rendering. This speeds up the AO pass by at least 4 times but the result is often blurry and contains halo artifact. This can be compensated for by using bilateral blur and bilateral upsampling.

AO factor texture:


Don’t Abuse SSAO

Despite how much realism it adds to the scene, it’s meant to be a subtle effect and shouldn’t be exaggerated in a game. It’s no where near physically accurate in most cases and is quite appalling to look at when not used correctly. Here’s a tweet showing how over-using SSAO totally ruins the look of your game. So don’t worship SSAO.

Drawing the Scene (Lambertian)

If you have done any 3D programming, you must know what lambert shading is. It’s the most basic lighting model and is physically accurate for diffuse (rough) surfaces to a certain degree which is all I plan to have in Monter (you should be able to tell I’m going with a low-poly style for the game and it goes nice with purely diffuse surface). Here’s the basic intuition for lambert lighting:

For a diffuse surface, the surface is really rough and light will not reflect along the normal of the point it hits. Rather, it will scatter randomly and the direction of light exitance could be any of the unit vectors that make up the hemisphere above the point (all of equal probability). Therefore, we can think of light exitance to be completely independent from the view direction since you probably get an equal amount of light no matter where you look at the surface.

To calculate the light attenuation on a surface, the angle between the surface normal and the incoming direction of light is the key. Imagine a stream of photons hitting a surface head-on. The direction of that stream of photons is the negation of surface’s normal. Now imagine they make an angle of 45 degrees. The same amount of photons is hitting the surface per second but the surface area that gets hit by the photons increases, therefore the photons-hit-per-area i.e. the photon density on the surface decreases. If you graph it out, it’s clear that the attenuation of the light intensity is the cosine of the angle between the negation of incoming light direction and the surface normal. This is called the lambert shading model and it’s one of the most common reflection equation used in games.

In Monter, there are two types of light: directional light and point light. It’s important to point out that point light does not only attenuate based on the incident angle. It also attenuates based on distance. Don’t get me wrong, photons don’t lose energy as they travel unless they collide with other substances such as dust or fog but we are not trying to simulate them here. It attenuates purely because it radiates as an expanding sphere. The further the surface is, the less dense the photon stream that hit the surface will be. Due to the property of the surface of a sphere, the light intensity falls off as an inverse squared equation. In order to have complete control over how each light should behave, I don’t just use the inverse square equation to calculate distance attenuation. As you will see in a bit, there’s other parameters I can tweak to change the lighting.

Now we know the relationship between incoming light and light exitance, but we also need the light to interact with surface’s material to show any color in the screen. One thing to realize is that surfaces do not have color; they just absorb a certain color from the light and reflect the rest (not actually what’s happening but it’s a good enough substitution model). So I simply multiply the light exitance with color of the surface.

Again, here’s the simplified version of lambert lighting equation code in Monter:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
vec3 DiffuseTerm;
if (Light.Type == LIGHT_TYPE_POINT)
{
	float LightDist = length(Position - Light.P);
	vec3 IncidentDirection = normalize(Position - Light.P);
	float DiffuseColor = MatColor * AtLeast(0.0f, -dot(Normal, IncidentDirection));
	DiffuseTerm = ((Light.Color * DiffuseColor) / 
				   (Light.Constant + 
					Light.Linear * LightDist + 
					Light.Quadradic * Square(LightDist)));
} 
else if (Light.Type == LIGHT_TYPE_DIRECTIONAL)
{
	vec3 IncidentDirection = normalize(Light.Direction);
	vec3 DiffuseColor = MatColor * AtLeast(0.0f, -dot(Normal, IncidentDirection));
	float Visiblity = UseShadow? GetVisiblity(Fragment.P, Light): 1.0f;
	DiffuseTerm = Visiblity * (Light.Color * DiffuseColor);
} 

LightExitance = AO * AmbientTerm + DiffuseTerm;    //”correct” formula
//LightExitance = AO * (AmbientTerm + DiffuseTerm);  //the one used in Monter


Now, remember I calculated the AO factor only for ambient light so it should be multiplied to the ambient term only. But from experimenting with the rendering, I decided to also multiply it with the diffuse term for better visual effect.

Lambert Shading + Shadow + SSAO:


HDR and Tone Mapping

By default, when you write color values to a texture or default framebuffer, they are clamped between [0, 1]. This greatly limits how we can design our scenes; if we want to have an area with a lot of light, then it’s going to be all white because of this limitation. One way to work around this limitation is to render the scene into an HDR texture then map it back to the [0, 1] range. This process is called tone mapping.

This is all pretty straight forward except the part where we convert HDR to low-dynamic-range (LDR) color. How are we going to do it? Currently Monter’s engine uses reinhard tone mapping, which is just the following in GLSL code:

1
vec3 LDR = HDR / (HDR + 1.0f); 


Pretty simple and gets the job done, but not done well. If I have time, I’d much prefer using an actual tone mapping algorithm that does it properly. But for the time being, I will just have to bear with reinhard tone mapping.

Bloom

Since we aren’t limited by the range of color, we can do something fancy like a bloom effect. It’s pretty simple; when we output the scene texture, we write to two HDR buffers: one scene buffer and if the color is above a certain threshold, a second HDR buffer. After that, we do a gaussian blur on the second HDR texture that only contains bright pixels. Then we do an add operation between these HDR textures and tone map the final output.

Bloom Optimization

Like SSAO, bloom is also too slow if we do it at the same resolution of default frame buffer. So we repeat what we did to speed up SSAO; we downsample to a low resolution texture and upsample it when we do the addition.

FXAA

FXAA is a post-processing effect that blurs jagged edges. While it smoothes out the jagged edges, it suffers from temporal aliasing so it’s best combined with some temporal anti-aliasing effect. If you’re interested in more detail, you should head over to this paper.

FXAA vs no FXAA comparison:


I still have a whole lot to talk about but that will make this blog way too long so I’ll save it for the next one, stay tuned.

Chen
Hi everyone. It’s an honor to be featured by handmade network. I will be pushing out blogs where I talk about the challenges I face during Monter’s development. I’ve been thinking long and hard about what to write about on a first blog. The best choice seems to be writing a review of what I’ve done for the game engine so far, but I’ve just finished shadow mapping, so I decided to go with it this time because it’s still fresh. A review of the entire engine will probably be the topic of my second blog, so stay tuned.

Shadow in Real Time Rendering

The cleanest way to do shadow, which is done in most ray tracers, is raycasting against light casters. When computing radiance from a surface point to the screen, a ray is cast from the surface point to the light caster, and if it’s obstructed by any shadow occluders, the surface point is shadowed. However, testing rays against meshes made of polygons is expensive. That process can be accelerated by kd trees and such, but it doesn’t integrate easily to my rendering pipeline that mainly runs on the GPU. It also raises complexity and bugs creep out of that. The more time I have to spend on engine before actually prototyping the game, the more likely Monter will be stalled in development. So this method is a no no.

The low-complexity way of doing shadow is shadow texture mapping (or more accurately, depth mapping). It’s standard practice for games because it’s relatively cheap and simple to implement. But it’s pretty nasty; and we are going to talk about the shadow mapping implementation in Monter and why I think it’s an unpleasant experience.

The General Idea

The general idea of shadow mapping is pretty simple. You take the camera to where the light caster is at and you point the camera in the same direction as the light caster. You render a depth map of the scene and store it away as a texture. After that, when you are lighting any surface point in the scene, you can transform that point into the view space of that light caster and compare the depth of that point against the depth value inside that depth texture that corresponds to that point’s direction. If the depth of the shaded point is deeper than the one queried from texture, then we know that this point is occluded by something else from the light caster, since there’s surface that has a smaller depth value than that point. It might seem like a clever idea at first, but it’s hard to execute well in practice. While implementing shadow mapping for Monter’s renderer, a lot of gross artifact appeared. I am going to list them, explain why they occur and how I eliminated them (mostly) in the following text.

Shadow Quality Problem

Ok, so we are going to render the scene into a depth texture. And we know one thing about texture mapping: if too many pixels map to the same texel, the rendered image could look blocky. And if we are not careful with the way we render the depth texture, this could happen to us too. First thing to realize is that, we are mapping a scene onto a finite texture. If we want the shadow mapping pass to run fast, we better not optimize the shadow quality by increasing the texture resolution, since the cost just goes up both in memory and performance. Our only option is decreasing the amount of scene that gets mapped to the depth textures, which results in fewer pixels mapped to the same texel.

The only part of the scene that needs to be covered by the shadow map is whatever the camera is viewing. So we can use the view frustum to deduce how to place our light caster camera to optimize for space. For now, let’s assume the light caster is the sun, with all light rays being parallel, so I used a orthogonal projection here. We can easily find the view frustum corners in world space, then fit it tightly with a bounding box in light view space. The near plane needs special care though, because we will need to include all shadow occluders present in the scene, even the ones outside of the view frustum. The bounding box we computed in light view place is going to be the light view frustum used for rendering the scene into depth texture. That way, I made sure all the pixels that gets rendered to the screen is covered by the shadow depth texture, and it’s the highest quality possible (not really). The shadow quality turned out to be terrible even with a 2048x2048 resolution depth texture on a normal scene. So apparently, our “most optimized” method is not enough.

Cascaded Shadow Maps

One important realization is that the player pays almost all their attention to the geometries that are close up to the camera, such as player and enemies, and geometries that are farther away, like mountains, are ignored. Objects that are closer need high shadow quality because that’s what the player will always be looking at, but not so much for distant objects. So it’s reasonable to distribute more texels to cover the closer geometries, and fewer texels to cover the farther geometries, despite that the farther geometries is much bigger in size and volume.

With that in mind, the solution to that problem is cascaded shadow map. It’s a rather simple idea; The view frustum is chopped up into small sub frustums, and each one of them are rendered into a separate depth texture of equal size (which is not really necessary and I will explain why in a minute). Therefore, the closest ¼ of the view frustum is have the same shadow map resolution as the farthest ¼ of the view frustum, which is what we want here.



When I implemented cascaded shadow map, I didn’t make separate depth textures for each sub frustum, instead I simply stored all four depth textures into a texture atlas. It saves the hassle to create new framebuffers and switch textures when rendering different slices of the scene.

The result image from this technique is much nicer and almost acceptable. There’s probably smarter ways to chop the view frustum up, such as giving the closer scene more texels and farther scene less texels. But I stopped digging any further and stayed with the 4 equal sub frusta with the same z length. Since the result is good enough for a first pass.

Cascaded Shadow Map Artifacts

Artifacts are introduced when we are refitting the frustum every frame. When the view camera rotates or translate, the edge of the shadow shimmers. It’s due to the fact the same surface points do not map to the same shadow map texels across frames because the shadow map “wiggles” too much. I eliminated this artifact by just snapping the orthogonal light caster view frustum to be a multiple of texel-size-in-world-unit amount.

Another artifact is “seam”s between each sub frustum. It’s not really seams, but the noticeable quality difference right at the boundary line between two sub frustums can sometimes be an unpleasant artifact. It occurs in Monter, but it’s not really noticeable, so I am letting it pass for now. Again, in order to get to the game prototyping phase as fast as possible, polishing work must be deferred.

Shadow Acne

Another noticeable artifact is shadow acne, a phenomenon where the renderer determines that the surface intersects with itself. The cause for this artifact is too many pixels mapping to the same texel in the depth texture. During the shadow mapping pass, we are going to get some pixels that map to the same texel, since we have a finite amount of texture to work with. So, these pixels sometimes have different depth values, but they are comparing against the same shadow map depth value due to a not 1:1 pixel/texel ratio. It’s clear that some pixels will falsely be shadowed, even though they shouldn’t be. Here's what it looks like:



The first thing I did was turning on bilinear filtering, so that four depth texels are fetched at once and blended, then compare with the pixel. This mitigates the artifact, but doesn’t remove it completely. I then tried adding depth bias to the depth value being tested against; it gives some room between the previously falsely shadowed pixels and the depth value sampled from the texture. It eliminates the self-intersection issue on most surfaces, but not on the surfaces with their normal orthogonal to the light caster direction. A depth bias offset along the normal of the surface fixes the issue. Here's what the same image looks like after the fix:



Other small details

If a user’s computer can’t run the game fast enough, shadow map texture will have to use lower resolution, and the shadow is going to look blocky. To mitigate the jaggy looking edge, I just sample the neighbor depth values and blend them, and the shadow edges are blurred as the result.

Recall that I said that I should fit the near plane of the light caster view frustum to the highest shadow occluder, that could be simplified here in Monter. Since we are doing a top-down view game, there’s some limit to how tall an object can be. So I just set a magic near plane value for the sun light caster, since all the objects are in front of that plane, but it’s subject to change.

Final result

The shadow generated by this technique adds a great deal of realism to the final scene. Here’s a comparison of what it looks like with shadow and without shadow: