A practical shadow mapping pipeline

Shadow mapping is a well known technique which was first introduced in 1978 by Lance Williams.

It has evolved since then to a lot of different variations and tweaks to be able to run
at interactive frame rates, even though the foundation of the technique remains the same:

1. Capture a depth buffer from the light point of view (commonly called a shadow map). The projection used in this step is important as it will influence the type of light you want to represent; a directional light is represented as a cuboid since all casted rays from the light are parallel, so the orthographic projection works for that matter, for a point light either one or several perspective projections can be used to represent all directions the point light is casting to.
2. For each fragment in the current point of view, transform it to light space coordinates (this step can be quite tedious as you would need to be cautious about the matrix math involved) and transform the light space coordinate to shadow map texture coordinates.
3. Compare the depth sampled at the coordinates in 2. to the depth saved in 1.

This post describes one of the several possible pipelines to implement shadow mapping using Open GL, and is adapted to target mobile hardware using OpenGL ES 2.0 specifications. A modern Open GL implementation would look much different than the one being exposed here.

Implementing shadow maps for a multiplatform solution can quickly become cumbersome, especially for mobile where a big share of the market might not support the extension that you need in order to implement it. When I started implementing the technique I found that using a common solution between all the targets was simpler than looking at extension support and discovering rendering issues on a few of the targets later on. Also, using the simplest variant of shadow mapping is a good trade-off, as the quality will not be the best, but you would need less features from the driver with a potential gain in performance.

An issue not so related to the hardware is shadow map acne that can be solved using a small bias. There is no real turnaround for this unless you use other filtering technique similar to variance shadow map. Acne happens naturally because of the inherent finite resolution of the texture used to store the shadow map.

Before starting to implement a rendering technique for a non-specific target, I keep a handful list of resources to look at for GPU hardware shares and extension support for each of the needed features:

- http://opengles.gpuinfo.org/gles_extensions.php
- https://hwstats.unity3d.com/mobile/

In order to implement shadow mapping using Open GL we need at least a framebuffer with a way to sample and store the depth from the light point of view. If we look at framebuffer with depth texture attachment, it is in this case an extension, as the target is Open GL ES 2.0 hardware. Now, looking at the extension OES_depth_texture description (from https://www.khronos.org/registry/...ensions/OES/OES_depth_texture.txt):

This extension defines a new texture format that stores depth values in the texture. Depth texture images are widely used for shadow casting but can also be used for other effects such as image based rendering, displacement mapping etc.

The great news is that this is exactly what we were looking for, and it is supported by 94% of the hardware, one problem is that the attachment texture may or may not behave like you expect depending on the driver.

As per the OpenGL ES spec, there is no guarantee that the OpenGL ES implementation will use the <type> to determine how to store the depth texture internally. It may choose to downsample the 32-bit depth values to 16-bit or even 24-bit. There is currently no way for the application to know or find out how the depth texture (or any texture) will be stored internally by the OpenGL ES implementation.

The potential issues you can have with driver implementations and pitfall from not having the device with a specific GPU model supporting this extension to test on is a little hazardous, and there is a risk if you rely on it for the graphics of your game.

The extension might as well not be implemented the way you think it is which is, unfortunately, quite common in the GL ecosystem. What seems to be a safe choice in those cases it to simply use what is specifically said to be supported on the official Open GL specifications of your target (in this particular case 2.0) and find workarounds in order to have what is needed for your rendering technique.

Less dependency on driver implementations is important if your engine aims numerous platforms as a target.

Depth value storage

1. Depth write pass (generating the shadow map)

In this implementation, the depth is stored as a packed RGB value in the color texture attachment of the framebuffer so we don't rely on a depth texture attachment.

glGenTextures(1, &textureHandle);
glBindTexture(GL_TEXTURE_2D, textureHandle);

glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE);

// Generate power of two texture size color attachment texture
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, 512, 512, 0, GL_RGBA, GL_UNSIGNED_BYTE, NULL);
glFramebufferTexture2D(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_TEXTURE_2D, textureHandle, 0);

Note that if you are using this technique, the texture attachment must be filtered using GL_NEAREST, we don't want depth values to be interpolated when sampling.

NPOT (Non Power Of Two) texture is an issue on some mobile hardware (extension GL_OES_texture_npot), so instead of making a framebuffer with an attached texture of size of the default viewport, the texture attachment is set to a 512x512 pixels texture resolution, which reduces the footprint of the shadow map at both depth sampling and depth writing.

The fragment shader used to pack the depth value is the following:

// stores the 24 bits depth value in each 8 bits components, depth between 0..1
vec3 packDepth888(float depth) {
    vec3 packedDepth;

    packedDepth.x = depth * 65536.0;
    packedDepth.y = depth * 256.0;
    packedDepth.z = depth;
    packedDepth.x = fract(packedDepth.x);
    packedDepth.y = fract(packedDepth.y) - packedDepth.x / 256.0;
    packedDepth.z = fract(packedDepth.z) - packedDepth.y / 256.0;

    return packedDepth;
}

void main() {
    // Store packed the depth in the framebuffer texture attachment
    gl_FragColor = vec4(packDepth888(gl_FragCoord.z), 1.0);
}

A separate vertex buffer is used in this pass, with a vertex layout containing only the position of the meshes.

The packed depth looks like this:

Which is a quantized representation of the following:

Using this technique, there will still be precision issues, but we will make sure that the precision is maximized when generating the view frustum for this pass.

2. Depth sampling pass

The values of the depth are unpacked and used as follow in the fragment shader:

float unpackDepth888(vec3 depthRGB) {
    return depthRGB.r / 65536.0 + depthRGB.g / 256.0 + depthRGB.b;
}

float texture2DCompare(sampler2D depths, vec2 uv, float compare) {
    // Retrieve and unpack the shadow map depth coordinate
    float shadowMapDepth = unpackDepth888(texture2D(shadowMap, uv).rgb);

    // Step is a function returning 0.0 if compare is below shadowMapDepth, 1.0 otherwise
    // See https://thebookofshaders.com/glossary/?search=step for reference
    return step(compare, shadowMapDepth);
}

// Function returning 1.0 if the shadow coordiante shadowCoord is within the shadow, 0.0 otherwise
float getShadow(sampler2D shadowMap, float shadowBias, vec4 shadowCoord) {
    // Transform homogeneous coordinates to texture coordinate [-1.0 .. 1.0] -> [0.0 .. 1.0]
    vec2 uv = shadowCoord.xy * 0.5 + vec2(0.5);

    // Transform depth to [0 .. 1.0] range, with a slight offset of shadowBias
    float shadowCoordDepth = (0.5 * shadowCoord.z + 0.5) - shadowBias;

    // Compare the depth value with the shadow map stored value at this shadow coordinate position
    return texture2DCompare(shadowMap, uv, shadowCoordDepth);
}

shadowCoord is a value computed in the vertex shader and passed along the fragment shader as a varying:

1	shadowCoord = lightModelViewProjection * vec4(position, 1.0);

Where lightModelViewProjection is the transformation matrix transforming position in the light point of view. This matrix is the one used in the depth write pass, it is an important one as it also defines the precision of your depth.

When calculating the uv to sample the shadow map, a bias matrix can be used to reduce the number of operations in the fragment shader and offload it in the vertex shader. lightModelViewProjection would be multiplied by this bias matrix which is a matrix translating and scaling by +0.5 on every axis.

Shadow map bias

Shadow map bias is a visual artifact that happens because the shadow map is in a limited resolution. You can imagine the scene from the light point of view to be perceived through a grid, all grid bucket will be assigned a single depth, which inherently can't represent the precise depth of all the geometry falling under that grid bucket (commonly called texel).

If the shadow map had an infinite resolution, all the texels would have an infinitely small area and the artifact would not be visible. So this issue is not due to the mathematical definition of shadow map technique itself, it is an issue happening because of the representation that we have to use in order to implement it.

In practice, this issue can be solved by applying an offset to the shadow map to move its depth upwards in the direction of the current camera view, so more points falls into it when comparing the depths. In this implementation, the preconditions of this bias is:

When the light is at the vertical of the triangle, we want this bias to be maximized.
When the light is perpendicular or behind the triangle, we want this bias to be minimized.

This technique is called slope scaled depth bias, because the bias is adaptive in function of the slope of the geometry the shadow map is sampled from.

A function that works for the preconditions we exposed is tan(arccos(x)) where x is the visibility term dot(N, L) clamped between 0 and 1, where N is the surface normal, and L the normalized surface to light vector.

This function is also equals to sqrt(1-x*x)/x (demonstrated by using a few trigonometric formulas) which might be more efficient to use the shader.

Note: The bias can be applied either when rendering the shadow map or when sampling it, in the pipeline implemented it is done at sampling time.

Optimize depth buffer precision

To optimize the shadow map resolution and depth precision, you want the view frustum to be as tight as possible with every side of the cuboid tightly fitting the scene. My use case works quite nicely since the scene is composed of a few tiles where the bounding box can be updated every time a tile has moved.

The idea is to generate a view matrix for the light camera, and transform the world space bounding box of the scene with it. This matrix can be obtained by making the camera look at the origin of the world from its opposite direction vector as center, and kill the translation component from the view matrix, so only the rotation components remain.

The documented code to fit the orhographic frustum to the scene is the following:

// Assuming boundingBox encompasses the whole set of visible tiles for the current viewport:
FrustumFit fit(BBox boundingBox, glm::vec3 sunDirection) {
    // The initial local coordinate of the camera is a camera 
    // pointing at (0.0, 0.0, 0.0), translated by -sunDirection
    glm::vec3 up = glm::vec3(0.0f, 0.0f, 1.0f);
    glm::vec3 center = glm::vec3(0.0f);
    glm::vec3 at = -sunDirection;
    glm::mat4 view = glm::lookAt(at, center, up);

    // Kill translation component (last column of the view 
    // matrix) since we only want the rotation components
    view[3][0] = 0.0f;
    view[3][1] = 0.0f;
    view[3][2] = 0.0f;

    // Initialize the box 8 corners of the scene
    glm::vec3 BBoxCorners[8] {
        glm::vec3(boundingBox.max.x, boundingBox.min.y, boundingBox.min.z),
        glm::vec3(boundingBox.max.x, boundingBox.min.y, boundingBox.max.z),
        glm::vec3(boundingBox.max.x, boundingBox.max.y, boundingBox.min.z),
        glm::vec3(boundingBox.max.x, boundingBox.max.y, boundingBox.max.z),
        glm::vec3(boundingBox.min.x, boundingBox.min.y, boundingBox.min.z),
        glm::vec3(boundingBox.min.x, boundingBox.min.y, boundingBox.max.z),
        glm::vec3(boundingBox.min.x, boundingBox.max.y, boundingBox.min.z),
        glm::vec3(boundingBox.min.x, boundingBox.max.y, boundingBox.max.z),
    };

    // Transform each of the box corner to light view space
    BBox lightViewSpaceBBox;
    for (int i = 0; i < 8; ++i) {
        glm::vec3 lightViewSpaceBBoxPoint = view * glm::vec4(BBoxCorners[i], 1.0f);

        // Merge the light view space bbox with this newly transformed point
        lightViewSpaceBBox = merge(lightViewSpaceBBox, lightViewSpaceBBoxPoint);
    }

    // Camera looking at -z, apply its min and max to make its frustum tightly fitting
    double far = std::max(-lightViewSpaceBBox.min.z, -lightViewSpaceBBox.max.z);
    double near = std::min(-lightViewSpaceBBox.min.z, -lightViewSpaceBBox.max.z);

    glm::mat4 projection = glm::ortho(
        (double)lightViewSpaceBBox.min.x, // left
        (double)lightViewSpaceBBox.max.x, // right
        (double)lightViewSpaceBBox.min.y, // top
        (double)lightViewSpaceBBox.max.y, // bottom
        near, far);

   return { projection, view };
}

This can be visualized by the blue box in the following debug view:

References & Further reading

[1] http://fabiensanglard.net/shadowmapping/index.php

[2] https://software.intel.com/en-us/...solution-rendering-on-opengl-es-2

[3] http://aras-p.info/blog/2007/03/0...ll-spent-encoding-floats-to-rgba/

[4] https://msdn.microsoft.com/en-us/...ws/desktop/ee416324(v=vs.85).aspx

A practical shadow mapping pipeline

Comments