Pixel-exact texture mapping

Hi everyone,

I'd first like to take the occasion of my first post to express my gratitude towards Casey and the HMN founders and maintainers. Like many others, HMH has had a strong effect on my programming attitude, mindset, and motivation over the last roughly one year since stumbling upon it. I feel like I've learned more from Casey and lurking in this community than I have in my professional dev job, period.

Now, my question is about doing pixel-exact texture mapping. I wouldn't say I'm new to graphics programming but what I have done is certainly very limited to personal projects. I'm writing something using Vulkan right now and I'm having trouble making sense of normalized device coordinates, framebuffer coordinates, and texel coordinates based on what I've been observing. I'll start with what I understand from the Vulkan spec, though, so maybe someone can tell me if my assumptions are already wrong. All sections are referenced from the 1.1.134 spec, since that's what I've had open for a while.

  • Normalized Device Coordinates: Section 25.7 says that clip coordinates are whatever was output by the shader (gl_Position for GLSL or Position for SPIR-V) and that normalized device coordinates are just derived from the usual perspective divide.
  • Framebuffer coordinates: This is a little trickier. The description of the built-in FragCoord SPIR-V decorator (section 14.6, and I believe this corresponds to gl_FragCoord) as well as the glossary state that (0, 0) is the upper left corner of the upper left pixel of the framebuffer and pixel centers are at half-integer coordinates.
  • Texel coordinates: Figure 3 of section 15.1.1 shows the relationship between pixel/integer coordinates (i,j), unnormalized coordinates (u,v), and normalized coordinates (s,t). It doesn't seem to explicitly say that pixel centers for (u,v) are also half-integers, but that's what I take from it. Finally, 15.5.8 followed by 15.6 describe how (s,t) coordinates are used to pick a texel or group of texels for filtering. This also seems to support my idea that starting from e.g. u=0.5 then dividing by the texture width to get s, should give exactly i=0, even for bilinear filtering (which is to say that the weight for i=1 is 0, defined later in 15.8.3).


I'm trying to apply this to a monospace bitmap font that I've arranged into a glyph atlas. Most of the lines are exactly one pixel wide, so any offset tends to look wrong. What works is to clamp the draw coordinates to integer values (using a typical orthographic projection to scale down to [-1,1] from [-width/2, width/2]), use integer values for (u,v), and use the full glyph width and height (i.e. not (width-1,height-1)) for the max texture coordinates. So, for drawing the bottom left glyph to the bottom left of the screen would be drawing to (-width/2, -height/2) and with texture coordinates ranging from (0,0) to (6,12), where the glyph dimensions are (6,12). So my question is: why isn't the correct way to do this to draw at (-width/2 + 0.5, -height/2 + 0.5) and with texture coordinates ranging from (0.5,0.5) to (5.5, 11.5)?

I modified my glyph atlas to draw a green line at the very bottom of each glyph and a red line at the top to clearly show when bounds aren't being met or are exceeded. Here's what the atlas looks like:

(I don't know if it's upside-down because renderdoc flipped the sense of coordinates or if it's actually upside-down, but the text comes out fine and I don't think that's related to the offset problem anyway.)

Correct behavior:


Reduce texture coordinate dimensions down to (width-1,height-1):


Only offset draw coordinates by 0.5:


Only offset texture coordinates by 0.5:


Same but also reduce texture coordinate dimensions down to (width-1,height-1):


The last partial theory I have is that what is actually sent to the fragment shader are half-integer coordinates and the integer coordinates just lead to the correct fragment picking, but even if that's true I don't really understand it. Unfortunately, I don't have a working OpenGL pipeline and neither renderdoc nor nsight support shader debugging in Vulkan, so I can't verify exactly which coordinates are used in the fragment shader.

If you read this far, thanks for at least considering my problem!

-Brandon

Edited by Brandon on Reason: Initial post
Hi and welcome.

I don't know about that topic but I was interested so I did a little search about pixel centers, and found some information that I think will answer your question:
Pixel center and top-left rule in OpenGL?
GLSL: Center or Centroid?
Chris Hecker - Miscellaneous Technical Articles (articles on texture mapping).

A short version could be (assuming I got it right):
When you ask the GPU to draw something, you define an area that isn't related to pixels/texels. The rasterizer will then determines if a frame buffer pixel needs to be drawn by checking if the frame buffer pixel center is inside the area. The idea is similar for texture, where you define the area of the texture you want to use.
Ah, okay. It looks like the problem is the top-left pixel fill convention that at least the first link and Chris Hecker both talk about. When I use all (0.5,0.5) offsets and only draw to (width-1,height-1), I'm trying to exactly touch all the pixel centers. I can confirm now that what's being chopped off is indeed the very bottom row and the right-most column, but everything else is correct. Expanding the area out to integer values makes sure the pixel centers are completely included. Thanks!
Don't forget to test your implementation on many computers, because OpenGL driver developers may just go "Meh, this looks okay." without actually following the standard and leaving a random 0.5 pixel offset. Then your game breaks when moving to another computer. Maybe not on the second computer, but after 30 different models I can guarantee that your OpenGL game will be broken somewhere, no matter if you follow the standard or not. This even applies to OpenGL reference implementations, which are equally full of bugs.

OpenGL
I would use gl_FragCoord.xy combined with a 2D translation offset given as a uniform. Anything involving vertex data and samplers is too open for interpretation, which is why OpenCL doesn't use texture sampling at all. HLSL had a "load" instruction in shader model 4.0, which took whole pixel coordinates for deterministic pixel look-ups, but I don't remember if GLSL had that feature. Otherwise, the usual trick is to multiply whole pixel coordinates with the input texture's reciprocal dimensions as a uniform vector, then sample with normalized coordinates. Also stay away from integer types in GLSL, because these are less tested and may return random noise on certain graphics cards.

CPU
When making 2D on top of a 3D rendering pipeline, always ask yourself if you are using enough heavy 3D features to be worth using a 3D abstraction which might not work two years from now. A pure 2D game will probably be faster, lighter and easier to render correctly using intrinsic assembly on the CPU. OpenGL has precision issues from bugs leading them to define OpenCL instead, which is also full of bugs. Direct3D has the volatile memory wiping your textures at random times (device lost exception), so Microsoft had to avoid using their own API when doing interfaces that has to actually work.

Hybrid
Another option is to write a passively updated interface to a buffer on the CPU and upload it as a single texture without any gaps or seams, but possibly blurry on some computers. Anything that is passive will just take an extra millisecond on the CPU while the GPU is working in parallel.