The problem with tessellation in DirectX 11

As you may have heard, Direct X 11 brings tessellation support, and all the hardware vendors and benchmarks are going crazy with super-finely tessellated meshes, promising us automatic level of detail and unprecedented visual fidelity. What you may not be aware of is that the Xbox 360 has very similar tessellation hardware too, which means many game developers have had the opportunity to use these same techniques for about five years now. So why aren’t all console games using tessellation pervasively? That’s what I will tell you in this blog post, as well as demonstrate a solution to the problem.

The clue is present in just about any of the dozens of tessellation demos that are now available. In case you haven’t seen one, here’s a capture I just took of one of the samples in DirectX 11 (ignore the frame rate, the capturing software interferes).

This looks good and all, but notice the mesh density. Even when not doing any tessellation at all this is a 50×50 grid! That’s five thousand triangles at the lowest level of detail for roughly one square metre of cobblestone. That’s plainly a ridiculous poly-count for any practical purposes in games. There are two main problems with such a high vertex density: it wastes time processing more vertices than needed, and leads to the small triangle problem.

It’s wasteful because even at a modest distance you’re not going to need 5000 triangles to represent a square metre of ground, so transforming that many vertices is just throwing precious cycles away.

The small triangle problem refers to the efficiency loss when rendering small triangles. Current GPUs rasterize pixels in small groups of at least 2×2 pixels. Whenever a triangle covers the entire “quad” of 2×2 pixels all is well, if however some of those pixels are not covered then the GPU resources associated with the unused pixels are just squandered. For example, if each triangle only covers a single pixel, then every one of those quads will be 3/4 unused, leading to just 25% pixel shading efficiency.

The problem

So why are all these demos using control meshes that are already finely tessellated? It’s no accident. There are two main reasons for this. The first is that you can only specify tessellation factors on a per-edge level, so if you want to adaptively tessellate some areas more than others, you’re going to have to have a dense distribution of edges to support that variation across the surface. The bigger reason, though, is that smoothly varying tessellation on a displaced surface looks like rubbish at lower tessellation levels. See this example, which is just the previous demo with a more reasonable base mesh resolution (4×4, instead of 50×50).

Notice the shimmering, bucking, artefact as the tessellation slider is moved up and down. This is the dirty little secret that Xbox 360 developers have known for years: at reasonable mesh densities, continuous tessellation just looks awful and is unusable as a general strategy. We used it in Banjo Kazooie : Nuts & Bolts for water, because the shimmering artefact actually looked pretty decent on a water surface that was supposed to shimmer, but while we tried to use it on some other things we could never live with the artefacts (the solution below is something that came to me after BKNB shipped, and could in principle be implemented in a future title).

A solution

So what’s going on here? Well the problem is that as vertices get added and removed they smoothly travel to their final location by sliding across the surface, this lead to vertices rapidly bobbing up and down as they move over the surface and sample different displacements from the displacement map. This problem is not a new artefact, it’s actually just standard minification aliasing. When you do normal straight texture mapping, the texels are sampled at pixel locations. If the ratio of texel density to pixel density gets too high you get minification aliasing because each pixel “doesn’t know” which of the many texels it covers to sample from, so small changes in pixel position will lead to entirely different texels being used. For regular texture mapping we solve this by using MIP-mapping. Effectively just choosing a lower resolution version of the texture when the sample locations are too sparsely distributed to accurately reconstruct the full resolution texture.

Displacement mapping is no different. Instead of sampling at pixels we have vertices, and instead of colours the values sampled are geometric offsets so the aliasing artefacts manifest in a different way. The basic problem, though, is that we’re reading a high frequency texture (the displacement map) at too low of a sampling frequency. The solution to this problem is the same as for regular texture mapping – use MIP-mapping to choose a lower resolution texture when the sampling frequency (i.e. tessellation factor) is lower.

So how do we determine which LOD to use? Well, one simple way is to look at the length of each edge in the control mesh in texture space, and choose a MIP level for each edge so that the distance in texels for each subdivision will be no more than 0.5 texels (this is to get us under the Nyquist limit which says that the sampling frequency must be twice that of the signal frequency). In other words, if the length of the edge in pixels is L, and the edge’s tessellation factor is T, then we will get L/T pixels per subdivision. We want that to be 0.5 by choosing a MIP level, so we have to choose a MIP level M so that (L/T)*2^(-M) = 0.5 (the linear distance in texture space decreases by a factor of two for each MIP level). Solve for M and we get: M = log2(L/T) + 1. In practice, linear interpolation isn’t a perfect reconstruction filter, so we may need a small fudge factor to boost the MIP level up slightly further. Note that although we use MIP mapping, the MIP level used doesn’t depend on viewing distance or angle, just on the tessellation factor (which in turn may depend on those factors, of course).

So, now that we have a MIP level per edge, we can simply interpolate between them in the Domain Shader to pick a suitable MIP level for each verex. It’s important that the interpolation you use has the property that when the point is on an edge, the weights for the two other edges are zero, so that edge-vertices use the same MIP level regardless of which patch they belong to. Here’s how this looks:

We can see the basic idea working here. Rather than shimmering artefacts when the tessellation level is too low to adequately represent the displacement, we just get a flatter surface instead (due to choosing a lower-res MIP level of the displacement map). However, it should be obvious that there’s a problem here if you look closely at the “spikes” visible at each vertex in the base mesh. Each one of those lies on two edges per patch, so won’t know from which to retrieve a MIP level. So what do we do? Just pick one of the two candidate edges at random? Take the average? No, neither will work work because neighbouring patches can have entirely different edges using the vertex. In fact, there can be an arbitrary number of patches associated with a vertex, and for each of them that vertex must pick the exact same MIP level in order to avoid cracks.

So what do we do for the control vertices themselves? Ideally, we’d find the average MIP level for all the edges used by a vertex, but that would be expensive since tessellation factors (and therefore MIP levels) can change each frame. The simplest solution I can think of is to store a “preferred edge” index for each control vertex. This would simply be a randomly chosen edge that uses that vertex. When you detect that you’re on a control point (by checking the barycentric coordinates), you simply check the vertex’s preferred edge index, and fetch the MIP level associated with that edge. Note that the preferred edge is not necessarily in the same patch as the current patch, but it is consistent in that every patch using that vertex will use the same preferred edge, and therefore MIP level, which eliminates cracks.

Here’s how this looks:

This is much better. We’ve got rid of most of the aliasing, and the corners fall in line with a MIP level chosen from its immediate neighbourhood. Notice that there’s still some ever-so-subtle shimmering going on in the creases here. That’s because we’re approximating the MIP level based on the coarse control mesh, whereas in reality the actual triangles in the tessellated mesh vary in size and shape. The main downside to this strategy is that we need to compute edge tessellation and MIP levels outside the main draw calls to produce a buffer of per-edge tessellation factors and MIP levels, which not only adds a draw call and associated bandwidth increases, but also requires us to transform all of our control vertices several times (assuming the tessellation factor depends on viewing direction, skinning etc.).

It would be ideal if the domain shader would give you some information about adjacent patches, so that we could easily compute an average LOD for all the edges connected to a vertex.

Conclusion

Generating geometry by sampling a texture at varying frequencies is not a trivial problem, and requires careful consideration in order to avoid the same aliasing problems that we’ve already dealt with for regular texture mapping. Unlike regular texture mapping, however, we don’t have any real help from the hardware in figuring out the appropriate MIP level for a displacement map, so we have to figure out approximations ourselves, and while there are workable solutions, I haven’t been able to figure anything out that isn’t ugly or inefficient in some way. I do hope that future talks about and demos of tessellation and displacement mapping will at least acknowledge this problem, instead of just ramping up the base mesh’s polycount, sweeping the issue under the carpet.

Update

A simple variation of this idea which has the benefit of being very cheap is to compute a per-object representative MIP level on the CPU (in some approximate way), and then in the domain shader simply interpolate between the MIP level you got from the edges, and this per-object MIP level, based on how close you are to a hull vertex (e.g. use the maximum value of your barycentric coordinates). You’d probably want to ensure that only the vertices that are very close to the hull vertices get influenced by the per-object MIP level, but this would at least ensure that all hull vertices use exactly the same MIP level (the per-object one!) while keeping the surface looking smooth.

Update 2

Obvious-in-retrospect tweak to this technique: simply store the “UV coverage” of each control vertex (basically the average UV area for all the triangles touching it divided by 3). If you know the area in UV-space for a vertex, you can compute an appropriate MIP level for it that does not depend on the triangles it’s used in. In the domain shader, detect when you’re at a control point and use the per-vertex MIP-level. This is cheap, and better than a per-object MIP level. This is looking like a pretty workable solution. It’s still not perfect because patches with a lot of internal variation has to be seriously over sampled to capture all the high frequencies.

15 thoughts on “The problem with tessellation in DirectX 11”

Sam says:

April 19, 2010 at 11:31

Sebastien, thanks, this is a great post. It’s really nice to see someone take the time and effort to have a good dig into a subject. I don’t have much to add, apart from a minor observation about explicit/implicit representations. I think a fair few of the issues here come from the explicit nature of the problem (i.e. explicit geometry -> verts, edges, etc), that might become easier under an implicit one (e.g. level sets, voxels, etc). Mind you, implicit reps have problems as well.. just different ones :). Given all this tesselation lark is essential for data compression (ignoring some details like LOD), I suspect it could be surplanted by simply better/easier ways to compress on gpus.

Sebastian says:

April 19, 2010 at 19:13

Sam, thanks. Yes something like voxels etc. might have some benefits in this respect. At the end of the day, though, the display is discrete so you’ll end up with regular old aliasing but we know how to somewhat mitigate that using regular old MSAA or SSAA, and you could probably sample at a lower resolution too in a similar fashion to the MIP calculation above (e.g. stop earlier in the voxel hierarchy if your pixel size reaches the Nyquist limit).

Sebastian says:

April 29, 2010 at 21:54

Taffy, thanks. I’m afraid I know nothing about the FBX mesh format.

Galactus says:

April 7, 2011 at 02:55

IMO nothing can look worse than a flat texture for bricks or stones (certainly nothing in these videos). When you have a flat cobblestone texture (like in Crysis 2) it looks like somebody made wallpaper out of a photo of some stones and then pasted it onto a perfectly flat pavement. And even pavement isn’t perfectly flat in the real world.

Galactus says:

April 7, 2011 at 02:58

Think the problem here is maybe you thought flat textures were actually fooling somebody. Not so.

Galactus says:

April 7, 2011 at 03:00

Basically what I’m saying shoddy tessellation appears to me to look better than a good flat texture. Because a flat texture is an illusion and we all can see past it.

jemery says:

June 14, 2011 at 00:41

Testing the transition from tessellated to untessellated at a the same zoom level is completely pointless. That transition, as any LOD system, is supposed to happen as the camera or the object moves! So most the shimering would not be noticed cause the object would just be too small.
Thats like changing mip levels of a texture without moving the object away from the camera and saying you can notice the lack of detail of lower mips. Of course you can, thats just not how the thing will be percieved on the game. Its an unfair test.

- sebastiansylvan says:
  
  June 14, 2011 at 01:34
  
  The transition is continuous. It doesn’t just happen in the distance, it happens all the time (just like for regular texture mapping where the texel:pixel ratio is constantly changing). Thus, you’ll see shimmering for near objects and far objects.
  
  The issue is not about seeing lack of detail (which would be fixed by zooming out). The issue it’s about rapid shimmering. If you move your camera 10 cm back during half a second of movement, and see a vertex slide across 10 texels jumping wildly up and down to 10 different displacement levels, then that will still look terrible even though the object is now 10cm further away and marginally smaller on the screen.
  
  You can see this for regular texture mapping too if you turn off MIP-mapping and filtering. The reason minification aliasing is objectionable is not because of lack of detail, it’s because of the shimmering you get due to the “filter kernel”, as it were, not covering enough of the underlying signal.
  
  This is really easy to try in the DX 11 tessellation demo, btw. Trust me, it looks just as terrible.
  
Oliver says:

August 23, 2011 at 14:29

I don’t fully understand all the jargon but I’d just like to say that I love your explanatory style. Keep up the good work.

Pingback: Casting a Critical Eye on GPU PTex | A Random Walk Through Geek-Space
Scott Kircher says:

July 10, 2012 at 17:24

I know this article is a couple of years old, but I just came across it. First let me say the article is great and was helpful to me specifically as I am working on implementing displacement mapping in my studio’s engine.

There is one error in the article, I believe: The result of solving for M in (L/T)*2^(-M) = 0.5 should be M = log2(L/T) + 1. (Not 2*log2(L/T)). I assume this is just a typo (perhaps you meant to write log2(2.0*L/T), which is equivalent; but the difference in the result when applied for mipping is significant.

Thanks for the great article, though!

- sebastiansylvan says:
  
  July 10, 2012 at 18:18
  
  Good catch! That is indeed typo. I think I actually had the +1 version in the code but messed it up in the writeup. Will fix, Thanks!
  
Rich says:

December 16, 2013 at 14:47

Hey, I’m trying to make the same modifications to that DX11 sample as you, but I’m struggling with the initial solution to the shimmering problem.

I’ve added code to calculate the desired mip level per edge, but I must be donig it wrong because I’m getting cracks in the mesh, and it basically isn’t working 🙂

Here’s what I’m doing in the hull shader’s patchconstantfunc:

uint width, height;
g_nmhTexture.GetDimensions( width, height );
for ( int i = 0; i <3; i++ )
{
uint iNext = ( i + 1 ) % 3;
float2 edgeUV = ( p[ iNext ].texCoord – p[ i ].texCoord );
float edgeLength = length( edgeUV * float2( width, height ) );

float L = edgeLength; // length of the edge in pixels
float T = g_vTessellationFactor.x; // edge’s tessellation factor
output.MipLevels[ i ] = log2( L / T ) + 1;
}

And then using those values in the domain shader:

float displacementMip = BarycentricCoordinates.x * input.MipLevels[ 0 ] +
BarycentricCoordinates.y * input.MipLevels[ 1 ] +
BarycentricCoordinates.z * input.MipLevels[ 2 ];

Could you help me out? Thanks!

- Sebastian Sylvan says:
  
  December 16, 2013 at 23:14
  
  I believe it’s your interpolation that’s wrong. It’s important that the displacementMip is constant for all vertices that’s on the same edge, your interpolation does not do that. I can’t remember exactly what interpolation I used, but I believe something simple like computing each edge weight by multiplying the two vertex weights for that edge would sort of work (whenever the output vertex is on an edge, the other two edges will have one vertex that’s zero), but it wouldn’t give you a nice linear ramp. I can’t recall exactly what I ended up with, sorry (I guess this is why you post code!).
  
  Perhaps something to do with using the minimum vertex weight for each edge as the edge weight? You’d have to normalize to make sure the weights add up to one at the end.
  
  A nice way to debug this is to give each edge a color, then output a vertex color based on the edge weights. If you get a solid color on each edge you’re golden, and then within each triangle you want a smooth transition.
  
tiba3195 says:

June 27, 2015 at 06:58

const float MipInterval = 50;
float mipLevel = clamp((distance(dout.PosW, gEyePosW) – MipInterval) / MipInterval, 0.0f, 6.0f);

play with the MipInterval and 6.0f

	xcbsmith on Why (most) High Level Language…
	xcbsmith on Why (most) High Level Language…
	xcbsmith on Why (most) High Level Language…
	mrtimuk on Two Performance Walls App…
	John Bootane on Two Performance Walls App…

A Random Walk Through Geek-Space

Brain dumps and other ramblings