Idea for efficient AO type raytracing applications


Quick idea that I had a few weeks ago regarding raytracing things like AO and potentially GI/FG. The basic observation is that for AO ray tracing you have far more rays than you have triangles, so it really doesn’t make sense that we only focus on spatial paritioning for the triangles. I suspect there’s some bias here because we’re so used to thinking of our rays as travelling around a static scene, so we just kind spatially subdivide the scene without even thinking about it. For AO the rays are just as static as the scene so we really should be looking into doing something smarter with them.

So the first implication of this is the following: organize your rays into some sort of spatial subdivision structure (e.g. an R-tree). Efficiency of construction is probably a concern here since we have a boatload of rays. Then, send your triangles down this structure one at a time finding which rays it intersects, jotting down the distance for each (so you can keep track of the minimum). You may want something more clever like some kind of cone-based tree to gather up clusters of rays facing roughly in the same direction from roughly the same location, but AO rays in particular tend to have a short maximum distance and are therefore pretty “stubby”, so something as simple as an AABB wouldn’t be terribly inefficient, I suspect.

Then of course the next step is to put the scene geometry in a spatial sudivision structure as well. Now you have two trees that sudivides the scene and rays respectively, so you can simply join them spatially to very efficiently prune out areas of your rays that can not touch areas of the scene. You can think of this as amortizing the cost of traversing the scene structure over all rays, since most of the scene traversal will be done for large chunks of your rays at the same time rather than having to repeat it for each ray. Basic CPU-pseudo-code would be:

raySceneJoin( rayNode, sceneNode ){

if ( rayNode and sceneNode are both leafs){




// We have at least one non-leaf. Figure out what children we have.

rayChildren = rayNode.isLeaf ? [rayNode] : rayNode.children

sceneChildren = sceneNode.isLeaf ? [sceneNode] : sceneNode.children

foreach rayChild in rayChildren {

foreach nodeChild in sceneChildren{

// Any pairs of nodes that intersect needs to be refined further

if ( intersection( rayChild, nodeChild) ){

raySceneJoin( rayChild, nodeChild)





This would give you something like ~O( log m log n ) complexity where m and n is the number of rays and triangles respectively  (as opposed to O(m log n) which is the usual “foreach ray, trace against scene structure” type of AO computation).

This can obvoiusly be done by the GPU in a variety of ways, e.g. by traversing the trees depth first (keep a global queue of ray/scene node pairs, and just keep adding intersecting pairs to it until you hit a child, read from the queue until it’s empty), although there are further things you may want to do to improve performance and dynamic space utilization (e.g. use a short local stack for parts of the traversal).


4 thoughts on “Idea for efficient AO type raytracing applications

  1. One concern I would have is that you’ve transformed a from a O(m log n) time complexity problem with storage based on n, lets say O(n log n) space, into an O(log m log n) time complexity problem with storage requirements based on both m and n, say O(m log m + n log n) space And by your argument above m is larger than n. The spatial subdivision structure you describe will take up a lot of space, and you’ll have to wait a while to get back your AO results pushing things out of cache. If you limit your scope to AO it probably isn’t bad, but for general purpose ray tracing, it seems like some sort of approximation to the visibility skeleton.

    Did you get a chance to try it?

    • Seems to me that storage will be O(n) in the first case, and O(m+n) in the second. If we assume a balanced tree, since each triangle/ray has one node wrapping it in the leaf, then 1/K nodes wrapping that which, for K=2 (binary tree), is asymptotically N + 0.5N + 0.25N… = 2N.

      It’s true that rays can be generated “on the fly” so it’s conceivable that even organizing them in a hierarchy is enough to significantly hur the benefit of being able to more agressively prune the rays.. Maybe you could let your ray hierarchy be some kind of dilated version of your scene hierarchy since rays, for AO, are associated with surfaces, that way you only need one tree joined with itself (but with some sort of dilation operation related to AO ray lengths). That may be overly conservative though.. Who knows.

      No, I have not tried this, sadly. I hadn’t heard about the visibility skeleton. Thanks!

    • Hey, that’s a pretty cool paper! Gives me half a dozen ideas immediately. For example, if your subdivision happens from scratch every frame, then surely the right subdivision is one which makes the num_rays*num_tris on each side of the splitting plane roughly equal (rather than just splitting the number of tris in half – that’s old-school off-line-preprocessing thinking!).

      Also, just splitting it in a view-aligned fashion seems promising (i.e. 2D screen space subdivision, plus a linear Z subdivision, in a kd-tree style – so your splitting planes will always fall between rows or columns of pixels). That way your rays will never straddle a subdivision, and the cost metric for rays is simplified (you know exactly how many rays will be on either side of a subdivision without having to count), plus of course that there’s no actual ray-splitter intersection required other than in the Z dimension and that’s as trivial as in the AASS case.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s