I implemented a form of Bounding Volume Hierarchies (BVH) based on Axis Aligned Bounding Boxes (AABB) to speed up geometry - ray intersection testing. The data structure is called Bounding Box Hierarchy (BBH, not to be confused with BBL).

BBBackground

In the current implementation of my ray tracer I simply test all triangles of each mesh against the traced ray. This obviously results in a linear runtime complexity and becomes sluggish when using larger objects.

Using a bounding volume (an AABB in my case), it is possible to make a quick decision whether a larger group of triangles is worth testing or can all be skipped:

Testing the ray against the bounding box is cheap, compared to the triangle intersection test, and rays that do not hit the box need not be checked against the triangles at all.

To test if an AABB is hit by a ray, I check the intersection intervals for each axis and see if their intersection is non-empty:

Hierarchy of Bounding Bunnies Boxes

If using a bounding box can avoid a lot of triangle tests, why not use more boxes? Meshes can be broken into multiple pieces and those pieces tested individually:

Can be split into two boxes, left and right:

Each box can be further divided, until the boxes only contain a single triangle. Organizing these boxes into a tree yields a hierarchy:

When tracing, this hierarchy can be traversed from the root node. When a bounding box is hit, its children are tested as well. When a box is missed, its children can be skipped:

How exactly this tree is best built, traversed, and stored are active areas of research still. My current approach for each is suboptimal but great for a first implementation. I expect to see great speedups, especially on large meshes.

Constructing the BBH

Building the BBH is surprisingly simple; the algorithm I use is very similar to how a merge sort works. In each iteration I sort the triangles of the mesh, based on their farthest vertex along an axis:

Then split the processed range into two:

Notice how the two smaller bounding boxes actually overlap. I found this confusing at first, but it makes sense, since they enclose triangles, not just vertices.

Continuing to recurse on the smaller ranges, I rotate the active axis between X, Y, and Z. This yields a log-linear runtime in the number of triangles $O(n \log n)$. Later on, I might look into algorithms that construct the BBH on the GPU.

Each node stores the AABB associated with it, the indices of its children, its parent’s index, and the range of the triangles contained in it:

struct BBHNode
{
    AABB box;
    int left_child = -1;
    int right_child = -1;
    int skip_index = -1;

    int tris_begin;
    int tris_end;
};

The nodes are stored in a flat array, meaning tree traversal is simplified down to looping through an array iteratively rather than emulating recursion using a stack.

Upon missing a node, the algorithm has to skip all of its children. This is what the skip_index is used for. Because I build and flatten the tree in a pre-order fashion, the next node outside of a parent’s entire subtree is allocated immediately after that subtree is fully constructed. Therefore, if an AABB is missed, I can instantly bypass its entire branch of children by jumping straight to that skip_index.

To construct the BBH I use a recursive implementation:

int generateBBHLayer(std::vector<BBHNode> &nodes,
                        std::vector<Triangle> &triangles,
                        int tris_begin, int tris_end,
                        int depth,
                        const Mesh &mesh)
{
    if (tris_end - tris_begin < 1) return -1;

    const auto trianglesInBox = std::span{triangles}.subspan(tris_begin, tris_end - tris_begin);
    const auto parent_bbox = findBoundingBox(trianglesInBox, mesh);
    const auto parent_index = static_cast<int>(nodes.size());

    nodes.push_back(BBHNode{
        .box = parent_bbox,
        .tris_begin = tris_begin,
        .tris_end = tris_end,
    });

    if (tris_end - tris_begin > 1)
    {
        // Sort and split range
    }

    const auto next_node = nodes.size();
    nodes[parent_index].skip_index = next_node;

    return parent_index;
}

The function returns the index of the newly created node, so the recursive step can assign it to the children attributes:

if (tris_end - tris_begin > 1)
{
    const auto direction = Vec3(depth%3 == 0, depth%3 == 1, depth%3 == 2);

    std::sort(triangles.begin() + tris_begin, triangles.begin() + tris_end, [direction, &mesh](const Triangle &tri1, const Triangle &tri2){
        return maxExtentAlong(tri1, direction, mesh) < maxExtentAlong(tri2, direction, mesh);
    });

    const auto tris_mid = (tris_begin + tris_end) / 2;

    if (tris_mid - tris_begin >= 1) {
        const auto left_child_index = generateBBHLayer(nodes, triangles, tris_begin, tris_mid, depth+1, mesh);
        nodes[parent_index].left_child = left_child_index;
    }

    if (tris_end - tris_mid >= 1) {
        const auto right_child_index = generateBBHLayer(nodes, triangles, tris_mid, tris_end, depth+1, mesh);
        nodes[parent_index].right_child = right_child_index;
    }
}

Using the BBH during tracing

Tracing with the BBH is also surprisingly simple, and it involves little intrusion to the existing intersection algorithm. Instead of looping through all triangles of a mesh, it just has to be able to deal with a subrange. Then, the BBH-aware tracer traverses the BBH tree and calls the old intersection test function:

std::optional<Intersection> getIntersectionImpl(
    const Ray &ray_in_world,
    const TriangleMesh &tris
){
    std::optional<Intersection> best = std::nullopt;
    
    const auto &bbh = tris.bbh;
    const auto ray = tris.modelToWorld.applyInverse(ray_in_world);

    int bbox_index = 0;
    while (bbox_index < bbh.nodes.size())
    {
        const auto &node = bbh.nodes[bbox_index];
        const auto bboxHit = testBoxIntersection(node.box, ray);

        if (bboxHit.has_value())
        {
            if (node.isLeaf())
            {
                getIntersectionTris(ray, tris, node.tris_begin, node.tris_end, best);
            }

            ++bbox_index;
        }
        else
        {
            bbox_index = node.skip_index;
        }
    }

    return best;
}

Notice how instead of transforming each triangle during the tracing, I now apply the inverse of the world-to-model transform of the currently traced mesh. This approach basically applies the transform to the BBH for free as well.

Results

Finally, I can render an image of the Stanford Bunny! This would have taken ages without the BBH.

I ran some rudimentary benchmarks using a maximum path length of 10, the Cornell Box, and 9 ceiling lights (as in the setup above) on a laptop RTX GPU with various meshes:

  • Stanford Bunny: 70'000 tris
  • Blender Monkey: 1'000 tris
  • Icosahedron: 20 tris

I got the following runtimes for tracing a single path through each pixel on a 640x640 image:

The performance gains are undeniable. For instance, rendering the highpoly Stanford Bunny mesh went from a grueling 31s baseline down to just 100ms, a jaw-dropping ~310x speedup! I am really happy with the results. The baseline performance can be further improved with optimizations later on.

Optimization opportunities

Multiple of the above tests could be improved further:

  1. Intersection Tweaks: The triangle-ray and bbox-ray intersection code can be optimized at instruction level.
  2. Surface Area Awareness: The BBH construction currently ignores the physical size of triangles. Because of this, meshes with highly detailed parts mixed with large primitive areas will perform poorly. Implementing a Surface Area Heuristic (SAH) split logic would improve this.
  3. Smart Traversal: BBH tracing currently takes a fixed traversal path. Dynamically choosing which child box to visit first (based on ray direction) could cut down heavily on unnecessary box tests.
  4. Deferred Evaluations: The triangle-ray intersection calculates all possibly needed data members immediately. This work could be deferred to only execute once for the final, verified closest hit triangle.

I will explore these opportunities in the coming weeks.

Conclusion

Bounding Box Hierarchies are a great tool for speeding up ray-triangle intersection tests. They were also easy and fun to code. There are plenty of optimization opportunities right now in my code, so I am looking forward to tackling those next.

As a bonus, I rendered the bunny in glass: