This is a (relatively) simple OpenCL voxel raytracer (CPU rendering is also supported, but is very slow). The underlying data structure is a Sparse Voxel Octree (SVO). The main point of departure between this implementation, and the standard SVO formulation, is that each node in the tree stores not a single voxel, but rather a 4x4x4 grid of voxels. The advantages of this approach are twofold:
- Once we reach a leaf (equivalently, maximum mipmap level), and are raytracing in a 4x4x4 grid, we can forget about the tree structure until we exit the grid again, which simplifies the "inner loop". This effect would of course be magnified if we increased the grid size (to 8x8x8 or higher), but there are significant diminishing returns.
- On the GPU, these grids can be stored in texture memory, and are therefore cached, even on older cards (I'm using a GTX 260). On newer NVIDIA cards, global memory is also cached, but the use of textures may still be worthwhile, since the caching will take advantage of the 3D structure of the data.
Much of the complexity of this code comes from the fact that I try to keep the SVO state synchronized between the host and device (i.e. main memory, and graphics memory), and do so transparently, with smart pointers taking care of such tasks as marking blocks of data to be transferred to the device, and (at some point in the future) copy-on-write.
- prioritize the parts that are closest to "done" (not renderer or octree).
- Next files to document:
- Calculate normal vectors from depth buffer.
- Cast of direct lighting rays.
- Cast indirect lighting rays.
- Normal vectors
- Calculate approximate normal vectors from a depth map. I think that the best way to do this is probably using a Markov Random Field (MRF) model, learned using expectation propagation in a coarse-to-fine manner on the GPU. The advantage of calculating normals from depth, instead of storing the normals in each voxel, is that it should make things a lot smoother, if it's done right. Distant surfaces made of voxels will have normals found "as if" they were smooth, and will therefore be lit up as if they weren't made of voxels at all. For close-up voxels, however, the six sides will wind up with different normals, causing them to be lit up as cubes. I think that this is pretty much exactly the behavior that we want—try to not look like voxels, unless we're so close that there is no avoiding it, in which case we should make the best of it, and make them look like perfect little cubes.
- Avoiding self-occlusion
- When casting direct lighting rays, first push them off of the surface in the direction of the normal vector, by a distance equal to the mipmapping cube size. Notice that I said the "mipmapping cube size" (i.e. the smallest cube size which would be displayed at the current distance from the camera, which is not necessarily the actual cube size). If you can actually see the individual cubes, then we want them to occlude each other! to avoid artifacts, it might be best to just push a distance which is proportional to the current camera distance (the proportionality factor being the spread, I suppose), which should be a bit smoother.
- Indirect lighting rays
- When casting indirect lighting rays "into the octree", do not have some extra data hanging off of the quadtree nodes, and do not have an extra (small) quadtree sitting around just for lighting. Instead, simply cast the rays, and store the coordinates of their intersection points. Store these coordinates, along with lighting color/intensity, in a big 'ol buffer, and sort it on the GPU (probably a library for this). Then, find lower/upper bounds using binary search, to determine how many light rays hit a target location.
- Better textures
- Since we're using OpenCL textures now to store the color data, we can rotate or scale textures, with interpolation (in the colors, not the voxels). We could even do animation. One question that I have is how we can combine this with non-periodic tiling–the two seem to be in conflict.
- Asynchronous OpenCL synchronization
- Try to think of how we can perform the octtree updates (like in Octree::Subtract) in a separate thread than the main drawing thread (I mean on the CPU), and also how we can perform the Synchronized::Heap and Synchronized::TextureHeap synchronization of data between the host and device asynchronously, so that we are drawing the "previous" octtree at the same time as the new one is being sent to the device. Right now, even moderately-sized octree changes make everything extremely choppy.
- Marching cubes
- At the highest mipmap level (there is no point in doing it elsewhere), use marching cubes to get smooth surfaces. Assuming that we store 1 bit to indicate whether each voxel is present or absent in each 4x4x4 tile, this would increase the memory usage to 5x5x5, roughly doubling it (from 64 bits to 125 bits). It would also complicate changing the quadreee, since whenever we change a voxel at the edge of a leaf, we would need to also update its neighbors.
- Terrain generation
- GPU Gems 3 contains a really nice chapter on terrain generation. The images look very impressive–when I get to that point, I should use something like this. Of course, while this might take care of the shape of the terrain, texturing it is a different problem entirely. To that end, I think that looking into non-periodic tilings would be a good idea. The Wikipedia page on Wang tiles contains links to some papers which might prove helpful. The point of this is to reduce memory usage through tiling, while exploiting non-periodicity to reduce the "visual evidence".