Culling techniques for rendering lots of cubes
Render front to back. To do so you don't need sorting, use octrees. The leaves won't be individual cubes, rather larger groups of those.
A mesh for each such leaf should be cached in a vertex buffer. When you generate this mesh do not generate all the cubes in a brute-force manner. Instead, for each cube face check if it has an opaque neighbor within the same leaf, if so you don't need to generate this face at all. Thus you render only the surface between solid cubes and empty space. You can also unify neighboring faces with the same material into a single long rectangle. You can also separate the mesh to six sets, one set for each principal direction: +/-XYZ faces. Draw only those sets of faces that may face the camera.
Rendering front to back doesn't help that much by itself. However you can use occlusion culling offered by modern hardware to benefit from this ordering. Before rendering an octree leaf, check if its bbox passes the occlusion query. If it doesn't pass you don't need to draw it at all.
Alternative approach to occlusion query may be ray-tracing. Ray tracing is good for rendering such environment. You can cast a sparse set of rays to approximate what leaves are visible and draw those leaves only. However this will underestimate the visibility set.
Here is what I've learned while writing my own clone:
- Don't just dump every cube into OpenGL, but also don't worry about doing all of the visibility pruning yourself. As another answer stated, check all 6 faces to see if they are fully occluded by an adjacent block. Only render faces that could be visible. This roughly reduces your face count from a cubic term (a volume of cubes n*n*n) to a squared term (surface of only about n*n).
- OpenGL can do view frustrum culling much faster than you can. Once you have rendered all of your surface faces into a display list or VBO, just send the entire blob to OpenGL. If you break your geometry into slices (or what Minecraft calls chunks) you might avoid drawing the chunks you can easily determine are behind the camera.
- Render your entire geometry into a display list (or lists) and redraw that each time. This is an easy step to take if you're using immediate mode because you just wrap your existing code in glNewList/glEndList and redraw with glCallList. Reducing the OpenGL call count (per frame) will have a vastly bigger impact than reducing the total volume of polygons to render.
- Once you see how much longer it takes to generate the display lists than to draw them, you'll start thinking about how to put the updates into a thread. This is where conversion to VBOs pays off: The thread renders into plain old arrays (adding 3 floats to an array instead of calling glVertex3f, for example) and then the GL thread only has to load those into the card with glBufferSubData. You win twice: The code can run in a thread, and it can "draw" a point with 3 array writes instead of 3 function calls.
Other things I've noticed:
VBOs and display lists have very similar performance. It's quite possible that a given OpenGL implementation uses a VBO internally to store a display list. I skipped right by vertex arrays (a sort of client-side VBO) so I'm not sure about those. Use the ARB extension version of VBOs instead of the GL 1.5 standard because the Intel drivers only implement the extension (despite claiming to support 1.5) and the nvidia and ATI drivers don't care.
Texture atlases rule. If you are using one texture per face, look at how atlases work.
If you want to see my code, find me on github.
Like others I've been playing around with a block world "engine" using Ogre and writing some articles as I go (see Block World Articles). The basic approach I've been taking is:
- Only create the visible faces of blocks (not faces between blocks).
- Split up world into smaller chunks (only necessary for faster updating of individual blocks).
- Combine block textures into one texture file (texture atlas).
Just using these can get you very good performance on large simple block worlds (for example, 1024x1024x1024 on decent hardware).