Once upon a time, when > was faster than < ... Wait, what?
If I understand correctly, performance-wise, flipping the sign of Z and the depth test is nothing but changing a < comparison to a > comparison. So, if I understand correctly and the author isn't lying or making things up, then changing < to > used to be a vital optimization for many games.
I didn't explain that particularly well, because it wasn't important. I just felt it was an interesting bit of trivia to add. I didn't intend to go over the algorithm specifically.
However, context is key. I never said that a < comparison was faster than a > comparison. Remember: we're talking about graphics hardware depth tests, not your CPU. Not operator<
.
What I was referring to was a specific old optimization where one frame you would use GL_LESS
with a range of [0, 0.5]. Next frame, you render with GL_GREATER
with a range of [1.0, 0.5]. You go back and forth, literally "flipping the sign of Z and the depth test" every frame.
This loses one bit of depth precision, but you didn't have to clear the depth buffer, which once upon a time was a rather slow operation. Since depth clearing is not only free these days but actually faster than this technique, people don't do it anymore.
The answer is almost certainly that for whatever incarnation of chip+driver was used, the Hierarchical Z only worked in the one direction - this was a fairly common issue back in the day. Low level assembly/branching has nothing to do with it - Z-buffering is done in fixed function hardware, and is pipelined - there is no speculation and hence, no branch prediction.