How expensive is binding an FBO (framebuffer object)
As it's often the case with performance characteristics, there isn't a simple answer. It depends heavily on hardware architecture, driver optimizations, and usage conditions.
To give you the tl;dr first: Switching render surfaces can be between fairly inexpensive and very expensive. My recommendation is the following:
- Try various approaches, and benchmark them on all the platforms you care about.
- If you can't do option 1, and you still want to be confident that your code will perform well across a variety of architectures, group your rendering by render target, and avoid unnecessary switches.
I'm hesitant to give numbers on how many switches per frame are harmless. Mostly because I don't have them, and I don't like to guess. And because it depends on so many factors. I know from a normally very reliable source that on at least one platform, just 2 or 3 switches per frame can have a very substantial negative performance impact. Apart from this very bad case, my intuition would have told me that I would try to avoid switching more than 10-100 times. But that's really just a guess, and it's absolutely possible that you can get away with more, particularly if you're targeting a limited set of hardware.
Your question sounds like it covers two different scenarios. Let me discuss them separately:
Redundant Bind Calls
From your description, it sounds like you partly have this usage pattern:
glBindFramebuffer(GL_FRAMEBUFFER, fboId);
glDraw...(...);
glBindFramebuffer(GL_FRAMEBUFFER, 0);
glBindFramebuffer(GL_FRAMEBUFFER, fboId);
glDraw...(...);
glBindFramebuffer(GL_FRAMEBUFFER, 0);
In this case, you make glBindFramebuffer()
calls, but all your rendering goes to the same framebuffer. I would expect most drivers to detect that these bind calls are redundant, and not do any serious work. Even though there are sometimes philosophical disputes on whether drivers should detect redundant state changes, they mostly do.
It depends on how much you trust your GPU/driver vendors in this case. Unless I have benchmarked it, I tend to be on the paranoid side in cases like this. I would avoid the redundant calls if there is any reasonable way to do that within my software architecture.
Actual Framebuffer Switches
As I mentioned in the introduction, what happens here is highly GPU and driver dependent. Just switching the state to point rendering to a different target is cheap. But there can be much more to it.
You often have additional memory allocations associated with an active render target. Typical examples include buffers for early depth testing, and compressed color buffers. What happens with these allocations when you switch to a different render target depends on the hardware architecture, driver implementation, and potentially other conditions:
- As long as there is enough space, it might be possible to keep these allocations alive for all the render surfaces you cycle through, and switch between them along with the actual render target switch. In this case, there would be very little overhead.
- If these allocations are in on-chip memory, the space can be very limited. If there's not enough room to keep them all on-chip, the allocation for the old surface might be evicted to either video memory (if the GPU has any) or system memory, and the allocation for the new surface loaded back. This can be moderately expensive.
- The GPU/driver might not support evicting and reloading these allocations, and might have to resolve their content to the actual buffer (e.g. expand the content of the compressed color buffer, and write it back to the full color buffer). This is expensive.
Things get even more interesting with tiled architectures, which, in various flavors, are used very widely on mobile devices. The key selling points of tiled architectures are that they can run fragment shaders only once per pixel, and have to write each tile to the framebuffer only once, which reduces overall memory bandwidth for writing to the framebuffer, and also greatly improves locality of those writes because a whole tile is written at once.
As far as I know, the tile memory used to store the triangles that will be rendered for each tile is normally on-chip memory. So if you switch framebuffers, you have to either:
- Execute the whole process of rendering each tile, and writing the result back to the framebuffer.
- Save the tile memory for the old surface to system memory, and load the previously saved tile memory for the new surface.
I don't know which approach is most commonly used (and if I did, I probably couldn't share). But both of them sound very expensive, and defeat the whole purpose of using a tile based architecture if it happens too frequently.
How expensive is binding a framebuffer object
Not very much. Essentially the amount of work for the OpenGL implementation is not much more than after a double buffer swap.
What really hurts performance is switching the attachments of an FBO, because everytime this happens the validity of the FBO has to be checked which is a costly operation.
Interesting enough Vertex Array Objects (VAOs), are expensive to switch in the existing OpenGL implementations and so far the common practice in the industry is to rebind buffer objects and vertex attribute pointers/offsets. I'm just mentioning this, because superficially FBOs and VAOs look and behave very similar, but they exhibit very different performance profiles right now.