Tuning OpenGL performance for geometry throughput

You're probably not going to like this response....

I've found your problem: Intel GM965 with open source Linux drivers

While my current job does not hit your volume of data, we've rendered several million vertexes in VBO and Intel graphics hardware/drivers have proven useless. Get yourself an NVidia card (and get over having to use the binary driver, it just works) and you'll be all set. Doesn't even have to be current generation though a top end Quadro (if work is paying) or top end GTX 400 series (if you're paying or just trying to save some bucks at work) should do just fine w/ the latest drivers. You could also try to find a machine w/ this hardware to test on if upgrading your machine is not an option.