I'm doing some speed comparisons between different implementation of "batching" for my game engine ND2D to display thousands of objects at a high framerate. And I'm wondering, why one implementation is faster than the other and if this is the expected result or I messed something up ;)
1. The cloud creates a large vertexbuffer including all triangles for 6000 sprites. This vertexbuffer is modified and uploaded each frame. One drawcall is used to render.
FP11 Debug Player: 21 FPS, 100% CPU load
FP11 Release Player: 60 FPS, 100% CPU load
2. The batch uses "real" geometry batching, by creating a static vertexbuffer a large as the max batch size (21 objects) and rendering all 21 objects in a single drawcall passing 21 different location matrices to the shader. To render all 6000 objects, this drawloop has to run 286 times, so 286 drawcalls are being spent.
FP11 Debug Player: 18 FPS, 113% CPU load
FP11 Release Player: 50 FPS, 130% CPU load
It seems like saving drawcalls is more efficient in Molehill (Doing more calculations on the CPU side), than saving calculation time on the CPU and let the GPU do the math.
How are the tests running on your machines?
You can check out the sources at: https://github.com/nulldesign/nd2d