When rendering geometries, of course, most graphics developers know how to implement instancing and do it nicely using whatever technique to speed up their rendering performance rather than traditional single-instance rendering. So, does this mean that we should always draw every object in the game using instancing technique? If not, what should our decision be made based on?
Before answering that question we must realize that there are both cpu and gpu costs we have to pay when rendering primitives. The cpu cost is associated to API call to process primitive which is a fixed cost regardless of size, and the gpu cost is obviously for actual drawing regard to how complex the geometry is. This tells us that drawing a few large meshes might be more efficient than drawing many small ones, yes, might be.
Cpu and gpu are executed in parallel in the pipeline meaning that either of them could be the bottleneck at anytime. It’s our role to find the closest balance point for the workload to reach the maximum performance. Giving a set of mesh instances, as mentioned earlier, drawing them one by one might be more efficient
than batch them or vise versa. Then what should we use as a criteria to decide whether to batch them or not? The answer is triangle count. Let’s have a look at this example,
Suppose that we know that 500 triangles is the best number to make decision in our rendering system that gives the most balance workload to cpu and gpu. This means that if we draw objects that has less than 500 triangles, without instancing, it will be cpu-bounded because there are too many API calls, no matter how fast the graphics card is, the performance will not change. In the other hand, if we draw high poly objects higher than 500, with instancing, it will be gpu-bounded since the gpu has too much work relative to cpu work yet we won’t achieve maximum performance because both workloads are not balanced. This already answers the first question of this article about whether we should use instancing technique everytime or not. The final answer is you shouldn’t.
In summary, we pick a number, in this example 500, to be a target breakpoint where we batch meshes if they have lower poly and not batch if they have higher poly count. Keep in mind that this number is just and example and might be varied depending on many factors like hardware specifications, platform, graphics api, instancing technique used, game logic complexity etc. So how do we find the best number?, the key is measurement. There are many useful performance measuring tools to help you track down the bottleneck of your rendering. Keep experimenting
until you find the best one for your system. In advanced case, you might also want to determine this breakpoint dynamically.
- Thitipong (Nick)