Hello there!
I am working on the rendering engine using Vulkan. My engine follows the GPU-driven approach. In the beginning of the frame the culling compute shader is executed. It iterates trough the instances buffer, performs a frustum and occlusion culling (with depth pyramid) and writes the resulting indirect command buffer. Then vkCmdDrawIndexedIndirectCount is performed. Material parameters are retrieved from another buffer using InstanceIndex (descriptorIndexing feature is used also).
Now I need to do much complex rendering using the different (possible arbitrary) pipelines (i.e alpha blending, tesselation, etc). And I'm stuck how to handle it with a GPU-driven indirect rendering.
I think about two approaches:
- Allocate separate indirect commands buffer for the every pipeline. Add pipeline ID's to an every entity in the instance buffer. My doubts are about how to predict a space needed to an every buffer (in worst case it should be numberofinstances*numberofpipelines. And it requires more atomic increments in the divergent branches of the culling shader. And finally, it requires to execute multiple DrawIndexedIndirect (one for each pipeline) even for zero amount of actual draws (since CPU don't know anything about culling results).
- Do dispatch (overwriting previous indirect buffer) then draw for an every shader in the scene. It requires to start new renderpass for every pipeline (since CmdDispatch cannot be performed within a renderpass), so I don't like that idea at all
Another option I see is to give-up and do GPU-driven rendering only for single most frequently used pipeline. And do traditional CPU driven for every other instances (loosing the ability to use depth pyramid culling for them). Don't want to do this because of need to support both ways (indirect and direct) together.
What would you do? How are large GPU-driven renderers handle it? Thank you for your answers!