Advertisement

Fence usage in double buffering

Started by September 13, 2019 05:49 PM
3 comments, last by MJP 5 years ago

By looking at Microsoft's working code samples and people's posts on forums, it seems that people are doing double buffering in the following way:

                                                                                                                             fence 0

frame 0:  | submit command | execute command ..................................................... |

frame 1:                                                   | submit command | execute command .....|.........................................

 

It seems that, people tend to submit and execute their commands for the current frame first, _then_ fence and wait for previous frame to finish. This seems counter-intuitive to me, since this potential overlap means duplicating temporary per-frame data.

My question is why not just do it this way instead:

                                                                                                                  fence 0

frame 0:  | submit command | execute command ..........................................|

frame 1:                                                                                                           | submit command | execute command .....|.........................................

 

This way, CPU work still overlaps with GPU work, but no duplication needs to happen. Yes, de-overlapping frame 0 and frame 1 seems to be bad for performance, but can it be that bad? I feel like if frames are finished on time, this overlap should never occur in the first place? So I'm wondering why the majority prefers the first approach to the second approach, even though the second approach is simpler and seems more natural to me. Thanks. 

8 hours ago, Chen96 said:

This seems counter-intuitive to me, since this potential overlap means duplicating temporary per-frame data.

And that's bad because...? :)

It's called double buffering exactly because of this. The "double" in double buffer means the region of memory you write at frame 0 is different from the one you will be writing at frame 1, at least for the data that must change every frame (like view & world matrices, etc).

If using an analogy, there's two trucks: You load packages in the first truck, when you're done the truck kicks off and now you start loading more stuff into the second truck while the first one is on route to destination and back.

If you're lucky, by the time you're done loading truck #2, truck #1 has already arrived. If not, you'll have to wait a little.

8 hours ago, Chen96 said:

This way, CPU work still overlaps with GPU work, but no duplication needs to happen.

No, there is not. The GPU won't start executing your commands until you submit them, and the CPU won't do anything more because it's waiting on the fence.

As per the truck analogy, truck #1 can't start until you're done loading all the packages, and once that's done; you sit idle for truck #1 to come back before you start working on truck #2. That's an inefficient use of your time.

Quote

but can it be that bad?

Yes, the framerate difference can be up to 2x. That's a lot. (YMMV, depends on the kind of workloads you're doing)

Advertisement

Hi Matias, Thanks for the great explanation. I still have one followup question:

When you say there is no GPU and CPU work overlap, I don't see how that is the case. The only part of the CPU work that needs to be stalled for fence, is the part of the CPU work that populates the command list. If I order my CPU workload so that the commandlist population is at the very end of my CPU frame, then it seems the latency that is reduced by fencing _after_ submitting the command list is the amount of time it takes for CPU to fill out the command list. Is that correct?

 

If you wait for the GPU to finish before building command buffers, you're going to end up with a bubble on the GPU where it's waiting for the CPU to submit more commands. That bubble, which will be > the time it takes for you to build command buffers, will take away from your per-frame GPU budget.

This topic is closed to new replies.

Advertisement