Rollback Prediction Management

NetworkDev19 · 2020-11-13T06:10:58

I have a simulation that ticks at 60hz and expect devices that may only update at 30hz to play too. I've struggled for awhile to handle the performance of rollbacks for prediction with this setup. And since I've been struggling with the engine and low level code doing micro optimizations, I figured maybe I need to reevaluate the higher level problem: the rollbacks themselves.. Does anyone have a particular resource that describes or demonstrates their rollback and reprediction technique for reference? The way mine is setup is every frame has x number of ticks at a fixed delta time to sample and then predict the players input, which gets sent at the end of the frame. So every frame at 30fps would have 2 prediction ticks. I also do a “fractional” prediction tick which is an additional prediction tick that uses left over frame time to do a smoothed out prediction tick. This fractional prediction tick always gets rollbacked every frame. The others do not. The others get rollback when the game mispredicts OR their sim falls behind (dropped packets or device freeze/stall). If the hash of the prediction world does not match the hash of the server world, then it must rollback to the server world state. Then, all pending input from the server state up to the last prediction tick are resimulated. On a 30 fps device running a 60hz sim with 40ms rtt (assume 20ms one way clean), I have something like 20-40 inputs not yet confirmed. So I have to resimulated that many ticks in a single frame. This is killer for mobile devices in particular. Rollbacks destroy their performance. And often it's not even a misprediction, it's just lag. The client has dropped packets or something or a frame stalls, then they have to rollback because they're behind the sim. Trying to think of ways to soften this! Maybe there's some kind of method of checking input values, seeing if they haven't changed, and merging/skipping certain parts of the resimulation in a rollback?

ddlox

309

November 03, 2020 06:04 PM

Don't sim multiple predictions in 1 frame -i could hear my gran'ma screaming-

You want to simulate them from the rollback frame (the one that failed) to the current frame. Then grab local inputs again and predict from the current frame on. Look at the diagram u posted of ggpo, it shows you that failure between frame 3 and frame 4, right?, but u rollback to frame 1 and u sim to frame 4 ( the reason u roll back to frame 1 is because remote inputs have arrived and are informing u that your prediction is wrong and these remote inputs are still on frame 1. So u rollback to frame 1 (or whatever that failed frame will be in your case))

So you rollback to the frame of which the remote inputs (when they have arrived) indicate that those players are at! And then you simulate to the current frame 4 (which means you will be repredicting in some of these frames) and then finally you can gather local input data again from frame 5 and predict (and so on and the cycle goes on etc…)

That's kind of what ggpo does when it rolls back (in a nutshell). Also, that's why ggpo gathers all inputs from all players BEFORE the next frame is executed, this gives the best predictions (it's not about sharing network performance or load), it's about giving your system the best chance of predicting correctly (but yes unfortunately when prediction fails then u rollback as mentioned)

I hope this makes more sense now, I predict that u understand it better now ?

Have fun ?

NetworkDev19

0

Author

November 03, 2020 09:05 PM

ddlox said:
Don't sim multiple predictions in 1 frame -i could hear my gran'ma screaming-
You want to simulate them from the rollback frame (the one that failed) to the current frame. Then grab local inputs again and predict from the current frame on. Look at the diagram u posted of ggpo, it shows you that failure between frame 3 and frame 4, right?, but u rollback to frame 1 and u sim to frame 4 ( the reason u roll back to frame 1 is because remote inputs have arrived and are informing u that your prediction is wrong and these remote inputs are still on frame 1. So u rollback to frame 1 (or whatever that failed frame will be in your case))
So you rollback to the frame of which the remote inputs (when they have arrived) indicate that those players are at! And then you simulate to the current frame 4 (which means you will be repredicting in some of these frames) and then finally you can gather local input data again from frame 5 and predict (and so on and the cycle goes on etc…)
That's kind of what ggpo does when it rolls back (in a nutshell). Also, that's why ggpo gathers all inputs from all players BEFORE the next frame is executed, this gives the best predictions (it's not about sharing network performance or load), it's about giving your system the best chance of predicting correctly (but yes unfortunately when prediction fails then u rollback as mentioned)
I hope this makes more sense now, I predict that u understand it better now ?
Have fun ?

Thanks but that was what I was saying before. Ggpo doesn't seem to apply to my model. That's like lockstep. My players don't care about the inputs of other players. Time progresses regardless of input. You mentioned remote input arriving. In the case of the local player, there is no remote input.

ddlox

309

November 03, 2020 11:15 PM

ok i pointed u to ggpo to see if it could help in your case but u found that it doesn't; ok understood.

let's step back a bit… u previously said:

Because device performance is so low due to having to sim multiple ticks a frame. ?

and much earlier:

The prediction ticks just take up so much frame time.

if u sim at 30fps or 60 or whatever, how much time exactly does each of those multiple ticks take per frame? and why?

i understand that you're running on mobile with all optimization guns blazing, but u need to see if u can log or profile those prediction ticks times somehow, otherwise there's a chance we'll keep giving you advice which won't fit your model.

There is no question that something is eating up those times, but we will never know what it is if u don't clock it… u said if u remove the tech/experimental engine, u get better results… well… clock your code without this engine, then put the engine back on and clock your code again and see where the difference comes from?….

is this not possible ?

Until then ?

NetworkDev19

0

Author

November 04, 2020 12:15 AM

ddlox said:
ok i pointed u to ggpo to see if it could help in your case but u found that it doesn't; ok understood.
let's step back a bit… u previously said:
Because device performance is so low due to having to sim multiple ticks a frame. ?
and much earlier:
The prediction ticks just take up so much frame time.
if u sim at 30fps or 60 or whatever, how much time exactly does each of those multiple ticks take per frame? and why?
i understand that you're running on mobile with all optimization guns blazing, but u need to see if u can log or profile those prediction ticks times somehow, otherwise there's a chance we'll keep giving you advice which won't fit your model.
There is no question that something is eating up those times, but we will never know what it is if u don't clock it… u said if u remove the tech/experimental engine, u get better results… well… clock your code without this engine, then put the engine back on and clock your code again and see where the difference comes from?….
is this not possible ?
Until then ?

Yes I'm looking for ways to ditch it. Unfortunately that choice is likely out of my hands but I can use the results to poke the engine creator.

Sorry I'm not trying to be dismissiv,I I appreciate thehelp I guess I'm just trying to make sure my prediction loop is actually correct and I'm the one not making the mistake. Like doing multiple ticks per frame when prediction is okay, and having to do up to 30~ ticks in 1 frame when a rollback happens.

Kylotan

10,512

November 04, 2020 12:02 PM

NetworkDev19 said:
I'm not having rollbacks every frame, but I am having issues where rollbacks have to happen because I inevitably mispredicts. Because device performance is so low due to having to sim multiple ticks a frame. ?

This still sounds like a bug to me. You should be able to predict a long way into the future without anything going wrong, unless some other entity is affecting the simulation. Whether it's one tick or one hundred ticks in a frame shouldn't matter. Either you're capable of correctly calculating the results of your local input, or you're not.

NetworkDev19

0

Author

November 04, 2020 12:43 PM

Kylotan said:
NetworkDev19 said:
I'm not having rollbacks every frame, but I am having issues where rollbacks have to happen because I inevitably mispredicts. Because device performance is so low due to having to sim multiple ticks a frame. ?
This still sounds like a bug to me. You should be able to predict a long way into the future without anything going wrong, unless some other entity is affecting the simulation. Whether it's one tick or one hundred ticks in a frame shouldn't matter. Either you're capable of correctly calculating the results of your local input, or you're not.

The issue is that the server is running 60hz. Clients are trying to keep up and predicting ahead of that time wise. But if performance is too low, their progression of time slows. So their prediction falls behind what the server is calculating. Thus the need of a rollback. The client will diverge if they can't keep up with the server. So it absolutely matters if it's one tick or 100 ticks in a frame no? The client is running, say, 15fps or lower. The server will always be ahead of the client. Then the client has to try to rollback and repredict, which further kills performance. The tick time in the client is always now in a “catch-up”. The only way to get back in sync is to rollback, skip it's predicted tick time into future to readjust, then resim. It's constantly going to be trying to do this because perf is low. And because of that, it will mispredict. Does that make sense? Can I elaborate on any particular step that maybe I'm doing something wrong on?

Also, another issue comes up as a side effect. The server has to process inputs for specific tick times. If the client predicts their input will arrive on the server at tick 10, but the server is at tick 20 because the client has fallen behind when it should be ahead for prediction, the server has to ignore the input or it won't be deterministic. So that too will influence mispredictions right? A client who is too far in the past (thanks to low frame rate) will have difficulty sending commands for the right tick which means it will predict the input but the server will reject it. Misprediction occurs.

This however I think can be combatted by using the frame time delta and trying to guess at the server's real tick time.

Edit: oh fudge. This made me think of something. Quick side question, is it more “correct” to send a packet with input at the beginning of the frame or at the end? Right now I'm sending inputs at the end of the frame. This does not account for the current frame time. So if you do an input, the game samples it, then does prediction (which is slow currently) and then sends the input at the end. That frame time doing prediction is not accounted for in the packet being sent… this could be significant to my woes about missed input rollbacks issue mentioned at the end there.

Kylotan

10,512

November 04, 2020 04:05 PM

NetworkDev19 said:
The issue is that the server is running 60hz. Clients are trying to keep up and predicting ahead of that time wise. But if performance is too low, their progression of time slows.

Right, that's the bug. The idea of a tick-based system is that each tick represents a fixed and predictable unit of time. You can't have time going slower or faster*. You can drop rendering frames to save processing time, but you can't define ticks as 1/60th of a second and then have some seconds where ticks cover 1/50th of a second. The simulations will play out differently and therefore you lose most of the benefits.

It's fine for the client to lag behind a little more from time to time. Whether this happens due to network lag or other performance characteristics doesn't really matter. Nothing needs repredicting, because nothing is wrong - we just haven't caught up yet, which takes time. The idea is that you don't render until you've caught up on your tick updates. If the client can't catch up, e.g. because the CPU burden is just too high, then your model parameters are not suitable for the constraints you have. You will need to adjust the parameters to fit within the resources available.

(* Actually, you can, but that's an added complication, and you need to get the basics working first. Watch the video on Overwatch's architecture and netcode to hear about how they stretch time.)

NetworkDev19 said:
If the client predicts their input will arrive on the server at tick 10, but the server is at tick 20 because the client has fallen behind when it should be ahead for prediction, the server has to ignore the input or it won't be deterministic. So that too will influence mispredictions right?

Sure. However, once you're in this position, the predictions are mostly worthless because the rest of the world has already overtaken that point. The server's already told everyone else where that client was on tick 11, 12, 13, etc. Not much point rolling back and reapplying locally when you know that state is invalid - better to just snap to whatever is the latest server state so that you've caught up.

This is not generally a tenable situation though, and for reasons mentioned above, you want to get your system into a position where this is a last resort for dealing with extreme lag spikes, not a standard part of most player's experience.

NetworkDev19 said:
is it more “correct” to send a packet with input at the beginning of the frame or at the end?

It doesn't really matter when you send it. What matters is how you reason about it. Each tick takes the current world state at T=N plus the current inputs and produces a successor world state T=N+1. When you receive a state from the server, you need to know what N was for that state, so that you can find your corresponding input N and re-do that tick (and all subsequent ones).

NetworkDev19

0

Author

November 04, 2020 04:18 PM

Kylotan said:
NetworkDev19 said:
The issue is that the server is running 60hz. Clients are trying to keep up and predicting ahead of that time wise. But if performance is too low, their progression of time slows.
Right, that's the bug. The idea of a tick-based system is that each tick represents a fixed and predictable unit of time. You can't have time going slower or faster*. You can drop rendering frames to save processing time, but you can't define ticks as 1/60th of a second and then have some seconds where ticks cover 1/50th of a second. The simulations will play out differently and therefore you lose most of the benefits.
It's fine for the client to lag behind a little more from time to time. Whether this happens due to network lag or other performance characteristics doesn't really matter. Nothing needs repredicting, because nothing is wrong - we just haven't caught up yet, which takes time. The idea is that you don't render until you've caught up on your tick updates. If the client can't catch up, e.g. because the CPU burden is just too high, then your model parameters are not suitable for the constraints you have. You will need to adjust the parameters to fit within the resources available.
(* Actually, you can, but that's an added complication, and you need to get the basics working first. Watch the video on Overwatch's architecture and netcode to hear about how they stretch time.)
NetworkDev19 said:
If the client predicts their input will arrive on the server at tick 10, but the server is at tick 20 because the client has fallen behind when it should be ahead for prediction, the server has to ignore the input or it won't be deterministic. So that too will influence mispredictions right?
Sure. However, once you're in this position, the predictions are mostly worthless because the rest of the world has already overtaken that point. The server's already told everyone else where that client was on tick 11, 12, 13, etc. Not much point rolling back and reapplying locally when you know that state is invalid - better to just snap to whatever is the latest server state so that you've caught up.
This is not generally a tenable situation though, and for reasons mentioned above, you want to get your system into a position where this is a last resort for dealing with extreme lag spikes, not a standard part of most player's experience.

NetworkDev19 said:
is it more “correct” to send a packet with input at the beginning of the frame or at the end?
It doesn't really matter when you send it. What matters is how you reason about it. Each tick takes the current world state at T=N plus the current inputs and produces a successor world state T=N+1. When you receive a state from the server, you need to know what N was for that state, so that you can find your corresponding input N and re-do that tick (and all subsequent ones).

I think you misunderstood the first part. The ticks are fixed timesteps. If it's 60hz, then the timesteps is 16ms per tick no matter what. It's a question of how many at that point. And if the client has to run 2+ ticks in a frame every frame, they continue to fall behind often to keep up.

When I said progression of time slows, I literally mean the time it takes for them to tick n frames it thinks it needs to predict ahead of the server, the server is already past those ticks by the time it receives the inputs.

And yes we do the similar time scaling as overwatch to speed up or slow down.

Kylotan

10,512

November 04, 2020 04:47 PM

I'm saying that it can't work the way you want it to work. You have to be able to run sufficient tick steps to be able to keep pace with the other simulations. And if you have submitted inputs to the server in time for them to be processed, there should be no mispredictions in normal operation except for the rare cases where some other entity has interacted with you on the server and changed the results.

It's normal for different clients to need to process different numbers of ticks per rendering frame. It's also okay to have some variance from frame to frame because you don't want to process partial ticks, or because you had a short-lived bit of lag. But if you can't execute enough ticks per second to keep up with the server in general, it's game over. The whole model is built around the assumption that every client can provide inputs in time and can process them in time.

I would say your options are basically:

change the game to use longer ticks, so that the overhead is lower
reduce your rendering time so that you have more CPU time left over for handling ticks
change your networking model - e.g. to a cheaper algorithm that attempts to converge simulations, rather than using deterministic steps

hplus0603

11,917

November 04, 2020 04:51 PM

if the client has to run 2+ ticks in a frame every frame, they continue to fall behind often to keep up.

This is true if there are no locality benefits (caching, for example) and if there is no other cost for a “frame” (such as rendering.)

If there are other costs that amortize per frame, then the marginal cost of an extra tick is smaller than the cost of the first tick in a frame, and running more ticks will give you some amount of benefit.

That being said – if your simulation can't be run in real time on a given CPU, then the simulation can't be run in real time on that CPU. That's why games have “minimum spec” recommendations. If you want to target some ill defined market like “the web,” then you have to make a very hard trade-off between what you want your game experience to be, and how harsh you need to be against people whose clients don't keep up.

enum Bool { True, False, FileNotFound };

🎉 Celebrating 25 Years of GameDev.net! 🎉

Rollback Prediction Management

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

🎉 Celebrating 25 Years of GameDev.net! 🎉

Rollback Prediction Management

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines