🎉 Celebrating 25 Years of GameDev.net! 🎉

Not many can claim 25 years on the Internet! Join us in celebrating this milestone. Learn more about our history, and thank you for being a part of our community!

Rollback Prediction Management

Started by
33 comments, last by NetworkDev19 3 years, 7 months ago

Thanks to both of you for your input, it's helping me “rubber duck” the logic that was here before me and the stuff I've added ontop.

So the rollbacks due to performance problems is of course annoying and I can't do too much about them. But I have noticed that my time adjustment calculations are a bit off perhaps.

Prediction tick time on the client has to be in the future, aka ahead of the server. So to figure out that magic number I want to predict, I do the following calculation:

  1. Last Server Tick + (rtt in seconds * server tick rate) + maximum unack'd player inputs
    Example: server packet received is tick 4, rtt is 30ms plus tickrate is 60hz is 2 rounded, max unack'd player inputs is 10 = tick 16
  2. if previous predicted tick < 13 (3 ticks less than preferred) = snap predicted time forward because the client has fallen behind
  3. if previous predicted tick > 24 (6 ticks more than preferred) = snap predicted time backwards because the client has fallen too far ahead
  4. if no snap has occurred and the number of unack'd commands is too low, we will slow down predicted time in the next frame. And vice versa (speed up if too many)

Perhaps this is far too complicated? I'm noticing slower hardware will end up doing a lot of snapping time forward because they're too far behind. But maybe my definition of “too far behind” is way too strict? In other words using the example of targeting tick 16, if the client's last predicted tick was up to tick 12, then it will snap time forward. Snapping time like that will lead to rollbacks but maybe I just need to redo the whole idea

Thankfully at the moment I don't have any situations where the client is predicted too fast…. yet. Obviously want to support hardware faster than 60fps in the future, but for now I'm targeting the situations where the client is behind

For the record, I believe I originally got this idea from this thread that I read sometime early last year:

https://www.gamedev.net/forums/topic/696756-command-frames-and-tick-synchronization/?page=3

Advertisement

the rollbacks due to performance problems is of course annoying and I can't do too much about them.

Well, normally you either:

(a) improve performance/reduce simulation complexity
(b) make your client more resilient in the face of network delays (i.e. bigger buffer)
(c) change your networking model.

So I think there are 3 things you could probably do. If you can't get what you need from (a) and (b) then I think you're left with (c).

So to figure out that magic number I want to predict

These things are always mind-bending because everyone has a different mental model of whether a given tick on a given process is future or past or whatever. So I'm not even going to try and follow your logic directly. I'll just say this:

  • if your client is falling behind on a semi-regular basis, then (as above) either:
    • your client simply cannot process the simulation adequately, or:
    • the buffer is too short for your level of network jitter
  • if your client is getting ahead, your server is not working properly or your logic is wrong
  • time stretching and compression is adding a layer of complexity that is probably not necessary and also likely to be hiding bugs

if the client has to run 2+ ticks in a frame every frame, they continue to fall behind often to keep up.

You still haven't told us why your simulation can't simulate a single tick in less than 8 milliseconds.

What's taking so long per simulation step?

All you've said is “it's mobile,” which is a non-answer – games on the PS/1 ran at 60 Hz, and they ran on a CPU that was about 50x less powerful than a single core on a modern mobile phone, much less a full multi-core phone with GPU and large caches and vector FPU and everything.

enum Bool { True, False, FileNotFound };

hplus0603 said:

if the client has to run 2+ ticks in a frame every frame, they continue to fall behind often to keep up.

You still haven't told us why your simulation can't simulate a single tick in less than 8 milliseconds.

What's taking so long per simulation step?

All you've said is “it's mobile,” which is a non-answer – games on the PS/1 ran at 60 Hz, and they ran on a CPU that was about 50x less powerful than a single core on a modern mobile phone, much less a full multi-core phone with GPU and large caches and vector FPU and everything.

I said it earlier - its the engine/experimental tech. There's overhead with the engine itself, nevermind the tech. Which I don't have control over. Good for other things, but not resim over and over. Especially on mobile.

I don't think comparing it to PS1 games is… helpful, it's too reductionist. PS1 games didn't have half the stuff modern games have plus sharing with HDRP rendering and what not. Never mind Operating System overhead. Plus my game is more than just run around and shoot, I've got to share the frame with a lot of physics, just one example.

Even if I play the game on PC though, oh a beefy computer, at avg 60FPS - you can still find rollbacks due to time drifting. If a frame gets held up because of physics, or asset loading, it will fall behind inevitable.

Even if I play the game on PC though, oh a beefy computer, at avg 60FPS - you can still find rollbacks due to time drifting. If a frame gets held up because of physics, or asset loading, it will fall behind inevitable.

This is not normal. If your physics is taking so long to resolve that you're dropping frames, something is very wrong. And if your asset loading is stalling the main thread, again something is very wrong.

Rollbacks due to an entity on the server interacting with your locally-controlled entity - normal, but usually short-lived, and cheap.
Rollbacks due to your engine not providing adequate headroom for performance fluctuation and thereby incurring constant buffer underruns - very weird.

You say you don't have control over the tech - so what do you have control over? What can you change?

Kylotan said:

Even if I play the game on PC though, oh a beefy computer, at avg 60FPS - you can still find rollbacks due to time drifting. If a frame gets held up because of physics, or asset loading, it will fall behind inevitable.

This is not normal. If your physics is taking so long to resolve that you're dropping frames, something is very wrong. And if your asset loading is stalling the main thread, again something is very wrong.

Rollbacks due to an entity on the server interacting with your locally-controlled entity - normal, but usually short-lived, and cheap.
Rollbacks due to your engine not providing adequate headroom for performance fluctuation and thereby incurring constant buffer underruns - very weird.

You say you don't have control over the tech - so what do you have control over? What can you change?

It's depressing, trust me I know. Been dealing with this for half a year now trying to make sense of it and get it prioritized -.-

If I were to move away from the prediction/reconcile model, I honestly don't know what I'd move to. I guess I'd have to make it client auth, make them send RPCs for everything to the server, and then have the server do lots of validation? Or I start stuffing inputs I send to the server with things like player position and have the server accept it within some okay range of error?

You didn't really answer the question - what things can you change?

From some of your comments, it doesn't sound like you have properly decoupled your logical update times from your rendering times, and that will cause all sorts of grief.

But from other comments, it seems like you have a fixed logical update period of 16.6ms - if so, the question is simply this:

  • Are you allowed to render at 30Hz? Or even 20Hz?
  • If so, will you have enough CPU time to execute 2 logical updates when rendering at 30Hz, or 3 updates when rendering at 20Hz?

In which case, the answer is either yes - in which case, do that, and ensure you have sufficient buffering to deal with jitter, or no - in which case, you basically have to give up on this simulation model.

There's not much point discussing different simulation models without knowing what you actually have the authority to change, because in my experience the client → server communication model will have a knock-on effect on how the server communicates with other clients too.

Kylotan said:

You didn't really answer the question - what things can you change?

From some of your comments, it doesn't sound like you have properly decoupled your logical update times from your rendering times, and that will cause all sorts of grief.

But from other comments, it seems like you have a fixed logical update period of 16.6ms - if so, the question is simply this:

  • Are you allowed to render at 30Hz? Or even 20Hz?
  • If so, will you have enough CPU time to execute 2 logical updates when rendering at 30Hz, or 3 updates when rendering at 20Hz?

In which case, the answer is either yes - in which case, do that, and ensure you have sufficient buffering to deal with jitter, or no - in which case, you basically have to give up on this simulation model.

There's not much point discussing different simulation models without knowing what you actually have the authority to change, because in my experience the client → server communication model will have a knock-on effect on how the server communicates with other clients too.

“what can I change”

Like I said, I could change the network model but I don't know where I would go with it.

“it doesn't sound like you have properly decoupled your logical update times from your rendering times”

Prediction happens before rendering in the loop. The prediction time/ticks are separate from the rendering time/ticks explicitly. I carry around 2 separate times, and only the prediction time is adjusted using the method I mentioned before plus delta time for the frame. The rendering time is adjust based only on delta time plus any scaling of time (ie slow down or speed up or snap for interpolation).

As for the rest Maybe this will clear things up a bit:

On a higher end device, a single prediction tick takes 6~8ms depending on what's happening ingame (abilities, shooting, sprinting, etc) So that means assuming a frame is exclusively prediction (no UI, no animation, no rendering, no network, no logging, nothing else) at 30 fps (33ms) at most I can do 5 prediction ticks before starting to dip any lower. Granted this is a profiled build so it isn't the fully optimized Release build, which can squeeze out probably like 2-3 more ticks depending on the device.

Of course, a frame is never going to be exclusively prediction. So let me break down the rest (roughly)

  • Outside of prediction, the core game logic has an additional 6~8ms overhead.
    So with 1 tick of prediction, the total comes to 12-16ms. (that's the 60fps marker)
  • Networking is taking up 3-4ms (definitely needs some love)
    Total so far = max 20ms
  • Rendering takes 6-10ms
    Total so far = max 30ms
  • Animation/UI/Physics takes a total of 10~16ms
    Total so far = max 46ms (well below 30 fps)
  • Everything else is around an extra 5ms
    Total (excluding profiling overhead) = max 50~ms

Again this is kind of a middle ground between debug and release optimization, so the numbers are not as terrible in release but only by maybe 10-20ms depending on the hardware.

So you can see where my headache is coming from. Again, this is with 1 tick of prediction which means it is not actually correctly synced. At that frame time (somewhere between 15-30 fps), in order to get an accurate prediction, you'd need something around 4-5 prediction ticks to keep up with a 60hz tick server right? Which clearly isn't going to fit in this horrible frame LOL

On a higher end device, a single prediction tick takes 6~8ms depending on what's happening ingame (abilities, shooting, sprinting, etc)

I still don't understand this. What are you doing during all that time? Running a 1000 rigid body pile using a solver written in Ruby? 8 milliseconds on a modern CPU is something between 20 and 50 million instructions per core. If you have 10 players, that's between 2 and 5 million instructions per player per core per 8ms step.

I think the #1 thing for you to do is to apply a profiler, and measure where your simulation is spending all its time. It still sounds to me as if something is a factor 10x-100x slower than it should be (depending on how advanced the simulation you have is.)

enum Bool { True, False, FileNotFound };

Yes I know where the time is going. It's the engine, as I've mentioned previously. Prediction systems that don't have anything to predict that frame take up (say an ability you're not using) will soak up time anyway, doing nothing. It's absolutely mental to be dealing with this, trust me. But I think it's out of my control because I've been very loud about it to no success. ?

There will be meetings in the future with the folks behind the engine to try to sort that out, but for the meantime I have to figure out a way to make this presentable despite the problems. I can write a book about my grief with this thing.

Anyways, despite all these issues I just wanted to sanity check I'm doing everything else right because I can only go so long blaming something else and beating that drum. I'm sure I'm doing something not technically correct in prediction somewhere but it's super hard to pinpoint what it could be without actually being performant.

Anyways, thanks for all the input. Maybe once this gets sorted out I'll come back with something more tangeable.

This topic is closed to new replies.

Advertisement