🎉 Celebrating 25 Years of GameDev.net! 🎉

Not many can claim 25 years on the Internet! Join us in celebrating this milestone. Learn more about our history, and thank you for being a part of our community!

Rollback Prediction Management

Started by
33 comments, last by NetworkDev19 3 years, 7 months ago

Among people who haven't yet developed a networked game, there's this meme that you can “just add networking.” That's like thinking you can have a baby and ”just add a car seat."

This is not true. EVERYTHING CHANGES.

Not only does your main simulation engine need to be intimately tied into the networking system, but often the rendering needs to be, too, at a minimum to be able to extrapolate or interpolate animations based on delayed remote entity state.

But, even worse! Gameplay itself needs to be designed with the limitations of your particular networking engine in mind. There is a reason most RTS games work they way they do. There's a reason most FPS shooters work the way they do. There's a reason most “base defender” games work the way they do. There's a reason most “farming” games work the way they do. All of this is because that's the gameplay you end up with when you choose a particular networking approach, and push the gameplay into the corners it will be allowed to go by that particular technology.

This is not optional in any way – networking determines engine, and networking plus engine determine certain gameplay rules. If you want to go the other way, and start with novel gameplay rules, then you will have to attempt building a custom networking setup and custom engine tailored to those rules. As it turns out, many such “custom” approaches have been tried over the years, and the ones that actually ship tend to fall back to one of the “known” attractor states.

In a battle between game designers and the speed of light, the speed of light wins.

It sounds like you're on a team that hasn't done this before. If that is the case, and the entire team isn't 100% committed to each and everyone do everything it takes to success in networking, then prepare for your project to fail. If you're not OK with that, then you should find a team that will listen to reason.

If you absolutely have to make some random game with some random engine “work” (and I use that word loosely,) then the best you can do is to never re-simulate, but instead just receive state from other players in a constant stream of updates (probably at a rate slower than your simulation rate.) Then use a simple “simulation” for the other players that just interpolates between the last two received states, and thus shows them delayed by one network tick on the screen. Accept whatever gameplay and display glitches happen because of this, because you can do nothing better at that point.

“How does this square with GGPO, who says that anything can be networked?” you may ask. It turns out, when that article came out, the observation was made that it only works where you have a simple simulation engine that not only supports cheap frame updates, but also supports cheap state rewinds/snapshots. For a fighter game, that's generally not so bad; for something like a RTS or Battle Royale, that would be a very poor match. Hence, the gameplay itself is predicated on simulation being cheap to update and roll back, and the engine used matches that requirement, and thus that particular networking approach works out in that case.

enum Bool { True, False, FileNotFound };
Advertisement

hplus0603 said:

Among people who haven't yet developed a networked game, there's this meme that you can “just add networking.” That's like thinking you can have a baby and ”just add a car seat."

This is not true. EVERYTHING CHANGES.

Not only does your main simulation engine need to be intimately tied into the networking system, but often the rendering needs to be, too, at a minimum to be able to extrapolate or interpolate animations based on delayed remote entity state.

But, even worse! Gameplay itself needs to be designed with the limitations of your particular networking engine in mind. There is a reason most RTS games work they way they do. There's a reason most FPS shooters work the way they do. There's a reason most “base defender” games work the way they do. There's a reason most “farming” games work the way they do. All of this is because that's the gameplay you end up with when you choose a particular networking approach, and push the gameplay into the corners it will be allowed to go by that particular technology.

This is not optional in any way – networking determines engine, and networking plus engine determine certain gameplay rules. If you want to go the other way, and start with novel gameplay rules, then you will have to attempt building a custom networking setup and custom engine tailored to those rules. As it turns out, many such “custom” approaches have been tried over the years, and the ones that actually ship tend to fall back to one of the “known” attractor states.

In a battle between game designers and the speed of light, the speed of light wins.

It sounds like you're on a team that hasn't done this before. If that is the case, and the entire team isn't 100% committed to each and everyone do everything it takes to success in networking, then prepare for your project to fail. If you're not OK with that, then you should find a team that will listen to reason.

If you absolutely have to make some random game with some random engine “work” (and I use that word loosely,) then the best you can do is to never re-simulate, but instead just receive state from other players in a constant stream of updates (probably at a rate slower than your simulation rate.) Then use a simple “simulation” for the other players that just interpolates between the last two received states, and thus shows them delayed by one network tick on the screen. Accept whatever gameplay and display glitches happen because of this, because you can do nothing better at that point.

“How does this square with GGPO, who says that anything can be networked?” you may ask. It turns out, when that article came out, the observation was made that it only works where you have a simple simulation engine that not only supports cheap frame updates, but also supports cheap state rewinds/snapshots. For a fighter game, that's generally not so bad; for something like a RTS or Battle Royale, that would be a very poor match. Hence, the gameplay itself is predicated on simulation being cheap to update and roll back, and the engine used matches that requirement, and thus that particular networking approach works out in that case.

Yep! You're on the nose. It's a lot of inexperienced folks and I've had to spend a lot of time teaching people (while also teaching myself mind you because this is the first time I've been solo on something as big as this haha Literally 1 man armying it until recently). There's actually a really big backstory to it too which lead us to redoing the whole game from scratch for this networking model that I can't go into details about. Yet somehow, it still got into a bad spot. I bit off more than I could chew admittedly and COVID has not helped since visibility/comms suffered massively.

The blame being tossed keeps coming back to the prediction model but I keep trying to explain that it should be the other way around, we need to model the game based on the networking model. Granted, the tech performance issues was something none of us saw coming because we were advertised the complete opposite so it's not any one individual's fault for that but rather third party.

I may end up hijacking some of your reasoning here, it's much clearer than what I've tried to herald to the troops so to speak. It's definitely one of my weaknesses overall. We have so many different kinds of predictions that can be done in a frame for 1 player, meanwhile, most of them will never trigger their “core” logic where the real work is done. And the engine that surrounds the logic that determines whether that logic gets triggered is doing something unexplainable on our side because it's 3rd party. So there's frustrations abound :(

we were advertised the complete opposite

Rule #1 of vendor software: Profile the actual thing you're going to do, using the actual code, before trusting any vendor claims. I guess that's a thing less experienced developers need to learn, too :-) There's a reason why “just npm install themagiclib” doesn't actually serve actual production systems very well in many cases…

That being said, if you just extrapolate players from the previously received snapshots, you literally just need to calculate position from previous positions and time. That should take no time at all. You don't need to run any triggers or collision detection or anything on the viewing client, for anything other than the local player.

Separately, if the “predictions” each end up running a lot of world collision queries and such, then it's possible that there's lots of overhead each time you spin up such a query. In that case, it's better to first walk your entities, batch up ALL collision queries they'd like to run, and then run all the queries as one batch into the world physics system, and then go back and resolve the results for each entity. The anti-pattern is update code that asks “Can I see this thing?” no? then “do I have a widget in front of me?" no? then “Is there ground under me?” no? … Each of those questions end up round-tripping into the physics system, which (depending on architecture) may require thread hand-off, singleton locking, invalidation of cached collision volumes, and many other bad things.

Anyway, if you don't control the engine, arrange for each “remote player” to simply be a dumb playback of older states from the network, and only do the expensive stuff for the local player. Live with the small glitches you will see when the remote-players end up lagging or jumping – that's the price for all the other choices you've decided to make.

enum Bool { True, False, FileNotFound };

We are only doing the expensive stuff for the player already. We only predict what the local player is doing/trying to do with their commands. You mention walk your entities, and that's exactly what is taking up the most time. The issue is we don't control how entities are “walked”. That's maybe a simplified explanation but it's the best I got without blatantly calling out the tech publicly :P

I'll update back if we make progress with their support about the issue, but I don't expect it for some time. Trying to make up for it in other areas like UI, the networking itself, etc in the mean time.

This topic is closed to new replies.

Advertisement