🎉 Celebrating 25 Years of GameDev.net! 🎉

Not many can claim 25 years on the Internet! Join us in celebrating this milestone. Learn more about our history, and thank you for being a part of our community!

Back to Networking and Multiplayer

Article on Lockstep Multiplayer

Andy W · 2016-01-06T14:47:22

I wrote an article on lockstep multiplayer, with a particular focus on the steps you can take to prevent or catch the vast majority of desync bugs long before you write a single line of multiplayer code: http://www.tundragames.com/minimizing-the-pain-of-lockstep-multiplayer/

Networking and Multiplayer Programming

Started by C0lumbo November 24, 2015 07:10 AM

15 comments, last by Sergey Ignatchenko 8 years, 6 months ago

C0lumbo

4,415

Author

January 04, 2016 09:15 AM

Deterministic lock-step has an inherent flaw (openy admitted in the article) that it goes with a speed of the slowest player. What about (ab)using deterministm in a different way - to send all the inputs to the server, where the server will timestamp them, and to send them to all the clients where the input will be "replayed" based on determinism? In other words, the only thing which server will do, is timestamping+forwarding.

Actually, I omitted it from the article for brevity but this is what I do in Rapture World Conquest.

The reason I do it that way isn't really because I'm worried about slowing everyone down to the speed of the slowest connection. In fact that might not even be helped in my case, because I don't have a good solution to identify the individual with the best connection to make them act as server, so I'm as likely to pick the worst choice as the best choice.

The reason I do it is to simplify handling what happens when one player disconnects. If everyone was sending their input messages to everyone in the fully connected mesh, and someone disconnects, it's hard to manage things so that all players agree on exactly when the individual disconnected. With more of a server-client approach (I have a fully connected mesh, but someone is an arbitrarily designated server) it's a little simpler because clients inform the server what the last step they received was, and the server tells clients the step that it knows everyone has received, then when a server disconnects and a new server is chosen, the first act of the new server is to tell all the clients to rewind to the point that it knows everyone reached.

I remember now why I left this out of the article! I'm not sure the above paragraph is very clear. TLDR version: Disconnect handling in a peer-to-peer input sharing scenario is very complicated, using a server-client model and host migration is also very complicated, but slightly less so (IMO).

Rapture - World Conquest

hplus0603

11,920

January 04, 2016 04:21 PM

the only thing which server will do, is timestamping+forwarding

That is exactly what the server does in deterministic lockstep. You still have the problem that the gameplay must proceed at the speed of the slowest simulation CPU participating in the simulation. Even if some "slower" player gets the time-stamped messages, then what? If that player can't keep up, that player can't see the game state, and thus can't play.

Luckily, game simulation is very seldom actually the biggest load on CPU, unless you have many thousands of physically simulated collision fragments flying around or something.

I think we all agree that deterministic lockstep is not right for all cases. It provides a particular point in the game server optimization space of latency/bandwidth/visual flaws. Many games fit close to this point. Many other games do not. And still other games could go either way.

enum Bool { True, False, FileNotFound };

Sergey Ignatchenko

625

January 05, 2016 11:50 AM

That is exactly what the server does in deterministic lockstep.

I _hate_ arguing about terminology (in particular, on questions like "what REALLY is named 'deterministic lockstep'?". On the otther hand, there are two quite different schemas here.

The first one is a pure P2P schema essentially described in https://en.wikipedia.org/wiki/Lockstep_protocol and aimed to deal with players meddling with timestamps to get advantage. It's key point (in a typical "lockstep" manner) is that none of clients can start processing before the previous frame is completed. In other words, if one single frame gets delayed - everybody needs to wait.

The second schema (admittedly similar, and also determinism-based) is with the server timestamping all the inputs and redistributing them to other clients. If I'm not badly mistaken, it can be made much more forgiving in terms of delays: while on average the game still needs to run with the speed of the slowest client, with a server-side timestamping it should be possible to ignore occasional delayed packets (such ignoring causing problems only for the slow guy). This approach (ignore the packet if it's too slow) would lead to certain implications and complications, but because of it being less synchronized, I'd expect it to be more suitable for the Internet (where the packets do get lost). In other words - server-side thing can be implemented in (almost, though I don't see why commitment is necessary for server-side timestamping) the same way as P2P thing (a.k.a. "punish everybody for somebody being slow"), or it can be slightly modified into "punish only the slow guy".

hplus0603

11,920

January 05, 2016 05:01 PM

Terminology is important, because that's how we actually manage to communicate with each other. Without terminology, nothing else works.
Your description of "deterministic lockstep" for games doesn't align with the general consensus I'm aware of -- "lockstep" (without "deterministic" in the mix) is something different.

Specifically, close coupling between simulation and networking and user input and rendering happens when you implement deterministic lockstep:

1) Players only exchange inputs (and some initial game state)
2) Players simulate the same game using the exact same rules (and simulation order, and such)
3) This means players can't move the simulation forward to step T until they have received the input for step T from every other player.
4) Typically this means that players time-stamp the input for their actions at some point in the future (one worst-case RTT into the future.)
5) The server, if there is one, simply collects the inputs for time step T, and when all players have provided input for time step T, forwards all the inputs to all the players.

There are a few assumptions in your statement that aren't part of this.
First, whether you use a star topology ("server" based) or a network topology ("peer to peer" based) doesn't matter.
Second, while it is true that simulation can't progress beyond step S until all the inputs for step S are available, this is usually solved by sending the inputs for step S when you're simulating and displaying step S-N on the client, for some N > 1. This leads to a pipelining of commands.

So, delaying for "the slowest player" really only happens when there are network problems -- if there is a network hiccup that prevents transmission for some time > N-RTT, there won't be sufficient buffering/pipelining in the system to keep the game flowing, and it will temporarily pause.

One of the first descriptions of this system that is easily accessible is the "1,500 archers on a 28.8 kbps modem" article about Age of Empires. Other games that have used this mechanism include Starcraft (RTS,) Warcraft (RTS,) and There.com (virtual world.)

The main gain of this technology is savings in bandwidth and the prevention of certain kinds of cheats (or, at least, leveling the playing field.)
The main drawback is the command latency (in RTS games typically hidden with "yes, sir!" acknowledgement animations) and making it really hard to support late joiners and the problems with slight variations in machine architecture/code compilers. A codegen difference between GCC and MSVC will make Windows/Linux cross-play hard!

enum Bool { True, False, FileNotFound };

Sergey Ignatchenko

625

January 06, 2016 10:04 AM

So, delaying for "the slowest player" really only happens when there are network problems -- if there is a network hiccup that prevents transmission for some time > N-RTT, there won't be sufficient buffering/pipelining in the system to keep the game flowing, and it will temporarily pause.

Yes, that's what I've meant. The problem is that as 99% of such hiccups are per-user (per-last-mile, to be exact), then the more players you have - the more hiccups you will experience (each such hiccup of any player affecting everybody within the same game world). In other words, while with 2 players (or within LAN) it will rarely be a problem, with a 100 players all over the country (or even worse, all over the world) - it will be pretty bad (or will require N to be rather high, which inevitably hits interactivity). In practice, 3 packets lost in a row, happen over the Internet quite frequently, and 5 in a row do happen too; situations with several-packets-in-a-row delayed for a second or so (which is a pretty bad hiccup), happen pretty much all the time.

And the trick I've mentioned (ignoring inputs of the currently-too-slowpoke guy), should IMHO help to address this tinsy winsy problem (or from the other hand, should allow to reduce the depth of N of the "backlog", improving response times). And to play this trick, you should have a server (at least I don't know how to organize ignoring slow guy easily in pure P2P). Not 100% sure if it is really practical (as I've said, mesh-level deterministic stuff is not really my cup of development tea), but well, before you guys can discard it as useless, I should try to describe clearly what it is about :-).

Hodgman

52,718

January 06, 2016 12:00 PM

You can do traditional deterministic lockstep without designating any one peer as the server... Each peer waits until it's received every other peer's inputs, then advances.

And you can also do deterministic lockstep where the arbitrary server can choose to delay slow peer's inputs. The only change here is that peers are not allowed to apply their own commands to their own local sim until the server says to. You might intend to give a command on frame 4, but the server might hate you and tell you to apply the command on frame 400. This may result in you losing the game thanks to your crappy network, but at least the game doesn't slow down for everyone else.

Second, while it is true that simulation can't progress beyond step S until all the inputs for step S are available

As above, you can bend this rule by allowing the server to declare a turn/step complete in the case of a slow client, and if that client's old commands do eventually turn up at the server, it can schedule them for an upcoming turn.
This allows the game to tolerate latency spikes, while only punishing the peers directly involved. If the server is causing latency spikes, then everyone suffers and all commands become laggy, so it would be time to attempt host migration if too many of these delay events occur across all peers.

. 22 Racing Series .

Sergey Ignatchenko

625

January 06, 2016 02:47 PM

As above, you can bend this rule by allowing the server to declare a turn/step complete in the case of a slow client, and if that client's old commands do eventually turn up at the server, it can schedule them for an upcoming turn.

Yep, this is (almost) exactly what I was trying to say (probably I wasn't clear enough).