🎉 Celebrating 25 Years of GameDev.net! 🎉

Not many can claim 25 years on the Internet! Join us in celebrating this milestone. Learn more about our history, and thank you for being a part of our community!

Berkeley Sockets and IOCP

Started by
13 comments, last by hplus0603 7 years, 1 month ago
Of course a shared-nothing implementation will scale better due to less locking. This is pretty obvious if you think on it.

But not all games can do a shared-nothing implementation, c.f. an MMO. This is where your scalability becomes highly contingent on efficient dispatching of input communications to the working thread pool. It's hard to beat IOCP and the non-Windows equivalents when you're in that situation.

But to reiterate what I've already said, that isn't most games. Most games are fine taking a more simplistic approach, and IOCP is exceedingly tough to get right. It's not worth the effort unless you are really pushing the OS.

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

Advertisement

A more naive design would have jobs submit to a threadpool for packet receives,


The game doesn't need to "ask" for data. Data will be coming in. The game should just process what's there.

If you're using UDP, you only need one thread, because there is only one socket, and there is only one network card, and presumably your CPU can shuffle data faster than your network card can send/receive it over the internet.

Thus, the simplest possible implementation, which is portable across all kinds of systems, and is also fast, is something that uses a simple loop with non-blocking socket calls:

int udp = socket(AF_INET, SOCK_DGRAM, IPADDR_UDP);
if ((udp < 0)
  || (fcntl(udp, FIO_SETFL, fcntl(udp, FIO_GETFL, 0) | O_NONBLOCK) < 0) 
  || (bind(udp, &listenaddr, listenaddrsize) < 0)) {
  you lose ();
}

while (true) {
  int worked = 0;
  int r;
  packet *p;
  while ((r = recvfrom(udp, buffer, bufsize, 0, &address, &addrlen)) > 0) {
    dispatch_incoming_packet(buffer, r, address, addrlen);
    ++worked;
  }
  while ((p = dequeue_outgoing_packet()) != NULL) {
    if (sendto(udp, p->buffer, p->size, 0, &p->address, p->addrlen) < 0) {
      requeue_outgoing_packet(p);
      break;
    } else {
      ++worked;
      free_packet(p);
    }
  }
  if (!worked) {
    usleep(1000);    //  one millisecond is a good balance on modern systems
  }
}
Obviously, there are some bits that you have to implement here; this just shows the structure of an active network send/receive thread for UDP.

Because the kernel queues incoming and outgoing packets per socket, it's efficient (enough) to first drain the incoming queue, then generate the outgoing queue (from the game,) and then repeat. If there's no work to do (no packet came in, nothing to send) then the system is running at low load, and sleeps a millisecond to save some CPU.
"Polling" I/O like this is often frowned upon for learning system programmers, because it can be inefficient (burning CPU polling when there's nothing to do,) and it can add additional latency (the sleep means the CPU won't wake up the instant a packet comes in.)
That's a valid concern, but in this case, the construct as shown above is often the best solution for real-time networked games (which is kind of a special case compared to traditional server programming.)

There's still interesting code inside the "dispatch incoming message" function, where you have to figure out whether the address in question is an existing connection that belongs to a known client, or whether it's a new client trying to connect, and then route it appropriately.
A hash table of existing clients, pointing at their game instances, is usually used; when the client is not in the hash table, you can assume it's a new client trying to connect, and route it to some actor that deals with that.
enum Bool { True, False, FileNotFound };

7 years ago, we were handling between 100 and 500 concurrent clients on one process with nothing more complex than select() calls, on Windows and Linux. It helped to have this on a secondary thread which performed the basic read and deserialisation before handing the data to the logic thread, but that was a luxury rather than a requirement as there was plenty of CPU time to spare. This does require a strong separation between deserialisation and game logic, of course.

It's interesting however to see that the initial post mentions "many game contexts in one process" - if the idea is to host a lot of individual games (e.g. MOBA or RTS sessions) rather than one big session (e.g. an MMO) then you'll quickly reach the point where background threads no longer give you 'free' CPU time, and I'd speculate that context-switching could be costly; especially if you have OS-level interrupts trying to wake up each game's receiving thread every few milliseconds.

7 years ago, we were handling between 100 and 500 concurrent clients on one process with nothing more complex than select() calls, on Windows and Linux

While this will admittedly "work", even under Windows if you define FD_SETSIZE, it is nevertheless an awful approach. Not only are you transferring two kilobytes of data to the kernel every time (the descriptor set is not a bitfield!), but more importantly the reason why FD_SETSIZE is only 64 with Winsock is that 64 happens to be the limit of what WaitForMultipleObjects can tackle. Which, in the case of calling select on 500 sockets, means that Winsock will spawn 8 threads, each doing a wait-all on a subset of 64 sockets (well, one of them only has 52), and your main thread doing a wait-all on these threads. Compared to IOCP, that's just... a desastrous approach.

It's not quite that desastrous under Linux, but still epoll_wait will be roughly 30-40 times (30-40 times, not 30-40%) faster for a set of several hundred descriptors.

Don't get me wrong, I'm not saying select is inherently bad. For watching 2 or 3 descriptors, or for watching a moderately small set of descriptors that changes very often, it's mighty fine (possibly even better). Only just, for mostly-static descriptor sets with a count in the hundreds, one might really consider something better which is readily available, and doesn't cost anything extra.

you'll quickly reach the point where background threads no longer give you 'free' CPU time,


The server can still use a single UDP socket for all the different sessions.
Only if you use TCP, do you need select() and more sockets.
enum Bool { True, False, FileNotFound };

This topic is closed to new replies.

Advertisement