🎉 Celebrating 25 Years of GameDev.net! 🎉

Not many can claim 25 years on the Internet! Join us in celebrating this milestone. Learn more about our history, and thank you for being a part of our community!

UDP multithreaded load balancing

Started by
27 comments, last by taby 3 years, 2 months ago

I am emulating lots of clients by using the following code:

int select_ret = select(0, &fds, 0, 0, &timeout);

if (SOCKET_ERROR == select_ret)
{
	cout << "  Socket select error." << endl;
	cleanup();
	return 7;
}
else if (0 < select_ret)
{
	int temp_bytes_received = 0;

	if (SOCKET_ERROR == (temp_bytes_received = recvfrom(udp_socket, &rx_buf[0], rx_buf_size, 0, reinterpret_cast<struct sockaddr*>(&their_addr), &addr_len)))
	{
		cout << "  Socket recvfrom error." << endl;
		cleanup();
		return 8;
	}

	ostringstream oss;
	oss << "127.";// static_cast<int>(their_addr.sin_addr.S_un.S_un_b.s_b1) << ".";
	oss << "0.";// static_cast<int>(their_addr.sin_addr.S_un.S_un_b.s_b2) << ".";
	oss << "0.";// static_cast<int>(their_addr.sin_addr.S_un.S_un_b.s_b3) << ".";
	oss << rand() % 256;// static_cast<int>(their_addr.sin_addr.S_un.S_un_b.s_b4);
	...

Does anyone have another way of testing many clients, without having actual clients?

Advertisement

The best way of emulating clients is perhaps, having clients?

I don't know what your code is supposed to do, it selects the socket, starts a receive and then, how does this help you testing for a huge network load? If you intend to have a simple loopback server/client, simply take a port range and spawn a lot of udp listeners on that range which are all sending to/receiving from what you want to test.

If you want a more elegant way, write a test suite application which spawns it's own processes and collects results in a nice GUI

Thanks for the reply!

Are you suggesting that I somehow set up a multi-homed client? Are you familiar with this kind of thing in Windows 10? I've only got 2 Windows machines, and zero Linux / Mac OS X machines. :(

The data that I'm looking for are the average load per thread in bytes per second, as well as the standard deviation. If the final load balancing code is working, then the standard deviation should drop, right?

P.S. Do you know how to enumerate all of the multi-homed client's IP addresses, using sockets? Is that even possible?

The following is the output of the working product (https://github.com/sjhalayka/udpspeed_5_load_balancing/blob/main/main.cpp);

Thread 0 342.459 Mbit/s
Thread 1 366.351 Mbit/s
Thread 2 263.315 Mbit/s
Thread 3 368.342 Mbit/s
Start mean: 335.117 +/- 42.6876
Start mean: 335.117 +/- 38.7732
Start mean: 335.117 +/- 34.5134
Start mean: 335.117 +/- 30.0215
Start mean: 335.117 +/- 28.1694
Start mean: 335.117 +/- 25.7621
Start mean: 335.117 +/- 21.911
Start mean: 335.117 +/- 18.2867
Start mean: 335.117 +/- 15.3798
Start mean: 335.117 +/- 12.7215
Start mean: 335.117 +/- 9.51213
Start mean: 335.117 +/- 6.68162
Start mean: 335.117 +/- 2.54722
Start mean: 335.117 +/- 2.54722
Start mean: 335.117 +/- 1.66489
Start mean: 335.117 +/- 1.01858
End mean: 335.117 +/- 1.01858

Yes, load testing is typically done with several hosts for the clients, and one host for the server, to make sure the clients can do all the necessary client-ey things (except render the graphics.) Renting server instances temporarily on Amazon or a similar cloud service is often helpful for this.

While you're still just developing in your garage, you can run the clients on the same machine as the server. Clients don't need to bind to any particular port, so they each get allocated a different port. The server always needs to use both IP address and source host port to identify a client, because multiple clients may be behind the same NAT firewall, which will give those clients the same IP address, but different source ports.

By the way: Doing anything string related in a tight network server loop is unlikely to perform well. ostringstream, std::string, std::vector, and other standard affordances are known to allocate/deallocate memory, which typically ends up interlocking with the heap, which is slower than using pre-allocated buffers. You're unlikely to yet be at the point where your game needs that attention to performance, but it's good to keep in mind as you start scaling up. Heap contention is often hard to really “see” in profiles, and ends up just being this tax that everything pays more or less invisibly…

enum Bool { True, False, FileNotFound };

taby said:
I've only got 2 Windows machines

The poor (wo)man's testing suite is usually starting multiple processes on the same machine if possible. If you have a second machine, better just use that to spawn as much clients as you need so your performance isn't interrupted from processes running in parallel. You can for example choose a port-range like

const int end = 23250;

int start = 23230;
for(int port = start; port < end; port++)
    SpawnUdpClientAt(port);

and start an own process which is sending/listening at the given port.

taby said:
The data that I'm looking for are the average load per thread in bytes per second, as well as the standard deviation.

Do you run your network code in threads which are managed from your application or do you go for IOCP?

I/O Completion Ports have the advantage that you can handle hundrets of connections at the same time without significant increasing the amount of threads as they're managed in a background pool from the OS itself. In our game engine, network handling is limited to 2 IOCP threads which don't perform any action rather than spawning a task for the message send. This plays well with our micro-scheduled task worker pool and allows faster processing of incoming messages.

Btw. as you mentioned Linux and Mac, IOCP is a Windows only thing which can be emulated using the UNIX Poll API call

Shaarigan said:
You can for example choose a port-range like

You don't need to choose a port range on a client. You don't need to call bind() at all for a client UDP socket. Just open it and call sendto() to send the messages to the right destination. The implementation will pick a free port number the first time you send on the socket, and auto-bind to that port number. You, as a client, don't need to know which port this is (and it may change on the way to the server, because of NAT.) Similarly, the server uses recvfrom() to see where the message came from, and then returns responses to that same address, again using sendto()

Also, it is possible to use connect() on a UDP port; if you do, then calling send() will always send to the given destination, and UDP datagrams that come back from some other address/port will be filtered out. This may be convenient, but it may also make NAT traversal and load balancing at large scale harder, so it's not commonly used.

enum Bool { True, False, FileNotFound };

Thanks for the advice!

P.S. I changed the code to use an IPv4_address object instead of a string representation. Every cycle counts! Thanks again.

P.P.S. I discontinued the use of push_back when dealing with vectors of a known size. Hopefully it's not still too inefficient compared to using the stack.

hplus0603 said:
You don't need to choose a port range on a client. You don't need to call bind() at all for a client UDP socket

With UDP, you have to bind() the socket in the client because UDP is connectionless, so there is no other way for the stack to know which program to deliver datagrams to for a particular port. If you could recvfrom() without bind(), you'd essentially be asking the stack to give your program all UDP datagrams sent to that computer

I made a major change to the algorithm. First, I find a candidate thread with 2 or more jobs (each job handles a client IP address). That's the same. Next I pick the smallest job in the candidate's job pool. This is different than before, where I just picked the first job in the pool. The reason I did this change was because the algorithm was failing to perform the reduction of the standard deviation of the job sizes where there are were like 65536 different client IP addresses. It all works great now, I believe. Thanks again for all of the ideas!

This topic is closed to new replies.

Advertisement