🎉 Celebrating 25 Years of GameDev.net! 🎉

Not many can claim 25 years on the Internet! Join us in celebrating this milestone. Learn more about our history, and thank you for being a part of our community!

Double precision on GPU, still too slow or now the way to go?

Started by
13 comments, last by scippie 3 years, 6 months ago

I remember when I was programming my 3D engine in 2015 or so, that I needed huge coordinates for my game (and needed double precision for that), but had to make them relative to the camera so that they would fit into a single precision floating point value so that the GPU would be able to handle them quickly.

I also remember a friend creating a fractal shader around that same time which ran 10x faster on single precision than double precision (although the latter would look better).

Double precision was already available on the GPU, but it was terribly slow.

I did some searches on the internet and can't find any useful information on how it progressed.

How are GPU's with double precision today? On attributes? On textures? Is is still so much slower than single precision? Is it still better to do the ‘relative to camera’ thing on the CPU than to simply pass double precision values to the GPU?

I understand that the memory footprint will be twice as big and twice as slow to pass through, but I guess memory has become more than twice as fast since then, which would scale out the difference unless you're creating an AAA game where every extra instruction counts.

Advertisement

@scippie That's a good question and unfortunately I don't have an answer for you. I'm kind of in the same boat as you. I think there are some cards with fast double performance, but I wouldn't count on the average gamer's card being fast.

I'm currently doing the ‘relative to camera’ thing as you put it, and that seems to work pretty well although it does mean you have to have a pretty solid LOD system to keep Z-fighting down. The other issue is most game engines don't support 64 bit coordinates which is another headache. I plan to wait a while before I think about 64 coordinates on the GPU myself.

RTX 3000 has a single:double ratio of 64:1
RNDA2 has a ratio of 16:1

So yep, it's still too slow on consumer HW.

Gnollrunner said:The other issue is most game engines don't support 64 bit coordinates which is another headache. I plan to wait a while before I think about 64 coordinates on the GPU myself.

That's not an issue for me. I'm actually thinking of rewriting my own engine, maybe even with Vulkan.

JoeJ said:

RTX 3000 has a single:double ratio of 64:1
RNDA2 has a ratio of 16:1

So yep, it's still too slow on consumer HW.

Wow, those numbers are… terrible! Where did you get that information?

E.g. from here: https://www.techpowerup.com/gpu-specs/radeon-rx-6800.c3713

The site has all GPUs, even for consoles.

This is useful if questions are more detailed: https://vulkan.gpuinfo.org/

64:1 and 16:1 are terrible numbers indeed, but we must only be talking about the transformation part of the pipeline. Surely that runs in parallel with rasterization, which is what takes up most of the time?

JoeJ said:

E.g. from here: https://www.techpowerup.com/gpu-specs/radeon-rx-6800.c3713

This is useful if questions are more detailed: https://vulkan.gpuinfo.org/

That's a very useful site indeed. I never thought these rates would be documented. I also didn't know that half's were twice as fast, I thought they just took half the memory.

Brian Sandberg said:
64:1 and 16:1 are terrible numbers indeed, but we must only be talking about the transformation part of the pipeline. Surely that runs in parallel with rasterization, which is what takes up most of the time?

Yeah, my generic statement of double being just to slow isn't justified.

I have no experience with double prec. vertex shaders in practice and assume it might make sense in rare cases. But we double the cost everywhere: Transfer, BW, registers and on chip memory. Having parallel work available will not hide all those costs. (There are non obvious effects like reduced occupancy due to increased register pressure.)

I would say: Using double prec. just for display purposes seems bad. It can be avoided with not so much effort.
Even if the game itself uses double prec. (Star Citizen is the only example i know), it's likely worth to reduce the display system to single prec.

Brian Sandberg said:

64:1 and 16:1 are terrible numbers indeed, but we must only be talking about the transformation part of the pipeline. Surely that runs in parallel with rasterization, which is what takes up most of the time?

I guess that depends on the implementation of course. But even then, the numbers are totally huge.

This topic is closed to new replies.

Advertisement