Extreme performance differences

Hydrael · 2005-07-19T09:34:31

Hello everyone, a few days ago, I tried running my project on 2 other computers, to see if and how it runs. What I encoutered was quite surprising: I tested my project on 3 computers a) Siemens Notebook, 1.9 Ghz (Centrino), 1 GB DDR, Radeon 9700 Mobiliy (used for development) b) Pentium 4 3.2 Ghz, HT, 1 GB DDR, Radeon 9700 c) Pentium 4, 3.0 Ghz, 512 DDR, some crappy Intel onboard graphic chip At the moment I'm not using any Extensions, except for Point Sprites (which are supported by all PCs, except for c) I'm rendering a 512x512 heightmap which makes 786.432 polys, and I'm using a custom made frustum culling method. Now for the FPS I get on those three computers: a) 60-80 b) 20-30 c) 180-220 (!) It seems, that the computer, from which I expected the lowest results totally owns the other two, while b, which I assumed to be the stronges machine goes down the drain. Another thing I noticed is, that my program uses 100% CPU on a and b, while c only uses 50%. I tried debugging my project on all three computers to find out where this insane performance difference comes together, but I don't know where to start. Does anyone know any standard mistakes I could have made? Thanks in advance Chris [Edited by - Hydrael on July 17, 2005 2:32:53 AM]

NeHe Productions Affiliates

Started by Hydrael July 17, 2005 02:13 AM

12 comments, last by AndyTX 19 years, 1 month ago

RipTorn

722

July 19, 2005 02:20 AM

512*512*2 = 524288 triangles

524288 * 220fps = 115,343,360 triangles/second
= not possible on an intel integrated graphics.

No intel integrated graphics chip supports vertex shaders or hardware transforms, so the absolute max you would be looking at would probably be ~8 million/sec.

The max you will see with absolutly bare bones geometry (ie, no lighting, no textures, nothing) is around about 80 million and thats on a x850 or the like, and thats when using VBO, with vcache optimizations, etc.

So I'd suggest your frame rate counter is broken.

Hydrael

Author

205

July 19, 2005 02:36 AM

I finally found the bottleneck - VTune gave the big hint :D
It was the glDrawElements calls, from which I had a _whole_ lot sending 24k vertices to the GPU every call. Basically the amount of calls and the amount of vertices being sent by themselves were both "ok", but it was simply too many vertices per single call.
So I only changed one little variable - the one telling how big one logical unit is. Initially it was 64 - meaning: my 512x512 heightmap was split up several, smaller 64x64 units. I know the middle point and the radius of one LU (logical unit), so I used a "sphere in frustum" function, to determine if that whole block is within the frustum. If it is, I render the whole thingy at once (64*64*6 vertices, because LUSize was 64).
All I now did is adjustung the LUSize value down. First to 32 which gave a speed increase of 100% - then knew that's the bogey ;)
After testing around a bit, I found out that the optimal value for LUSize is 8, which means within one frame I always send a huge sequence of 8*8*6 vertices to the GPU.
On machine b I now also come up to 190-230 fps.
I expect that fact (way more performance with minimal vertices per call) to be changed as soon I start using (dynamic) VBOs - I will let you know as soon as I tried it out ;)

So it only was changing LUSize=64; to LUSize=8; within ~4000 lines of code :D

The only thing I still don't understand is, why machine c was doing so good. Must have to do with the crappy graphic card in some way.

Anyways...thanks a lot everyone - you really helped me out here ;)

Greets

Chris

Edit:
@RipTorn: I'm not sending all the vertices at once - I first do a visibility check. Depending on viewing distance, all together about 50k (at standard, 64.0f viewing distance) triangles were being rendered - now, after the little change it should be less.

@markr: Sorry, I kind of overread your post :/
...but you were right, VSync was disabled

[Edited by - Hydrael on July 19, 2005 2:36:22 AM]

lc_overlord

436

July 19, 2005 04:21 AM

Quote: Original post by Hydrael
The only thing I still don't understand is, why machine c was doing so good. Must have to do with the crappy graphic card in some way.

Probobly, those graphics cards where never designed for openGL or anything 3d by the way.
so the OGL drivers kinda ignores stuff, that's why it might run faster, but the result is not the same.

www.flashbang.se | www.thegeekstate.com | nehe.gamedev.net | glAux fix for lesson 6 | [twitter]thegeekstate[/twitter]

AndyTX

807

July 19, 2005 09:34 AM

It's possible that the partially software implementation (ie. Intel graphics) has different bottlenecks than the hardware one and thus your code just happened to be playing nice with the Intel onboard, and not the dedicated hardware. In face if you are rendering straight from RAM, my guess is that this is the case (Intel is optimized to always render from RAM... it emulates things like VBOs etc. with host buffers - plus no need to send the data to the GPU).

However, if you send the data in well-formed batches always using hardware buffers, I can guarantee that the GPUs will outperform the onboard solution every time, or something is very wrong ;)

Extreme performance differences

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Extreme performance differences

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines