🎉 Celebrating 25 Years of GameDev.net! 🎉

Not many can claim 25 years on the Internet! Join us in celebrating this milestone. Learn more about our history, and thank you for being a part of our community!

6600GT delusion

Started by
2 comments, last by Krohm 18 years ago
After fixing my issue with the broken sampler, I was finally able to render some "really HDR" lights from the 6600GT. I was really thinking about hitting easily 60+ FPS but instead I'm much lower. For a full 600x600 window, it can take up to 1/8 second to work out! What a shame! I'm surely fragment-processing bound but I'm not sure how can I get more performance out of those shaders (which are DOT3 with a twist by the way). I suspect I could also spend a lot of time in moving data around the system bus. Well, I'm sure I'm also messing up states and various stuff like wasting CPU time. Come on! It's 1/8 of second! What your experiences were with 6600GT and other budget-similar cards? I always thinked that budget cards were the same as higher-end, just with less pipelines. I actually think that maybe NV folks also stripped an ALU unit from each pipe (the one dedicated to math), or maybe their filtering HW for FP16 is so c|25p (I'm using NEAREST, so I don't know why this should even happen) it is unable to hit decent scheduling.

Previously "Krohm"

Advertisement
A while ago I developed an app, it used FP16 render targets, etc. Quite a lot actually. It was rather heavy on the effects. It was designed to run on (at the time the project started) top of the line x850s.
I initally started developing it on an 9500 pro (128mb). I later spent time in the states to complete it, so needed to use my laptop (a geforce 6600 go). Not a GT, but still close (desktop clock rates from memory).

I expected the 6600 to wallop my 9700. It seemed logical...
However, well, it didn't.

I had fallback low detail modes for the app specifically to ease development. They went in stages.
First was to use 8bit integer instead of FP16 where possible. This was a big change, and made it look rubbish [wink]. The next was to run at half internal resolution, the final mode ran both (which looked beyond rubbish).

Generally, I had to do dev on the lowest setting. The performance I got from the full setting was utterly abysmal on the 6600. About 2fps on average (the x850s were 45+). I generally got better performance on my 9500 running full detail mode than the 6600 running 8bit (I was careful to keep the FP16 filtering to the x850s ability, ie none :).

But the real kicker? well, as the effects started to mount up (depth of field, soft shadows, bloom, differed shading, soft depth particles, etc), half way through dev the full mode no longer worked on the 6600. Out of memory d3d exceptions. The 9500 was fine however even at higher resolutions and full precision. Both cards were 128mb. The 6600 also had unexplainable shader weirdness, and I became quite familiar with ye old 'nv4_disp.dll' drawn as white text on a blue background (grrr!), finally, certain model files (.x format) would simply crash seemingly at the driver level when trying to load (even in the directX sdk model loader samples)

Overall I was greatly dissapointed. Sure the 6600 monstered doom3, but at the end of the day it was the slower card, and far less reliable. :-(

That said it's fine for my current project. However I've found the CPU usage to be considerably higher than my current radeon when the drawing gets going.

The 9500 pro was the best budget card in the history of ever. That thing was, and still is, an absolute gem of a card. Never had a problem with it.
In my experience, the 6600GT is a pretty solid mid ranged card. I have one in my dvr/extra gaming box. It ran the Half Life 2: Lost Coast Demo decently (1024x768 with HDR on). It is right on the border of being able to handle it, but that is somewhat expected out of this card. It is a trade off of price vs performance.

"I can't believe I'm defending logic to a turing machine." - Kent Woolworth [Other Space]

... yes, by sure it is with existing shipping apps...
Now, getting back to the business:

Quote: Original post by RipTorn
A while ago I developed an app, it used FP16 render targets...

The issue is that I am NOT using FP rendertargets, just two 128^2 textures as a constant array to loop over.
Quote: Not a GT, but still close (desktop clock rates from memory).
I don't know about the Go, but when I bought the GT, it was almost 20% faster than plain 6600 so if you say it's similar I'll take that for granted.
Quote: The next was to run at half internal resolution...
I am thinking at using half in the shaders but this would require some work I am not willing to do for this experiment (which was planned to be simple). NV papers however say NV4x doesn't take a real boost at 16bit vector processing so it does not even look attactive.
Quote: I generally got better performance on my 9500 running full detail mode than the 6600 running 8bit (I was careful to keep the FP16 filtering to the x850s ability, ie none :).

I won't go into tricks for the reasons above. The fragment program I am using relies heavily on PS3.0 to shave off cycles. Last time I checked (last week) the branching provided a definite speedup but today I checked it again and the result is... horrific! (see end of message)
Quote: as the effects started to mount up (depth of field, soft shadows, bloom, differed shading, soft depth particles, etc), half way through dev the full mode no longer worked on the 6600. Out of memory d3d exceptions. The 9500 was fine however even at higher resolutions and full precision. Both cards were 128mb. The 6600 also had unexplainable shader weirdness, and I became quite familiar with ye old 'nv4_disp.dll' drawn as white text on a blue background (grrr!), finally, certain model files (.x format) would simply crash seemingly at the driver level when trying to load (even in the directX sdk model loader samples)

Ouch! The driver does not like your system at all. With a thousand instructions executed, I really cannot blame the driver for the op management (although I do this when it comes to the broken virtualization)!
Quote: The 9500 pro was the best budget card in the history of ever. That thing was, and still is, an absolute gem of a card. Never had a problem with it.

I agree so I raccomanded it to a friend. Unluckly, drivers doesn't want to update on his machine, no matter what.
I still think 6600GT is the follower to 9500 and there must be something in my application doing wrong things. This is why I ask. It is simply not possible it runs so slow!

Anyway, here's my speculations on this horror based on the tests I did today.
Fasten your seatbelts: the 6600 really does not support PS3.0. It mangles thuru it happily but without early outs it's hopelessely condamned to pay always the maximun cost, thus defeating half the purpose of PS3.0.
I've removed all the branches replacing it with clamps (thus zeroing out the following computations) and I got the same performance.

Does this makes sense? Unluckly, it does.
Managing op flow isn't easy. They obviously wanted a easier to manufacture op controller so, what they did? They slapped in a optimized and improved NV3x mask controller. As ATi knows, this can increase/decrease manufacturing costs badly/nicely.
Then what they did? They may have stripped out the second ALU unit per pipe.
So, half the pipes with half the power and no early outs. Ouch.

Does this hamper real games performance? No, because they don't use dynamic branching nor LUTs, for the matter.

Is this a winning strategy? I still think it is. It's my app being strange after all.

Now comes the real horror: between the two branching optimization tests I've updated drivers! I just hope my memory is faulting because, if it is serving me well, it means economic 7xxx needs more than an help... not to mention they did the same with my NV25 a few years ago.

Is what I found similar to your experiences?

Previously "Krohm"

This topic is closed to new replies.

Advertisement