🎉 Celebrating 25 Years of GameDev.net! 🎉

Not many can claim 25 years on the Internet! Join us in celebrating this milestone. Learn more about our history, and thank you for being a part of our community!

parallelizing a software rasterizer algorithm using opencl

Started by
58 comments, last by JoeJ 1 year, 9 months ago

AhmedSaleh said:
I'm just confused, what is the point of prefix sum ?

It's a really simple concept, so before i have learned GPU programming, i did not know it has name.

But it's just that: We have an array of numbers, and the prefix sum is another array, where each element is the sum of all former elements.

This has two applications, usually:
We know the sum of all numbers of the whole array (last element), which often need to allocate memory of that size.
And we know this sum also for any element in our initial array, so we can put all objects in compact order to process them in such order, often in parallel.

See the end of the code:

// now we can iterate points per bin:
  
  for (int binI = 0; binI<NUM_BINS; binI++)
  {
      int listBegin = prefixBinCount[binI];
      int listEnd = prefixBinCount[binI+1];
      
      for (int i=listBegin; i<listEnd; i++)
      {
          std::cout << pointPerBinLists[i] << " ";
      }
        
  }

Notice that's ideal. We don't need to traverse some linked list or something like that - we have our stuff in sequence.
And getting the begin and end of our range per bin is easy and ideal as well.

Once this goal is clear, it is easier to understand how and why the prefix sum gives us this, and why the concept is generally useful.

Basically the applications are similar to those where we want to use sorting. But binning is faster (O(number of elements + number of bins)), so preferable if we don't need exactly sorted order, just some coarse division (into 'bins').
(Another typical application is summed area tables (SAT)).

The alternative to binning would be to build linked lists of points per bin. This is simpler, and has time complexity of only O(number of elements).
But to do this, we would again require atomics to build the lists in parallel.
Plus: With triangles, it would not even work, because triangles go into multiple tiles, not just one.
Finally, traversing linked lists is slow due to pointer chasing, and on GPU this is even more of a problem than on CPU.

This topic is closed to new replies.

Advertisement