Mirror Image

Mostly AR and Stuff

__volatile__ in #cuda reduce sample

“Reduce” is one of the most useful samples in NVIDIA CUDA SDK. It’s implementation of highly optimized cuda algorithm for some of the elements of the array of the arbitrary length. It’s hardly possible to make anything better and generic enough with existing GPGPU architecture (if anyone know something as generic but considerably more efficient I’d like to know too). One of the big plus of the reduce algorithm is that it can work for any binary commutative associative operation – like min, max, multiply etc. And NVIDIA sample provide this ability – it’s implemented as reduce on template class, so all one have to do is implement class with overload of addition and assignment operations.

However there is one obstacle – it’s a __volatile__ qualifier in the code. Simple overload of “=” “+=” and “+” operations  in class LSum cause compiler error like

error: no operator “+” matches these operands
1> operand types are: LSum + volatile LSum

The answer is add __volatile__ to all class operation, but there is the trick here:

for “=”  just

volatile LSum& operator =(volatile LSum &rhs)

is not enough. You should add volatile to the end too, to specify not only input and output, but function itself as volatile.

At the end class looks like:

class LSum
{
public:

__device__ LSum& operator+=(volatile LSum &rhs)

{

return *this;

};

__device__ LSum operator+(volatile LSum &rhs)
{
LSum res = *this;
res += rhs;
return res;
};

__device__ LSum& operator =(const float &LSum)
{

return *this;

};

__device__ volatile LSum& operator =(volatile LSum &rhs) volatile
{

return *this;
};
};

11, May, 2013 Posted by | Uncategorized | Comments Off on __volatile__ in #cuda reduce sample