2013 May 11 « Mirror Image

volatile in #cuda reduce sample

“Reduce” is one of the most useful samples in NVIDIA CUDA SDK. It’s implementation of highly optimized cuda algorithm for some of the elements of the array of the arbitrary length. It’s hardly possible to make anything better and generic enough with existing GPGPU architecture (if anyone know something as generic but considerably more efficient I’d like to know too). One of the big plus of the reduce algorithm is that it can work for any binary commutative associative operation – like min, max, multiply etc. And NVIDIA sample provide this ability – it’s implemented as reduce on template class, so all one have to do is implement class with overload of addition and assignment operations.

However there is one obstacle – it’s a __volatile__ qualifier in the code. Simple overload of “=” “+=” and “+” operations in class LSum cause compiler error like

error: no operator “+” matches these operands
1> operand types are: LSum + volatile LSum

The answer is add __volatile__ to all class operation, but there is the trick here:

for “=” just

volatile LSum& operator =(volatile LSum &rhs)

is not enough. You should add volatile to the end too, to specify not only input and output, but function itself as volatile.

At the end class looks like:

class LSum
{
public:
…

__device__ LSum& operator+=(volatile LSum &rhs)

{

…

return *this;

};

__device__ LSum operator+(volatile LSum &rhs)
{
LSum res = *this;
res += rhs;
return res;
};

__device__ LSum& operator =(const float &LSum)
{

…

return *this;

};

__device__ volatile LSum& operator =(volatile LSum &rhs) volatile
{
…

return *this;
};
};

11, May, 2013 Posted by mirror2image | Uncategorized | Comments Off on __volatile__ in #cuda reduce sample

Mirror Image

Mostly AR and Stuff

volatile in #cuda reduce sample

About

Top Posts

Blogroll

downloads

My places

Old homepage

People

Symbian

Video

Twitter

Archives

Meta

Site info