“get nan or inf error” in cuda-convnet – possible fix

“get nan or inf” error happens sometimes on lower-end GPU’s in cuda-convnet. I have traced this error to NaN values in the weights of convolutional layers. I still not clear to me why these NaN values appear in the weights. Are they backpropagate from fully-connected layers or popping up in the convolution kernel? It looks to me latter is more likely. Anyway I made a temporary fix – just scan weight’s gradients with simple cuda kernel and replace NaN’a with zeroes. Didn’t observe the error after that.

I have pushed fix into windows version of cuda-convnet at

https://github.com/s271/win_convnet

Fix activated with option –fix-nan=1

There shouldn’t be any problem with making those changes for linux version – there are several small changes in *.cu and *.py files only

If anyone wondering what cuda-convnet is here is a nice explanation:

http://fastml.com/object-recognition-in-images-with-cuda-convnet/

And here is the main paper about cuda-convnet

Click to access NIPS2012_0534.pdf

17, January, 2014 - Posted by mirror2image | Coding | convolutional network, cuda, cuda-convnet, Deep Learning

3 Comments

Thanks very much for this! I’m trying to implement these changes here, but am working on a Linux machine, and when I try to build I find that the changes in matrix.cpp throw ‘function undefined’ errors. I think this might be because the #define statements for these two “_host” functions are inside a windows-looking #if statement in matrix.h…

do you have an idea of what i could do at this point? I’ve tried relocating the #define statements but it hasn’t fixed the compiling problems so far…

Comment by quoyle | 20, January, 2014
Those function are just wrappers around system’s isnan and isinf functions, defined in matrix.h, line 44. My implementation seems win-specific.
They have different names to make difference with cuda isnan

Different implementations of isnan are here
http://stackoverflow.com/questions/2249110/how-do-i-make-a-portable-isnan-isinf-function
Most simple seems

#include <cmath>

int isnan_host(double x) {
return std::isnan(x);
}

int isinf_host(double x) {
return std::isinf(x);
}
Tell me if it help

Comment by mirror2image | 20, January, 2014
I’ve pushed std::isnan(x) variant into github

Comment by mirror2image | 20, January, 2014

Sorry, the comment form is closed at this time.

« Previous | Next »

Mirror Image

Mostly AR and Stuff