“get nan or inf error” in cuda-convnet – possible fix
“get nan or inf” error happens sometimes on lower-end GPU’s in cuda-convnet. I have traced this error to NaN values in the weights of convolutional layers. I still not clear to me why these NaN values appear in the weights. Are they backpropagate from fully-connected layers or popping up in the convolution kernel? It looks to me latter is more likely. Anyway I made a temporary fix – just scan weight’s gradients with simple cuda kernel and replace NaN’a with zeroes. Didn’t observe the error after that.
I have pushed fix into windows version of cuda-convnet at
https://github.com/s271/win_convnet
Fix activated with option –fix-nan=1
There shouldn’t be any problem with making those changes for linux version – there are several small changes in *.cu and *.py files only
PS
If anyone wondering what cuda-convnet is here is a nice explanation:
http://fastml.com/object-recognition-in-images-with-cuda-convnet/
And here is the main paper about cuda-convnet
Click to access NIPS2012_0534.pdf
3 Comments
Sorry, the comment form is closed at this time.
Thanks very much for this! I’m trying to implement these changes here, but am working on a Linux machine, and when I try to build I find that the changes in matrix.cpp throw ‘function undefined’ errors. I think this might be because the #define statements for these two “_host” functions are inside a windows-looking #if statement in matrix.h…
do you have an idea of what i could do at this point? I’ve tried relocating the #define statements but it hasn’t fixed the compiling problems so far…
Those function are just wrappers around system’s isnan and isinf functions, defined in matrix.h, line 44. My implementation seems win-specific.
They have different names to make difference with cuda isnan
Different implementations of isnan are here
http://stackoverflow.com/questions/2249110/how-do-i-make-a-portable-isnan-isinf-function
Most simple seems
#include <cmath>
int isnan_host(double x) {
return std::isnan(x);
}
int isinf_host(double x) {
return std::isinf(x);
}
Tell me if it help
I’ve pushed std::isnan(x) variant into github