Mirror Image

Mostly AR and Stuff

“get nan or inf error” in cuda-convnet – possible fix

“get nan or inf” error happens sometimes on lower-end GPU’s in cuda-convnet. I have traced this error to NaN values in the weights of convolutional layers. I still not clear to me why these NaN values appear in the weights. Are they backpropagate from fully-connected layers or popping up in the convolution kernel? It looks to me latter is more likely. Anyway I made a temporary fix – just scan weight’s gradients with simple cuda kernel and replace NaN’a with zeroes. Didn’t observe the error after that.

I have pushed fix into windows version of cuda-convnet at

https://github.com/s271/win_convnet

Fix activated with option –fix-nan=1

There shouldn’t be any problem with making those changes for linux version – there are several small changes in  *.cu and *.py files only

PS

If anyone wondering what cuda-convnet is here is a nice explanation:

http://fastml.com/object-recognition-in-images-with-cuda-convnet/

And here is the main paper about cuda-convnet

http://books.nips.cc/papers/files/nips25/NIPS2012_0534.pdf

 

17, January, 2014 Posted by | Coding | , , , | 3 Comments