-
Notifications
You must be signed in to change notification settings - Fork 70
The main loop gives Nans when in a computation expression #8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I am not 100% sure, but I think the above might be a memory corruption bug because I have not transposed my data properly so the matrix dimensions do not match. Also I forgot to subtract the targets from the outputs in the cost variable. Right now, I've just gotten the OR example to work and I've noticed that DiffSharp seems to have no bound checks on matrix dimensions anywhere! Even moreso than map, boundary checking I would say is essential to a library like this. Is it really not included? Edit: I also cannot get logistic cross entropy cost function to work for some reason. At first it took me a bit to realize that I was inadvertently using the matrix product operator * instead of the .* operator, but even after I fixed that the function does not act like one would expect it to. The code below is mostly the same as from the neural networks tutorial. The squared error cost function works fine. Edit (2 days later): I still have not gotten the cross entropy error as shown below to work. The code in the example above is surely due to a memory bug (though it won't converge properly even after I'd fixed it,) but why the code below will not work is a complete mystery to me. I am not sure whether this is a bug with the library or some misunderstanding on my part how the library works. Please advise.
|
I've managed to find the error in the cross entropy function. It turns out that it was a library bug. Take a look at this and you will see what I mean at around line 140. I'll give it a shot at fixing it directly. Edit: I managed to find the errors. On line 2958: Based on symmetry, I would say that both Sub_DCons_DV and Sub_DCons_DM are wrong. |
Hi Marko, thank you for reporting this. I'm sorry for the delay. I've just seen the issue you opened here. I will look at it and let you know as soon as I understand what's the problem. |
Got your mail. It happened to me too just recently that I did not get a Github notification. The only thing I have to add to this thread for the sake of completeness that for in the code in my first post there was also one other error besides using matrix multiplication instead of the Hadamarad.
I am kind of not used to bound checks bailing me out of these kinds of bugs so I did not see it at first. That cross entropy bug was also pretty nasty. |
I can confirm that this was a bug in the reverse AD code of some scalar-vector and scalar-matrix operations. The Seeing that you are implementing recurrent neural networks, I would suggest to have a look at the RNN, LSTM, and GRU code here: https://github.com/hypelib/Hype/blob/master/src/Hype/Neural.fs You can maybe use/modify that code to suit your needs. The training code there is also working quite well and implements several gradient-based optimization methods like RMSProp. There are some questions you asked. Let me try to answer them briefly: Map operations If you really need this type of mapping, you can still achieve it by converting a matrix to a 2D array, and mapping the function. For example: let m = toDM [[1; 2]; [3; 4]];;
let m1 = m |> DM.toArray2D |> Array2D.map (fun (v:D) -> sin (v * v))
val m : DM = DM [[1.0f; 2.0f]
[3.0f; 4.0f]]
val m1 : D [,] = [[D 0.841470957f; D -0.756802499f]
[D 0.412118495f; D -0.287903309f]] The suggested and faster way of doing this without let m = toDM [[1; 2]; [3; 4]];;
let m1 = sin (m .* m)
val m : DM = DM [[1.0f; 2.0f]
[3.0f; 4.0f]]
val m1 : DM = DM [[0.841470957f; -0.756802499f]
[0.412118495f; -0.287903309f]] Bounds checking Some suggestions for usage I can give you a few quick suggestions to make your life a bit easier when using the library. :) Instead of let XORx = [|[0.; 0.]
[0.; 1.]
[1.; 0.]
[1.; 1.]
|] |> Array.map List.toSeq |> Array.toSeq |> toDM |> DM.Transpose you can just write let XORx = toDM [[0;0;1;1];[0;1;0;1]] which gives the same result. This is because lists and arrays can be passed as sequences. It's the reason we use Instead of W = [|[|-0.55f;-0.4f;-0.25f|]|] |> Array.map Array.toSeq |> Array.toSeq |> toDM you can write W = toDM [[-0.55f;-0.4f;-0.25f]] Again, thank you very much for catching the bug! :) |
One more thing! :) The API is still evolving and we would be very happy to hear if you have any comments or suggestions regarding that! |
The bug is fixed in version 0.7.5. |
In the sequence recall program below, when I uncomment the '\let train =' and the stuff at the end, I get Nan numbers after around 2000-3000 iterations. When I leave that as it is, it optimizes just fine through the whole 10k iterations.
This looks like a bug to me, as there is nothing to indicate to me that the results should differ from the two runs.
Also, although not related to this issue, one thing that sticks out to me is the lack of map for the DM matrices. It would really help in implementing various activation functions not to mention the clipping function for the final sigmoid layer. Would that be something that is difficult to add to the AD library?
The text was updated successfully, but these errors were encountered: