-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Re-design ANN loss functions along with some bug fixes #2495
Conversation
typename InputType::elem_type lossSum = arma::accu(loss); | ||
|
||
if (reduction) | ||
return lossSum; | ||
|
||
return lossSum / input.n_elem; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if reduction = true
, mean loss has to be returned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am returning sum, when reduction = true
.
I am doing that for each and every loss function, so there shouldn't be any non-uniformity, I guess.
Is there any specific reason why I should return mean instead of sum, when reduction = true
?
(apart from the fact, that PyTorch does it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a big deal. But it just looked counter-intuitive. Any new user would expect that "reduced" form is less than "non-reduced" form. (It's like swapping the keys Home
and End
in your keyboard)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The non-reduced form is a vector / matrix / cube.
For all the loss functions I have touched on above, the non-reduced loss is stored in the variable loss.
And the reduced form is a single number, either the sum or the mean of all the values in the loss variable.
The reason why I preferred the sum reduction is simply that sum will have a greater value than mean for all cases, which might be more effective when back propagating it through previous layers.
Lets wait for some more opinions on this.
If more people feel this is counter-intuitive, I will swap them back to what you suggested :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do this for all the other functions, which I think you do, it's fine for me.
Hi @iamshnoo , Are you working on this?. |
Feel free to go ahead with the merge fixes. I am not working on this at the moment. You can tag me in the PR if you need me to take a look at it. |
Trying to solve #2444 .
Also re-designs some of the loss functions to adapt them for a future PR that will allow none-type reduction support.
Google Colab notebooks and brief description of the problem for each of the loss functions can be checked from the first comment of the linked issue above. (I might be wrong about some of these, please let me know if I am and also how I should go about correcting them.)
I have divided each loss function into a separate commit for the first 11 commits to this branch, so that it might be easier to review.