You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It makes sense for the cross-entropy part of the loss function to be divided by the sample size, since it depends on it. This is common. But I have never seen regularization being divided by sample size.
In the MLPClassifier backpropagation code, alpha (the L2 regularization term) is divided by the sample size.
It makes sense for the cross-entropy part of the loss function to be divided by the sample size, since it depends on it. This is common. But I have never seen regularization being divided by sample size.
For contrast, Keras does the average for cross entropy, but applies L2 without doing an average over sample size. This is also how things are described over Goodfellow, Bengio, Courville's book.
I believe also that the Scikit-learn SGDClassifier implementation also does not divide alpha by sample size (but I am not sure).
The text was updated successfully, but these errors were encountered: