8000 MLPClassifier: regularization is divided by sample size · Issue #10477 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

MLPClassifier: regularization is divided by sample size #10477

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
rpmcruz opened this issue Jan 15, 2018 · 1 comment
Closed

MLPClassifier: regularization is divided by sample size #10477

rpmcruz opened this issue Jan 15, 2018 · 1 comment

Comments

@rpmcruz
Copy link
rpmcruz commented Jan 15, 2018

In the MLPClassifier backpropagation code, alpha (the L2 regularization term) is divided by the sample size.

It makes sense for the cross-entropy part of the loss function to be divided by the sample size, since it depends on it. This is common. But I have never seen regularization being divided by sample size.

For contrast, Keras does the average for cross entropy, but applies L2 without doing an average over sample size. This is also how things are described over Goodfellow, Bengio, Courville's book.

I believe also that the Scikit-learn SGDClassifier implementation also does not divide alpha by sample size (but I am not sure).

@rpmcruz rpmcruz closed this as completed Jan 15, 2018
@rpmcruz rpmcruz reopened this Jan 15, 2018
@rpmcruz
Copy link
Author
rpmcruz commented Jan 15, 2018

Sorry, someone please make this a dup of #1395.

I hadn't noticed this had been reported yet.

@rpmcruz rpmcruz closed this as completed Jan 15, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant
0