MLPClassifier: regularization is divided by sample size #10477

rpmcruz · 2018-01-15T22:07:41Z

In the MLPClassifier backpropagation code, alpha (the L2 regularization term) is divided by the sample size.

It makes sense for the cross-entropy part of the loss function to be divided by the sample size, since it depends on it. This is common. But I have never seen regularization being divided by sample size.

For contrast, Keras does the average for cross entropy, but applies L2 without doing an average over sample size. This is also how things are described over Goodfellow, Bengio, Courville's book.

I believe also that the Scikit-learn SGDClassifier implementation also does not divide alpha by sample size (but I am not sure).

rpmcruz · 2018-01-15T22:11:02Z

Sorry, someone please make this a dup of #1395.

I hadn't noticed this had been reported yet.

rpmcruz closed this as completed Jan 15, 2018

rpmcruz reopened this Jan 15, 2018

rpmcruz closed this as completed Jan 15, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

MLPClassifier: regularization is divided by sample size #10477

MLPClassifier: regularization is divided by sample size #10477

Uh oh!

Uh oh!

MLPClassifier: regularization is divided by sample size #10477

MLPClassifier: regularization is divided by sample size #10477

Comments

Uh oh!