ENH better 'constant' learning rate schedule #22

ogrisel · 2014-12-18T15:38:19Z

I switched the mnist to be deeper and narrower using SGD + momentum with a better "constant" learning rate that makes it possible to be aggressive without diverging:

...
Iteration 49, cost = 0.00434729

Classification performance:
===========================

Classifier               train-time   test-time   error-rate
------------------------------------------------------------
MultilayerPerceptron        157.24s       0.08s       0.0199

amueller · 2014-12-18T15:46:19Z

Nice :) I am just experimenting with that lol.

this one gives 1.5% but takes 1000s :-/

mlp = MultilayerPerceptronClassifier(hidden_layer_sizes=(800, 800), algorithm="sgd", random_state=42, verbose=10, max_iter=30,
                                     alpha=0, momentum=.9, learning_rate_init=1)

amueller · 2014-12-18T15:47:26Z

Does yours actually stop because of convergence? 157s seems very short for 400 iterations. I guess you tweaked the tol?

ogrisel · 2014-12-18T20:29:40Z

Does yours actually stop because of convergence? 157s seems very short for 400 iterations. I guess you tweaked the tol?

yes, tol is set to 1e-4 to keep the benchmark fast enough.

Actually I am thinking that this convergence check could better be done on a validation set instead of the training set. That would get us early stopping for free which is what people are actually interested in practice. WDYT?

ogrisel · 2014-12-18T20:31:44Z

this one gives 1.5% but takes 1000s :-/

I am sure we could get better results with (2000, 2000) and dropout. Adding dropout is not easy with the current code base though. That might be a code smell.

IssamLaradji · 2014-12-18T20:41:22Z

Nice changes.

I am thinking that this convergence check could better be done on a validation set instead of the training set

Are you suggesting we divide (X,y) into a training and a validation set when running.fit(X,y)? isn't that risky in that the algorithm will train on less training samples ?

ogrisel · 2014-12-18T21:26:03Z

isn't that risky in that the algorithm will train on less training samples ?

Yes, but overfitting by training too much (especially without dropout) is an even bigger risk. I would reserve 10% of the (X, y) data as validation set by default, and make it possible for the user to pass a custom validation set:

mlp.fit(X, y, X_validation=X_validation, y_validation=y_validation)

Also if verbose >= 2 I would also compute and report the .score() value on the training and validation set in addition to the cost.

amueller · 2014-12-18T21:45:48Z

I think we shouldn't do early stopping in the current PR. We should really rather merge this really fast and then iterate on it.

ogrisel · 2014-12-18T22:38:40Z

We should really rather merge this really fast and then iterate on it.

+1 for merging my PR into your PR into @IssamLaradji's PR at least so that we can all iterate and experiment with the same code base.

I would really like to have a way to monitor the progress of train score + validation score vs epochs before we merge in master. Both for LBGS and SGD. I don't think you can use and tune the hyperparameter of a MLP in practice without plotting those curves to get some intuitions on what is going wrong. Doing brute force randomized parameter search is too expensive.

Maybe the validation API is not good. Maybe we should introduce a monitor callback instead.

amueller · 2014-12-18T22:41:35Z

You can monitor the verbose output, but I agree that is suboptimal.
A callback actually seems like a good idea. Do we want to do it the same way that GradientBoosting does it? I would rather not have a fit parameter.

ogrisel · 2014-12-18T23:04:09Z

You can monitor the verbose output, but I agree that is suboptimal.

This is what I do but you have no way to detect how much the network is overfitting and how that evolves with the number of epochs.

Yes we should be consistent with GradientBoosting that makes it possible to address our use case.

amueller · 2014-12-18T23:06:12Z

For monitoring a validation set I used partial_fit ;)

ENH better 'constant' learning rate schedule

ENH better 'constant' learning rate schedule

3c4cc13

ogrisel mentioned this pull request Dec 18, 2014

[MRG] Mlp finishing touches scikit-learn/scikit-learn#3939

Closed

22 tasks

amueller added a commit that referenced this pull request Dec 18, 2014

Merge pull request #22 from ogrisel/mlp-sgd-learning-rate

a096d4a

ENH better 'constant' learning rate schedule

amueller merged commit a096d4a into amueller:mlp_refactoring Dec 18, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ENH better 'constant' learning rate schedule #22

ENH better 'constant' learning rate schedule #22

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ENH better 'constant' learning rate schedule #22

ENH better 'constant' learning rate schedule #22

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!