-
Notifications
You must be signed in to change notification settings - Fork 21
ENH better 'constant' learning rate schedule #22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH better 'constant' learning rate schedule #22
Conversation
Nice :) I am just experimenting with that lol. this one gives 1.5% but takes 1000s :-/
|
Does yours actually stop because of convergence? 157s seems very short for 400 iterations. I guess you tweaked the tol? |
yes, tol is set to 1e-4 to keep the benchmark fast enough. Actually I am thinking that this convergence check could better be done on a validation set instead of the training set. That would get us early stopping for free which is what people are actually interested in practice. WDYT? |
I am sure we could get better results with (2000, 2000) and dropout. Adding dropout is not easy with the current code base though. That might be a code smell. |
Nice changes.
Are you suggesting we divide |
Yes, but overfitting by training too much (especially without dropout) is an even bigger risk. I would reserve 10% of the (X, y) data as validation set by default, and make it possible for the user to pass a custom validation set: mlp.fit(X, y, X_validation=X_validation, y_validation=y_validation) Also if |
I think we shouldn't do early stopping in the current PR. We should really rather merge this really fast and then iterate on it. |
+1 for merging my PR into your PR into @IssamLaradji's PR at least so that we can all iterate and experiment with the same code base. I would really like to have a way to monitor the progress of train score + validation score vs epochs before we merge in master. Both for LBGS and SGD. I don't think you can use and tune the hyperparameter of a MLP in practice without plotting those curves to get some intuitions on what is going wrong. Doing brute force randomized parameter search is too expensive. Maybe the validation API is not good. Maybe we should introduce a monitor callback instead. |
You can monitor the verbose output, but I agree that is suboptimal. |
This is what I do but you have no way to detect how much the network is overfitting and how that evolves with the number of epochs. Yes we should be consistent with GradientBoosting that makes it possible to address our use case. |
For monitoring a validation set I used |
ENH better 'constant' learning rate schedule
I switched the mnist to be deeper and narrower using SGD + momentum with a better "constant" learning rate that makes it possible to be aggressive without diverging: