Added fit and score times for learning_curve #13938

H4dr1en · 2019-05-24T08:42:27Z

What does this implement/fix? Explain your changes.

It would be interesting to have access to the fit and score times of the estimators during the computing of the learning curves. This can be very easily done because _fit_and_score method already has a return_times parameter. I suggest to set it to True for the computing of the learning curves.

Any other comments?

Here is how fit and score times can be used to get valuable information:

Using the following setup:

from sklearn.datasets import make_regression
from sklearn.ensemble import RandomForestRegressor

X, Y = make_regression(n_samples=int(1e4), n_features=50, n_informative=25, bias=-92, noise=100)
estimator = RandomForestRegressor()

Having the times of the fit and scores help us plotting the following:

And also such plots:

As you can see it is easy to determine the best estimator for the considered dataset, taking in account the
"scalability" of the model: it helps us telling which model will perform the better if we add more data to our dataset.

jnothman

Thanks for the pull request. We will not make this kind of backwards-incompatible change. Please add a parameter such as return_times. It may also be worth adding a plot like that to an example.

H4dr1en · 2019-05-27T08:16:37Z

I added backwards compatibility (default behavior: as before, meaning return_times = False)
I also added a plot_learning_curves_times.py under examples/model_selection that will generate the following plot:

Should I take care of generating the doc for this example? If yes, how?

jnothman · 2019-05-27T08:19:33Z

Docs will generate automatically on Circle CI. See our contributor docs. I don't see why you would need a new example rather than extend the existing one, though.

doc/modules/learning_curve.rst

examples/model_selection/plot_learning_curve.py

Setting it to - 1 is likely to make the generation of the documentation crash.

examples/model_selection/plot_learning_curve.py

jnothman · 2019-05-28T12:06:39Z

examples/model_selection/plot_learning_curve.py


 plt.show()
+
+###############################################################################


Is there a way to actually integrate this into the example above, rather than having an entirely separate demonstration of this functionality?

I guess we can, I will try!

EDIT: I am wondering if it is a good idea to merge the examples, as the first one focuses on showing two distinct learning curves whereas the second one focuses on stacking multiple fit time curves of estimators to find the best one.

So could you be more specific of what you would except?

For the moment I guess you would like to directly retrieve the fit times of the two estimators of the first example (GaussianNB and SVC) and plot the two corresponding fit time curves (new feature). If yes, then I can take care of that. Otherwise if we want to show more curves (and therefore more estimators), we should separate the examples (or would you prefer to add other classifiers to the first example?).

So could you be more specific of what you would except?

Why not just add your 2 new plots (n_samples vs fit_time and fit_time vs score) to both pre-existing estimators?

That would be a 3 by 2 grid of plots and would avoid having this whole new code section

OK, so here are the resulting plots with changes in ea0cc71:

If you look at the generated doc https://59951-843222-gh.circle-artifacts.com/0/doc/auto_examples/model_selection/plot_learning_curve.html

the plots are very small. Maybe try with 3 rows and 2 columns instead.
(you can build the doc locally following these guidelines https://scikit-learn.org/dev/developers/contributing.html#building-the-documentation)

I am having trouble generating the doc. I am using a miniconda venv so I enter the following in the prompt command:

(py3) path\scikit-learn\doc>set EXAMPLES_PATTERN=plot_learning_curve.py (py3) path\scikit-learn\doc>make html

But all the examples are then generated (regex var not taken in account) and this takes forever on my computer...
I also tried with (py3) path\scikit-learn\doc>set EXAMPLES_PATTERN=*plot_learning_curve.py but same result. How would you do?

the doc says EXAMPLES_PATTERN=your_regex_goes_here make html, in one command

This gave me unknown command error, but anyway make html eventually worked.

NicolasHug

A few comments, Looks good in general.

I'm not a huge fan of returning tuples with different sizes depending on the input, but I guess it's too late to return a bunch a anyway.

NicolasHug · 2019-05-28T19:22:30Z

examples/model_selection/plot_learning_curve.py


 plt.show()
+
+###############################################################################


So could you be more specific of what you would except?

Why not just add your 2 new plots (n_samples vs fit_time and fit_time vs score) to both pre-existing estimators?

That would be a 3 by 2 grid of plots and would avoid having this whole new code section

examples/plot_kernel_ridge_regression.py

sklearn/model_selection/_validation.py

…coring time to score time

sklearn/model_selection/_validation.py

jnothman

@thomasjpfan wrote on gitter:

This seems like a good candidate for returning a dictionary (or a Bunch). I think going through a deprecation cycle for it is a little rough.

Are you trying to suggest changing the return value without deprecation?? I don’t think that’s a good idea, certainly not unless you have a return value that can unpack as three elements.

Or are you trying to suggest that you want feedback on whether we should consider deprecating and moving to a bunch. I agree that deprecation would be disruptive to users and tutorials of this, so I'm not entirely fond of it.

If we want to change to bunch, then instead of return_times we should add return_bunch and include times in the bunch, make the default value change from False to True in a couple of versions, etc, etc.

thomasjpfan · 2019-05-30T01:42:54Z

Or are you trying to suggest that you want feedback on whether we should consider deprecating and moving to a bunch.

I was looking for feedback for such an API change. (I will be more direct next time). Even in the example we have something like this:

train_sizes, train_scores, test_scores, fit_times, _ = \
	        learning_curve(estimator, X, y, cv=cv, n_jobs=n_jobs,
	                       train_sizes=train_sizes,
	                       return_times=True)

Every time I see \ before the function is even called, I think of returning a bunch.

jnothman · 2019-05-30T02:08:46Z

I think it's fair enough. We currently return a similar dict in cross_validate (not a bunch there or in GridSearchCV.cv_results_, but could be), but this API was designed to follow cross_val_score before cross_validate existed.

sklearn/model_selection/_validation.py

Co-Authored-By: Joel Nothman <joel.nothman@gmail.com>

NicolasHug

Minimal comments. LGTM anyway

examples/model_selection/plot_learning_curve.py

sklearn/model_selection/tests/test_validation.py

jnothman

Otherwise lgtm. We can separately consider a return_bunch option, I suppose.

Please add an entry to the change log at doc/whats_new/v0.22.rst. Like the other entries there, please reference this pull request with :pr: and credit yourself (and other contributors if applicable) with :user:

sklearn/model_selection/tests/test_validation.py

…kit-learn into learning-curve-times

NicolasHug · 2019-06-11T13:05:44Z

@H4dr1en , please look at https://github.com/scikit-learn/scikit-learn/pull/13938/files and revert all the unrelated changes.

NicolasHug · 2019-06-14T18:34:56Z

Thanks @H4dr1en !

rth · 2019-06-16T12:55:13Z

It looks like Circle CI is failing on master after this PR was merged?

thomasjpfan · 2019-06-16T13:00:14Z

Raised issue regarding this at #14098

H4dr1en added 2 commits May 24, 2019 10:26

Added fit and score times for learning curves

0883478

Fixed line continuation character

0d73176

H4dr1en changed the title ~~[MRG] Added fit and score times for learning curves~~ [WIP] Added fit and score times for learning curves May 24, 2019

H4dr1en added 5 commits May 24, 2019 13:30

Added return fit and score times to _incremental_fit_estimator

0cc0623

compliant with flake8

388bb16

removed trailing whitespace

26807b4

Removed more trailing whitespace

cf1b016

fixed test_learning_curve array dimensions

5fe79ad

H4dr1en changed the title ~~[WIP] Added fit and score times for learning curves~~ [MRG] Added fit and score times for learning curves May 24, 2019

jnothman reviewed May 26, 2019

View reviewed changes

Backward compatibility and doc

640b4f4

jnothman reviewed May 27, 2019

View reviewed changes

doc/modules/learning_curve.rst Outdated Show resolved Hide resolved

jnothman reviewed May 27, 2019

View reviewed changes

examples/model_selection/plot_learning_curve.py Outdated Show resolved Hide resolved

H4dr1en added 5 commits May 27, 2019 12:50

Updated doc, fixed return values

f3fe244

Fixed Numpy incompatibility

be4d47f

Fixed backward compatibility for example

dc22667

Set n_jobs to 4 for plotting times

304b8e7

Setting it to - 1 is likely to make the generation of the documentation crash.

Set number of data points to 8

cee2c90

jnothman reviewed May 28, 2019

View reviewed changes

Replaced label training times with fit times for more clarity

679c72b

NicolasHug reviewed May 28, 2019

View reviewed changes

H4dr1en added 3 commits May 29, 2019 07:47

Refactored lc example, removed unrelated changes, included training s…

ea0cc71

…coring time to score time

flake8 compliance

98ab157

Restored doc, updated doc, removed curve color of 2 extra plots

07df058

jnothman reviewed May 29, 2019

View reviewed changes

sklearn/model_selection/_validation.py Outdated Show resolved Hide resolved

jnothman reviewed May 30, 2019

View reviewed changes

Plot example curves using grid, wrap line using parentheses

e3111ae

jnothman reviewed Jun 5, 2019

View reviewed changes

sklearn/model_selection/_validation.py Outdated Show resolved Hide resolved

Fixed typo in doc string

00bfb7e

Co-Authored-By: Joel Nothman <joel.nothman@gmail.com>

NicolasHug approved these changes Jun 7, 2019

View reviewed changes

examples/model_selection/plot_learning_curve.py Outdated Show resolved Hide resolved

examples/model_selection/plot_learning_curve.py Outdated Show resolved Hide resolved

sklearn/model_selection/tests/test_validation.py Outdated Show resolved Hide resolved

jnothman approved these changes Jun 11, 2019

View reviewed changes

sklearn/model_selection/tests/test_validation.py Outdated Show resolved Hide resolved

H4dr1en added 3 commits June 11, 2019 12:15

Changed example grid layout, check dtypes, added credits, fixed typo

299d5d3

Merge branch 'learning-curve-times' of https://github.com/H4dr1en/sci…

ceac44d

…kit-learn into learning-curve-times

Merge branch 'master' into learning-curve-times

7af991d

Reverted back unrelated changes

c4159d4

NicolasHug changed the title ~~[MRG] Added fit and score times for learning curves~~ Added fit and score times for learning_curve Jun 14, 2019

NicolasHug merged commit b28aadf into scikit-learn:master Jun 14, 2019

thomasjpfan mentioned this pull request Jun 15, 2019

learning_curve.rst assumes two figures from plot_learning_curve #14098

Closed

koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019

Added fit and score times for learning_curve (scikit-learn#13938)

5561cb7

desilinguist mentioned this pull request Oct 14, 2019

Include fit times in learning curve output EducationalTestingService/skll#556

Closed


		plt.show()

		###############################################################################

Uh oh!

Added fit and score times for learning_curve #13938

Added fit and score times for learning_curve #13938

Uh oh!

Conversation

Uh oh!

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants