update documentation to reflect fit, transform, and predict parameter… #7156

shanglun · 2016-08-07T01:45:35Z

Reference Issue

This PR is a response to Scikit-learn issue #7142 which is a documentation addition for "rolling your own estimator"

What does this implement/fix? Explain your changes.

This PR adds two additional paragraphs detailing conventions for additional parameters to pass to the fit, predict, and transform functions of the custom estimator.

Any other comments?

If you'd like me to change the wording or add/remove sections, please feel free to reach out.

… passing rules

This change is

… passing rules

nelson-liu · 2016-08-07T01:53:02Z

doc/developers/contributing.rst

+should be restricted to variables that have the shape ``n_samples``.
+This allows for all relevant parameters to be sliced cleanly during
+cross-validation. All other parameters should be passed to
+``__init__`` or ``set_params``.


I think its more clear to say "All parameters should be passed into init or set after construction with set_params

I feel like parameters that affect the behavior of fit should always go into __init__... this also serves to maintain compatibility with things like GridSearchCV

Is "init" without the surrounding underscores clearer, do you think? Elsewhere in the documentation, the init function is referred to using the underscore notation. I do agree with the wording of set_params. I will make the change.

Oops, sorry I didn't put the underscores with init in a code block and it bolded. I didn't mean to suggest a change to remove underscores, that's good as it is

Parameters that affect behavior of fit does should always go into init. The idea with the fit parameters is to accommodate variables like sample_weights, which need to sliced up along with the X and Y during cross-validation. That should be clear in the documentation - I'll make the change.
Edit: Nice catch on the init function there, @nelson-liu

Yup, make sure to use underscores and not bold init ;) (__init__)

… and predict parameter conventions

nelson-liu · 2016-08-07T05:43:33Z

doc/developers/contributing.rst

-cross-validation. All other parameters should be passed to
-``__init__`` or ``set_params``.
+should be restricted to variables that need to be sliced during
+cross-validation. Consequently, these variables would be either arrays with shape=[N] or


I don't think "Consequently" fits here, you can just omit it.

…eters

amueller · 2016-08-26T17:11:34Z

doc/developers/contributing.rst

 All logic behind estimator parameters,
 like translating string arguments into functions, should be done in ``fit``.

+Parameters can 
8000
be passed to the ``fit`` method. Generally, these parameters


I would say "why it is possible to pass parameter to the fit method, these should be ``` and also explain why (pipelining, grid-search). Also, the shape can be anything as long as the first dimension in n_samples.

Ok! The change has been made. Please review when you have a chance.

amueller · 2016-09-06T19:02:55Z

doc/developers/contributing.rst

 should be restricted to variables that need to be sliced during
-cross-validation. These variables would be either arrays with shape=[N] or
-array-like with shape=[N,] where N is the number of samples. An example of
+cross-validation. These variables would be either arrays with shape=[N] where N=n_samples


I think we only require shape[0] == n_samples. Why not just These would be arrays withshape[0] == n_samples``?

I think there are many ways to express the same idea here. I like the
method you're suggesting as it's very clear and concise. I'll make this
change today.

On Sep 6, 2016 3:03 PM, "Andreas Mueller" notifications@github.com wrote:

In doc/developers/contributing.rst
#7156 (comment)
:

should be restricted to variables that need to be sliced during
-cross-validation. These variables would be either arrays with shape=[N] or
-array-like with shape=[N,] where N is the number of samples. An example of
+cross-validation. These variables would be either arrays with shape=[N] where N=n_samples

I think we only require shape[0] == n_samples. Why not just These would
be arrays withshape[0] == n_samples``?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/scikit-learn/scikit-learn/pull/7156/files/43fdd093b0bf14a22d7371334503949d61dfb491..01ccec3d297b421d86876d15f35df65aef11b511#r77698216,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AKNQ_CnJUt4oMYWJLxQR-DtpLDO9IfjDks5qnbkTgaJpZM4JeZ4l
.

jnothman · 2016-09-08T03:33:42Z

I'm sure this all needs a rewrite to some extent, but this is covered, at least with respect to fit above in the "Fitting" section. Perhaps it should point back? Or some other amalgamation?

shanglun · 2016-09-08T10:51:25Z

@jnothman Definitely makes sense to point back to related part of the doc. I'll look at it and figure out the best wording. I believe the ticket was created because the shape[0]==n_samples requirement was not explicitly mentioned in the rolling your own estimator section and created some confusion.

Unrelated, but the AppVeyor and Travis CI builds are failing. Checked my PR a few times and don't think they should cause the errors that the output are showing. Went into doc and tried make html and it works currently. Any ideas what I can do to make the builds pass?

maniteja123 · 2016-09-08T11:39:35Z

Hi, it seems that the test failure is unrelated to the changes in this PR. I suppose the master was failing and it is now fixed in 62ad5fe.

shanglun · 2016-09-18T04:55:08Z

I pushed another change in the documentation directory. It appears the travis-ci tests are still failing. Is there something I can change to help this CI process go through correctly?

nelson-liu · 2016-09-18T06:15:23Z

8000

looking at it again, seems like the flake8 diff script got a hold of the wrong ancestor, triggering it to diff every file touched for around 19k commits..

…into skl_7142_doc_fix

amueller · 2016-09-30T01:21:49Z

please don't merge the master branch. that really messes up history. You can rebase your branch on top of master if you like, but it's usually not necessary.

amueller · 2016-09-30T01:22:28Z

doc/developers/contributing.rst

+While it is possible to pass parameters into the ``fit`` method, these
+should be restricted to variables that need to be sliced during
+cross-validation. These variables need to be arrays with shape[0]==n_samples (see
+:ref:`fitting` above). An example of


weird linebreak.

amueller

Looks good apart from my comment.

amueller · 2016-09-30T01:24:01Z

doc/developers/contributing.rst

+``__init__`` or set after construction using ``set_params``.
+
+As a general rule, ``transform`` and ``predict`` should not take additional
+parameters. In cases where additional parameters would be useful, e.g.


I don't think this is a good explanation of what we want. I would say "In case parameters need to be changed after the call to __init__, e.g. when setting a threshold value for feature selection, this can be done by setting the attribute directly on the estimator or calling set_parms.

dalmia · 2016-11-08T18:06:19Z

This seems inactive now. Should I create a fresh pull request incorporating all the changes discussed so far in the thread?

shanglun · 2016-11-08T18:12:43Z

Didn't realize there was another suggestion to implement. I'll issue a PR
today

On Nov 8, 2016 1:06 PM, "Aman Dalmia" notifications@github.com wrote:

This seems inactive now. Should I create a fresh pull request
incorporating all the changes discussed so far in the thread?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#7156 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AKNQ_BDC6_0P4w2vkwF5EfNJRManF4yhks5q8LozgaJpZM4JeZ4l
.

dalmia · 2016-11-08T18:14:34Z

Hi @shanglun, I don't suppose you need to create a new PR. Just commit the required changes. That should work.

shanglun · 2016-11-08T18:22:44Z

Right, my English :p

On Nov 8, 2016 1:15 PM, "Aman Dalmia" notifications@github.com wrote:

Hi @shanglun https://github.com/shanglun, I don't suppose you need to
create a new PR. Just commit the required changes. That should work.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#7156 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AKNQ_DYkrdNcTdt3wxpFBWQCRnE-PZsEks5q8LwlgaJpZM4JeZ4l
.

shanglun · 2016-11-09T04:47:33Z

Hmm, I'm getting this error when trying to build the documentation

Exception occurred: File "C:\Users\shang\Anaconda3\lib\zipfile.py", line 1433, in write st = os.stat(filename) FileNotFoundError: [WinError 2] The system cannot find the file specified: 'auto_examples\\tree\\unveil_tree_structure.ipynb' The full traceback has been saved in C:\Users\shang\AppData\Local\Temp\sphinx-err-ueft6pnv.log, if you want to report the issue to the developers. Please also report this if it was a user error, so that a better error message can be provided next time. A bug report can be filed in the tracker at <https://github.com/sphinx-doc/sphinx/issues>. Thanks!
Any ideas on how to fix this? (besides getting a Linux machine, heh)

dalmia · 2016-11-09T05:46:51Z

Seems like an issue with the file path you specified. This might help you.

shanglun · 2016-11-09T06:05:03Z

Yeah, figured that. Looks like the backslashes are in fact escaped, and
previous makes have succeeded. Git status doesn't think anything is missing.

On Nov 9, 2016 12:47 AM, "Aman Dalmia" notifications@github.com wrote:

Seems like an issue with the file path you specified. This
http://stackoverflow.com/questions/19767179/python-file-not-found-error
might help you.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#7156 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AKNQ_Fh-PNHENOJyxtq_S__mchbOlKlUks5q8V5hgaJpZM4JeZ4l
.

shanglun · 2016-11-14T05:58:36Z

I'm having trouble building the documentation, looks like there is a plot_unveil_tree_structure.py but no unveil_tree_structure.ipynb

I thought maybe this is just an issue with my branch, but when I clone a fresh copy of the master branch and try to build the docs, I still get errors.

Are others able to build documentation without problems?

GaelVaroquaux · 2016-11-14T06:09:14Z

I'm having trouble building the documentation, looks like there is a plot_unveil_tree_structure.py but no unveil_tree_structure.ipynb

Try removing the "auto_example" directory in docs.

shanglun · 2016-11-15T01:17:11Z

That build the documentation page in question. Thank you for the help. For future reference, what specifically caused the error? Was it a permissions issue?

amueller · 2016-11-22T17:39:57Z

doc/developers/contributing.rst

+As a general rule, ``transform`` and ``predict`` should not take additional
+parameters. In cases where additional parameters would be useful, e.g. when
+setting a threshold value for feature selection, the custom parameters
+should be passed to ``__init__``. In case parameters need to be changed after the call to __init__, e.g. when setting a threshold value for feature selection, this can be done by setting the attribute directly on the estimator or calling set_parms.


please format so that each line is <80 characters.

amueller · 2016-11-22T17:40:12Z

doc/developers/contributing.rst

 like translating string arguments into functions, should be done in ``fit``.
+While it is possible to pass parameters into the ``fit`` method, these
+should be restricted to variables that need to be sliced during
+cross-validation. These variables need to be arrays with shape[0]==n_samples (see


please double backticks around code.

amueller · 2016-11-22T17:40:39Z

doc/developers/contributing.rst

+While it is possible to pass parameters into the ``fit`` method, these
+should be restricted to variables that need to be sliced during
+cross-validation. These variables need to be arrays with shape[0]==n_samples (see
+:ref:`fitting` above). An example of


These seem to fit on a single line.

amueller · 2016-11-22T17:40:57Z

doc/developers/contributing.rst

 would have to be performed in ``set_params``,
 which is used in algorithms like ``GridSearchCV``.

+.. _fitting:


is this used anywhere?

This is linked to in a previous section - line 982.

cmarmo · 2020-07-09T10:13:37Z

@shanglun , @amueller, the documentatition about "Rolling your own estimation" has been moved to its own section (and file). Are this PR and the correspondent issue (#7142) still relevant? Thanks!

shanglun · 2020-07-09T18:13:50Z

Hi Chiara, I think it's ok to close the PR. If it is still an issue it's probably best to create a new issue and PR for the new file. Sean

…

On Thu, Jul 9, 2020 at 6:13 AM Chiara Marmo ***@***.***> wrote: @shanglun <https://github.com/shanglun> , @amueller <https://github.com/amueller>, the documentatition about "Rolling your own estimation" has been moved to its own section <https://scikit-learn.org/dev/developers/develop.html#rolling-your-own-estimator> (and file <https://github.com/scikit-learn/scikit-learn/blob/master/doc/developers/develop.rst>). Are this PR and the correspondent issue (#7142 <#7142>) still relevant? Thanks! — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#7156 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACRVB7GFMQIIPFIHQ3I4XK3R2WJ6LANCNFSM4CLZTYSQ> .

cmarmo · 2020-07-10T14:28:10Z

Thanks @shanglun (Sean), this was your first pull request to scikit-learn, though and this is not a nice way to end it... :(
Please, feel free to pick a new issue and we will try to do better. :)

update documentation to reflect fit, transform, and predict parameter…

c019330

… passing rules

nelson-liu reviewed Aug 7, 2016
View reviewed changes

wording changes and clarifications to documentation on fit, transfor,…

bf626c9

… and predict parameter conventions

nelson-liu reviewed Aug 7, 2016
View reviewed changes

shanglun added 2 commits August 7, 2016 12:08

adjust wording for documentation on fit, transform, and predict param…

f094b0d

…eters

adjust shape notation

43fdd09

amueller reviewed Aug 26, 2016
View reviewed changes

docfix feedback

01ccec3

amueller reviewed Sep 6, 2016
View reviewed changes

change wording

634391c

refer back to fitting section above

7102e64

Merge branch 'master' of https://github.com/scikit-learn/scikit-learn …

866cc49

…into skl_7142_doc_fix

amueller reviewed Sep 30, 2016

View reviewed changes

amueller requested changes Sep 30, 2016

View reviewed changes

Fix change requests

34fe620

amueller reviewed Nov 22, 2016

View reviewed changes

line length fix

ed793b8

amueller added the Waiting for Reviewer label Aug 5, 2019

amueller approved these changes Aug 16, 2019

View reviewed changes

cmarmo closed this Jul 10, 2020

cmarmo mentioned this pull request Sep 29, 2020

arguments to fit in "rolling your own estimator" #7142

Closed

Uh oh!

update documentation to reflect fit, transform, and predict parameter… #7156

update documentation to reflect fit, transform, and predict parameter… #7156

Uh oh!

Conversation

Uh oh!

Reference Issue

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants