-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
update documentation to reflect fit, transform, and predict parameter… #7156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
doc/developers/contributing.rst
Outdated
should be restricted to variables that have the shape ``n_samples``. | ||
This allows for all relevant parameters to be sliced cleanly during | ||
cross-validation. All other parameters should be passed to | ||
``__init__`` or ``set_params``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think its more clear to say "All parameters should be passed into init or set after construction with set_params
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like parameters that affect the behavior of fit should always go into __init__
... this also serves to maintain compatibility with things like GridSearchCV
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is "init" without the surrounding underscores clearer, do you think? Elsewhere in the documentation, the init function is referred to using the underscore notation. I do agree with the wording of set_params. I will make the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, sorry I didn't put the underscores with init in a code block and it bolded. I didn't mean to suggest a change to remove underscores, that's good as it is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parameters that affect behavior of fit does should always go into init. The idea with the fit parameters is to accommodate variables like sample_weights, which need to sliced up along with the X and Y during cross-validation. That should be clear in the documentation - I'll make the change.
Edit: Nice catch on the init function there, @nelson-liu
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, make sure to use underscores and not bold init ;) (__init__
)
… and predict parameter conventions
doc/developers/contributing.rst
Outdated
cross-validation. All other parameters should be passed to | ||
``__init__`` or ``set_params``. | ||
should be restricted to variables that need to be sliced during | ||
cross-validation. Consequently, these variables would be either arrays with shape=[N] or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think "Consequently" fits here, you can just omit it.
doc/developers/contributing.rst
Outdated
@@ -952,6 +952,22 @@ The easiest and recommended way to accomplish this is to | |||
All logic behind estimator parameters, | |||
like translating string arguments into functions, should be done in ``fit``. | |||
|
|||
Parameters can be passed to the ``fit`` method. Generally, these parameters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say "why it is possible to pass parameter to the fit
method, these should be ``` and also explain why (pipelining, grid-search). Also, the shape can be anything as long as the first dimension in n_samples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok! The change has been made. Please review when you have a chance.
doc/developers/contributing.rst
Outdated
should be restricted to variables that need to be sliced during | ||
cross-validation. These variables would be either arrays with shape=[N] or | ||
array-like with shape=[N,] where N is the number of samples. An example of | ||
cross-validation. These variables would be either arrays with shape=[N] where N=n_samples |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we only require shape[0] == n_samples
. Why not just These would be arrays with
shape[0] == n_samples``?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are many ways to express the same idea here. I like the
method you're suggesting as it's very clear and concise. I'll make this
change today.
On Sep 6, 2016 3:03 PM, "Andreas Mueller" notifications@github.com wrote:
In doc/developers/contributing.rst
#7156 (comment)
:should be restricted to variables that need to be sliced during
-cross-validation. These variables would be either arrays with shape=[N] or
-array-like with shape=[N,] where N is the number of samples. An example of
+cross-validation. These variables would be either arrays with shape=[N] where N=n_samplesI think we only require shape[0] == n_samples. Why not just These would
be arrays withshape[0] == n_samples``?—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/scikit-learn/scikit-learn/pull/7156/files/43fdd093b0bf14a22d7371334503949d61dfb491..01ccec3d297b421d86876d15f35df65aef11b511#r77698216,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AKNQ_CnJUt4oMYWJLxQR-DtpLDO9IfjDks5qnbkTgaJpZM4JeZ4l
.
I'm sure this all needs a rewrite to some extent, but this is covered, at least with respect to |
@jnothman Definitely makes sense to point back to related part of the doc. I'll look at it and figure out the best wording. I believe the ticket was created because the shape[0]==n_samples requirement was not explicitly mentioned in the rolling your own estimator section and created some confusion. Unrelated, but the AppVeyor and Travis CI builds are failing. Checked my PR a few times and don't think they should cause the errors that the output are showing. Went into doc and tried make html and it works currently. Any ideas what I can do to make the builds pass? |
Hi, it seems that the test failure is unrelated to the changes in this PR. I suppose the master was failing and it is now fixed in 62ad5fe. |
I pushed another change in the documentation directory. It appears the travis-ci tests are still failing. Is there something I can change to help this CI process go through correctly? |
looking at it again, seems like the flake8 diff script got a hold of the wrong ancestor, triggering it to diff every file touched for around 19k commits.. |
…into skl_7142_doc_fix
please don't merge the master branch. that really messes up history. You can rebase your branch on top of master if you like, but it's usually not necessary. |
doc/developers/contributing.rst
Outdated
While it is possible to pass parameters into the ``fit`` method, these | ||
should be restricted to variables that need to be sliced during | ||
cross-validation. These variables need to be arrays with shape[0]==n_samples (see | ||
:ref:`fitting` above). An example of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
weird linebreak.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good apart from my comment.
doc/developers/contributing.rst
Outdated
``__init__`` or set after construction using ``set_params``. | ||
|
||
As a general rule, ``transform`` and ``predict`` should not take additional | ||
parameters. In cases where additional parameters would be useful, e.g. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is a good explanation of what we want. I would say "In case parameters need to be changed after the call to __init__
, e.g. when setting a threshold value for feature selection, this can be done by setting the attribute directly on the estimator or calling set_parms
.
This seems inactive now. Should I create a fresh pull request incorporating all the changes discussed so far in the thread? |
Didn't realize there was another suggestion to implement. I'll issue a PR On Nov 8, 2016 1:06 PM, "Aman Dalmia" notifications@github.com wrote:
|
Hi @shanglun, I don't suppose you need to create a new PR. Just commit the required changes. That should work. |
Right, my English :p On Nov 8, 2016 1:15 PM, "Aman Dalmia" notifications@github.com wrote:
|
Hmm, I'm getting this error when trying to build the documentation
|
Seems like an issue with the file path you specified. This might help you. |
Yeah, figured that. Looks like the backslashes are in fact escaped, and On Nov 9, 2016 12:47 AM, "Aman Dalmia" notifications@github.com wrote:
|
I'm having trouble building the documentation, looks like there is a plot_unveil_tree_structure.py but no unveil_tree_structure.ipynb I thought maybe this is just an issue with my branch, but when I clone a fresh copy of the master branch and try to build the docs, I still get errors. Are others able to build documentation without problems? |
I'm having trouble building the documentation, looks like there is a
plot_unveil_tree_structure.py but no unveil_tree_structure.ipynb
Try removing the "auto_example" directory in docs.
|
That build the documentation page in question. Thank you for the help. For future reference, what specifically caused the error? Was it a permissions issue? |
doc/developers/contributing.rst
Outdated
As a general rule, ``transform`` and ``predict`` should not take additional | ||
parameters. In cases where additional parameters would be useful, e.g. when | ||
setting a threshold value for feature selection, the custom parameters | ||
should be passed to ``__init__``. In case parameters need to be changed after the call to __init__, e.g. when setting a threshold value for feature selection, this can be done by setting the attribute directly on the estimator or calling set_parms. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please format so that each line is <80 characters.
doc/developers/contributing.rst
Outdated
@@ -974,6 +976,21 @@ The easiest and recommended way to accomplish this is to | |||
**not do any parameter validation in** ``__init__``. | |||
All logic behind estimator parameters, | |||
like translating string arguments into functions, should be done in ``fit``. | |||
While it is possible to pass parameters into the ``fit`` method, these | |||
should be restricted to variables that need to be sliced during | |||
cross-validation. These variables need to be arrays with shape[0]==n_samples (see |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please double backticks around code.
doc/developers/contributing.rst
Outdated
While it is possible to pass parameters into the ``fit`` method, these | ||
should be restricted to variables that need to be sliced during | ||
cross-validation. These variables need to be arrays with shape[0]==n_samples (see | ||
:ref:`fitting` above). An example of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These seem to fit on a single line.
@@ -798,6 +798,8 @@ The reason for postponing the validation is that the same validation | |||
would have to be performed in ``set_params``, | |||
which is used in algorithms like ``GridSearchCV``. | |||
|
|||
.. _fitting: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this used anywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is linked to in a previous section - line 982.
Hi Chiara,
I think it's ok to close the PR. If it is still an issue it's probably best
to create a new issue and PR for the new file.
Sean
…On Thu, Jul 9, 2020 at 6:13 AM Chiara Marmo ***@***.***> wrote:
@shanglun <https://github.com/shanglun> , @amueller
<https://github.com/amueller>, the documentatition about "Rolling your
own estimation" has been moved to its own section
<https://scikit-learn.org/dev/developers/develop.html#rolling-your-own-estimator>
(and file
<https://github.com/scikit-learn/scikit-learn/blob/master/doc/developers/develop.rst>).
Are this PR and the correspondent issue (#7142
<#7142>) still
relevant? Thanks!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#7156 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACRVB7GFMQIIPFIHQ3I4XK3R2WJ6LANCNFSM4CLZTYSQ>
.
|
Thanks @shanglun (Sean), this was your first pull request to scikit-learn, though and this is not a nice way to end it... :( |
Reference Issue
This PR is a response to Scikit-learn issue #7142 which is a documentation addition for "rolling your own estimator"
What does this implement/fix? Explain your changes.
This PR adds two additional paragraphs detailing conventions for additional parameters to pass to the fit, predict, and transform functions of the custom estimator.
Any other comments?
If you'd like me to change the wording or add/remove sections, please feel free to reach out.
… passing rules
This change is