8000 DOC Give local recommendations about IterativeImputer in docstrings by aperezlebel · Pull Request #23701 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

DOC Give local recommendations about IterativeImputer in docstrings #23701

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Jul 27, 2022

Conversation

aperezlebel
Copy link
Contributor

Reference Issues/PRs

Addresses first task of #21967.

What does this implement/fix? Explain your changes.

Adapt the docstrings to give local recommendations for IterativeImputer.

Any other comments?

I refer to KNNImputer as a "multivariate imputer" in the "see also" part because it uses observed features to impute the missing ones (and does not use only the missing ones as does the univariate SimpleImputer). However, in the user guide it is outside of the "multivariate imputation" part. Is it ok to refer to KNNImputer as a multivariate imputer as I did?

@glemaitre glemaitre changed the title Give local recommendations about IterativeImputer in docstrings DOC Give local recommendations about IterativeImputer in docstrings Jun 21, 2022
@glemaitre
Copy link
Member

I assume that we can solve the SyntaxError: invalid escape sequence \m using a raw string:

r"""Multivariate ....
"""

@glemaitre
Copy link
Member

Is it ok to refer to KNNImputer as a multivariate imputer as I did?

Yes, I think this is indeed a multivariate imputer. We should be updating the user guide accordingly in another PR.

@aperezlebel
Copy link
Contributor Author
aperezlebel commented Jun 21, 2022

We should be updating the user guide accordingly in another PR.

Ok, I will do this in task 6 of #21967.

For the record, I derived the complexity as follows: $\mathcal{O}(np\min(n,p))$ for a single fit of BayesianRidge, which is fitted $\mathcal{O}(kp^2)$ times inside IterativeImputer.

@glemaitre
Copy link
Member

Trying to solve the doc issue. It seems weird to me. Let see if the warning is still up.
We could ass the information about the missing indicator as well.

@glemaitre glemaitre self-requested a review June 23, 2022 14:44
@glemaitre
Copy link
Member

I just escaped the backslashes because I don't get where the warning is raised from.

@jjerphan
Copy link
Member

@ArturoAmorQ: I think you might be interested in reviewing this PR. 🙂

Copy link
Member
@ArturoAmorQ ArturoAmorQ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, @aperezlebel, I am looking forward for the rest of the related PRs. For the moment here are a couple of comments.

Comment on lines +95 to +96
IterativeImputer : Multivariate imputer that estimates values to impute for
each feature with missing values from all the others.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the sake of consistency, please make the same change in the first line of the IterativeImputer docstring and other references to it in sklearn/impute/_base.py

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it does not fit on a single line, it will not be compliant with numpydoc. I think that we can let it as-is

where :math:`k` = `max_iter`, :math:`n` the number of samples and
:math:`p` the number of features. It thus becomes prohibitively costly when
the number of features increases. Setting
`n_nearest_features << n_features`, `skip_complete=True` or increasing `tol`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that

`n_nearest_features` << `n_features`, `skip_complete` = `True`

was changed to

`n_nearest_features << n_features`, `skip_complete=True`

in f5a1ebd (#23701). I agree that skip_complete=True is the correct markdown as the = operator is part of the user syntax, but I am wondering if the notation n_nearest_features << n_features is more correct for such a mathematical (and not user) syntax. What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this is fine. We are not super consistent in this regard. At least, the HTML rendering should be on the same line then.

@glemaitre glemaitre self-requested a review July 27, 2022 12:44
@glemaitre glemaitre merged commit 8a01010 into scikit-learn:main Jul 27, 2022
@glemaitre
Copy link
Member

Thanks @aperezlebel LGTM

glemaitre added a commit to glemaitre/scikit-learn that referenced this pull request Aug 4, 2022
glemaitre added a commit that referenced this pull request Aug 5, 2022
…23701)

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0