-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
[MRG+2] Rename MICEImputer
to ChainedImputer
#11314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
MICEImputer
to ChainedImputer
MICEImputer
to ChainedImputer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
MICEImputer
to ChainedImputer
MICEImputer
to ChainedImputer
Getting some |
@sergeyf Any reference for the new name? |
@qinhanmin2014 thanks re: master, I'll do that. Re: name. So the point here is that we're not actually doing MICE, because the traditional way of applying the MICE technique is to get multiple imputations, and this class only returns a single one. But the full name is multivariate imputation by chained equations (MICE), hence the named |
Sorry, I broke master. |
Part of the confusion is that:
|
@jnothman Which paper? Stef's paper that I reference is this one: "mice: Multivariate Imputation by Chained Equations in R" https://www.jstatsoft.org/article/view/v045i03 There's also this paper, but it's not referenced in this PR: "Multiple Imputation by Chained Equations: What is it and how does it work?" https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3074241/ @qinhanmin2014 will you please take a look at my changes to impute.rst and let me know if it's sufficiently clear now. |
You're right. I've just been confused by people implying that the M must mean Multiple.... because of course what we're doing here is Multivariate Imputation by Chaining.... just not Multiple. |
I have confused that at least 5 times while writing the original PR =) |
doc/modules/impute.rst
Outdated
A more sophisticated approach is to use the :class:`ChainedImputer` class, which | ||
implements MICE: Multivariate Imputation by Chained Equations. MICE is usually used | ||
to generate multiple imputations, but :class:`ChainedImputer` generates a single | ||
(averaged) imputation, which is why it is not named `MICEImputer`. MICE models each |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to link those terms to a glossary entry that explains:
- univariate vs multivariate imputation
- single vs multiple imputation (increasing the number of samples in the resulting transformed dataset).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. There is no entry for imputation
. I'll add one and include these points.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides the above comment on the missing glossary entry, +1 for merge.
@ogrisel I added a glossary entry and rewrote the entries in impute.rst. Please take a look and confirm that it satisfies your request. |
MICEImputer
to ChainedImputer
MICEImputer
to ChainedImputer
Yeah, mice is both multivariate and multiple.
multivariate: it loops over the features and imputes all incomplete features
multiple: if you impute every incomplete value m times.
ChainedImputer in its current form is chained and multivariate (and not yet
multiple).
<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
Virusvrij.
www.avast.com
<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
2018-06-20 17:19 GMT+02:00 Sergey Feldman <notifications@github.com>:
… @ogrisel <https://github.com/ogrisel> I added a glossary entry and
rewrote the entries in impute.rst. Please take a look and confirm that it
satisfies your request.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#11314 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AVQqeyAS6OX5LzBwe2JtzlCeq2jly0RCks5t-mfqgaJpZM4UsF-9>
.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The glossary entry for imputation is significantly longer than the average entry.
Maybe we should keep a short entry in the glossary and move the discussing on univariate vs multivariate vs single vs multiple imputations in a dedicated section at the end of the missing value imputation chapter of the narrative documentation.
Alternatively, we could split the entries: univariate imputation, multivariate imputation, multiple imputations and add cross-links. See for instance the entries on "inductive" vs "transductive".
impute | ||
imputation | ||
Most machine learning algorithms require that their inputs have no | ||
:term:`missing values`, and will not work if this requirement is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a backlink to the :term:imputation
entry from the :term:missing values
entry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
doc/glossary.rst
Outdated
The above practice is called multiple imputation. The | ||
:class:`impute.ChainedImputer` class can be used for multiple | ||
imputations by applying it to the same dataset with different random | ||
seeds. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would further refine:
... imputations by applying it repeatedly to the same dataset with different random seeds.
Note that a call to the transform
method of a :term:transformer
is not allowed to change the number of samples. Therefore multiple imputations cannot be achieved by a single call to transform
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will add.
@ogrisel Done! I moved most of the discussion to impute.rst |
Hmm tests are failing because of StandardScaler?
|
sklearn/utils/estimator_checks.py
Outdated
@@ -77,8 +77,8 @@ | |||
'RANSACRegressor', 'RadiusNeighborsRegressor', | |||
'RandomForestRegressor', 'Ridge', 'RidgeCV'] | |||
|
|||
ALLOW_NAN = ['Imputer', 'SimpleImputer', 'MICEImputer', | |||
'MinMaxScaler', 'StandardScaler', 'QuantileTransformer'] | |||
ALLOW_NAN = ['Imputer', 'SimpleImputer', 'ChainedImputer', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add the StandardScaler
back here to remove the error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh dang that was my bad on the merging. Thanks!
Thank you very much @sergeyf! |
This reverts commit 93382cc.
Addresses a point raised in #8478 and #11259. The
MICEImputer
isn't really a MICE imputer because we're only keeping a single averaged imputation by default. This is a purely cosmetic PR so that end-users don't get as confused.CC @glemaitre @jnothman @RianneSchouten