-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
X does not have valid feature names, but IsolationForest was fitted with feature names #25844
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I tried this in Jupyter on windows. It is working fine. Also, I tried one more thing.
|
This is a bug indeed, I can reproduce on 1.2.2 and |
The root cause as you hinted:
Not sure what the best approach is here, cc @glemaitre and @jeremiedbb who may have suggestions. |
OK. What if we pass the original feature names to the clf.scores_samples() method along with the input array X. You can obtain the feature names used during training by accessing the feature_names_ attribute of the trained IsolationForest model clf.
# Assuming clf is already trained and contamination != 'auto'
X = ... # input array that caused the error
feature_names = clf.feature_names_ # get feature names used during training
scores = clf.score_samples(X, feature_names=feature_names) # pass feature names to scores_samples()
In #24873 we solved a similar problem (internally passing a numpy array when the user passed in a dataframe). I've not looked at the code related to |
It seems like this approach could work indeed, thanks! To summarise the idea would be to:
I am labelling this as "good first issue", @abhi1628, feel free to start working on it if you feel like it! If that's the case, you can comment |
Indeed, using a private function to validate or not the input seems the way to go. |
Considering the idea of @glemaitre and @betatim I tried this logic.
|
Please modify the source code and add a non-regression test such that we can discuss implementation details. It is not easy to do that in an issue. |
Hi, I'm not sure if anyone is working on making a PR to solve this issue. If not, can I take this issue? |
@abhi1628 are you planning to open a Pull Request to try to solve this issue? If not, @Charlie-XIAO you would be more than welcome to work on it. |
Thanks, I will wait for @abhi1628's reponse then. |
I am not working on it currently, @Charlie-XIAO
<https://github.com/Charlie-XIAO> you can take this issue. Thank You.
…On Wed, 22 Mar, 2023, 12:59 am Yao Xiao, ***@***.***> wrote:
Thanks, I will wait for @abhi1628 <https://github.com/abhi1628>'s reponse
then.
—
Reply to this email directly, view it on GitHub
<#25844 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ANIBKQBDKOSP2V2NI2NEM2DW5H6RXANCNFSM6AAAAAAVZ2DOAA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Thanks, will work on it soon. |
/take |
…n not 'auto'" Also see Issue scikit-learn#25844, now does not generate warnings
A very similar bug seems to occur for LocalOutlierFactor as well. |
Can you open a separate issue with a minimal reproducible example? Thanks. |
Describe the bug
If you fit an
IsolationForest
using apd.DataFrame
it generates a warningThis only seems to occur if you supply a non-default value (i.e. not "auto") for the
contamination
parameter. This warning is unexpected as a) X does have valid feature names and b) it is being raised by thefit()
method but in general is supposed to indicate that predict has been called with ie. an ndarray but the model was fitted using a dataframe.The reason is most likely when you pass contamination != "auto" the estimator essentially calls predict on the training data in order to determine the
offset_
parameters:scikit-learn/sklearn/ensemble/_iforest.py
Line 337 in 9aaed49
Steps/Code to Reproduce
Expected Results
Does not raise "X does not have valid feature names, but IsolationForest was fitted with feature names"
Actual Results
raises "X does not have valid feature names, but IsolationForest was fitted with feature names"
Versions
The text was updated successfully, but these errors were encountered: