-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
Add missing value support for AdaBoost? #30477
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Indeed missing values are not supported in I would suggest to use Even if trees support missing values, adding missing value support for To get an idea of the kind of work we are talking about, see #26391 for RandomForest and #27966 for ExtraTrees. |
Thank you for the reply and for providing those links. For my current use case, I'm essentially benchmarking a bunch of different classification methods with various datasets, and I was hoping to include I was able to get
to
And here
to
This approach worked for my own use. However, I know it would be better to check if the classifier supports missing values directly, instead of just that the base classifier is I also did not do any benchmarking to see how it affects runtime performance, so essentially all I can say now is my changes allow So, I realize there's a big gap between what I did then and actually having something that could be merged in as a pull request. But, there was also a thread on SO asking about using |
Great to hear that you managed to get it to work for your use case. In my opinion, hackability I would expect that in your benchmarks |
Thanks! I agree it's really cool that sk-learn's easy enough to edit that I was able to get Part of me would still like to see support for running it with missing values eventually added, but I wouldn't be able to attempt it myself anytime in the near future (and whatever I'd come up with would probably need at least a little tweaking even if it would be considered worth adding). And, I realize too in most cases |
I had a quick glance at your code change, and I think the meta estimator should inspect the You are welcome to open a PR if you have the time to contribute your fix to scikit-learn. Otherwise, let's close this PR if nobody else is interested enough to write a PR. |
Shoot, yes, that would have been better to check. I'm on a different project now, so I'm not sure if/when I'd be able to attempt to get this into good enough shape to be merged. I'm fine with the issue being closed if no one else is able to tackle this either |
Let's close but if someone else is interested in fixing this, feel free to open a PR and link to the discussion of this issue from the description of the PR. |
Describe the bug
I am working on classifying samples in various datasets using the AdaBoostClassifier with the DecisionTreeClassifier as the base estimator.
The DecisionTreeClassifier can handle np.nan values, so I assumed the AdaBoostClassifier would be able to as well.
However, that does not seem to be the case, as AdaBoost gives the error
ValueError: Input X contains NaN
when I try to use it with data containing NAs.I asked if this was the intended behavior here but have yet to receive a response.
If this isn't intentional, could AdaBoostClassifier be updated to support missing values when the base estimator does?
Steps/Code to Reproduce
Expected Results
No error is thrown when
DecisionTreeClassifier(max_depth=1)
is used as the classifier since the DecisionTreeClassifier can handle np.nan values.Because of that, I expected AdaBoost to fit and train successfully too.
Actual Results
Versions
The text was updated successfully, but these errors were encountered: