FEA Add missing-value support to sparse splitter in RandomForest and ExtraTrees

@adam2392

Summary

While missing-value support for decision trees have been added recently, they only work when encoded in a dense array. Since RandomForest* and ExtraTrees* both support sparse X, if a user encodes np.nan inside sparse X, it should still work.

Solution

Add missing-value logic in SparsePartitioner in _parititoner.pyx, BestSparseSplitter and RandomSparseSplitter in _splitter.pyx.

The logic is the same as in the dense case, but just has to handle the fact that X is now sparse CSC array format.

Misc.

          FYI https://github.com/scikit-learn/scikit-learn/pull/27966 will introduce native support for missing values in the `ExtraTree*` models (i.e. random splitter).

One thing I noticed though as I went through the PR is that the current codebase still does not support missing values in the sparse splitter. I think this might be pretty easy to add, but should we re-open this issue technically?

Xref: #5870 (comment)

Originally posted by @adam2392 in #5870 (comment)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Summary

Solution

Misc.

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Description

Summary

Solution

Misc.

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions