8000 FEA Add missing-value support to sparse splitter in RandomForest and ExtraTrees · Issue #29542 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content
FEA Add missing-value support to sparse splitter in RandomForest and ExtraTrees #29542
Open
@adam2392

Description

@adam2392

Summary

While missing-value support for decision trees have been added recently, they only work when encoded in a dense array. Since RandomForest* and ExtraTrees* both support sparse X, if a user encodes np.nan inside sparse X, it should still work.

Solution

Add missing-value logic in SparsePartitioner in _parititoner.pyx, BestSparseSplitter and RandomSparseSplitter in _splitter.pyx.

The logic is the same as in the dense case, but just has to handle the fact that X is now sparse CSC array format.

Misc.

          FYI https://github.com/scikit-learn/scikit-learn/pull/27966 will introduce native support for missing values in the `ExtraTree*` models (i.e. random splitter). 

One thing I noticed though as I went through the PR is that the current codebase still does not support missing values in the sparse splitter. I think this might be pretty easy to add, but should we re-open this issue technically?

Xref: #5870 (comment)

Originally posted by @adam2392 in #5870 (comment)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0