8000 Fixes #11996 : Handle missing values in OneHotEncoder by Olamyy · Pull Request #12017 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

Fixes #11996 : Handle missing values in OneHotEncoder #12017

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions doc/modules/preprocessing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -540,6 +540,16 @@ columns for this feature will be all zeros
array([[1., 0., 0., 0., 0., 0.]])


Missing categorical features in the training data can be handled by specifying what happens to them using the ``handle_missing`` parameter. The values for this can be one of :

`all-missing`: This will replace all missing rows with NaN.
`all-zero` : This will replace all mis 6146 sing rows with zeros.
`categorical` : This will replace all missing rows as a representation of a separate one hot column.

Note that, for scikit-learn to handle your missing values using OneHotEncoder, you have to pass a placeholder of what should be recorded as a missing value. This is the `missing_values` parameter and possible values can be either a `NaN` or a custom value of your choice.



See :ref:`dict_feature_extraction` for categorical features that are represented
as a dict, not as scalars.

Expand Down
Loading
0