Handle missing values in OrdinalEncoder #11997

jnothman · 2018-09-04T05:40:04Z

A minimal implementation would pass through NaNs from the input to the output of transform and make sure the presence of NaN does not affect the categories identified in fit.

A missing_values parameter might allow the user to configure what object is a placeholder for missingness (e.g. NaN, None, etc.).

See #10465 for background

The text was updated successfully, but these errors were encountered:

maxcopeland · 2018-09-04T17:45:44Z

Hi @jnothman-- do you mind if I work on this?

jnothman · 2018-09-04T21:39:22Z

Go for it

jnothman · 2018-09-04T21:40:13Z

I suppose we might also consider a handle_missing param that would allow NaN to be encoded as the smallest/largest number...?

jashrathod · 2018-09-08T11:57:42Z

Hi.
I'm new so open source contributions. So can someone help me get started?

maxcopeland · 2018-09-08T15:01:26Z

I'm currently working on this issue-- but I think the best way to start is to review the contributing guidlines. And when you see an issue no one is working on, ask the member who submitted the issue if you can get started. (I'm fairly new to this project myself).

CatChenal · 2018-09-29T14:39:40Z

I wish the help wanted tag would disappear once a contributor adopts an issue...

shashvat-kedia · 2019-04-01T06:03:20Z

@jnothman I am new to this project and would like to contribute. Can I start by working on this issue?

jnothman · 2019-04-01T07:32:16Z

@maxcopeland is working on this issue. It's just taking some time to finish because it's not so easy :)

Catadanna · 2019-09-17T07:08:56Z

A suggestion: assign 0 only for missing values, and starting encoding from 1 (and not from 0 as it is done now), even when there are no missing values in the data set. Such a normalization could help identifying the preceding missing values more easier (in order to handle them).

glemaitre · 2019-11-13T15:17:39Z

By adding the option add_indicator in the imputer, we also make things difficult right now.
Indeed, one will have to define a pipeline imputer+encoder. If add_indicator=True, we will get some extra-columns which you don't want to encode.

~~The workaround is to make a column transform with a MissingIndicator and set add_indicator=False for the imputer.~~

~~A reasonable use case would be to first encode ignoring the missing values and then apply the imputer.~~

~~I might pick up this and make some reviews on the different PRs~~

EDIT: Since we will encode the missing values as a caregories, we will not need add_indicator=True in practise.

thomasjpfan · 2023-02-23T17:05:31Z

I am closing this PR because this feature was added in #21988, which added encoded_missing_value to choose the encoding for missing values.

jnothman mentioned this issue Sep 4, 2018

Handling of missing values in the CategoricalEncoder #10465

Closed

jnothman added Easy Well-defined and straightforward way to resolve help wanted good first issue Easy with clear instructions to resolve labels Sep 4, 2018

maxcopeland mentioned this issue Sep 9, 2018

[WIP] Handle missing values in OrdinalEncoder #12045

Closed

jnothman removed the help wanted label Sep 30, 2018

baluyotraf mentioned this issue Feb 15, 2019

[WIP] NaN Support for OneHotEncoder #13028

Closed

johngian mentioned this issue Feb 25, 2019

Ignore missing categorical features mozilla/bugbug#192

Merged

rragundez mentioned this issue Mar 11, 2019

Handle unseen labels in LabelEncoder #13423

Closed

jnothman removed Easy Well-defined and straightforward way to resolve good first issue Easy with clear instructions to resolve labels Apr 1, 2019

fmder mentioned this issue May 16, 2019

[MRG] Adds handle unknown option to ordinal encoder #13897

Closed

TwsThomas mentioned this issue Sep 18, 2019

[WIP] Handle missing values in label._encode() #15009

Closed

franchuterivera mentioned this issue May 3, 2021

Simplify InputValidator: Allows pandas frame to directly reach the pipeline automl/auto-sklearn#1135

Merged

cmarmo added Enhancement help wanted module:preprocessing labels Jan 30, 2022

thomasjpfan closed this as completed Feb 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle missing values in OrdinalEncoder #11997

Handle missing values in OrdinalEncoder #11997

Handle missing values in OrdinalEncoder #11997

Handle missing values in OrdinalEncoder #11997

Comments