Improve near constant feature detection in scalers #19898

jeremiedbb · 2021-04-14T13:30:01Z

To avoid scaling near constant features by a very large value, a heuristic was introduced in #19527. However it was a bit too absolute (e.g. #19726). It was fixed for StandardScaler, but there might be similar issues for the other scalers.

It should be tested case by case, depending on how the scale is defined. To see if the heuristic is appropriate or not, it should be tested against a very wide range of scaling of the data. If it breaks similarly to #19726, we need to find an appropriate bound on the scale to declare the feature constant (as it was done in #19788 for StandardScaler)

ogrisel · 2021-04-14T14:46:26Z

Not sure if this should be labeled as a bug or an enhancement. I suppose this will depend on the result of the investigations.

RogerWszolek · 2021-05-24T22:13:30Z

Hey, I saw this post and found it very interesting. I'm going to put some research into this unless any of you has a reason that I shouldn't.

jeremiedbb · 2021-05-24T23:10:06Z

sure go ahead !

Higgs32584 · 2024-01-18T16:50:54Z

Any updates

jeremiedbb added the Bug: triage label Apr 14, 2021

ogrisel added Enhancement help wanted module:preprocessing and removed Bug: triage labels Apr 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Improve near constant feature detection in scalers #19898

Improve near constant feature detection in scalers #19898

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Improve near constant feature detection in scalers #19898

Improve near constant feature detection in scalers #19898

Comments

Uh oh!

Uh oh!

Uh oh!

Uh oh!