8000 Improve near constant feature detection in scalers · Issue #19898 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

Improve near constant feature detection in scalers #19898

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jeremiedbb opened this issue Apr 14, 2021 · 4 comments
Open

Improve near constant feature detection in scalers #19898

jeremiedbb opened this issue Apr 14, 2021 · 4 comments

Comments

@jeremiedbb
Copy link
Member

To avoid scaling near constant features by a very large value, a heuristic was introduced in #19527. However it was a bit too absolute (e.g. #19726). It was fixed for StandardScaler, but there might be similar issues for the other scalers.

It should be tested case by case, depending on how the scale is defined. To see if the heuristic is appropriate or not, it should be tested against a very wide range of scaling of the data. If it breaks similarly to #19726, we need to find an appropriate bound on the scale to declare the feature constant (as it was done in #19788 for StandardScaler)

@ogrisel
Copy link
Member
ogrisel commented Apr 14, 2021

Not sure if this should be labeled as a bug or an enhancement. I suppose this will depend on the result of the investigations.

@RogerWszolek
Copy link

Hey, I saw this post and found it very interesting. I'm going to put some research into this unless any of you has a reason that I shouldn't.

@jeremiedbb
Copy link
Member Author

sure go ahead !

@Higgs32584
Copy link
Contributor

Any updates

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants
0