-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
Uniform columns return a standard deviation of 1 in StandardScaler #4609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
StandardScaler.fit is using sklearn.preprocessing.data._mean_and_std, this docstring may be a hint to explain this behaviour. |
Yeah, it is a bit awkward but setting it to 0 is also awkward. I am a bit conflicted about what would be best here. |
0 is the correct answer, so I find it less awkward than 1. Returning 0, but handling it internally to avoid NaNs, and throwing a warning would be my preference. Adding functionality to optionally remove those columns would be even better. |
what do you mean by "the correct answer"? This is not a function to compute the standard deviation, you can use np.std for that. This is the "internal handling" to get the desired scaling. I feel that the promise here is that scaler.transform(X) == (X - scaler.mean_) / scaler.std_ and not that scaler.mean_ == np.mean(X, axis=0)
scaler.std_ == np.std(X, axis=0) I don't see what the usefulness of the second contract would be. Optionally removing these columns would be a nice addition. PR welcome. |
By correct I mean reporting the calculated standard deviation (in Latex, sorry) I'm new to sklearn. That PR may be a while. |
well, I agree that if we name the attribute |
yes, it is #3639 |
Fixed by #4796. |
It should be 0.
The text was updated successfully, but these errors were encountered: