-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Add sample_weight support to StandardScaler (and friends?) #15601
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Touching the cython code doesn't look that hard, but it would require deciding if we want to duplicate the functions to handle the weighted case, or whether we can reuse the exisiting code paths (risking loss in both efficiency and precision). |
In #14696 we duplicate. I think it's worth it as long as the function is simple enough? |
This would also help a "sparse" solution of #3702. |
Actually, I'm using a numpy/scipy based helper in my PR and I think this should be fine as long as we don't need inplace? |
I can take that issue |
Computing weighted mean and variance (for non-sparse matrix) in partial_fit requires applying _incremental_mean_and_var twice or developing new (to my knowledge) method similar to _incremental_mean_and_var, but with sample_weight (like here). I believe latter option is far better and should be dealt with another issue. What do you think, guys? |
I'm okay with the latter proposal, but I'd need to see a pull request to be
confident
|
I think we should just use a non-cython variant as I did here: |
But we want StandardScaler to compute stats in incremental (batch) manner, which is, I believe, quite different. |
Why would this be useful for SS? How are the weights used? |
Via #15583:
it would be nice to have
sample_weight
support inStandardScaler
. I don't see how to do that for sparse data right now, though (without touching the Cython code).The text was updated successfully, but these errors were encountered: