8000 ENH Improves memory usage for standard scalar by thomasjpfan · Pull Request #20652 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

ENH Improves memory usage for standard scalar #20652

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

thomasjpfan
Copy link
Member
@thomasjpfan thomasjpfan commented Aug 1, 2021

Reference Issues/PRs

Closes #5651

What does this implement/fix? Explain your changes.

This PR:

  1. More intelligently chooses between np.mean and np.nanmean. The nan mask is always computed when calling np.nanmean
  2. Computes the correction first and then the new normalized variance using a temp variable.

Any other comments?

With the script in #5651:

import numpy as np
from sklearn.preprocessing import StandardScaler


shape = (2 ** 16, 2 ** 12)
big = np.random.RandomState(0).uniform(-1, 1, size=shape)
scaler = StandardScaler().fit(big)

And running on main and this PR:

mprof run standard_scaler_bench.py

I get the following plot:

just_fit

Where the black line is this PR and the blue line is main. We see that this PR reduces the compute time and memory usage.

CC @jeremiedbb

Copy link
Member
@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks to have look at this issue.

Copy link
Member
@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great to have this fixed. I find it strange that nansum etc do not have more efficient implementation

@glemaitre glemaitre merged commit f812e2a into scikit-learn:main Aug 5, 2021
samronsin pushed a commit to samronsin/scikit-learn that referenced this pull request Nov 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Is it possible to reduce StandardScaler.fit() memory consumption?
3 participants
0