8000 PERF: Prevent using copy in astype if not needed by jrings · Pull Request #4575 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

PERF: Prevent using copy in astype if not needed #4575

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from

Conversation

jrings
Copy link
@jrings jrings commented Apr 11, 2015

I went over all instances of astype in the code and, if a copy was not needed because it was assigned back to the same variable, used utils.fixes.astype with copy=False.

Resolves #4573

8000
@jrings jrings closed this Apr 11, 2015
@jrings jrings reopened this Apr 11, 2015
@jnothman
Copy link
Member

Sometimes the copy will have been intentional. Should we merely hope that
Travis catches these?

On 12 April 2015 at 08:14, Joerg Rings notifications@github.com wrote:

Reopened #4575 #4575.


Reply to this email directly or view it on GitHub
#4575 (comment).

@jrings
Copy link
Author
jrings commented Apr 12, 2015

So far, the only instances I replaced were those where a variable was
reassigned to the same variable - so not using a copy should be fine,
right?
On Apr 11, 2015 6:00 PM, "jnothman" notifications@github.com wrote:

Sometimes the copy will have been intentional. Should we merely hope that
Travis catches these?

On 12 April 2015 at 08:14, Joerg Rings notifications@github.com wrote:

Reopened #4575 #4575.


Reply to this email directly or view it on GitHub
<#4575 (comment)
.


Reply to this email directly or view it on GitHub
#4575 (comment)
.

@jnothman
Copy link
Member

Depends if it is then modified (and not part of the output, which is almost
certainly tested, while modifications to the input are not always tested)

On 12 April 2015 at 12:48, Joerg Rings notifications@github.com wrote:

So far, the only instances I replaced were those where a variable was
reassigned to the same variable - so not using a copy should be fine,
right?
On Apr 11, 2015 6:00 PM, "jnothman" notifications@github.com wrote:

Sometimes the copy will have been intentional. Should we merely hope that
Travis catches these?

On 12 April 2015 at 08:14, Joerg Rings notifications@github.com wrote:

Reopened #4575 <#4575
.


Reply to this email directly or view it on GitHub
<
#4575 (comment)
.


Reply to this email directly or view it on GitHub
<
#4575 (comment)

.


Reply to this email directly or view it on GitHub
#4575 (comment)
.

@ogrisel
Copy link
Member
ogrisel commented Apr 12, 2015

I think that when the copy is intended, it should be made explicit with:

from sklearn.utils.fixes import astype

...

X = astype(X, dtype, copy=True)

@jnothman
Copy link
Member

+1 for always using the version from fixes.

On 13 April 2015 at 04:36, Olivier Grisel notifications@github.com wrote:

I think that when the copy is intended, it should be made explicit with:

from sklearn.utils.fixes import astype

...

X = astype(X, dtype, copy=True)


Reply to this email directly or view it on GitHub
#4575 (comment)
.

@amueller
Copy link
Member

@jrings why would assigning it to the same variable be related to whether a copy was intentional? The question is whether we overwrite data that the user passed in and that the user might expect to be usable afterwards.

@amueller
Copy link
Member

Fails on python3 on sparse matrices?

@ogrisel
Copy link
Member
ogrisel commented Apr 13, 2015

It's not related to Python 3, it fails because the last job is on recent versions of numpy were copy=False is supported and therefore our backport breaks.

@amueller
Copy link
Member

Ahhh... right, makes sense.

@ogrisel
Copy link
Member
ogrisel commented Apr 24, 2015

So based on the discussion from #4585, we should:

  • use sklearn.utils.fixes's astype(X, dtype=the_expected_dtype) whenever X is known to be a numpy array,
  • use our check_array(X, dtype=the_expected_dtype, accept_sparse=True) whenever X can also be a scipy sparse matrix (check the docstring of the function to decide).

@jrings are you interested in tacking this or would you like someone else to take over this PR?

@ogrisel
Copy link
Member
ogrisel commented Apr 24, 2015

BTW the above means that we should extend check_array to add support for accept_sparse=True to let sparse matrices go through without mentioning the kinds of sparse matrices we accept (pass-through if the dtype is ok).

@ogrisel
Copy link
Member
ogrisel commented Apr 24, 2015

If @jrings is not available for finishing this maybe @sseg who was involved in #4585 might be interested.

@jrings
Copy link
Author
jrings commented Apr 24, 2015

I'll take a look on the weekend, I should have time, if not I'll let you
know!

On Fri, Apr 24, 2015 at 11:02 AM, Olivier Grisel notifications@github.com
wrote:

If @jrings https://github.com/jrings is not available for finishing
this maybe @sseg https://github.com/sseg who was involved in #4585
#4585 might be
interested.


Reply to this email directly or view it on GitHub
#4575 (comment)
.

@ogrisel
Copy link
Member
ogrisel commented Apr 24, 2015

Thanks.

@jrings
Copy link
Author
jrings commented Apr 25, 2015

I'm just gonna start over from a fresh branch off master and do a separate PR since I messed up the first attempts...is that ok?

@jrings
Copy link
Author
jrings commented Apr 26, 2015

Actually, I don't think I'll be able to find time, a had a quick look yesterday but won't have more time and have to work on something else today. So if someone wants to jump in?

@agramfort
Copy link
Member

taken over by #4645 closing.

@agramfort agramfort closed this Apr 29, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Audit our usage of numpy.astype to remove unecessary memory copies
5 participants
0