8000 ENH: add additional features to nan_to_num by kikocorreoso · Pull Request #9355 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

ENH: add additional features to nan_to_num #9355

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

ENH: add additional features to nan_to_num #9355

wants to merge 1 commit into from

Conversation

kikocorreoso
Copy link
Contributor
@kikocorreoso kikocorreoso commented Jul 3, 2017

Add some special features to nan_to_num in order to allow a user define the values to be used to fill the inexact values.

The main idea would be to allow the user to pass some user defined values so somethink like this:

>>> import numpy as np
>>> np.set_printoptions(precision=8)
>>> x = np.array([np.inf, -np.inf, np.nan, -128, 128])
>>> np.nan_to_num(x)
array([  1.79769313e+308,  -1.79769313e+308,   0.00000000e+000,
        -1.28000000e+002,   1.28000000e+002])
>>> np.nan_to_num(x, nan_fill_value=-999, posinf_fill_value=-999, neginf_fill_value=-999)
array([  -999.,  -999.,   -999., -128.,   128.])

Would be possible without modifying the actual behavior of the original function.

The PR doesn't have tests or other stuff as I prefer to ask before. Also, [nan/posinf/neginf]_fill_value is horrible naming so if the PR is interesting you can propose other keyword names.

Rationale:

Maybe there exists something in numpy that already does this but I couldn't find it. It is pretty easy to do this with very few code but maybe modifying slightly an already existing function would allow to provide this functionality.

The main idea is to allow to use it in functions like np.savetxt or when you do not need so much precision int64/float64 and you use x.astype(np.int32) in a more controlled way specifying the values to use to convert a nan to an int32.

Please, comment in order to work on it or discard it.

@eric-wieser
Copy link
Member
eric-wieser commented Jul 3, 2017

Note that you can already write this as a single expression of comparable length with:

np.select([np.isnan(x), np.isposinf(x), np.isneginf(x)], [-999, 999, -999], x)

So I'm -0.5 on this, as "There should be one-- and preferably only one --obvious way to do it."

Copy link
Contributor
@madphysicist madphysicist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this PR. It is a simple change that does not break backwards compatibility and gives a good amount of added functionality that I personally find myself needing on occasion. There are a few changes that are necessary to make the function work properly. I would also request that you add some tests for the new functionality (especially one for setting the infs to 0.0).

@@ -329,7 +329,10 @@ def _getmaxmin(t):
f = getlimits.finfo(t)
return f.max, f.min

def nan_to_num(x, copy=True):
def nan_to_num(x, copy=True,
nan_fill_value=None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably just be 0.0, not None. Using None adds no value to the code and just adds an unnecessary branch in the function.

nan_fill_value : int, float, optional
Value to be used to fill NaN values. If no value is passed
then NaN values will be replaced with 0.0.
posinf_fill_value : int, float, optional
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I am not super happy with this being the default behavior to begin with, you have to keep it like this to preserve backwards compatibility. At least there is now a way to preserve infinities while removing NaNs.

@@ -392,9 +407,14 @@ def nan_to_num(x, copy=True):

x = x[None] if isscalar else x
dest = (x.real, x.imag) if iscomplex else (x,)
nanv = nan_fill_value if nan_fill_value else 0.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this (the default should be 0.0 as per the comment above).

maxf, minf = _getmaxmin(x.real.dtype)
if posinf_fill_value:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be if posinf_fill_value is not None: .... Otherwise you remove the user's ability to replace infinities with zeros.

maxf, minf = _getmaxmin(x.real.dtype)
if posinf_fill_value:
maxf = posinf_fill_value
if neginf_fill_value:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explicitly test for None, same as for positive infinities.

@madphysicist
Copy link
Contributor

@eric-wieser While I agree with you that there should only be one obvious way to do this, but I think this PR solves a fundamental issue with an existing function. What might be worth doing is replacing that for loop with the select statement you mentioned, but I think that the changes in and of themselves are useful. In particular, this allows you to not replace the infinities while replacing NaNs, as I mentioned in a previous comment.

@eric-wieser
Copy link
Member
eric-wieser commented Jul 3, 2017

Looking at that for loop - nan_to_num has different and frankly weird behaviour for complex variables. In my mind, if part of a complex number is non-finite, you probably don't want to replace it with anything else. For instance:

>>> np.nan_to_num(no.complex64(np.nan*(1 + 1j)), nan_fill_value=-999)
-999 - 999j

@kikocorreoso
Copy link
Contributor Author

Thanks @eric-wieser and @madphysicist for your time.

For me nan_to_num is very useful but also very limited. The actual state of the PR is just a proof of concept to agree if it is worth to include the extended functionality in numpy.

Pd: I'm not so well versed on numpy and for me the np.select way is not so evident :-(

@madphysicist
Copy link
Contributor

Since the function already exists, we may as well improve it. @kikocorreoso, don't worry about not being familiar with select. Take your time to play around with it as long as you need until you are happy with how it works. Also, don't forget to write tests for any changes you make!

@eric-wieser
Copy link
Member
eric-wieser commented Jul 4, 2017

The main idea is to allow to [...] use x.astype(np.int32) in a more controlled way specifying the values to use to convert a nan to an int32.

This only creates a trap - all you've solved is controlled casting of float to np.int32 for infinities - large numbers still behave as badly as infinity did:

# note that many of these casts invoke undefined behaviour, so your results may vary
>>> a = np.array([np.inf, np.finfo(float).max]); a
array([              inf,   1.79769313e+308])
>>> a.astype(np.int32)  # undefined behaviour
array([-2147483648, -2147483648])
>>> np.nan_to_num(a).astype(np.int32) # makes no difference on my machine, may on yours
array([-2147483648, -2147483648])
>>> np.nan_to_num(x, posinf_fill_value=-999).astype(np.int32) # with this patch
array([-999, -2147483648])   # only half solves the problem

Can you give a more detailed example of the problem you're trying to solve here?

@charris charris changed the title add additional features to nan_to_num ENH: add additional features to nan_to_num Jul 5, 2017
@madphysicist
Copy link
Contributor

@eric-wieser. I see your point and I agree with you. Perhaps a better way to do this would be to introduce a dtype parameter, which would default to the type of the input array. You could then spec-out much more clearly how you would want your function to behave for out-of-bounds and NaN elements. I think that would give this function a real reason to exist.

@charris
Copy link
Member
charris commented Jul 5, 2017

Needs rebase.

@kikocorreoso
Copy link
Contributor Author

Hi all,

I've been busy the latest days and 3 days travelling this week. I hope to come back to this in some days to provide a new PR with the suggestions and use cases.

@eric-wieser eric-wieser marked this as a duplicate of #9356 Jul 28, 2017
@ahaldane
Copy link
Member

Reviewing this as part of my 10 random issues, as requested in a 3/20/19 request.

I like the idea. This just needs to be finished up and rebased. @kikocorreoso are you still available to finish?

To finish, @madphysicist's comments need to be addressed. Also, I personally think the kwd names should be shortened to just 'nan', 'posinf', and 'neginf', so you would do np.nan_to_num(x, nan=1.0).

@kikocorreoso
Copy link
Contributor Author

Hi @ahaldane

Yes, I can have a look again on this and try to address all the issues you and @madphysicist commented.

I will try to have a look this week.

kikocorreoso pushed a commit to kikocorreoso/numpy that referenced this pull request Mar 30, 2019
@kikocorreoso
Copy link
Contributor Author

I created a new PR (#13219) as I was not able to figure out how to commit to this one as my original branch was removed.

Please, let me know if I should do it in a more sane way or if I should close this PR and continue the conversation on #13219 .

ahaldane added a commit that referenced this pull request Apr 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0