ENH: add additional features to nan_to_num #9355

kikocorreoso · 2017-07-03T13:41:53Z

Add some special features to nan_to_num in order to allow a user define the values to be used to fill the inexact values.

The main idea would be to allow the user to pass some user defined values so somethink like this:

>>> import numpy as np
>>> np.set_printoptions(precision=8)
>>> x = np.array([np.inf, -np.inf, np.nan, -128, 128])
>>> np.nan_to_num(x)
array([  1.79769313e+308,  -1.79769313e+308,   0.00000000e+000,
        -1.28000000e+002,   1.28000000e+002])
>>> np.nan_to_num(x, nan_fill_value=-999, posinf_fill_value=-999, neginf_fill_value=-999)
array([  -999.,  -999.,   -999., -128.,   128.])

Would be possible without modifying the actual behavior of the original function.

The PR doesn't have tests or other stuff as I prefer to ask before. Also, [nan/posinf/neginf]_fill_value is horrible naming so if the PR is interesting you can propose other keyword names.

Rationale:

Maybe there exists something in numpy that already does this but I couldn't find it. It is pretty easy to do this with very few code but maybe modifying slightly an already existing function would allow to provide this functionality.

The main idea is to allow to use it in functions like np.savetxt or when you do not need so much precision int64/float64 and you use x.astype(np.int32) in a more controlled way specifying the values to use to convert a nan to an int32.

Please, comment in order to work on it or discard it.

eric-wieser · 2017-07-03T16:06:05Z

Note that you can already write this as a single expression of comparable length with:

np.select([np.isnan(x), np.isposinf(x), np.isneginf(x)], [-999, 999, -999], x)

So I'm -0.5 on this, as "There should be one-- and preferably only one --obvious way to do it."

madphysicist

I like this PR. It is a simple change that does not break backwards compatibility and gives a good amount of added functionality that I personally find myself needing on occasion. There are a few changes that are necessary to make the function work properly. I would also request that you add some tests for the new functionality (especially one for setting the infs to 0.0).

madphysicist · 2017-07-03T15:18:15Z

numpy/lib/type_check.py

@@ -329,7 +329,10 @@ def _getmaxmin(t):
    f = getlimits.finfo(t)
    return f.max, f.min

-              def nan_to_num(x, copy=True):

            
+def nan_to_num(x, copy=True, 
+               nan_fill_value=None, 


This should probably just be 0.0, not None. Using None adds no value to the code and just adds an unnecessary branch in the function.

madphysicist · 2017-07-03T15:23:58Z

numpy/lib/type_check.py

+    nan_fill_value : int, float, optional
+        Value to be used to fill NaN values. If no value is passed 
+        then NaN values will be replaced with 0.0.
+    posinf_fill_value : int, float, optional


While I am not super happy with this being the default behavior to begin with, you have to keep it like this to preserve backwards compatibility. At least there is now a way to preserve infinities while removing NaNs.

madphysicist · 2017-07-03T15:24:30Z

numpy/lib/type_check.py

@@ -392,9 +407,14 @@ def nan_to_num(x, copy=True):

    x = x[None] if isscalar else x
    dest = (x.real, x.imag) if iscomplex else (x,)
+    nanv = nan_fill_value if nan_fill_value else 0.0


Remove this (the default should be 0.0 as per the comment above).

madphysicist · 2017-07-03T15:25:35Z

numpy/lib/type_check.py

    maxf, minf = _getmaxmin(x.real.dtype)
+    if posinf_fill_value:


Should be if posinf_fill_value is not None: .... Otherwise you remove the user's ability to replace infinities with zeros.

madphysicist · 2017-07-03T15:26:02Z

numpy/lib/type_check.py

    maxf, minf = _getmaxmin(x.real.dtype)
+    if posinf_fill_value:
+        maxf = posinf_fill_value
+    if neginf_fill_value:


Explicitly test for None, same as for positive infinities.

madphysicist · 2017-07-03T17:06:39Z

@eric-wieser While I agree with you that there should only be one obvious way to do this, but I think this PR solves a fundamental issue with an existing function. What might be worth doing is replacing that for loop with the select statement you mentioned, but I think that the changes in and of themselves are useful. In particular, this allows you to not replace the infinities while replacing NaNs, as I mentioned in a previous comment.

eric-wieser · 2017-07-03T17:27:36Z

Looking at that for loop - nan_to_num has different and frankly weird behaviour for complex variables. In my mind, if part of a complex number is non-finite, you probably don't want to replace it with anything else. For instance:

>>> np.nan_to_num(no.complex64(np.nan*(1 + 1j)), nan_fill_value=-999)
-999 - 999j

kikocorreoso · 2017-07-03T18:15:53Z

Thanks @eric-wieser and @madphysicist for your time.

For me nan_to_num is very useful but also very limited. The actual state of the PR is just a proof of concept to agree if it is worth to include the extended functionality in numpy.

Pd: I'm not so well versed on numpy and for me the np.select way is not so evident :-(

madphysicist · 2017-07-03T18:18:56Z

Since the function already exists, we may as well improve it. @kikocorreoso, don't worry about not being familiar with select. Take your time to play around with it as long as you need until you are happy with how it works. Also, don't forget to write tests for any changes you make!

eric-wieser · 2017-07-04T10:33:57Z

The main idea is to allow to [...] use x.astype(np.int32) in a more controlled way specifying the values to use to convert a nan to an int32.

This only creates a trap - all you've solved is controlled casting of float to np.int32 for infinities - large numbers still behave as badly as infinity did:

# note that many of these casts invoke undefined behaviour, so your results may vary
>>> a = np.array([np.inf, np.finfo(float).max]); a
array([              inf,   1.79769313e+308])
>>> a.astype(np.int32)  # undefined behaviour
array([-2147483648, -2147483648])
>>> np.nan_to_num(a).astype(np.int32) # makes no difference on my machine, may on yours
array([-2147483648, -2147483648])
>>> np.nan_to_num(x, posinf_fill_value=-999).astype(np.int32) # with this patch
array([-999, -2147483648])   # only half solves the problem

Can you give a more detailed example of the problem you're trying to solve here?

madphysicist · 2017-07-05T15:13:41Z

@eric-wieser. I see your point and I agree with you. Perhaps a better way to do this would be to introduce a dtype parameter, which would default to the type of the input array. You could then spec-out much more clearly how you would want your function to behave for out-of-bounds and NaN elements. I think that would give this function a real reason to exist.

charris · 2017-07-05T21:36:37Z

Needs rebase.

kikocorreoso · 2017-07-11T06:17:03Z

Hi all,

I've been busy the latest days and 3 days travelling this week. I hope to come back to this in some days to provide a new PR with the suggestions and use cases.

ahaldane · 2019-03-23T19:57:03Z

Reviewing this as part of my 10 random issues, as requested in a 3/20/19 request.

I like the idea. This just needs to be finished up and rebased. @kikocorreoso are you still available to finish?

To finish, @madphysicist's comments need to be addressed. Also, I personally think the kwd names should be shortened to just 'nan', 'posinf', and 'neginf', so you would do np.nan_to_num(x, nan=1.0).

kikocorreoso · 2019-03-24T09:31:17Z

Hi @ahaldane

Yes, I can have a look again on this and try to address all the issues you and @madphysicist commented.

I will try to have a look this week.

kikocorreoso · 2019-03-30T15:59:30Z

I created a new PR (#13219) as I was not able to figure out how to commit to this one as my original branch was removed.

Please, let me know if I should do it in a more sane way or if I should close this PR and continue the conversation on #13219 .

ENH: nan_to_num keyword addition (was #9355)

add additional features to nan_to_num

5e1de84

kikocorreoso mentioned this pull request Jul 3, 2017

[ENH] nan_to_num enhancement to fill inexact values with user provided values #9356

Closed

madphysicist suggested changes Jul 3, 2017

View reviewed changes

eric-wieser added 01 - Enhancement component: numpy.lib labels Jul 4, 2017

charris changed the title ~~add additional features to nan_to_num~~ ENH: add additional features to nan_to_num Jul 5, 2017

eric-wieser marked this as a duplicate of #9356 Jul 28, 2017

eric-wieser added the 55 - Needs work label Nov 3, 2017

kikocorreoso pushed a commit to kikocorreoso/numpy that referenced this pull request Mar 30, 2019

address comments on numpy#9355

4b1d5cb

kikocorreoso mentioned this pull request Mar 30, 2019

ENH: nan_to_num keyword addition (was #9355) #13219

Merged

ahaldane added a commit that referenced this pull request Apr 9, 2019

Merge pull request #13219 from kikocorreoso/nan_to_num-enhancement

d6a8cab

ENH: nan_to_num keyword addition (was #9355)

kikocorreoso closed this Apr 10, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: add additional features to nan_to_num #9355

ENH: add additional features to nan_to_num #9355

ENH: add additional features to nan_to_num #9355

ENH: add additional features to nan_to_num #9355

Conversation

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment