-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
ENH: add additional features to nan_to_num #9355
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Note that you can already write this as a single expression of comparable length with: np.select([np.isnan(x), np.isposinf(x), np.isneginf(x)], [-999, 999, -999], x) So I'm -0.5 on this, as "There should be one-- and preferably only one --obvious way to do it." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this PR. It is a simple change that does not break backwards compatibility and gives a good amount of added functionality that I personally find myself needing on occasion. There are a few changes that are necessary to make the function work properly. I would also request that you add some tests for the new functionality (especially one for setting the infs to 0.0).
@@ -329,7 +329,10 @@ def _getmaxmin(t): | |||
f = getlimits.finfo(t) | |||
return f.max, f.min | |||
|
|||
def nan_to_num(x, copy=True): | |||
def nan_to_num(x, copy=True, | |||
nan_fill_value=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably just be 0.0, not None
. Using None
adds no value to the code and just adds an unnecessary branch in the function.
nan_fill_value : int, float, optional | ||
Value to be used to fill NaN values. If no value is passed | ||
then NaN values will be replaced with 0.0. | ||
posinf_fill_value : int, float, optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I am not super happy with this being the default behavior to begin with, you have to keep it like this to preserve backwards compatibility. At least there is now a way to preserve infinities while removing NaNs.
@@ -392,9 +407,14 @@ def nan_to_num(x, copy=True): | |||
|
|||
x = x[None] if isscalar else x | |||
dest = (x.real, x.imag) if iscomplex else (x,) | |||
nanv = nan_fill_value if nan_fill_value else 0.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this (the default should be 0.0 as per the comment above).
maxf, minf = _getmaxmin(x.real.dtype) | ||
if posinf_fill_value: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be if posinf_fill_value is not None: ...
. Otherwise you remove the user's ability to replace infinities with zeros.
maxf, minf = _getmaxmin(x.real.dtype) | ||
if posinf_fill_value: | ||
maxf = posinf_fill_value | ||
if neginf_fill_value: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explicitly test for None, same as for positive infinities.
@eric-wieser While I agree with you that there should only be one obvious way to do this, but I think this PR solves a fundamental issue with an existing function. What might be worth doing is replacing that |
Looking at that for loop -
|
Thanks @eric-wieser and @madphysicist for your time. For me Pd: I'm not so well versed on numpy and for me the |
Since the function already exists, we may as well improve it. @kikocorreoso, don't worry about not being familiar with |
This only creates a trap - all you've solved is controlled casting of # note that many of these casts invoke undefined behaviour, so your results may vary
>>> a = np.array([np.inf, np.finfo(float).max]); a
array([ inf, 1.79769313e+308])
>>> a.astype(np.int32) # undefined behaviour
array([-2147483648, -2147483648])
>>> np.nan_to_num(a).astype(np.int32) # makes no difference on my machine, may on yours
array([-2147483648, -2147483648])
>>> np.nan_to_num(x, posinf_fill_value=-999).astype(np.int32) # with this patch
array([-999, -2147483648]) # only half solves the problem Can you give a more detailed example of the problem you're trying to solve here? |
@eric-wieser. I see your point and I agree with you. Perhaps a better way to do this would be to introduce a |
Needs rebase. |
Hi all, I've been busy the latest days and 3 days travelling this week. I hope to come back to this in some days to provide a new PR with the suggestions and use cases. |
Reviewing this as part of my 10 random issues, as requested in a 3/20/19 request. I like the idea. This just needs to be finished up and rebased. @kikocorreoso are you still available to finish? To finish, @madphysicist's comments need to be addressed. Also, I personally think the kwd names should be shortened to just 'nan', 'posinf', and 'neginf', so you would do |
Hi @ahaldane Yes, I can have a look again on this and try to address all the issues you and @madphysicist commented. I will try to have a look this week. |
ENH: nan_to_num keyword addition (was #9355)
Add some special features to
nan_to_num
in order to allow a user define the values to be used to fill the inexact values.The main idea would be to allow the user to pass some user defined values so somethink like this:
Would be possible without modifying the actual behavior of the original function.
The PR doesn't have tests or other stuff as I prefer to ask before. Also,
[nan/posinf/neginf]_fill_value
is horrible naming so if the PR is interesting you can propose other keyword names.Rationale:
Maybe there exists something in numpy that already does this but I couldn't find it. It is pretty easy to do this with very few code but maybe modifying slightly an already existing function would allow to provide this functionality.
The main idea is to allow to use it in functions like
np.savetxt
or when you do not need so much precisionint64
/float64
and you usex.astype(np.int32)
in a more controlled way specifying the values to use to convert anan
to anint32
.Please, comment in order to work on it or discard it.