8000 BUG: quantile 1 with 'weibull' gives confusing error · Issue #24592 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

BUG: quantile 1 with 'weibull' gives confusing error #24592

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicki 8000 ng “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
matthew-brett opened this issue Aug 30, 2023 · 9 comments · Fixed by #24710
Closed

BUG: quantile 1 with 'weibull' gives confusing error #24592

matthew-brett opened this issue Aug 30, 2023 · 9 comments · Fixed by #24710
Labels

Comments

@matthew-brett
Copy link
Contributor
matthew-brett commented Aug 30, 2023

Describe the issue:

Asking for the 1 quantile with weibull gives a confusing error - see below.

Reproduce the code example:

import numpy as np

np.quantile(np.arange(10), 1, method='weibull')

Error message:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In [57], line 1
----> 1 np.quantile(np.arange(10), 1, method='weibull')

File <__array_function__ internals>:180, in quantile(*args, **kwargs)

File ~/Library/Python/3.10/lib/python/site-packages/numpy/lib/function_base.py:4412, in quantile(a, q, axis, out, overwrite_input, method, keepdims, interpolation)
   4410 if not _quantile_is_valid(q):
   4411     raise ValueError("Quantiles must be in the range [0, 1]")
-> 4412 return _quantile_unchecked(
   4413     a, q, axis, out, overwrite_input, method, keepdims)

File ~/Library/Python/3.10/lib/python/site-packages/numpy/lib/function_base.py:4424, in _quantile_unchecked(a, q, axis, out, overwrite_input, method, keepdims)
   4416 def _quantile_unchecked(a,
   4417                         q,
   4418                         axis=None,
   (...)
   4421                         method="linear",
   4422                         keepdims=False):
   4423     """Assumes that q is in [0, 1], and is an ndarray"""
-> 4424     r, k = _ureduce(a,
   4425                     func=_quantile_ureduce_func,
   4426                     q=q,
   4427                     axis=axis,
   4428                     out=out,
   4429                     overwrite_input=overwrite_input,
   4430                     method=method)
   4431     if keepdims:
   4432         return r.reshape(q.shape + k)

File ~/Library/Python/3.10/lib/python/site-packages/numpy/lib/function_base.py:3725, in _ureduce(a, func, **kwargs)
   3722 else:
   3723     keepdim = (1,) * a.ndim
-> 3725 r = func(a, **kwargs)
   3726 return r, keepdim

File ~/Library/Python/3.10/lib/python/site-packages/numpy/lib/function_base.py:4593, in _quantile_ureduce_func(a, q, axis, out, overwrite_input, method)
   4591     else:
   4592         arr = a.copy()
-> 4593 result = _quantile(arr,
   4594                    quantiles=q,
   4595                    axis=axis,
   4596                    method=method,
   4597                    out=out)
   4598 return result

File ~/Library/Python/3.10/lib/python/site-packages/numpy/lib/function_base.py:4683, in _quantile(arr, quantiles, axis, method, out)
   4680     slices_having_nans = np.isnan(arr[-1])
   4681 else:
   4682     # cannot contain nan
-> 4683     arr.partition(virtual_indexes.ravel(), axis=0)
   4684     slices_having_nans = np.array(False, dtype=bool)
   4685 result = take(arr, virtual_indexes, axis=0, out=out)

ValueError: kth(=10) out of bounds (10)


### Runtime information:

[1] import sys, numpy; print(numpy.version); print(sys.version)
1.26.0b1
3.10.11 (main, Apr 7 2023, 07:24:53) [Clang 14.0.0 (clang-1400.0.29.202)]
In [2]: print(numpy.show_runtime())
WARNING: threadpoolctl not found in system! Install it by pip install threadpoolctl. Once installed, try np.show_runtime again for more detailed build information
[{'numpy_version': '1.26.0b1',
'python': '3.10.11 (main, Apr 7 2023, 07:24:53) [Clang 14.0.0 '
'(clang-1400.0.29.202)]',
'uname': uname_result(system='Darwin', node='nipraxis.dynevor.org', release='22.6.0', version='Darwin Kernel Version 22.6.0: Wed Jul 5 22:17:35 PDT 2023; root:xnu-8796.141.3~6/RELEASE_ARM64_T8112', machine='arm64')},
{'simd_extensions': {'baseline': ['NEON', 'NEON_FP16', 'NEON_VFPV4', 'ASIMD'],
'found': ['ASIMDHP'],
'not_found': ['ASIMDFHM']}}]


### Context for the issue:

Writing a tutorial on quantiles, and failing to understand them!
@eendebakpt
Copy link
Contributor

The problem is you are passing 1 as an integer argument, while the documentation states q : array_like of float
The output of quantile(x, 1., method='weibull') is correct: 9.

For input zero however, the results are:

In[83]: quantile(x, 0., method='weibull')
Out[83]: 0.0

In[84]: quantile(x, 0, method='weibull')
Out[84]: 9

The fact that we get an incorrect answer when passing 0, I consider to be a bug in the implementation. I will look if this can be easily addressed.

@eendebakpt
Copy link
Contributor

There are a couple more cases affected:

import numpy as np
x=np.arange(10)

for m in ['inverted_cdf', 'averaged_inverted_cdf', 'closest_observation', 
          'interpolated_inverted_cdf', 'hazen', 'weibull', 'linear', 'median_unbiased', 'normal_unbiased']:
    for q in [0,1]:
        try:
            print(f'{m}: q {q}: {np.quantile(x, q, method=m)}')
        except:
            print(f'{m}: could not calculate q={q}')     

Output:

inverted_cdf: q 0: 0
inverted_cdf: q 1: 9
averaged_inverted_cdf: q 0: 9
averaged_inverted_cdf: q 1: 9
closest_observation: q 0: 0
closest_observation: q 1: 9
interpolated_inverted_cdf: q 0: 9
interpolated_inverted_cdf: q 1: 9
hazen: q 0: 0.0
hazen: q 1: 9.0
weibull: q 0: 9
weibull: could not calculate q=1
linear: q 0: 0
linear: q 1: 9
median_unbiased: q 0: 0.0
median_unbiased: q 1: 9.0
normal_unbiased: q 0: 0.0
normal_unbiased: q 1: 9.0

Two simple solutions:

  • If the argument q is of integer type, cast to float
  • If the argument q is of integer type, raise a deprecation warning or exception.

The problems arise in this code path: https://github.com/numpy/numpy/blob/main/numpy/lib/_function_base_impl.py#L4731. I suspect the special code path for integers is there for performance reasons.

@matthew-brett
Copy link
Contributor Author

Is there any reason not to convert to float? I mean - what is the advantage of the error?

@eendebakpt
Copy link
Contributor

Such a change would be backwards incompatible. For example:

from fractions import Fraction
import numpy as np

x=[Fraction(i) for i in range(10)]
print(np.quantile(x, 0)) # works 
print(np.quantile(x, 0.)) # fails

Not sure whether the Fraction is a use case that needs to be supported with float argument q. But there are tests for Fraction, that suggest we should support it in some way:

def test_fraction(self):

Adding @eric-wieser as the author of the tests in the CC.

@matthew-brett
Copy link
Contributor Author

Sorry @eendebakpt - just checking - you mean automatic conversion to float would be backwards incompatible?

Surely the current behavior can't be intentional, of failing with a floating point 0 (but not integer 0) for a fraction value? What would the purpose of such a failure be?

@eendebakpt
Copy link
Contributor

Yes, the automatic conversion would break one of the tests I linked to, so it would be backwards incompatible.

I do agree that the current behavior is strange, and perhaps something we can change. But that is up to a numpy core developer.

@matthew-brett
Copy link
Contributor Author

Just checking - but the point here is that e.g. np.quantile(Fraction(5), 0) gives Fraction(5, 1), but gives an error for np.quantile(Fraction(5), 0.). I'm not sure at all what anyone would use np.quantile(Fraction(5), 0) for. I notice that you get the same answer for np.quantile(Fraction(5), 1). Why would np.quantile(Fraction(5), 0.) be expected to raise an error? Isn't it surprising that you can only use integers for the second argument in that case?

@anirudh-hegde
Copy link
anirudh-hegde commented Oct 13, 2023 8000
import numpy as np

np.quantile(np.arange(10), 1, method='weibull')

I am interested tosolve this can you please assign me this issue

@eendebakpt
Copy link
Contributor
import numpy as np

np.quantile(np.arange(10), 1, method='weibull')

I am interested tosolve this can you please assign me this issue

@Anianonymous There already is a PRs to fix this issue: #24710. You can help to review the PR.

seberg pushed a commit that referenced this issue Feb 11, 2024
Issue #24592 was opened because the np.quantile method gives incorrect answers for the weibull method. The problem:

import numpy as np
x=np.arange(10)
np.quantile(x, 0., method='weibull') # output 0.0, correct
np.quantile(x, 0, method='weibull') # output 9, incorrect!
np.quantile(x, 1, method='weibull') # raises an error
The value of np.quantile(x, q=0, method='weibull') is incorrect for integer inputs. Two options to address this:

Do not allow integer input for np.quantile. The documentation for the probability argument states that "q : array_like of float", so this seems reasonable. But this would be a behaviour change that might have a large impact and there are unit tests that test for integer input as well.

Automatically cast the input argument q to float. Also this would be a behaviour change and break existing tests.

We rejected these two options because of the issues mentioned. Applying either one of the options only for weibull would make the input handling less uniform.

The problem with weibull (and some other methods) is that inside np.lib._function_base_impl._quantile there are two paths: one for integer virtual indices and one for other types of virtual indices (in the code the statement if np.issubdtype(virtual_indexes.dtype, np.integer):. The integer path is only valid for methods returning integer indices in the range [0, size(arr)>. For the weibull method something like q = [0, .5, 1] is cast to float, which results in float virtual_indices (which contains floats with integer values, but that is okay).

We can either:

i) Modify the code in the integer path to handle integers < 0 and > size(arr) -1
ii) Check whether the methods supports integer output, and if not use the alternative path

We opted for the second option.

Fixes #24592
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants
0