-
-
Notifications
You must be signed in to change notification settings - Fork 5.4k
MAINT: stats.DiscreteDistribution: fix inversion methods #22970
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I spent some time looking through this and I am happy that this matches what we discussed yesterday
Just a merge conflict to resolve |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I rebase this branch locally and resolve merge conflicts I consistently see a single test failure locally, reproducible via spin test -j 12 -t scipy/stats/tests/test_continuous.py::TestMakeDistribution::test_rv_generic --durations=10
.
___________________________________ TestMakeDistribution.test_rv_generic[128-distdata128] _______________________________________
[gw1] darwin -- Python 3.12.3 /Users/treddy/python_venvs/py_312_scipy_dev/bin/python3
self = <scipy.stats.tests.test_continuous.TestMakeDistribution object at 0x11b53ec60>, i = 128
distdata = ['nchypergeom_fisher', (140, 80, 60, 0.5)]
@pytest.mark.parametrize('i, distdata', enumerate(distcont + distdiscrete))
def test_rv_generic(self, i, distdata):
distname = distdata[0]
slow = {'argus', 'exponpow', 'exponweib', 'genexpon', 'gompertz', 'halfgennorm',
'johnsonsb', 'kappa4', 'ksone', 'kstwo', 'kstwobign', 'norminvgauss',
'powerlognorm', 'powernorm', 'recipinvgauss', 'studentized_range',
'vonmises_line', # continuous
'skellam'} # discrete
if not int(os.environ.get('SCIPY_XSLOW', '0')) and distname in slow:
pytest.skip('Skipping as XSLOW')
if distname in { # skip these distributions
'levy_stable', # private methods seem to require >= 1d args
'vonmises', # circular distribution; shouldn't work
'poisson_binom', # vector shape parameter
'hypergeom', # distribution functions need interpolation
}:
return
# skip single test, mostly due to slight disagreement
custom_tolerances = {'ksone': 1e-5, 'kstwo': 1e-5} # discontinuous PDF
skip_entropy = {'kstwobign', 'pearson3'} # tolerance issue
skip_skewness = {'exponpow', 'ksone', 'nchypergeom_wallenius'} # tolerance
skip_kurtosis = {'chi', 'exponpow', 'invgamma', # tolerance
'johnsonsb', 'ksone', 'kstwo', # tolerance
'nchypergeom_wallenius'} # tolerance
skip_logccdf = {'arcsine', 'skewcauchy', 'trapezoid', 'triang'} # tolerance
skip_raw = {2: {'alpha', 'foldcauchy', 'halfcauchy', 'levy', 'levy_l'},
3: {'pareto'}, # stats.pareto is just wrong
4: {'invgamma'}} # tolerance issue
skip_standardized = {'exponpow', 'ksone', 'nchypergeom_wallenius'} # tolerances
dist = getattr(stats, distname)
params = dict(zip(dist.shapes.split(', '), distdata[1])) if dist.shapes else {}
rng = np.random.default_rng(7548723590230982)
CustomDistribution = stats.make_distribution(dist)
X = CustomDistribution(**params)
Y = dist(**params)
x = X.sample(shape=10, rng=rng)
p = X.cdf(x)
rtol = custom_tolerances.get(distname, 1e-7)
atol = 1e-12
with np.errstate(divide='ignore', invalid='ignore'):
m, v, s, k = Y.stats('mvsk')
assert_allclose(X.support(), Y.support())
if distname not in skip_entropy:
assert_allclose(X.entropy(), Y.entropy(), rtol=rtol)
if isinstance(Y, stats.rv_discrete):
# some continuous distributions have trouble with `logentropy` because
# it uses complex numbers
assert_allclose(np.exp(X.logentropy()), Y.entropy(), rtol=rtol)
assert_allclose(X.median(), Y.median(), rtol=rtol)
assert_allclose(X.mean(), m, rtol=rtol, atol=atol)
assert_allclose(X.variance(), v, rtol=rtol, atol=atol)
if distname not in skip_skewness:
assert_allclose(X.skewness(), s, rtol=rtol, atol=atol)
if distname not in skip_kurtosis:
assert_allclose(X.kurtosis(convention='excess'), k,
rtol=rtol, atol=atol)
if isinstance(dist, stats.rv_continuous):
assert_allclose(X.logpdf(x), Y.logpdf(x), rtol=rtol)
assert_allclose(X.pdf(x), Y.pdf(x), rtol=rtol)
else:
assert_allclose(X.logpmf(x), Y.logpmf(x), rtol=rtol)
assert_allclose(X.pmf(x), Y.pmf(x), rtol=rtol)
assert_allclose(X.logcdf(x), Y.logcdf(x), rtol=rtol)
assert_allclose(X.cdf(x), Y.cdf(x), rtol=rtol)
if distname not in skip_logccdf:
assert_allclose(X.logccdf(x), Y.logsf(x), rtol=rtol)
assert_allclose(X.ccdf(x), Y.sf(x), rtol=rtol)
# old infrastructure convention for ppf(p=0) and isf(p=1) is different than
# new infrastructure. Adjust reference values accordingly.
a, _ = Y.support()
ref_ppf = Y.ppf(p)
ref_ppf[p == 0] = a
ref_isf = Y.isf(p)
ref_isf[p == 1] = a
> assert_allclose(X.icdf(p), ref_ppf, rtol=rtol)
E AssertionError:
E Not equal to tolerance rtol=1e-07, atol=0
E
E Mismatched elements: 6 / 10 (60%)
E Max absolute difference among violations: 1.
E Max relative difference among violations: 0.03448276
E ACTUAL: array([30., 27., 30., 31., 28., 29., 29., 29., 32., 26.])
E DESIRED: array([31., 27., 31., 31., 29., 30., 30., 30., 32., 26.])
CustomDistribution = <class 'scipy.stats._distribution_infrastructure._make_distribution_rv_generic.<locals>.CustomDistribution'>
X = NoncentralHypergeometricFisher(M=np.float64(140.0), N=np.float64(60.0), n=np.float64(80.0), odds=np.float64(0.5))
Y = <scipy.stats._distn_infrastructure.rv_discrete_frozen object at 0x11b33f470>
_ = np.int64(60)
a = np.int64(0)
atol = 1e-12
custom_tolerances = {'ksone': 1e-05, 'kstwo': 1e-05}
dist = <scipy.stats._discrete_distns.nchypergeom_fisher_gen object at 0x11a73d430>
distdata = ['nchypergeom_fisher', (140, 80, 60, 0.5)]
distname = 'nchypergeom_fisher'
i = 128
k = np.float64(-0.015031959595980027)
m = np.float64(28.44642744936175)
p = array([0.76340047, 0.37233802, 0.76340047, 0.85618448, 0.50915771,
0.64451321, 0.64451321, 0.64451321, 0.92056245, 0.24967749])
params = {'M': 140, 'N': 60, 'n': 80, 'odds': 0.5}
ref_isf = array([26., 29., 26., 25., 28., 27., 27., 27., 24., 30.])
ref_ppf = array([31., 27., 31., 31., 29., 30., 30., 30., 32., 26.])
rng = Generator(PCG64) at 0x10C93A340
rtol = 1e-07
s = np.float64(0.025421969966314347)
self = <scipy.stats.tests.test_continuous.TestMakeDistribution object at 0x11b53ec60>
skip_entropy = {'kstwobign', 'pearson3'}
skip_kurtosis = {'chi', 'exponpow', 'invgamma', 'johnsonsb', 'ksone', 'kstwo', ...}
skip_logccdf = {'arcsine', 'skewcauchy', 'trapezoid', 'triang'}
skip_raw = {2: {'alpha', 'foldcauchy', 'halfcauchy', 'levy', 'levy_l'}, 3: {'pareto'}, 4: {'invgamma'}}
skip_skewness = {'exponpow', 'ksone', 'nchypergeom_wallenius'}
skip_standardized = {'exponpow', 'ksone', 'nchypergeom_wallenius'}
slow = {'argus', 'exponpow', 'exponweib', 'genexpon', 'gompertz', 'halfgennorm', ...}
v = np.float64(8.300922457539917)
x = array([30., 27., 30., 31., 28., 29., 29., 29., 32., 26.])
scipy/stats/tests/test_continuous.py:1189: AssertionError
The 3 slowest test durations on that call are also all > 12 seconds on a fairly recent M3 Mac:
======================================================= slowest 10 durations =======================================================
25.08s call build-install/usr/lib/python3.12/site-packages/scipy/stats/tests/test_continuous.py::TestMakeDistribution::test_rv_generic[120-distdata120]
13.45s call build-install/usr/lib/python3.12/site-packages/scipy/stats/tests/test_continuous.py::TestMakeDistribution::test_rv_generic[138-distdata138]
12.89s call build-install/usr/lib/python3.12/site-packages/scipy/stats/tests/test_continuous.py::TestMakeDistribution::test_rv_generic[130-distdata130]
On latest main
, the same test incantation has no test slower than 0.20 seconds.
Matt and I are still in the same physical location--I'll see if we can chat after supper today.
Thanks @tylerjereddy! I skipped the slow tests and prevented the failing |
Jake wasn't aware of a few other recent changes, but I'll consider his approving review still valid given the minor review commits that Matt applied after my review and the passing CI. I poked around locally with my above incantation and it doesn't seem to cause problems with failure or performance of tests now. Thanks Matt |
Reference issue
gh-22312
What does this implement/fix?
gh-22312 noted a problem with
method='inversion'
for discrete distributions when the correct inverse is the left end of the support. This fixes the bug.Additional information
It may still be slow because it waits for bracket finding to fail. An enhancement (which I might include before merge) would be to skip inversion when the correct result is the left end of the support.
@steppi were there any other tests we skipped due to this issue that we should unskip?