BUG: Fix bug in Runs.runs_test for the case of a single run yielding … #9524

shusheer · 2025-03-05T23:09:10Z

…incorrect zscore and pvalue

[ X] closes statsmodels.sandbox.stats.runs.runstest_1samp returns incorrect values with a single run #9523
[ X] tests added / passed.
[ X] code/documentation is well formatted.
[ X] properly formatted commit message. See
NumPy's guide.

…incorrect zscore and pvalue

josef-pkt · 2025-03-05T23:20:28Z

Thanks for pointing this out.

what is the theoretical justification for the chosen pvalue?

I need to check this.
For example for one sample binomial test, there are different options for the case of all observations identical, wald, score, exact, .... Those result in either nans or different values by test (or confidence interval) method

… at end of file)

shusheer · 2025-03-05T23:33:46Z

what is the theoretical justification for the chosen pvalue?

I don't choose a fixed pvalue, I calculate the probability that a sample size of N is all the same, which is 2^(1-N). I then I use this to calculate a z-score assuming normal approximation. The one bit I am slightly unsure of is that because we have two possible states (above vs below the threshold used for calculating run value) I think the probability might need to be doubled. In any case, the z-score is negative because the run is longer than expected by chance.

josef-pkt · 2025-03-06T00:01:53Z

I kind of understand the argument, and it looks reasonable.

However I'm a bit distracted these days (home renovation) and might need some time to figure out the details.
One possibility is to run some Monte Carlo simulations to see whether the size (rejection rate) is correct in this case.
I guess it will only be relevant in small samples.

But it looks intuitive that we should reject in large samples if all outcomes are the same, i.e. only one run is observed.

(A bit of caution, in scipy.stats t-test we thought we had a justification for a specific non-nan result if variance is zero, but there are different ways of approaching a zero variance limit, so scipy.stats switched back to returning nans.
I don't think this applies in this case.)

josef-pkt · 2025-03-06T06:23:07Z

aside:
https://www.statsmodels.org/dev/generated/statsmodels.sandbox.stats.runs.runstest_1samp.html
The docstring does not specify what the null and alternative hypothesis are, nor under what assumptions the p-value is computed.
My guess is that the null assumption is i.i.d. Bernoulli with p=0.5. Then we would reject if either p != 0.5 or observations are not sequentially independent.

Based on the comments and notes the functions were largely based on SAS documentation.

(AFAIR I had given up on "exact" distributions for runs as too complicated and not worth the effort. Some results are still in the sandbox module.
)

another aside
"This tests whether all groups have the same fraction of observations above the median"
in https://www.statsmodels.org/dev/generated/statsmodels.sandbox.stats.runs.median_test_ksample.html
sounds strange, half of the observations are above the median.
This should mention pooled median to be more explicit.

(It looks like I have not worked on this since 2013. Another module to see what can be moved out of the sandbox and what is unfinished experimental code.)

update
SAS has run test in autoreg procedure
https://go.documentation.sas.com/doc/en/pgmsascdc/v_060/etsug/etsug_autoreg_details27.htm
but not much information on it except formula.
no mention of all events identical, formula has division by zero

update
The module doc mention NPAR procedure in SAS, but SAS does not have that procedure.
SPSS has NPAR which also includes the runs test, one sample test and Wald-Wolfowitz 2 sample test.
The SPSS algorithm also has the 0.5 small sample correction nobs < 50.

It is likely that I mixed up the references when I wrote the docstrings.

shusheer · 2025-03-06T17:29:40Z

There are a few failures on tests, but these seem entirely unrelated to the runs test, so I presume reflect other possible issues in the wider package. I have therefore not reviewed them in any detail.

FAILED statsmodels/stats/tests/test_deltacov.py::TestDeltacovOLS::test_ttest

FAILED statsmodels/tsa/forecasting/tests/test_theta.py::test_auto - Assertion...

FAILED statsmodels/regression/tests/test_regression.py::test_summary_as_latex
FAILED statsmodels/distributions/copula/tests/test_copula.py::TestGaussianCopula::test_cdf
FAILED statsmodels/regression/tests/test_theil.py::TestTheilTextile::test_summary
FAILED statsmodels/iolib/tests/test_summary2.py::test_ols_summary_rsquared_label
FAILED statsmodels/sandbox/stats/tests/test_multicomp.py::test_tukey_pvalues
ERROR statsmodels/distributions/tests/test_mixture.py
ERROR statsmodels/sandbox/distributions/tests/test_extras.py
ERROR statsmodels/sandbox/distributions/tests/test_multivariate.py
ERROR statsmodels/sandbox/distributions/tests/test_norm_expan.py
ERROR statsmodels/sandbox/distributions/tests/test_transf.py

josef-pkt · 2025-03-06T20:17:29Z

unit test failures are unrelated and can be ignored here.
The larger last group is most likely because of some recent scipy changes to multivariate normal cdf

josef-pkt · 2025-03-06T20:21:14Z

Out of curiosity, How did you run into this problem with only one run?
If the threshold is mean or median, then we need to have at least 2 runs (in case of trend or level shift).

shusheer · 2025-03-07T19:03:29Z

Out of curiosity, How did you run into this problem with only one run? If the threshold is mean or median, then we need to have at least 2 runs (in case of trend or level shift).

I'm a very minor author on the CSAPS package https://github.com/espdev/csaps which is a smoothing cubic spline package. When doing cubic spline smoothing, the question naturally arises as to "what is the correct level of smoothing" for arbitrary data, such that you can implement auto-smoothing.

My solution is to maximise the probability of the runs test - thus avoiding the problem of outliers and so on in the data, which might influence other tests. There are two boundaries for the smoothing parameter: effectively a linear fit to the data at one end (which will have some valid number of runs), or an unsmoothed cubic spline at the other end (which will pass exactly through each datapoint). At that boundary, the residuals are a string of zeroes, and so your data for the runs test causes exactly the issue I came upon here. In order for my optimiser to work, I need a valid value for the boundary conditions, and so the need to correct the runs test implementation!

josef-pkt · 2025-07-17T07:29:20Z

merging, looks good AFAIR (before I got lost in runs test extensions)

BUG: Fix bug in Runs.runs_test for the case of a single run yielding …

0c02c93

…incorrect zscore and pvalue

shusheer mentioned this pull request Mar 5, 2025

statsmodels.sandbox.stats.runs.runstest_1samp returns incorrect values with a single run #9523

Closed

MAINT: fix flake8 failures (add blank lines between tests, blank line…

45fd1be

… at end of file)

josef-pkt added type-enh comp-stats prio-elev labels Mar 5, 2025

josef-pkt mentioned this pull request Mar 10, 2025

QUERY: stats.brunnermunzel: produces warnings and nans with edge case input scipy/scipy#22664

Open

josef-pkt mentioned this pull request May 15, 2025

ENH: test for symmetry of distribution (non-parametric, distribution free) #9525

Open

josef-pkt added this to the 0.15 milestone Jul 17, 2025

josef-pkt merged commit a6f3a94 into statsmodels:main Jul 17, 2025
2 of 3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

BUG: Fix bug in Runs.runs_test for the case of a single run yielding … #9524

BUG: Fix bug in Runs.runs_test for the case of a single run yielding … #9524

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

BUG: Fix bug in Runs.runs_test for the case of a single run yielding … #9524

BUG: Fix bug in Runs.runs_test for the case of a single run yielding … #9524

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!