8000 BUG: Fix bug in Runs.runs_test for the case of a single run yielding … by shusheer · Pull Request #9524 · statsmodels/statsmodels · GitHub
[go: up one dir, main page]

Skip to content

BUG: Fix bug in Runs.runs_test for the case of a single run yielding … #9524

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jul 17, 2025

Conversation

shusheer
Copy link
@shusheer shusheer commented Mar 5, 2025

…incorrect zscore and pvalue

@josef-pkt
Copy link
Member

Thanks for pointing this out.

what is the theoretical justification for the chosen pvalue?

I need to check this.
For example for one sample binomial test, there are different options for the case of all observations identical, wald, score, exact, .... Those result in either nans or different values by test (or confidence interval) method

@shusheer
Copy link
Author
shusheer commented Mar 5, 2025

what is the theoretical justification for the chosen pvalue?

I don't choose a fixed pvalue, I calculate the probability that a sample size of N is all the same, which is 2^(1-N). I then I use this to calculate a z-score assuming normal approximation. The one bit I am slightly unsure of is that because we have two possible states (above vs below the threshold used for calculating run value) I think the probability might need to be doubled. In any case, the z-score is negative because the run is longer than expected by chance.

@josef-pkt
Copy link
Member 8000

I kind of understand the argument, and it looks reasonable.

However I'm a bit distracted these days (home renovation) and might need some time to figure out the details.
One possibility is to run some Monte Carlo simulations to see whether the size (rejection rate) is correct in this case.
I guess it will only be relevant in small samples.

But it looks intuitive that we should reject in large samples if all outcomes are the same, i.e. only one run is observed.

(A bit of caution, in scipy.stats t-test we thought we had a justification for a specific non-nan result if variance is zero, but there are different ways of approaching a zero variance limit, so scipy.stats switched back to returning nans.
I don't think this applies in this case.)

@josef-pkt
Copy link
Member
josef-pkt commented Mar 6, 2025

aside:
https://www.statsmodels.org/dev/generated/statsmodels.sandbox.stats.runs.runstest_1samp.html
The docstring does not specify what the null and alternative hypothesis are, nor under what assumptions the p-value is computed.
My guess is that the null assumption is i.i.d. Bernoulli with p=0.5. Then we would reject if either p != 0.5 or observations are not sequentially independent.

Based on the comments and notes the functions were largely based on SAS documentation.

(AFAIR I had given up on "exact" distributions for runs as too complicated and not worth the effort. Some results are still in the sandbox module.
)

another aside
"This tests whether all groups have the same fraction of observations above the median"
in https://www.statsmodels.org/dev/generated/statsmodels.sandbox.stats.runs.median_test_ksample.html
sounds strange, half of the observations are above the median.
This should mention pooled median to be more explicit.

(It looks like I have not worked on this since 2013. Another module to see what can be moved out of the sandbox and what is unfinished experimental code.)

update
SAS has run test in autoreg procedure
https://go.documentation.sas.com/doc/en/pgmsascdc/v_060/etsug/etsug_autoreg_details27.htm
but not much information on it except formula.
no mention of all events identical, formula has division by zero

update
The module doc mention NPAR procedure in SAS, but SAS does not have that procedure.
SPSS has NPAR which also includes the runs test, one sample test and Wald-Wolfowitz 2 sample test.
The SPSS algorithm also has the 0.5 small sample correction nobs < 50.

It is likely that I mixed up the references when I wrote the docstrings.

@shusheer
Copy link
Author
shusheer commented Mar 6, 2025

There are a few failures on tests, but these seem entirely unrelated to the runs test, so I presume reflect other possible issues in the wider package. I have therefore not reviewed them in any detail.

FAILED statsmodels/stats/tests/test_deltacov.py::TestDeltacovOLS::test_ttest

FAILED statsmodels/tsa/forecasting/tests/test_theta.py::test_auto - Assertion...

FAILED statsmodels/regression/tests/test_regression.py::test_summary_as_latex
FAILED statsmodels/distributions/copula/tests/test_copula.py::TestGaussianCopula::test_cdf
FAILED statsmodels/regression/tests/test_theil.py::TestTheilTextile::test_summary
FAILED statsmodels/iolib/tests/test_summary2.py::test_ols_summary_rsquared_label
FAILED statsmodels/sandbox/stats/tests/test_multicomp.py::test_tukey_pvalues
ERROR statsmodels/distributions/tests/test_mixture.py
ERROR statsmodels/sandbox/distributions/tests/test_extras.py
ERROR statsmodels/sandbox/distributions/tests/test_multivariate.py
ERROR statsmodels/sandbox/distributions/tests/test_norm_expan.py
ERROR statsmodels/sandbox/distributions/tests/test_transf.py

@josef-pkt
Copy link
Member

unit test failures are unrelated and can be ignored here.
The larger last group is most likely because of some recent scipy changes to multivariate normal cdf

@josef-pkt
Copy link
Member

Out of curiosity, How did you run into this problem with only one run?
If the threshold is mean or median, then we need to have at least 2 runs (in case of trend or level shift).

@shusheer
Copy link
Author
shusheer commented Mar 7, 2025

Out of curiosity, How did you run into this problem with only one run? If the threshold is mean or median, then we need to have at least 2 runs (in case of trend or level shift).

I'm a very minor author on the CSAPS package https://github.com/espdev/csaps which is a smoothing cubic spline package. When doing cubic spline smoothing, the question naturally arises as to "what is the correct level of smoothing" for arbitrary data, such that you can implement auto-smoothing.

My solution is to maximise the probability of the runs test - thus avoiding the problem of outliers and so on in the data, which might influence other tests. There are two boundaries for the smoothing parameter: effectively a linear fit to the data at one end (which will have some valid number of runs), or an unsmoothed cubic spline at the other end (which will pass exactly through each datapoint). At that boundary, the residuals are a string of zeroes, and so your data for the runs test causes exactly the issue I came upon here. In order for my optimiser to work, I need a valid value for the boundary conditions, and so the need to correct the runs test implementation!

@josef-pkt
Copy link
Member

merging, looks good AFAIR (before I got lost in runs test extensions)

@josef-pkt josef-pkt merged commit a6f3a94 into statsmodels:main Jul 17, 2025
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

statsmodels.sandbox.stats.runs.runstest_1samp returns incorrect values with a single run
2 participants
0