10000 ENH: multivariate normality tests by josef-pkt · Pull Request #9564 · statsmodels/statsmodels · GitHub
[go: up one dir, main page]

Skip to content

ENH: multivariate normality tests #9564

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

josef-pkt
Copy link
Member

resurrecting some old code of mine

Jarque-Bera type tests for multivariate normality.
Tests based on skew and kurtosis of orthogonalized data, and possibly tests based on Mardia's mv skew and kurtosis.

see #382, #6194

AFAICS, this old version does not have the Doornik-Hansen orthogonalization which orthogonalizes corr instead of cov matrix.
(There are comments that the code does not replicate skew and kurtosis in the DH article or working paper)

Several more recent articles orthogonalize based on cov. So we need an option cov versus corr, i.e. scaling data with std or not before orthogonalization.

@josef-pkt
Copy link
Member Author
josef-pkt commented May 8, 2025

two problems
the data in statsmodels\statsmodels\genmod\tests\results\iris.csv are modified (which I had uses in my examples) and not the original data. I loaded data from sklearn which match DH example in appendix.

sigmainvhalf is missing post-multiplication with evecs.T

After this I can replicate the DH example results, orthogonalized skew and kurtosis and test statistic and pvalue..

Note on numerical precision
This uses eigenvalue decomp of cov or corr following the articles, and not SVD on original data.

update
iris.csv was added in 2011 as test case for perfect separation.
I don't see any info about changes to original values
dd1f2bf

two lines have been changed:

idx = np.nonzero((iris_sk.data - iris[:, :-1]).sum(1))
idx, (iris_sk.data - iris[:, :-1])[idx]
((array([34, 37], dtype=int64),),
 array([[ 0. ,  0. ,  0. ,  0.1],
        [ 0. ,  0.5, -0.1,  0. ]]))

@josef-pkt
Copy link
Member Author

multivariate mean and cov tests are in statsmodels.stats.
I'm not sure putting mv normality tests in statsmodels.multivariate is better than stats. But the currently added module is private, so public path is still open.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant
0