10000 ENH: stats: Add matrix variate t distribution by johnmdusel · Pull Request #22925 · scipy/scipy · GitHub
[go: up one dir, main page]

Skip to content

ENH: stats: Add matrix variate t distribution #22925

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 55 commits into
base: main
Choose a base branch
from

Conversation

johnmdusel
Copy link
@johnmdusel johnmdusel commented May 2, 2025

Reference issue

Closes issue #22870.

What does this implement/fix?

Background explanation in this post on the developer forum

Additional information

From the PR checklist:

  • This code can be distributed under a BSD license. I followed the existing random matrix implementation for data validation and wrote the methods from scratch.

  • I provided unit tests in stats.tests.test_multivariate.TestMatrixT. These are mostly simiar to the ones for matrix_normal, except that the moment checking and sample checking also perform a spot-check of two basic theorems from Gupta and Nagar (2000). All unit tests pass locally.

  • Re: style: I used black to format my additions to stats._multivariate. For unit tests and benchmarking I followed the style of existing code.

  • I provided benchmarks in stats.MatrixSampling. This also includes benchmarks for rvs methods of matrix_normal and invwishart.

  • All public functions have docstrings with examples, and I've verified that the documentation builds locally and renders correctly.

… that frozen instance checks for singularity of spread matrices.
    - Completed `test_bad_input`
    - Stubs of other tests
    - Adds unit tests for bad input and default inputs and covariance expansion to
      matrix variate t test class.
    - Fixes bugs in matrix variate t distribution which were exposed by those tests.
    - Adds unit tests for array input to matrix variate t test class.

    - Fixes bugs in matrix variate t distribution which were exposed by those tests.

        _logpdf did not expect to receive an array input with > 3 dimensions and its Einstein sum
        is defined based on that expectation. So, if there are > 3 dimensions then _logpdf will
        flatten out the extra ones, perform the calculation, then reshape before returning.
Covers matrix normal, matrix t, and inverse Wishart.
@github-actions github-actions bot added scipy.stats enhancement A new feature or improvement labels May 2, 2025
Copy link
Contributor
@tylerjereddy tylerjereddy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you made a solid effort to follow our usual practices. I added a few surface-level comments ahead of the review by stats regulars/domain experts.

johnmdusel added 2 commits May 3, 2025 06:13
…t and remove leftover debugging lines in `_multivariate.py`
@johnmdusel
Copy link
Author

Is this PR still active? Some tests show as failed, but I’m unsure how to resolve them on my end. I’m happy to make any necessary changes to fix them.

Copy link
Member
@lucascolley lucascolley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the lint failures shown at https://github.com/scipy/scipy/actions/runs/14981775469/job/42087379243?pr=22925 are an easy fix—they are just due to some lines being too long

Copy link
Contributor
@dschmitz89 dschmitz89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks very promising, thank you @johnmdusel . Could you add references to the paper in the docstring?

About tests: how can we evaluate that this is indeed the matrix t distribution? Can we test the PDF method against some external reference? Or is there a special case where this reduces to a distribution we already have implemented?

@johnmdusel
Copy link
Author

This looks very promising, thank you @johnmdusel . Could you add references to the paper in the docstring?

Yes, I will add the reference to Gupta and Nagar.

About tests: how can we evaluate that this is indeed the matrix t distribution? Can we test the PDF method against some external reference? Or is there a special case where this reduces to a distribution we already have implemented?

When I was developing this class, I tested the rvs and pdf and logpdf methods against Mathematica's MatrixTDistribution. The matrix_t_gen implemented here does agree with the Mathematica version. I can add some or all of that validation to the TestMatrixT class, if there's interest.

Motivated by their role in the `rvs` method
@johnmdusel johnmdusel requested a review from dschmitz89 July 2, 2025 13:54
…terface consistency

Post `0dceecb` they need to be `row_spread=1` and `col_spread=1` instead of `None`. This is how they're defined in `matrix_normal_gen.__call__`
@mdhaber
Copy link
Contributor
mdhaber commented Jul 3, 2025

Would you have time to also have a look @mdhaber (especially the mathematica reference)?

I'm limited to using Mathematica version "14.2.1 for Linux x86 (64-bit) (March 16, 2025)". When I execute that code, I don't get the sample sample or pdf values.

@dschmitz89
Copy link
Contributor

Would you have time to also have a look @mdhaber (especially the mathematica reference)?

I'm limited to using Mathematica version "14.2.1 for Linux x86 (64-bit) (March 16, 2025)". When I execute that code, I don't get the sample sample or pdf values.

Hm, unfortunate. Can we do a minimal test with a fixed set of matrices to test the pdf values if we cannot rely on the random variates? I do not expect differences in the first few digits between different mathematica versions. @johnmdusel @mdhaber

@johnmdusel
Copy link
Author
johnmdusel commented Jul 4, 2025

Can we do a minimal test with a fixed set of matrices to test the pdf values if we cannot rely on the random variates?

@dschmitz89 I could set up a minimal Julia script to generate random variates and compute PDF values. Would that be alright, as an alternative to Mathematica? I would provide a Dockerfile and a .jl file and copy-paste the outputs as is currently done in the unit test.

@mdhaber mdhaber closed this Jul 4, 2025
@mdhaber
Copy link
Contributor
mdhaber commented Jul 4, 2025

Oops that was a misclick. I'll see if I can just verify the PDfF values @johnmdusel generated.

@mdhaber mdhaber reopened this Jul 4, 2025
@johnmdusel
Copy link
Author

@dschmitz89 @mdhaber I added a unit test that checks PDF values against Julia. The source in the test's docstring and this small repo. All tests pass locally.

Copy link
Contributor
@dschmitz89 dschmitz89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am happy with this PR and the pdf values were also confirmed against an external source (Mathematica). There is one last stylistic nit regarding tests: we typically use relative tolerances, not absolute ones. Will wait for other reviewers before merging.

@scipy scipy deleted a comment from abuazamatov1919 Jul 10, 2025
@johnmdusel
Copy link
Author

@rgommers The tests against Julia and Mathematica are now based on a handful of reference values, instead of a file.

@johnmdusel
Copy link
Author
johnmdusel commented Jul 16, 2025

@dschmitz89 -- questions about the failing CI checks above

  • Linux tests: The test_samples test took too long; it takes more samples for the test to pass with a lower relative tolerance. How would you like me to manage this tradeoff? With 10k samples it will pass with a 5% relative tolerance; with 100k it will pass with 2.5% but will take > 1 second.

  • Windows tests: This seems to be related to HiGHS; can I assume it'll be fixed in some other PR?

@dschmitz89
Copy link
Contributor

@dschmitz89 -- questions about the failing CI checks above

  • Linux tests: The test_samples test took too long; it takes more samples for the test to pass with a lower relative tolerance. How would you like me to manage this tradeoff? With 10k samples it will pass with a 5% relative tolerance; with 100k it will pass with 2.5% but will take > 1 second.

Let's go with the 5% tolerance for simplicity. Sry about the churn.

  • Windows tests: This seems to be related to HiGHS; can I assume it'll be fixed in some other PR?

That one is already fixed in main.

@johnmdusel
Copy link
Author

@dschmitz89 -- questions about the failing CI checks above

  • Linux tests: The test_samples test took too long; it takes more samples for the test to pass with a lower relative tolerance. How would you like me to manage this tradeoff? With 10k samples it will pass with a 5% relative tolerance; with 100k it will pass with 2.5% but will take > 1 second.

Let's go with the 5% tolerance for simplicity. Sry about the churn.

No problem! I made the change.

@johnmdusel
Copy link
Author

@rgommers The tests against Julia and Mathematica are now based on a handful of reference values, instead of a file.

@rgommers would you be able to approve if no further changes are required? On the other hand, I'm happy to make further changes as needed. Thank you very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement A new feature or improvement scipy.stats
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: stats: add matrix t-distribution
7 participants
0